HTTP Explained: The HyperText Transfer Protocol, Part 1
This is part of a larger series titled, “How To Program Anything: The Internet“
The HyperText Transfer Protocol
Many of us these days don’t really pay much attention to those enigmatic prefixes to our URLs that point us to our favorite webpages, it’s been relegated to a bit of the modern wizardry we take for granted. In fact, many browsers now hide “http://” or “https://” from us as we browse the web entirely! However, when using a web browser to browse the “World Wide Web” as Tim-Berners Lee titled his seminal hypertext project, we use the http URI scheme, and resultant protocol all the time, for just about everything. (Don’t know what a URI or URI scheme is? I have a tutorial explaining all things URI, URL, URN, and their respective components at Anatomy of a URI) But, how does it work? What does it mean?
What is Hypertext?
Hypertext is an abstract concept that’s integrally related to “traditional” text. “Hyper-” as a prefix in English comes from the Greek “ὑπερ” (pronounced, as given in classical IPA: /hupér/, or in more modern IPA: /hypér/). In Latin you might think of the word “super”, and in something more Germanic you might consider the prefix “uber,” such as in the übermensch of Nietzschean philosophy. The relation is similar in fact to Nietzsche’s use of it, where the übermensch is a higher man who has transcended simple humanity, hypertext is text that has transcended, or oercome the constraints of written text, most notably its linear nature.
The biggest key way of achieving this is by being able to embed “hyperlinks” (as they’ve come to be known) in content both textual and graphical, which bears a relation to another piece of content. By interacting with this embedded link the reader can immediately access hopefully pertinent information, and in some artistic uses other “paths”, to what is being discussed at paw. This ability to “interact” with the text and media, even if it is a simple touch, selection, or “click” is imperative to the ability to embed such links in the text, and thus, hypertext as we know it must be displayed on devices capable of such interaction.
For example, the prototypical “memex” coined and invented by Vannevar Bush (though “memex” was coined after publication) in his essay Mechanization and the Record in 1939 (later to be reworked and expanded as a published article in The Atlantic in July 1945 titled “As We May Think”). Vannevar expressed concern for science’s destructive achievements at the time (a contemporary event were the atomic bombings of Hiroshima and Nagasaki, something Vannevar had been involved in through the initiation and early administration of The Manhattan Project) and looked forward to post war scientists focusing on furthering understanding and other such goodwill. This was leading to, and would lead to more of, an information explosion, the nature of which could hinder or hamper scientific pursuit. He imagined basically a device like a piece of furniture that could associate various records together (microfilms) using relationships, and that people could peruse and produce “trails” where they go from one record to another through relational links. He also imagined some way of transferring information from one memex to another. His hope was that “wholly new forms of encyclopedias will appear, ready made with a mesh of associative trails running through them, ready to be dropped into the memex and there amplified.” (“As We May Think”) Not only that he proposed a certain collective memory using the machine making knowledge more accessible. Sound familiar? (Think: Wikipedia) The idea is that you could use codes to tell the machine to jump to and pull up information at “random”, or in the construction of a trail. Sounds like a hyperlink to me.
Things were to progress from 1945 at a pretty rapid pace however, and on December 9, 1968 Douglas Engelbart put on “The Mother of All Demos” at the ACM/IEEE – Computer Society’s Fall Joint Computer Conference in San Francisco. At this demo a 90 minute live presentation demonstrated a computer mouse, bitmap displays, and on top of those windows, graphics, hypertext, video conferencing, dynamic file linking, revision control, and a collaborative real-time editor. All of this was demonstrated in a single system called the NLS (the oN-Line System). This demonstration initiated the work Xerox did towards Graphical User Interfaces and highly influenced both the Apple Macintosh and Microsoft Windows operating systems in the 1980s. Remember, this demonstration occurred in 1968. Funny enough, before the demonstration much of the computer science community viewed Douglas as somewhat of a “crack pot”. (The Mother of All Demos — 150 years ahead of its time).
With these beautiful machines coming into being, a machine like a memex could potentially be made! That’s why in 1989 Tim Berners-Lee, a scientist at CERN, proposed a new project he called “WorldWideWeb”. The idea was that we could build a system that offered simple, immediate, information-sharing among physicists workin at academic institutions. He wrote:
HyperText is a way to link and access information of various kinds as a web of nodes in which the user can browse at will. Potentially, HyperText provides a single user-interface to many large classes of stored information, such as reports, notes, data-bases, computer documentation and on-line systems help. We propose the implementation of a simple scheme to incorporate several different servers of machine-stored information already available at CERN, including an analysis of the requirements for information access needs by experiments… A program which provides access to the hypertext world we call a browser. ― T. Berners-Lee, R. Cailliau, 12 November 1990, CERN
Finally in 1992, Lynx was born as an early Internet web browser. Its ability to provide hypertext links within documents that could reach into documents anywhere on the Internet began the creation of the Web on the Internet. In fact, Lynx was my first web browser I ever used as part of the ACLIN terminal system I could dial into on my local library. Actually, in fact, I used the ACLIN library 800 number system to telnet in, launch Lynx, and then navigate to MUCKs and MUDs of interest… it was my only connection to the outside world before AOL!
Currently in 2017, after Douglas’ ideas and inventions trickled into realm of personal computing from the Xerox Alto to the immensely popular Apple Macintosh, this interactivity hypertext requires is generally achieved with computers: the screens display the hypertext, and the user uses some input mechanism such as a touchscreen or mouse, to select and interact with the embedded link. And the links themselves serve pretty much the function a memex was to serve, pulling up information from various disparate sources across the world in a sort of collective informational intelligence called the World Wide Web
Note however! Don’t confuse HyperText as HTML. HyperText is an abstract idea of content linked together in various ways, where HTML is the HyperText Markup Language. HTML is one of many means to format information into a HyperText system.
The World Wide Web is a HyperText System
As you can see then, what came about to be the World Wide Web, is the ultimate HyperText system. You are using the World Wide Web and HyperText technologies such as HTML and HTTP to even simply view this page and all its contents, including images and other wizardry. It appears Vannevar’s dream became a reality, and we are now harnessing increasingly explosive information in ways that are understandable by society and individuals.
Computer to Computer Communication
Computer networking is an extraordinarily complex and detailed topic, so it’s difficult to pinpoint how exactly to convey where protocols and various other mechanisms come in and what they do. In short, the internet as we know it generally, but not always, runs on a technology collectively called TCP/IP (Transmission Control Protocol / Internet Protocol) also called the Internet Protocol Suite. It is divided into “layers” that operate one “on top” of the other, that is, depending on the previous. It starts basically with the link layer, then the internet layer, then the transport layer, and finally the application layer. Each layer deals with linking up computers and sending messages from one node in a network to another node in the network.
Today, for HTTP, thankfully we’ll only be using and focusing on the application layer. This is the “layer” or environment where applications can simply connect to a port an another machine and “automagically” send data to that port, and whereupon the said port will “automagically” receive that information and whatever program is listening on that port will process the given data. The most interactive and barebones example of this layer is a basic Telnet connection. You can “Telnet” to a particular port using your command line, type letters and return, and voila those characters are sent to that computer’s port. HTTP works with this type of connection, and, to be shown later, you can actually establish a Telnet connection to a web server’s port (such as wunk.me:80), manually type in an HTTP request, and receive the raw data HTTP response back in your terminal screen. This is essentially what your web browser does “behind the scenes.”
What is a Protocol?
You can think of a protocol in a number of ways. A protocol in terms of computer network communication is simply an agreement on the format of given data, its order of presentation, and the algorithms produced to interpret that data. A protocol doesn’t necessarily concern itself really with what is being communicated, although sometimes it does, but is more focused on how something is communicated. In a way, written and spoken languages are form of a protocol. The order of terms and the meanings of sets of sounds in each language are different, and unless two parties have agreed to speak the same languages, to keep it simple, no communication is occurring. Or say you and your best friend decide that you want to send messages to each other, but you don’t want anyone else to read those messages in between. So you decide to use an encryption method, garbling up your messages before you send them. Your friend on the receiving end has an algorithm or method of ungarbling the messages… without which the message would mean nothing. In this sense you have an protocol of encryption in your communication process.
Protocols can also exist outside of this context as well and in fact help tell computers and intermediary hardware how to pass one message from point A to point B. In this case the agreement is on how given data will be routed from machine to machine, just like the rules a telephone operator may have followed in the early days to connect two parties.
As you can see protocols are encapsulations of processes of communication and in fact can be compared to programming languages in the sense that those are encapsulations of processes of computation. I find the analogy to be abstract at best, but theoretically it begins to fall apart when you consider what a programming language can do outside of the context of a protocol.
HTTP, HyperText Transfer Protocol, is made up of specific formatting rules that tell the sender or receiver various pieces of information. In HTTP a “client” sends a request to another machine. That machine, the “server”, processes the request it has received and sends a response back to the client. In this case the request and response have to do with pieces of information to be retrieved, such as files containing HTML, CSS, JS, JPGs, PNGs, etc.
Server and Client: Computer Relationships
I thought I’d take a moment to comment on the “client/server” relationship that’s talked about so often in computer networking and communication protocols. For quite a while I found the tenuous identity of whether a computer was a server or a client a very confusing thing. For one, I didn’t understand that the roles could change depending on context, and in some cases a server may act as a client, and a client as a server. Or that one computer could be both a server and a client. Let’s clear this up…
I finally realized that in terms of “server” and “client” it really comes down to how we consider a single communication. That is, every instance of a communication establishes who/what is the server, and who/what the client. In most closed conversation, particularly in HTTP where everything is either a request or a response, you send out a request for information (basically, ask a question) and the other entity answers your question with a response. The entity sending a request for information (or request for action as we’ll see) is the client in any given communication, and the entity sending a response containing information is the server in any given communication. This can be summed up in the image below:
In practice, for HTTP, you generally set up a software program called a “server” such as APACHE or NGINX, or even Lighty (lighttp), on a computer and use it to “serve” up webpages to other people. The programs that access those servers using HTTP (the browsers and REST clients) don’t generally run their own servers as well, and act mostly as clients all the time. However, when I think of server and client, I simply ask myself who is requesting information (or an action) and who is providing a response, or feedback.
The Many Faces of HTTP
HTTP itself has had multiple versions, usually portrayed by typing HTTP/#. The documents I’m referencing when I speak of HTTP are, much like the URI tutorial, the RFCs that ultimately defined the second-to-final version of HTTP (HTTP/1.1), which is in widest use today. These are RFC7230 (Message Syntax and Routing), RFC7231 (Semantics and Content), RFC7232 (Conditional Requests), RFC7233 (Range Requests), RFC7234 (Caching) and RFC7235 (Authentication). These documents are called “Request For Comments” that help define the mechanics and standards of the internet. There is also a HTTP/2 specification that is covered in RFC7231, and is currently the final and latest version of HTTP to also have wide support.
The focusing idea behind HTTP is that it is a protocol meant for retrieving/sending/acting on resources in a HyperText system (the World Wide Web). These resources are identified and located by using Uniform Resource Locators (or URLs). URLs use the Uniform Resource Identifier schemes “http” and “https”. For more information on URIs, schemes, and their structures see my previous tutorial.
The version we’re covering here in depth is HTTP/1.1, which is a revision of HTTP/1.0. The biggest and most important difference between the two is that in HTTP/1.0 a separate connection is made to the server for every resource request, that being every CSS file, image source file, JS file, etc. This increases the time it takes to load a complete “page” as each connection requires the establishment of a TCP connection which in terms of processing time is expensive. In HTTP/1.1 the client and server connection once established can be used multiple times to download additional files. Without having to open multiple connections latency is minimized.
Ideas of HTTP
There are multiple ideas that govern the philosophy of the design of HTTP. Some of these ideas are simply names for parts of the process and format, while others govern the context or lack of context that HTTP has.
Headers and Payload
HTTP requests and responses are split up into two sections: headers and content. The headers are all the HTTP pertinent information in the request and response, such as the version, the method, the URL, the status/error code, etc. The content, or otherwise known as the entity, is a representation of any given resource. This translates to for example, I request a JPG file, I get an HTTP message back with a status code of 200 OK in the headers, and in the content (or entity) all the beautiful 1s and 0s that make up the JPG file. Or I request wunk.me, in the payload/content/entity of the HTTP response is all the HTML making up my homepage.
HTTP is a stateless protocol. This means that the only context any given request or response has is its corresponding response or request. That’s it. One request or response can’t set up data that has to be remembered or considered in later requests.
For example, you can’t send one HTTP request, as defined, to log into a server and then have all further HTTP requests automatically know that you have done this, that is, without adding additional information.
But, I can do that! You say. Well, kind of. Browsers and other technologies have developed ways around this including cookies, which are not really part of the pure HTTP standard and are defined in other RFCs elsewhere. However, when you have a cookie, the browser sends that cookie information to the server with every single HTTP request it makes. This is because without re-stating itself in the HTTP request, in the HTTP realm, it wouldn’t exist anymore. Every request is a blank slate, no context, and this is stateless.
The HTTP Session
Despite being stateless, there is such a thing as an HTTP session. A session is defined by a series of network request-response transactions. An HTTP client initiates a request to a particular host on a particular port (such as 80 or 8080), and the server on the other end listens and receives the request. The server then processes this request and sends back a status line and a message of its own. From beginning to end, this is an HTTP session.
With a stay-alive connection or persistent connection introduced in HTTP/1.1 the client may make further HTTP requests while the connection is still open, rather than close the connection after every request. The HTTP session is then terminated when the connection is closed (in my opinion).
When an HTTP request is formed its single most important information is the identification of a resource through the use of a URL. The second most important piece of information is the “verb” or “action” that is to affect the identified resource. These are commonly known as “methods” (for those programming inclined, you can think of them a bit like methods of class). HTTP/1.0 defined GET, POST, and HEAD methods, however, HTTP/1.1 added OPTIONS, PUT, DELETE, TRACE, and CONNECT as well. Technically there is no limit to the number of methods that can be defined, and in fact WebDAV defines 7 new methods for itself. Any client can use any method, and any server can support any combination of methods, being the ideal. I go over the methods below:
- GET – This method specifies that a given representation of the indicated URL is being requested. According to the specifications a GET request method is meant to only retrieve data. What this means is that issuing a GET method request to a server should not have the side-effect of changing the resource on the server side, or other related data.
- POST – This method actually passes data to the server in relation to the resource requested. For example, the resource may be a comment thread, and a POST method request to that comment thread resource would add a new comment to the thread. Another example may be a POST method request containing a JPG image to an Instagram user profile, causing the JPG image to become another resource under that user’s account. It is commonly used to submit data from a web form to a data-handling process on the server.
- HEAD – This method is exactly like GET, however, instead of getting the whole response (that being the resource’s representation, such as an HTML or JPG file) you only get the HTTP response part of the message. You can check HTTP meta-information using this without asking for the whole file. Useful if you want to find out HTTP information on a map image, without transferring a 20 MB map image.
- PUT – (added in HTTP/1.1) The idea here is that this is a request to “upload” or store the data in the content under a particular URI. If the URI refers to an already existing resource, it is modified. If the URI doesn’t resolve to anything on the server, the server would then presumably create that resource and reference it with the given URI.
- DELETE – (added in HTTP/1.1) This is pretty self explanatory. Like PUT, it specifies a URI that in this case should be deleted. That is, any record of it is to be wiped clean and its storage or representation no longer is to exist.
- OPTIONS – (added in HTTP/1.1) The OPTIONS method request returns the HTTP methods that the server supports for a specific URL. For example, I can determine if the server would respond to a PUT or POST request at a particular URL before I necessarily attempt to send those requests.
- TRACE – (added in HTTP/1.1) The idea behind this method is that sometimes we want to see what exactly the end-point server is receiving after our message gets routed around the internet. This method causes the server to reprint, or echo, the received request back to the sender so that the client can see if any changes or additions have been made by all the machines in between.
- Important Security Information: The TRACE method can be used in what is known as a cross-site tracing attack. This is a vulnerability that can expose and grant access to malicious users to various online resources, and thus, TRACE should be disabled on all production ready public facing servers (this includes the TRACK method as well that exists on Microsoft IIS servers.)
- PATCH – (added with RFC5789) This is somewhat like a PUT method, except that instead of overwriting the resource, it would only apply partial modifications to it. So change one thing, but not others. This would be akin to GET-ing a resource, modifying it, and then PUT-ing the resource back.
- CONNECT – (added in HTTP/1.1) This method is more technical and network intensive. It’s a way to convert the request connection to a transparent TCP/IP tunnel. This is usually done to facilitate SSL communication (Secure Sockets Layer) through an unencrypted proxy.
The expectation of HTTP servers everywhere is that the baseline for a general purpose server is to implement the GET and HEAD methods. On top of that, thehy should also respond or implement the OPTIONS method as well so that they can tell people that they can only GET and HEAD.
There are a few concepts relating to HTTP request methods that are important in their design and implementation:
- Safety – There are methods defined as safe and unsafe. The criteria for safety rests on the fact that some of the methods are designed for information retrieval exclusively. This means that whatever URL they are requesting, nothing should “happen” to the resource on the server or otherwise other than sending its representation over the network. These methods could include HEAD, GET, OPTIONS, TRACE. It’s called safe because the idea is that you could arbitrarily make all sorts of random GET requests to a server, keeping track of nothing including the server’s state, and nothing will be disrupted. Unfortunately, this safety is not required by the specs, and safety in these terms isn’t guaranteed across all implementations. However, in general this is the case, and it is important otherwise web spiders/robots/crawlers couldn’t issue GET request after GET request to index a site without wreaking total havoc. Likewise, the unsafe methods include POST, PUT, DELETE, and PATCH. These are deemed unsafe in this regard because their reception/execution may cause side effects on the server or otherwise. These methods are meant to DO something, which can alter the resources being referenced. For example, PUT multiple different files to the same URI can result in a succession of over-writes of that resource, giving us only the last one uploaded.
- Idempotence – Methods PUT and DELETE are meant to be idempotent. Idempotence in this context refers to the ability to “apply” the same operation twice with the same result. I quote Wikipedia: “A unary operation (or function) is idempotent if, whenever it is applied twice to any value, it gives the same result as if it were applied once; i.e., ƒ(ƒ(x)) ≡ ƒ(x). For example, the absolute value function, where abs(abs(x)) ≡ abs(x), is idempotent.” What this basically means is that if you were to make multiple identical requests with these methods, it will be as if you only made one request. It is not that you’ll get the same response from the server, but you’ll get the same result on the server in regards to the resources in question. Two identical DELETE method requests in a row should only delete the given resource once (the second time it won’t delete anything because it’ll already be gone). Two identical PUT method requests in a row should produce one URL with one resource representation (e.g. JPG image), not two separate files on the server. Because HTTP is stateless, the GET, HEAD, OPTIONS, and TRACE methods are also idempotent.
The POST method is not idempotent! It can be idempotent, but there are absolutely no guarantees. What this means it that multiple identical POST requests don’t necessarily end up with the same result. For example, a POST request that causes the side-effect of leaving a comment on a blog, if sent again will post the same comment again. Without proper protection clicking on “Buy Now!” may cause a POST request that charges your credit card twice! This is why browsers many times show a warning to a user when they want to reload a POST request, which would require sending the POST request again. Applications themselves can be written to handle these unfortunate mistakes, and often are, but it’s an important consideration.
- Implementation – Unfortunately the above safety and idempotence standards are not enforced in any way. This means that an errant programmer could create a server that might trigger all sorts of weird side-effects on a GET request, or put an additional file without overwriting the old one on every PUT request. It is up to the programmer to ensure the correctness of his system and to ensure that they build something an assuming user won’t muck up vastly.
HTTP responses are actually formatted technically different than HTTP request. Instead of a method defined in the HTTP headers in the response, you have what is called a HTTP Status Code. This has been true since HTTP/1.0. You’ve very often seen the most famous of these status codes often, particularly in the earlier days of the internet, that being 404. A status code essentially is a three digit number followed by a phrase (the reason phrase). So in our continuing example, you’ve probably seen 404 Not Found often. I particularly enjoy 403 Forbidden. Oooh, verboten!
The first digit of the code determines the classification of the status code. That way, custom status codes can be used, although many standardized status codes exist. A more likely application is to customize the “reason phrase” with equivalents that are applicable for that location or application at the programmer’s discretion. This is mostly because the status code of an HTTP response is meant to be read by both machines and humans, the 3 digit code is for the computer/program, while the reason phrase is for the human.
The most commonly used code is the 200 OK status code, indicating that everything is working correctly and that there were no redirections or errors. The 3 digit codes go up to 599 and are classifiable in the following way:
- 1XX – Informational Responses
- 2XX – Successful Responses
- 3XX – Redirection Notifications
- 4XX – Client Error
- 5XX – Server Error
When an HTTP request gets sent out into the internet to be passed around and finally find its destination, it may pass through other servers that serve particular purposes in regards to HTTP before reaching its final place. Three common HTTP intermediaries include the “proxy”, the “gateway” and the “tunnel”. These aren’t bound to an entire machine, because a single intermediary device may act as a proxy, gateway, or tunnel at any given time depending on the request. These intermediaries are beyond the scope of this article and get more into methodologies of networking and network setup. There is a fourth “kind” of intermediary to consider and these are network intermediaries that can act on “lower layers” (see above) of the network protocol, filtering or redirecting traffic without knowledge or permission. This is a man-in-the-middle attack from the viewpoint of the HTTP standard.
A client, server, intermediary, etc. may employ a caching system to cache appropriate HTTP responses and their payloads. What is deemed appropriate to cache and when these caches may be accessed is actually determined by further meta-information in the HTTP headers (see below). A server cannot cache while acting as an HTTP tunnel (see above). These caches can exist at any step of the chain, and even exist on the client device itself to cache commonly viewed images and other such resources. This purpose of a cache is to reduce network traffic, speed up content delivery, and balance the network load for new information.
Authentication is basically providing credentials proving you are who you say you are, or that at least you have the authority to view and operate on the resources identified by the URI. There are multiple authentication schemes available through HTTP, one being Basic Access authentication and another being Digest Access authentication. Both of these work off a challenge-response system, basically the server issues a challenge (generally, what is the username and password) and once the challenge is satisfied, only then is the request processed and served. These are two example authentication schemes, but HTTP also provides a general framework for access control and authentication. People also often build their own means and methods of access control using public and private keys, of which you’ve probably seen in many HTTP services around the web. The challenge-response scheme is the general framework, and is extensible so that other schemes besides Basic and Digest can be implemented.
HTTP authentication also has what are called Authentication Realms. These are a little esoteric, but the idea is that a server can implement separate scopes for one URI allowing basically, as you might imagine, different login screens for the same URL depending on the existence of a realm value string. This type of authentication is somewhat out of the scope of this more introductory article.
That’s it for part 1 of the HTTP protocol tutorial. We’ve covered the basic concepts of HTTP and the types of information that it deals with. We haven’t covered the specific implementation of an HTTP request or response however. In those requests and responses there can be a lot of “metadata” or data associated with a given HTTP message that pertains to content length, content type, caching, routing, etc. These are called header data fields. We’ll delve into the exact formatting of HTTP and these header data fields in the next part. Thanks!
If you appreciate my tutorials please help support me through my Patreon.
If a monthly commitment isn’t up your alley, you can always buy me some coffee.