Full Stack & Full Circle: What the Heck Happens In an HTTP Request-Response Cycle


Published on

Presented at Confident Coding III, San Francisco, CA. October 20, 2012.

Flying in from 10,000 foot view (“Hey, browser, show me this”, “Okay, here it is”), we’ll take thoughtful overview of the HTTP request/response cycle. Its essence is simply a series of questions & answers, accumulating portions of content to be gracefully assembled for the user.

We’ll hone in on some key players amidst the “full stack” of communications, with special attention to how an understanding of the HTTP lifecycle endows any developer or designer with the power to optimize for performance, cost, and UX

  When Estelle said she wanted to do a day of teaching "everything else" about web development, I knew right away that the HTTP request-response cycle is something I just _had_ to talk about. Its the topic that always gets skipped. "Welcome to web development! Lets start at HTML, CSS, and JavaScript!" We pass it right by. And we shouldnt.
  Hypertext Transfer Protocol (HTTP) is THE CORE technology of the web. Yes, more essential than even HTML. Assets on the web can be HTML, but dont have to be. We download software, graphics, PDFs, Word docs, Flash...none of these is HTML. Nor is XML, or JSON. CSS, or JavaScript. Data flowing across the web relies either on HTTP, or its secure counterpart HTTPS. The web relies on the HTTP request-response cycle.
  As a developer, you wants to take an interest in the cycle. Its where your work gets assembled. That assembly can be slow, or it can be fast. _Which_? Well, that depends on how the request-response cycle is managed. As a developer, you want to _understand_ what the cycle passes through so that you can arrange it to do whatever you want.
  Web development is usually regarded as two parts: Frontend Backend
  Sometimes we refer instead to Client side Server side
  Its fine, we do it all the time. But technically imprecise: Frontend & Backend refer to infrastructure Client side & server side refer to programming languages
  So the cycle refers to the communication passing through our frontend & backend infrastructure. Later well look at how programming languages fit into this picture. Client side (JavaScript) Server side (Python, Ruby, PHP, Perl, Java, etc.)
  Lets add a User to this mental picture, too. Because someones got to make the request, right? And someones got to be interested in what response it gets. So now weve got at least 3 potential components. They even correspond pretty nicely to a common team configuration: a user experience designer, a frontend developer, a backend developer.
  Frontend and backend are different parts of a cohesive whole, the website. Its the user and site that are in relationship with each other, want something of each other, desiring to communicate with each other. The frontend and backend _together_ form the stack of infrastructure for the site to do that. Web site/app
  Thats what we mean by "full stack". The combination of frontend and backend infrastructure, plus the technologies that allow them to interact. Full stack
  Voila. The request-response cycle just refers to passing a request message through the stack, and the stack delivering a response to that. HTTP/1.1 200 OK response request GET /something
  Request
  Theres an important piece missing from that diagram so far, right? URL. What is a URL anyway? Many people compare it to a street address. A series of progressive detail that narrow it down to an exact location. URL
  Anatomy of a URL
  http://cczona.com/ <scheme>://<host>/ URL Basics
At minimum, a URL needs two parts:
  http://cczona.com/ <scheme>://<host>/ URL Basics
Scheme - Browsers have lately been hiding the scheme from you. Dont be misled by that. Scheme is _mandatory_. It is an instruction, specifying which transfer method to use.
  http://cczona.com/ <scheme>://<host>/ URL Basics
Host - This part I _would_ compare to an address. It specifies where the host server is. Well expand on that later.
  http://cczona.com/blog/2012/10/20/confident-coding <scheme>://<host>/< path > URL Common
It usually also has at least one more part: Path - the name of a resource being requested
  http://cczona.com/blog/2012/10/20/confident-coding <scheme>://<host>/< path > HOW WHERE WHAT URL Common
Consider these three parts the "How, Where, and What".
  http://carinazona:opensesame@cczona.com:3000/ blog;year=2012? search&term=Confident&month=October#session2 <scheme>://<user>:<password>@<host>:<port>/ <path>;<params >? <query>#fragment URL All the parts!
There are 6 optional additional parts available. I know, crazy, right?
  Port Query Fragment URL Less Common
Port - is where the host server is expected to be listening for an HTTP request Query - is one or more values intended to be used by an application Fragment - is the name of some part within the requested resource.
  Username Password Parameters URL Rarely used for HTTP
  A request is submitted via a User-Agent.
  User-Agent examples Browser Crawlers/spiders/robots Screen readers Programs Scripts
Browsers - are the most common by far. But many other user-agent types exist as well. Crawlers/spiders/robots - automated site browser. Some examples include search engine indexers (Googlebot), archivers (Archive.org aka "The Internet Archive"), testing tools (curl-loader), monitoring services (uptime)
  User-Agent examples Browser Crawlers/spiders/robots Screen readers Programs Scripts
Screenreaders - assistive devices that read content aloud Programs - such as cURL & Wget
  User-Agent examples Browser Crawlers/spiders/robots Screen readers Programs Scripts
Scripts - these can be written in any language capable of using the HTTP scheme. Which is pretty much any common one you can imagine. Perl, PHP, Ruby, and Python have HTTP libraries that make it easy to submit requests and accept responses.
  The User-Agent submits the request using one of the defined HTTP methods. Method
  HTTP Methods GET POST HEAD PUT DELETE OPTIONS TRACE
GET - is the most common method. The request is communicated primarily via the URL itself
  HTTP Methods GET POST HEAD PUT DELETE OPTIONS TRACE
POST - is also common, typically used via an HTML form action. The request is communicated via the URL and the forms data is communicated via the requests headers. GETs have a length limit, so POST is a handy alternative when your form might receive a whole lot of data.
  HTTP Methods GET POST HEAD PUT DELETE OPTIONS TRACE
HEAD - the request is communicated the same as a GET, so via URL. But HEAD signals that the response should only contain headers, not body. Whereas the response to a GET is both. PUT, DELETE, OPTIONS & TRACE have not been widely adopted.
  Ive been mentioning headers. The obvious question is: _what the heck is a header_? Headers
  A header is metadata about the request or response. Headers are metadata
  Browsers developer tools can display headers. Lets take a look at these. Notice the structure: each line is one key/value pair. The line starts with a key, then a colon, then the value. Headers Firefox Web Developer Tools
  Headers Chrome Developer Tools
  Headers Safari Developer Tools
  Request headers Include information about: • request • users environment • user-agents capabilities • user-agents identity*
Request headers originate at the user-agent. Its headers include information about: request users environment user-agents capabilities user-agents identity*
  Request headers Include information about: • request • users environment • user-agents capabilities • user-agents identity*
Theres a huge caveat here. The "user-agent" header frequently is a lie. Theres a long story as to why. Never, ever depend on the user-agent header to be accurate.
  Domain Name Server (DNS)
  DNS Distributed map of domains to IPs
When a request is submitted by the user, the user-agent first has to determine where to submit it. The URLs host part can be a domain name or an IP address. But the user-agent actually can only submit to an IP address. This is where DNS comes in. A domain name service (DNS) is a directory of domains and their IP addresses.
  Nameservers For example: NS1.Dreamhost.com NS2.Dreamhost.com NS3.Dreamhost.com NS4.Dreamhost.com
If youve ever registered a domain name, you may have noticed that the domain was either configured with a default set of nameservers, or you had to provide the set of nameservers for your web hosting provider. For example (slide)
  Nameservers For example: NS1.dreamhost.com NS2.dreamhost.com NS3.dreamhost.com NS4.dreamhost.com
Nameservers are a secondary directory necessary for DNS resolution. Theyre responsible for keeping track of IP addresses where each of _your domains_ services are located.
  Host Server
  The host server is not the web server. Rather, its the hardware and operating system that house the web server, and potentially houses other infrastructure used by the site. So for example: OSX, Linux, Unix, Windows Server, or Novel NetWare. Host Servers
  The host server passes the request off to the web server. HTTP/1.1 200 OK response request GET /something
  HTTP/1.1 200 OK response request GET /something
  Web Server
  A web server is software thats capable of taking an HTTP request as input, and of delivering an HTTP response as output. The most popular web servers are Apache, Microsoft IIS, and "engine X" (nginx). Lets take a look at the web servers log of our request. Web Server
  - - [20/Oct/2012:13:58:09 +0000] "GET /blog/?search&term=Confident&month=October HTTP/1.1" 200 11633 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_2) AppleWebKit/537.4 (KHTML, like Gecko) Chrome/22.0.1229.94 Safari/537.4" Web Server Request Log
Looks pretty much like gibberish, right? Ah, but wait....
  - - [20/Oct/2012:13:58:09 +0000] "GET /blog/?search&term=Confident&month=October HTTP/1.1" 200 11633 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_2) AppleWebKit/537.4 (KHTML, like Gecko) Chrome/22.0.1229.94 Safari/537.4" Web Server Request Log
Now see how many parts you can name already. Whats the request method? The IP address? The path? And query string? Any thoughts on what that purple one is?
  The web server interprets the request & determines how it should be handled. It looks at the request path and checks whether it matches any redirection directives. These may be temporary (e.g. "offline for maintenance"), or they may be permanent (e.g. "blog" is now "old-blog"). Web Server Redirect
  In lines 1 & 2, the web server gracefully handles files that have moved. In lines 4 & 5, it uses a regular expression to match any files that match the pattern. Web Server Redirect
  Maybe youve seen something similar to this WordPress panel. It looks like some cool fancy feature. Naw. Its just using a web server redirect. Regular expressions are awesome. Web Server Redirect
  Heres the redirect. Every page and post of your WordPress blog is actually one script file + an HTTP redirect. Thats it. It only look like infinite pages. Why do this? Flexibility, while enforcing consistency. You could change every URL without breaking a single link. Nice. Web Server Redirect (example)
  Once a web server has finished doing redirects, it checks whether the new path matches a corresponding path in the web servers directory hierarchy. If it doesnt match, or theres a problem reading in the content, error time! Web Server Redirect (example)
  Aww, sad panda. Well, sad Octocat. Web Server 404 Page Not Found
  Uh-oh. Headache Puppy says badness happened. Web Server 500 Internal Server Error
  When it does match, now the magic happens. Response
  The web server reads in whatever content the file provides. Text, html, image, output from a script...whatever. HTML
  Passing the request down to a script gives us opportunity to generate some (or all) content dynamically. Often this involves using values passed in via the URL and/or POSTed data. HTTP/1.1 200 OK response request GET /something
  This example script generates output, starting with a response header (line 1), then declaring the end of headers (line 3), then outputting response body (starting at line 5). Script
  A script may also want to call on other services, such as a database. HTTP/1.1 200 OK response request GET /something
  Once the web server receives output suitable for delivery in an HTTP response, it send it on its way. Example Response Response Message
  The response likewise has headers. Most of these describe the content and how it should be handled by the user-agent. Notice that the first line of the response states the final status of the request. If things went well, thats "200 OK". Response headers Mostly describe: • Content • How user-agent should handle it First line is HTTP version and final status HTTP/1.1 200 OK
  The user agent receives the response, and interprets it. In this case, the header had told the user-agent to interpret our responses body as HTML. Example Response Response Message
  Now its the user-agents turn to process what its received. It parses the content, to determine how to render it for the user. Cycle complete! Back to the Frontend
  Now it can determine what _other_ assets are needed. The user-agent fires off requests for those as well. JavaScript, CSS, images, etc. A bunch more cycles! HTTP/1.1 200 OK response request GET /something
  When the user-agent has received everything it needs, its final steps are to layout the content, and render it for the user. HTTP/1.1 200 OK response request GET /something
  Voila. Done.
  So far w
  70. 70. Optimizing AJAX Minification Caching Sprites Expires headers Database indexes, operations, Compression joins Proxy server Partials Content delivery network (CDN) REST/APIs @cczonaTuesday, October 23, 12You may have noticed that theres a lot of potential here to make a site seem sluggish. Sowhat we learn from the HTTP request-response cycle is that there are many opportunities tooptimize for fast performance.
  71. 71. Stay tuned... @cczonaTuesday, October 23, 12Luckily, Estelle will be talking next about a bunch of methods for optimizing performance!
  72. 72. Thank you! Any questions? @cczonaTuesday, October 23, 12Any questions for me?
