HTTP at your local BigCo


Published on

Published in: Automotive, Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

HTTP at your local BigCo

  1. 1. HTTP at your local BigCo:How the internet sausage gets made <br />Peter Griess<br />@pgriess<br />
  2. 2. Goals and non-goals<br />Basics of TCP/IP, DNS and HTTP and how they work together; pitfalls and optimizations<br />A 1,000 foot view of scaling out HTTP infrastructure<br />All manner of load balancing / traffic shaping<br />Living on the edge<br />Not: how to make a fast application (database access, rendering performance, etc)<br />
  3. 3. Background: DNS<br />Map hostnames to IP(s)<br />,<br />Resolution process<br />Recursion (and what does the DNS server see?)<br />Caching<br />Latencies: on-host, cached in LAN, cached at ISP, miss<br />
  4. 4. Background: TCP<br />Stateful protocol<br />Negotiated by a synchronous 3-way handshake:<br />2xRTT before first byte is sent!<br />e.g. USA => South America ~250ms RTT<br />Seamless failover is hard (but not impossible)<br />Load balancing must be aware of flows<br />
  5. 5. Background: HTTP<br />Layered on top of TCP/TLS<br />Has some useful bits<br />Compression<br />Connection re-use<br />Pipelining<br />Caching<br />Kind of sucks<br />Headers on all requests/responses<br />Compression on bodies only<br />Pipelining has to be disabled most of the time<br />Pipelining suffers from head-of-line blocking<br />
  6. 6.<br /><br />Big bad internet<br />HTTP<br />
  7. 7. Problem?<br />
  8. 8. Problem<br />Availability<br />Server goes down (kernel panic?)<br />Network goes down (cable cut?)<br />Datacenter goes down (EC2?)<br />Overload<br />Shed load (good, can be transparent)<br />Get infinitely slow (not good)<br />
  9. 9. multi-server<br />Big bad internet<br />???<br />
  10. 10. We have options<br />DNS load balancing<br />IP load balancing<br />HTTP load balancing<br />
  11. 11. DNS load balancing<br /> resolves to IPs: A, B, C, D<br />Add new IPs to scale out<br />Remove IPs when hosts go down<br />Benefits<br />Don’t need extra hardware to do load balancing<br />Can span datacenters<br />DNS servers are cheap / fast<br />Drawbacks<br />Hotspots due to caching<br />Hotspots due to ordering in result list<br />Hotspots due to resolver size<br />TTL / flexibility trade-off<br />
  12. 12. DNS<br />Big bad internet<br />DNS Server<br />DNS<br /><br /><br /><br />
  13. 13. IP load balancing (1)<br /> resolves to 1 public IP owned by an IP load balancer<br />Add new backend hosts w/ private IPs to scale out<br />Load balancer health-checks hosts actively or passively to avoid dead hosts<br />Scheduling policies vs. failover<br />DSR<br />
  14. 14. IP load balancing (2)<br />Benefits<br />Only 1 public IP (high DNS TTL)<br />Backend network capacity/membership transparent to the internet<br />Cheap-ish<br />Failover is possible, not insanely difficult<br />Drawbacks<br />Can’t do what you can with HTTP<br />
  15. 15. IP<br /><br />Big bad internet<br /><br />GW<br /><br /><br />LB<br />
  16. 16. HTTP load balancing (1)<br /> resolves to 1 public IP owned by an HTTP load balancer<br />Largely same as IP load balancing<br />Terminates TCP connections (sees all bytes)<br />Can make routing decisions based on HTTP<br />Can autonomously serve requests (caching, access control, etc)<br />Examples:<br />Send requests for /foo/* to pool A<br />401 requests without cookie Q<br />
  17. 17. HTTP load balancing (2)<br />Benefits<br />Largely the same as IP<br />More flexible rules<br />Can terminate TLS (security+, cost+)<br />Drawbacks<br />No DSR<br />Failover difficult<br />Not as performant as IP<br />
  18. 18. HTTP<br /><br />Big bad internet<br /><br /><br />LB<br />HTTP(S)<br /><br />
  19. 19. MOAR<br />Eventually a single LB is going to be a problem<br />Not enough capacity<br />Availability<br />Turtles all the day way down<br />LB of LBs!<br />DNS load balancing between datacenters<br />…<br />
  20. 20. HTTPS: myths and reality<br />Too computationally expensive<br />Only a few percent (; is your webserver actually CPU bound? doubt it<br />SSL acceleration cards, GPUs, etc<br />Too much latency<br />Handshaking is 5-7xRTT<br />Session resume<br />False start<br />Snap start<br />Caching breaks<br />
  21. 21. My latency is huge in Japan<br />RTT to USA is (or any single DC) can be huge<br />Re-use connections (connection: keep-alive)<br />Send work in parallel (pipelining)<br />Use compression (content-encoding)<br />Lots of tricks for static resources (bundling, CDNs, caching, etc)<br />Pre-fetch data<br />
  22. 22. Let’s get crazy: SPDY<br />Don’t limit yourself to HTTP; use a different protocol<br />SPDY developed by Google, supported by Chrome, (and soon<br />Connection re-use w/o head-of-line blocking<br />Headers always compressed<br />Always SSL (but breaks caching)<br />
  23. 23. Let’s get crazy: TCP termination<br />Synchronous RTTs: the silent killer<br />Opening new TCP connections is very costly<br />Run proxies close to users and proxy traffic back to core using optimized protocol<br />Low RTT to proxy<br />Do SPDY-like tricks between edge + core<br />Potentially faster network to core than public internet<br />Advertise these proxies via DNS<br />Geo-targetting<br />AS-adjacency<br />Akamai CDN does this, sort of<br />
  24. 24. Let’s get crazy: DNS anycast<br />Remember how DNS resolutions were slow?<br />DNS servers could be far away from a user<br />Advertise multiple network routes for the same DNS IP, let the IP stack pick the closest one<br />