Successfully reported this slideshow.
Your SlideShare is downloading. ×

Developing the fastest HTTP/2 server

Developing the fastest HTTP/2 server

Download to read offline

Presentation material for TokyoRubyKaigi11.
Describes techniques used by H2O, including: techniques to optimize TCP for responsiveness, server-push and cache digests.

Presentation material for TokyoRubyKaigi11.
Describes techniques used by H2O, including: techniques to optimize TCP for responsiveness, server-push and cache digests.

Advertisement
Advertisement

More Related Content

Advertisement
Advertisement

Developing the fastest HTTP/2 server

  1. 1. Copyright (C) 2016 DeNA Co.,Ltd. All Rights Reserved. Developing the fastest HTTP/2 server DeNA Co., Ltd. Kazuho Oku 1
  2. 2. Copyright (C) 2016 DeNA Co.,Ltd. All Rights Reserved. Who am I? n  Kazuho Oku n  Major works: ⁃  Palmscape / Xiino (web browser for Palm OS) •  awarded M.I.T. TR 100/2002 ⁃  Mitoh project 2004 super creator ⁃  Q4M (message queue plugin for MySQL) •  MySQL Conference Community Awards 2011 ⁃  H2O (HTTP/2 server) •  Japan OSS Contribution Award 2015 2 Developing the fastest HTTP/2 server
  3. 3. Copyright (C) 2016 DeNA Co.,Ltd. All Rights Reserved. Background 3 Developing the fastest HTTP/2 server
  4. 4. Copyright (C) 2016 DeNA Co.,Ltd. All Rights Reserved. Responsiveness is important 4 Developing the fastest HTTP/2 server source: h@p://radar.oreilly.com/2009/06/bing-and-google-agree-slow-pag.html n  500ms increase → -1.2% revenue
  5. 5. Copyright (C) 2016 DeNA Co.,Ltd. All Rights Reserved. Increasing size and # of requests 5 Developing the fastest HTTP/2 server source: h@p://h@parchive.org/trends.php?s=All&minlabel=Aug+1+2011&maxlabel=Aug+1+2015#bytesTotal&reqTotal
  6. 6. Copyright (C) 2016 DeNA Co.,Ltd. All Rights Reserved. Bandwidth is also increasing n  end-usersʼ B/W increase 50% every year (Nielsenʼs Law) 6 Developing the fastest HTTP/2 server source: h@p://www.nngroup.com/arRcles/law-of-bandwidth/
  7. 7. Copyright (C) 2016 DeNA Co.,Ltd. All Rights Reserved. More bandwidth doesnʼt matter 7 Developing the fastest HTTP/2 server source: More Bandwidth Doesn't Ma@er - 2011 Mike Belshe (Google) * effective B/W reaches ceiling at around 1.6Mbps
  8. 8. Copyright (C) 2016 DeNA Co.,Ltd. All Rights Reserved. Latency is the new bottleneck 8 Developing the fastest HTTP/2 server source: More Bandwidth Doesn't Ma@er - 2011 Mike Belshe (Google)
  9. 9. Copyright (C) 2016 DeNA Co.,Ltd. All Rights Reserved. Latency cannot be optimized n  latency = speed of light ⁃  round-trip bet. Japan and US: 80ms n  mobile carriers have huge latency ⁃  LTE ~ 50ms n  the Web is becoming more and more complex 9 Developing the fastest HTTP/2 server
  10. 10. Copyright (C) 2016 DeNA Co.,Ltd. All Rights Reserved. Web is becoming slower ... unless we do something. 10 Developing the fastest HTTP/2 server
  11. 11. Copyright (C) 2016 DeNA Co.,Ltd. All Rights Reserved. Solution: new protocol 11 Developing the fastest HTTP/2 server
  12. 12. Copyright (C) 2016 DeNA Co.,Ltd. All Rights Reserved. HTTP/2! 12 Developing the fastest HTTP/2 server
  13. 13. Copyright (C) 2016 DeNA Co.,Ltd. All Rights Reserved. The reasons HTTP/1.1 is slow n  concurrency is too small ⁃  multiple round-trips required when issuing many requests n  no prioritization between. requests ⁃  can suspend HTML / image streams in favor of CSS / JS n  big request / response headers ⁃  typically hundreds of octets ⁃  becomes an overhead when issuing many reqs. 13 Developing the fastest HTTP/2 server
  14. 14. Copyright (C) 2016 DeNA Co.,Ltd. All Rights Reserved. HTTP/2 n  RFC 7540 (2015/5) ⁃  based on SPDY by Google n  key features: ⁃  binary protocol ⁃  header compression ⁃  multiplexing ⁃  prioritization 14 Developing the fastest HTTP/2 server
  15. 15. Copyright (C) 2016 DeNA Co.,Ltd. All Rights Reserved. Benchmark n  red bar: time spent until first-paint n  big difference bet. server implementations n  reason: quality of prioritization logic n  H2O shows the true potential of HTTP/2 15 Developing the fastest HTTP/2 server
  16. 16. Copyright (C) 2016 DeNA Co.,Ltd. All Rights Reserved. Have we reached the limit? 16 Developing the fastest HTTP/2 server
  17. 17. Copyright (C) 2016 DeNA Co.,Ltd. All Rights Reserved. Letʼs consider what would be the ideal HTTP flow. 17 Developing the fastest HTTP/2 server
  18. 18. Copyright (C) 2016 DeNA Co.,Ltd. All Rights Reserved. TCP slow start n  Initial Congestion Window (IW)=10 ⁃  only 10 packets can be sent in first RTT ⁃  used to be IW=3 n  window increase: 1.5x/RTT 18 Developing the fastest HTTP/2 server 0 100,000 200,000 300,000 400,000 500,000 600,000 700,000 800,000 1 2 3 4 5 6 7 8 bytes transmi,ed RTT TCP slow start (IW10, MSS1460)
  19. 19. Copyright (C) 2016 DeNA Co.,Ltd. All Rights Reserved. Flow of the ideal HTTP n  fastest within the limits of TCP/IP n  receive a request 0-RTT, and: ⁃  first send CSS/JS* ⁃  then send the HTML ⁃  then send the images* *: but only the ones not cached by the browser 19 Developing the fastest HTTP/2 server client server 1 RTT request response
  20. 20. Copyright (C) 2016 DeNA Co.,Ltd. All Rights Reserved. The reality in HTTP/2 n  TCP establishment: +1 RTT n  TLS handshake: +2 RTT* n  HTML fetch: +1 RTT n  JS,CSS fetch: +2 RTT** n  Total: 6 RTT *: 1 RTT on reconnection **: servers often cannot switch to sending JS,CSS instantly, due to the output buffered in TCP send buffer 20 Developing the fastest HTTP/2 server client server 1 RTT TCP SYN TCP SYNACK TLS Handshake TLS Handshake TLS Handshake TLS Handshake GET / HTML GET css,js css, js 〜〜
  21. 21. Copyright (C) 2016 DeNA Co.,Ltd. All Rights Reserved. Ongoing optimizations n  TCP Fast Open ⁃  connection establishment in 0 RTT n  TLS 1.3 ⁃  initial handshake complete in 1 RTT ⁃  resumption in 0 RTT n  what can be done in the HTTP/2 layer? 21 Developing the fastest HTTP/2 server
  22. 22. Copyright (C) 2016 DeNA Co.,Ltd. All Rights Reserved. Further optimizations in HTTP/2 layer n  optimize TCP for responsiveness n  Cache-aware server push 22 Developing the fastest HTTP/2 server
  23. 23. Copyright (C) 2016 DeNA Co.,Ltd. All Rights Reserved. Optimizing TCP for responsiveness 23 Developing the fastest HTTP/2 server
  24. 24. Copyright (C) 2016 DeNA Co.,Ltd. All Rights Reserved. Typical sequence of HTTP/2 24 Developing the fastest HTTP/2 server HTTP/2 200 OK <!DOCTYPE HTML> … <SCRIPT SRC=”jquery.js”> … client server GET / GET /jquery.js need to switch sending from HTML to JS at this very moment (means that amount of data sent in * must be smaller than IW) 1 RTT *
  25. 25. Copyright (C) 2016 DeNA Co.,Ltd. All Rights Reserved. Buffering in TCP and TLS layer 25 Developing the fastest HTTP/2 server TCP send buffer CWND unacked poll threshold BIO buf. // ordinary code (non-blocking) while (SSL_write(…) != SSL_ERR_WANT_WRITE) ; TLS Records sent immediately not immediately sent HTTP/2 frames
  26. 26. Copyright (C) 2016 DeNA Co.,Ltd. All Rights Reserved. Why do we have buffers? 26 Developing the fastest HTTP/2 server n  TCP send buffer: ⁃  reduce ping-pong bet. kernel and application n  BIO buffer: ⁃  for data that couldnʼt be stored in TCP send buffer TCP send buffer CWND unacked poll threshold BIO buf. TLS Records sent immediately not immediately sent HTTP/2 frames
  27. 27. Copyright (C) 2016 DeNA Co.,Ltd. All Rights Reserved. Improvement: poll-then-write 27 Developing the fastest HTTP/2 server TCP send buffer CWND unacked poll threshold // only call SSL_write when polls notifies the app. while (poll_for_write(fd) == SOCKET_IS_READY) SSL_write(…); TLS Records sent immediately not immediately sent HTTP/2 frames
  28. 28. Copyright (C) 2016 DeNA Co.,Ltd. All Rights Reserved. Adjust poll threshold 28 Developing the fastest HTTP/2 server TCP send buffer CWND unacked poll threshold n  set poll threshold to the end of CWND? ⁃  setsockopt(TCP_NOTSENT_LOWAT) ⁃  in linux, the minimum is CWND + 1 octet •  becomes unstable when set to CWND + 0 TLS Records sent immediately not immediately sent HTTP/2 frames
  29. 29. Copyright (C) 2016 DeNA Co.,Ltd. All Rights Reserved. Adjust poll threshold 29 Developing the fastest HTTP/2 server CWND unacked poll threshold // only call SSL_write when polls notifies the app. while (poll_for_write(fd) == SOCKET_IS_READY) SSL_write(…); TLS Records sent immediately not immediately sent HTTP/2 frames TCP send buffer
  30. 30. Copyright (C) 2016 DeNA Co.,Ltd. All Rights Reserved. Further improvement: read TCP states 30 Developing the fastest HTTP/2 server CWND unacked poll threshold // calc size of data to send by calling getsockopt(TCP_INFO) if (poll_for_write(fd) == SOCKET_IS_READY) { capacity = CWND + unacked + ONE_MSS - TLS_overhead; SSL_write(prepare_http2_frames(capacity)); } TLS Records sent immediately not immediately sent HTTP/2 frames TCP send buffer
  31. 31. Copyright (C) 2016 DeNA Co.,Ltd. All Rights Reserved. Issues in the proposed approach n  increased delay bet. ACK recv. → data send ⁃  leads to slower peak speed ⁃  reason: •  traditional approach: completes within kernel •  this approach: application needs to be notified to generate new data n  solution: ⁃  use the approach only when necessary •  i.e. when RTT is big and CWND is small •  increased delay can be ignored if: delay << RTT 31 Developing the fastest HTTP/2 server
  32. 32. Copyright (C) 2016 DeNA Co.,Ltd. All Rights Reserved. Code for calculating size of data to send size_t get_suggested_write_size() { getsockopt(fd, IPPROTO_TCP, TCP_INFO, &tcp_info, sizeof(tcp_info)); if (tcp_info.tcpi_rtt < min_rtt || tcp_info.tcpi_snd_cwnd > max_cwnd) return UNKNOWN; switch (SSL_get_current_cipher(ssl)->id) { case TLS1_CK_RSA_WITH_AES_128_GCM_SHA256: case …: tls_overhead = 5 + 8 + 16; break; default: return UNKNOWN; } packets_sendable = tcp_info.tcpi_snd_cwnd > tcp_info.tcpi_unacked ? tcp_info.tcpi_snd_cwnd - tcp_info.tcpi_unacked : 0; return (packets_sendable + 1) * (tcp_info.tcpi_snd_mss - tls_overhead); } 32 Developing the fastest HTTP/2 server
  33. 33. Copyright (C) 2016 DeNA Co.,Ltd. All Rights Reserved. Benchmark 33 Developing the fastest HTTP/2 server n  conditions: ⁃  server in Ireland, client in Japan (RTT 250ms) ⁃  load tiny js at the top of a large HTML n  result: delay decreased from 511ms to 250ms ⁃  i.e. JS fetch latency was 2RTT, became 1 RTT •  similar results in other environments
  34. 34. Copyright (C) 2016 DeNA Co.,Ltd. All Rights Reserved. Conclusion n  near-optimal result can be achieved ⁃  by adjusting poll threshold and reading TCP states ⁃  1-packet overhead due to restriction in Linux kernel n  1-RTT improvement in H2O ⁃  estimated 1-RTT improvement per the depth of the load graph 34 Developing the fastest HTTP/2 server
  35. 35. Copyright (C) 2016 DeNA Co.,Ltd. All Rights Reserved. Same problem exists with load balancers n  L4 L/B or TLS terminator also act as buffers ⁃  impact bigger than that of TCP send buffer of httpd n  solution: ⁃  best: donʼt use L/B ⁃  next to best: implement mitigations in L/B ⁃  long-term: TCP migration + L3 NAT or DSR •  i.e. accept in L/B, then transfer the connection to HTTP/2 server 35 Developing the fastest HTTP/2 server
  36. 36. Copyright (C) 2016 DeNA Co.,Ltd. All Rights Reserved. Cache-aware Server Push 36 Developing the fastest HTTP/2 server
  37. 37. Copyright (C) 2016 DeNA Co.,Ltd. All Rights Reserved. What is server-push? n  start the delivery of CSS / JS when receiving a request for HTML n  effect: ⁃  1 RTT reduction, or more 37 Developing the fastest HTTP/2 server
  38. 38. Copyright (C) 2016 DeNA Co.,Ltd. All Rights Reserved. Use-case: conceal request process time n  ex. RTT=50ms, process time=200ms 38 Developing the fastest HTTP/2 server req. process request push-asset HTML push-asset push-asset push-asset req. process request asset HTML asset asset asset req. 450ms (5 RTT + processing =me) 250ms (1 RTT + processing =me) without push with push
  39. 39. Copyright (C) 2016 DeNA Co.,Ltd. All Rights Reserved. Use-case: conceal network distance n  CDNsʼ use-case ⁃  utilize the conn. while waiting for app. response ⁃  side-effect: reduce the number of app DCs 39 Developing the fastest HTTP/2 server req. push-asset HTML push-asset push-asset push-asset client edge server (CDN) app. server req. HTML
  40. 40. Copyright (C) 2016 DeNA Co.,Ltd. All Rights Reserved. Issues of server-push n  how to determine if a resource is already cached ⁃  shouldnʼt push a resource already in cache •  waste of bandwidth (and time) ⁃  canʼt issue a request to identify the cache state •  since it would waste 1 RTT we are trying to reduce! 40 Developing the fastest HTTP/2 server
  41. 41. Copyright (C) 2016 DeNA Co.,Ltd. All Rights Reserved. Cache-aware server push n  experimental feature since H2O 1.5 n  create a digest of URLs found in browser cache ⁃  uses Golomb coded sets •  space-efficient variant of bloom filter n  server uses the digest to determine whether or not to push 41 Developing the fastest HTTP/2 server
  42. 42. Copyright (C) 2016 DeNA Co.,Ltd. All Rights Reserved. Memo: fresh vs. stale n  two states of a cached resource n  fresh: ⁃  resource that can be used ⁃  example: Expires: Jan 1 2030 n  stale: ⁃  needs revalidation before use •  i.e. issue GET with if-modified-since 42 Developing the fastest HTTP/2 server
  43. 43. Copyright (C) 2016 DeNA Co.,Ltd. All Rights Reserved. Generating a digest 1.  calc hashcode of URLs of every fresh cache ⁃  range: 0 .. #-of-URL / false-positive-rate 2.  sort the hashcodes, remove duplicates 3.  emit the first element using the following encoding: 1.  “value * FPR” using unary coding 2.  “value mod (1/false-positive-rate)” using binary coding 4.  for every other element, emit the delta from preceding element subtracted by one using the encoding 5.  pad 1 up to the byte boundary 43 Developing the fastest HTTP/2 server
  44. 44. Copyright (C) 2016 DeNA Co.,Ltd. All Rights Reserved. Generating a digest n  scenario: ⁃  FPR: 1/256 ⁃  URLs of fresh resources in cache: •  https://example.com/ecma.js •  https://example.com/style.css n  calc hash modulo 512: 0x3d, 0x16b n  sort, remove dupes, and emit the delta: ⁃  0x3d → 0 00111101 ⁃  0x16b - 0x3d - 1 → 0x12d → 10 00101101 ⁃  padding → 111111 44 Developing the fastest HTTP/2 server
  45. 45. Copyright (C) 2016 DeNA Co.,Ltd. All Rights Reserved. Overhead of sending the digest n  size: #-of-URLs * (1/log2(FPR) + 1.x) bits n  1,400 URLs can be stored in 1 packet ⁃  when false-positive-rate set to 1/128 n  can raise FPR to cram more URLs ⁃  false-positive means the resource is not pushed, browser can just pull it ⁃  pushing some of the required resources is better than none 45 Developing the fastest HTTP/2 server
  46. 46. Copyright (C) 2016 DeNA Co.,Ltd. All Rights Reserved. Where to store the digest? n  cookie ⁃  pros: runs on any browser, anytime ⁃  cons: digest becomes inaccurate •  only the browser knows whatʼs in the browser cache n  ServiceWorker (+ServiceWorker Cache) ⁃  pros: runs on Chrome, Firefox ⁃  cons: doesnʼt start until leaving the landing page n  HTTP/2 frame ⁃  pros: minimal octets transferred •  thanks to the knowledge of HTTP/2 connection ⁃  cons: needs to be implemented by browser developer 46 Developing the fastest HTTP/2 server
  47. 47. Copyright (C) 2016 DeNA Co.,Ltd. All Rights Reserved. Discussion at IETF n  IETF 95 (April) ⁃  initial submission of the internet draft •  co-author: Mark Nottingham (HTTP WG Chair) ⁃  defines the HTTP/2 frame •  since itʼs the best way in the long-term •  store the frame in headers / cookies for the short- term n  IETF 96, HTTP Workshop (July) ⁃  to define digest calculation of stale resources 47 Developing the fastest HTTP/2 server
  48. 48. Copyright (C) 2016 DeNA Co.,Ltd. All Rights Reserved. Handling stale resources n  hash key changed to URL + Etag ⁃  anyone needs support for last-modified? n  server uses URL + Etag of the resource to check the digest ⁃  push the resource in case a match is not found ⁃  push 304 Not Modified in case a match is found 48 Developing the fastest HTTP/2 server
  49. 49. Copyright (C) 2016 DeNA Co.,Ltd. All Rights Reserved. Difficulties in pushing 304 n  Etag cannot always be obtained immediately ⁃  cannot build If-Match request header without etag ⁃  the “request*” of a pushed resource SHOULD be sent before the main response n  proposed solution: ⁃  allow 304 against a non-conditional GET *: in case of server-push, the server generates both request and response, sends them to the client. 49 Developing the fastest HTTP/2 server
  50. 50. Copyright (C) 2016 DeNA Co.,Ltd. All Rights Reserved. Using server-push from Ruby n  Link: rel=preload header ⁃  web server pushes the specified URL HTTP/1.1 200 OK Content-Type: text/html Link: </style.css>; rel=preload # this header!!! ⁃  supported by: •  H2O, nghttpx (nghttp2), mod_h2 (Apache) ⁃  patch for nginx exists 50 Developing the fastest HTTP/2 server
  51. 51. Copyright (C) 2016 DeNA Co.,Ltd. All Rights Reserved. The issue with Link: rel=preload n  cannot initiate push while processing the request 51 Developing the fastest HTTP/2 server client HTTP/2 server Web app. GET / can’t push at this moment GET / 200 OK Link: …200 OK process request
  52. 52. Copyright (C) 2016 DeNA Co.,Ltd. All Rights Reserved. 1xx Early Metadata 52 Developing the fastest HTTP/2 server n  send Link: rel=preload as interim response ⁃  application sends 1xx then processes the request n  supported in H2O 2.1 n  might propose for standardization in IETF GET / HTTP/1.1 Host: example.com HTTP/1.1 1xx Early Metadata Link: </style.css>; rel=preload HTTP/1.1 200 OK Content-Type: text/html; charset=utf-8 <!DOCTYPE HTML> ...
  53. 53. Copyright (C) 2016 DeNA Co.,Ltd. All Rights Reserved. Sending 1xx from Rack n  in case of Unicorn: Proc.new do |env| env[”unicorn.socket”].write( ”HTTP/1.1 1xx Early Metadatarn” + ”Link: </style.js>; rel=preloadrn” + ”rn”); # time-consuming operation ... [ 200, [ ... ], [ ... ] ] end ...we need to define the formal API 53 Developing the fastest HTTP/2 server
  54. 54. Copyright (C) 2016 DeNA Co.,Ltd. All Rights Reserved. Conclusion 54 Developing the fastest HTTP/2 server
  55. 55. Copyright (C) 2016 DeNA Co.,Ltd. All Rights Reserved. Conclusion n  the Web has become faster with HTTP/2 n  HTTP/2 becomes fast as to the limit of TCP/IP with: ⁃  optimizing TCP for responsiveness ⁃  Cache Digest ⁃  1xx Early Metadata 55 Developing the fastest HTTP/2 server
  56. 56. Copyright (C) 2016 DeNA Co.,Ltd. All Rights Reserved. Q&A n  Q. Can it be made faster than the limits o TCP/IP? n  A. Yes! ⁃  shorten the RTT! •  CDNsʼ approach ⁃  make DNS query part of TLS handshake •  was part of TLS 1.3 draft (removed as too premature) ⁃  fairness isnʼt a issue for a private network! •  TCP optimizer for mobile carriers 56 Developing the fastest HTTP/2 server

×