Programming TCP for responsiveness

5,886 views

Published on

Techniques to write a TCP/IP application optimized for responsiveness.

Published in: Internet

Programming TCP for responsiveness

  1. 1. Copyright (C) 2016 DeNA Co.,Ltd. All Rights Reserved. Programming TCP for responsiveness DeNA Co., Ltd. Kazuho Oku 1
  2. 2. Copyright (C) 2016 DeNA Co.,Ltd. All Rights Reserved. explains TCP latency optimization implemented in H2O HTTP/2 server 2.1 2 Programming TCP for responsivesess
  3. 3. Copyright (C) 2016 DeNA Co.,Ltd. All Rights Reserved. Background 3 Programming TCP for responsivesess
  4. 4. Copyright (C) 2016 DeNA Co.,Ltd. All Rights Reserved. TCP slow start n  Initial Congestion Window (IW)=10 ⁃  only 10 packets can be sent in first RTT ⁃  used to be IW=3 n  window increase: 1.5x/RTT 4 Programming TCP for responsivesess 0 100,000 200,000 300,000 400,000 500,000 600,000 700,000 800,000 1 2 3 4 5 6 7 8 bytes transmi,ed RTT TCP slow start (IW10, MSS1460)
  5. 5. Copyright (C) 2016 DeNA Co.,Ltd. All Rights Reserved. Why 1.5x? During slow start, a TCP increments cwnd by at most SMSS bytes for each ACK received that cumulatively acknowledges new data. (snip) The delayed ACK algorithm specified in [RFC1122] SHOULD be used by a TCP receiver. When using delayed ACKs, a TCP receiver MUST NOT excessively delay acknowledgments. Specifically, an ACK SHOULD be generated for at least every second full-sized segment, and MUST be generated within 500 ms of the arrival of the first unacknowledged packet. TCP Congestion Control (RFC 5681) 5 Programming TCP for responsivesess
  6. 6. Copyright (C) 2016 DeNA Co.,Ltd. All Rights Reserved. Flow of the ideal HTTP n  fastest within the limits of TCP/IP n  receive a request 0-RTT, and: ⁃  first send CSS/JS* ⁃  then send the HTML ⁃  then send the images* *: but only the ones not cached by the browser 6 Programming TCP for responsivesess client server 1 RTT request response
  7. 7. Copyright (C) 2016 DeNA Co.,Ltd. All Rights Reserved. The reality in HTTP/2 n  TCP establishment: +1 RTT* n  TLS handshake: +2 RTT** n  HTML fetch: +1 RTT n  JS,CSS fetch: +2 RTT*** n  Total: 6 RTT *: 0 RTT on reconnection **: 1 RTT on reconnection ***: servers often cannot switch to sending JS,CSS instantly, due to the output buffered in TCP send buffer 7 Programming TCP for responsivesess client server 1 RTT TCP SYN TCP SYNACK TLS Handshake TLS Handshake TLS Handshake TLS Handshake GET / HTML GET css,js css, js 〜〜
  8. 8. Copyright (C) 2016 DeNA Co.,Ltd. All Rights Reserved. Ongoing optimizations n  TCP Fast Open ⁃  initial establishment in 1 RTT ⁃  re-establishment in 0 RTT n  TLS 1.3 ⁃  initial handshake complete in 1 RTT ⁃  resumption in 0 RTT n  what can be done in the HTTP/2 layer? 8 Programming TCP for responsivesess
  9. 9. Copyright (C) 2016 DeNA Co.,Ltd. All Rights Reserved. Programming TCP for responsiveness 9 Programming TCP for responsivesess
  10. 10. Copyright (C) 2016 DeNA Co.,Ltd. All Rights Reserved. Programming TCP for responsiveness Answer: TCP Urgent Indications (i.e. MSG_OOB) 10 Programming TCP for responsivesess
  11. 11. Copyright (C) 2016 DeNA Co.,Ltd. All Rights Reserved. Programming TCP for responsiveness Answer: TCP Urgent Indications (i.e. MSG_OOB) 11 Programming TCP for responsivesess
  12. 12. Copyright (C) 2016 DeNA Co.,Ltd. All Rights Reserved. TCP Urgent Indications n  out-of-band messaging for TCP ⁃  used by telnet! n  can only send 1 octet ⁃  conflicting specs on how to handle multi-octet messages n  cannot be used for HTTP/2 n  RFC 6093 “recommends against the use of urgent mechanism” (RFC 7414) 12 Programming TCP for responsivesess
  13. 13. Copyright (C) 2016 DeNA Co.,Ltd. All Rights Reserved. Typical sequence of HTTP/2 13 Programming TCP for responsivesess HTTP/2 200 OK <!DOCTYPE HTML> … <SCRIPT SRC=”jquery.js”> … client server GET / GET /jquery.js need to switch sending from HTML to JS at this very moment (means that amount of data sent in * must be smaller than IW) 1 RTT *
  14. 14. Copyright (C) 2016 DeNA Co.,Ltd. All Rights Reserved. Buffering in TCP and TLS layer 14 Programming TCP for responsivesess TCP send buffer CWND unacked poll threshold BIO buf. // ordinary code (non-blocking) while (SSL_write(…) != SSL_ERR_WANT_WRITE) ; TLS Records sent immediately not immediately sent HTTP/2 frames
  15. 15. Copyright (C) 2016 DeNA Co.,Ltd. All Rights Reserved. Why do we have buffers? 15 Programming TCP for responsivesess n  TCP send buffer: ⁃  reduce ping-pong bet. kernel and application n  BIO buffer: ⁃  for data that couldnʼt be stored in TCP send buffer TCP send buffer CWND unacked poll threshold BIO buf. TLS Records sent immediately not immediately sent HTTP/2 frames
  16. 16. Copyright (C) 2016 DeNA Co.,Ltd. All Rights Reserved. Improvement: poll-then-write 16 Programming TCP for responsivesess TCP send buffer CWND unacked poll threshold // only call SSL_write when polls notifies the app. while (poll_for_write(fd) == SOCKET_IS_READY) SSL_write(…); TLS Records sent immediately not immediately sent HTTP/2 frames
  17. 17. Copyright (C) 2016 DeNA Co.,Ltd. All Rights Reserved. Adjust poll threshold 17 Programming TCP for responsivesess TCP send buffer CWND unacked poll threshold n  set poll threshold to the end of CWND? ⁃  setsockopt(TCP_NOTSENT_LOWAT) ⁃  in linux, the minimum is CWND + 1 octet •  becomes unstable when set to CWND + 0 TLS Records sent immediately not immediately sent HTTP/2 frames
  18. 18. Copyright (C) 2016 DeNA Co.,Ltd. All Rights Reserved. Adjust poll threshold 18 Programming TCP for responsivesess CWND unacked poll threshold // only call SSL_write when polls notifies the app. while (poll_for_write(fd) == SOCKET_IS_READY) SSL_write(…); TLS Records sent immediately not immediately sent HTTP/2 frames TCP send buffer
  19. 19. Copyright (C) 2016 DeNA Co.,Ltd. All Rights Reserved. Further improvement: read TCP states 19 Programming TCP for responsivesess CWND unacked poll threshold // calc size of data to send by calling getsockopt(TCP_INFO) if (poll_for_write(fd) == SOCKET_IS_READY) { capacity = CWND - unacked + TWO_MSS - TLS_overhead; SSL_write(prepare_http2_frames(capacity)); } TLS Records sent immediately not immediately sent HTTP/2 frames TCP send buffer
  20. 20. Copyright (C) 2016 DeNA Co.,Ltd. All Rights Reserved. Negative impact of additional delay n  increased delay bet. ACK recv. → data send, since: ⁃  traditional approach: completes within kernel ⁃  this approach: application needs to be notified to generate new data n  outcome: ⁃  increase of CWND becomes slower ⁃  leads to slower peak speed? •  depends on how CWND at peak is calculated ⁃  does kernel use TCP timestamp for the matter? 20 Programming TCP for responsivesess
  21. 21. Copyright (C) 2016 DeNA Co.,Ltd. All Rights Reserved. Countermeasures n  optimize for responsiveness only when necessary ⁃  i.e. when RTT is big and CWND is small ⁃  impact of optimization is proportional to unsent_bytes / CWND n  disable optimization if additional delay is significant ⁃  when epoll returns immediately, estimated additional delay is equal to the time spent by the loop 21 Programming TCP for responsivesess
  22. 22. Copyright (C) 2016 DeNA Co.,Ltd. All Rights Reserved. Configuration Directives n  http2-latopt-min-rtt ⁃  minimum TCP RTT to enable the optimization ⁃  default: UINT_MAX (disabled) n  http2-latopt-max-cwnd ⁃  maximum CWND to enable (in octets) ⁃  default: 65535 n  http2-max-additional-delay ⁃  max. additional delay (as the ratio to TCP RTT) ⁃  latopt disabled if the delay is greater ⁃  default: 0.1 22 Programming TCP for responsivesess
  23. 23. Copyright (C) 2016 DeNA Co.,Ltd. All Rights Reserved. Pseudo-code size_t get_suggested_write_size() { getsockopt(fd, IPPROTO_TCP, TCP_INFO, &tcp_info, sizeof(tcp_info)); if (tcp_info.tcpi_rtt < min_rtt || tcp_info.tcpi_snd_cwnd > max_cwnd) return UNKNOWN; switch (SSL_get_current_cipher(ssl)->id) { case TLS1_CK_RSA_WITH_AES_128_GCM_SHA256: case …: tls_overhead = 5 + 8 + 16; break; default: return UNKNOWN; } packets_sendable = tcp_info.tcpi_snd_cwnd > tcp_info.tcpi_unacked ? tcp_info.tcpi_snd_cwnd - tcp_info.tcpi_unacked : 0; return (packets_sendable + 2) * (tcp_info.tcpi_snd_mss - tls_overhead); } 23 Programming TCP for responsivesess
  24. 24. Copyright (C) 2016 DeNA Co.,Ltd. All Rights Reserved. Benchmark (1) 24 Programming TCP for responsivesess n  conditions: ⁃  server in Ireland, client in Tokyo (RTT 250ms) ⁃  load tiny js at the top of a large HTML n  result: delay decreased from 511ms to 250ms ⁃  i.e. JS fetch latency was 2RTT, became 1 RTT •  similar results in other environments
  25. 25. Copyright (C) 2016 DeNA Co.,Ltd. All Rights Reserved. Benchmark (2) n  using same data as previous n  server: Sakura VPS (Ishikari DC) 25 Programming TCP for responsivesess 0 50 100 150 200 250 300 HTML JS milliseconds downloading HTML (and JS within) RTT ~25ms master latopt
  26. 26. Copyright (C) 2016 DeNA Co.,Ltd. All Rights Reserved. Conclusion n  near-optimal result can be achieved ⁃  by adjusting poll threshold and reading TCP states ⁃  1-packet overhead due to restriction in Linux kernel n  1-RTT improvement in H2O ⁃  estimated 1-RTT improvement per the depth of the load graph 26 Programming TCP for responsivesess
  27. 27. Copyright (C) 2016 DeNA Co.,Ltd. All Rights Reserved. Under the hood 27 Programming TCP for responsivesess
  28. 28. Copyright (C) 2016 DeNA Co.,Ltd. All Rights Reserved. TCP_NOTSENT_LOWAT n  supported by Linux, OS X n  on Linux: ⁃  sysctl: •  set to -1: use kernel default •  set to 0: sshd hangs •  set to positive int: override kernel default ⁃  setsockopt: •  set to 0: use default (sysctl or kernel) •  set to int: override default 28 Programming TCP for responsivesess
  29. 29. Copyright (C) 2016 DeNA Co.,Ltd. All Rights Reserved. Unit of CWND n  Linux: # of packets ⁃  if INITCWND is 10, you can send at most 10 packets at once, regardless of their size n  BSD (incl. OS X): octets ⁃  you can send CWND*MSS octets, regardless of the number of packets •  if CWND=10 and MSS=1460, it is possible to send 14,600 packets containing 1-octet payload 29 Programming TCP for responsivesess
  30. 30. Copyright (C) 2016 DeNA Co.,Ltd. All Rights Reserved. Determining amount of data that can be sent immediately OS MSS CWND inflight send buffer (inflight + unsent) Linux tcpi_snd_mss tcpi_snd_cwnd* tcpi_snd_unacked* ioctl(SIOCOUTQ) OS X** tcpi_maxseg tcpi_snd_cwnd - tcpi_snd_sbbytes FreeBSD tcpi_snd_mss tcpi_snd_cwnd - ioctl(FIONWRITE) NetBSD tcpi_snd_mss tcpi_snd_cwnd* - ioctl(FIONWRITE) 30 Programming TCP for responsivesess n  calculate either of: ⁃  CWND - inflight ⁃  min(CWND - (inflight + unsent), 0) n  units used in the calculation must be the same ⁃  NetBSD: fail *: units of values marked are packets, unmarked are octets **: somefmes the values of tcpi_* are returned as zeros

×