Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Network Performance: Making Every Packet Count - NET401 - re:Invent 2017

397 views

Published on

Many applications are network I/O bound, including common database-based applications and service-based architectures. But operating systems and applications are often not tuned to deliver high performance. This session uncovers hidden issues that lead to low network performance, and shows you how to overcome them to obtain the best network performance possible.

  • Be the first to comment

Network Performance: Making Every Packet Count - NET401 - re:Invent 2017

  1. 1. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Network Performance: Making Every Packet Count M i k e F u r r , P r i n c i p a l E n g i n e e r , E C 2 N o v e m b e r 2 9 , 2 0 1 7 N E T 4 0 1
  2. 2. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. What to expect from this session Tuning TCP on Linux TCP Performance Application
  3. 3. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  4. 4. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. TCP
  5. 5. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. TCP • Transmission Control Protocol • Underlies SSH, HTTP, *SQL, SMTP • Stream delivery, flow control
  6. 6. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. TCP Jack Jill
  7. 7. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Jack Jill
  8. 8. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Limiting in-flight data Jack Jill Receive Window Receive Window Congestion Window Congestion Window
  9. 9. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Bandwidth delay product Jack Jill 2 ms round-trip time
  10. 10. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Bandwidth delay product Jack Jill 100 ms round-trip time
  11. 11. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Receive window Receiver controlled, signaled to sender
  12. 12. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Congestion window Jack Jill Receive Window Receive Window Congestion Window Congestion Window
  13. 13. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Congestion window • Sender controlled • Window is managed by the congestion control algorithm • Inputs—vary by algorithm 
  14. 14. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Initial congestion window $ ip route list default via 10.16.16.1 dev eth0 10.16.16.0/24 dev eth0 proto kernel scope link 169.254.169.254 dev eth0 scope link 1448 1448 1448 = 4344 bytes
  15. 15. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Initial congestion window # ip route change 10.16.16.0/24 dev eth0 proto kernel scope link initcwnd 16 $ ip route list default via 10.16.16.1 dev eth0 10.16.16.0/24 dev eth0 proto kernel scope link initcwnd 16 169.254.169.254 dev eth0 scope link 1448 1448 1448 1448[ + 12 ] = 23168 bytes
  16. 16. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. 0 20 40 60 80 100 0% 2% 4% 6% 8% 10% Loss Rate Impact of loss on TCP throughput
  17. 17. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Loss is visible as TCP retransmissions $ netstat -s | grep retransmit 58496 segments retransmitted 52788 fast retransmits 135 forward retransmits 3659 retransmits in slow start 392 SACK retransmits failed
  18. 18. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Socket level diagnostic $ ss -ite State Recv-Q Send-Q Local Address:Port Peer Address:Port ESTAB 0 3829960 10.16.16.18:https 10.16.16.75:52008 timer:(on,012ms,0) uid:498 ino:7116021 sk:0001c286 <-> ts sack cubic wscale:7,7 rto:204 rtt:1.423/0.14 ato:40 mss:1448 cwnd:138 ssthresh:80 send 1123.4Mbps unacked:138 retrans:0/11737 rcv_space:26847 TCP State
  19. 19. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Socket level diagnostic Bytes queued for transmission $ ss -ite State Recv-Q Send-Q Local Address:Port Peer Address:Port ESTAB 0 3829960 10.16.16.18:https 10.16.16.75:52008 timer:(on,012ms,0) uid:498 ino:7116021 sk:0001c286 <-> ts sack cubic wscale:7,7 rto:204 rtt:1.423/0.14 ato:40 mss:1448 cwnd:138 ssthresh:80 send 1123.4Mbps unacked:138 retrans:0/11737 rcv_space:26847
  20. 20. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Socket level diagnostic $ ss -ite State Recv-Q Send-Q Local Address:Port Peer Address:Port ESTAB 0 3829960 10.16.16.18:https 10.16.16.75:52008 timer:(on,012ms,0) uid:498 ino:7116021 sk:0001c286 <-> ts sack cubic wscale:7,7 rto:204 rtt:1.423/0.14 ato:40 mss:1448 cwnd:138 ssthresh:80 send 1123.4Mbps unacked:138 retrans:0/11737 rcv_space:26847 Congestion control algorithm
  21. 21. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Socket level diagnostic $ ss -ite State Recv-Q Send-Q Local Address:Port Peer Address:Port ESTAB 0 3829960 10.16.16.18:https 10.16.16.75:52008 timer:(on,012ms,0) uid:498 ino:7116021 sk:0001c286 <-> ts sack cubic wscale:7,7 rto:204 rtt:1.423/0.14 ato:40 mss:1448 cwnd:138 ssthresh:80 send 1123.4Mbps unacked:138 retrans:0/11737 rcv_space:26847 Retransmission timeout
  22. 22. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Socket level diagnostic $ ss -ite State Recv-Q Send-Q Local Address:Port Peer Address:Port ESTAB 0 3829960 10.16.16.18:https 10.16.16.75:52008 timer:(on,012ms,0) uid:498 ino:7116021 sk:0001c286 <-> ts sack cubic wscale:7,7 rto:204 rtt:1.423/0.14 ato:40 mss:1448 cwnd:138 ssthresh:80 send 1123.4Mbps unacked:138 retrans:0/11737 rcv_space:26847 Congestion window
  23. 23. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Socket level diagnostic $ ss -ite State Recv-Q Send-Q Local Address:Port Peer Address:Port ESTAB 0 3829960 10.16.16.18:https 10.16.16.75:52008 timer:(on,012ms,0) uid:498 ino:7116021 sk:0001c286 <-> ts sack cubic wscale:7,7 rto:204 rtt:1.423/0.14 ato:40 mss:1448 cwnd:138 ssthresh:80 send 1123.4Mbps unacked:138 retrans:0/11737 rcv_space:26847 Retransmissions
  24. 24. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Monitoring retransmissions in real time Observable using Linux kernel tracing # tcpretrans TIME PID LADDR:LPORT -- RADDR:RPORT STATE 03:31:07 106588 10.16.16.18:443 R> 10.16.16.75:52291 ESTABLISHED https://github.com/brendangregg/perf-tools/
  25. 25. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Congestion control algorithm Jack Jill
  26. 26. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Congestion control algorithms in Linux • New Reno: Pre-2.6.8 • BIC: 2.6.8–2.6.18 • CUBIC: 2.6.19+ • Pluggable architecture • Other algorithms often available • BBR, Vegas, Illinois, Westwood, Highspeed, Scalable
  27. 27. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Tuning congestion control algorithm $ sysctl net.ipv4.tcp_available_congestion_control net.ipv4.tcp_available_congestion_control = cubic reno $ find /lib/modules -name tcp_* […] # modprobe tcp_illinois $ sysctl net.ipv4.tcp_available_congestion_control net.ipv4.tcp_available_congestion_control = cubic reno illinois
  28. 28. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Tuning congestion control algorithm # sysctl net.ipv4.tcp_congestion_control=illinois net.ipv4.tcp_congestion_control = illinois # echo “net.ipv4.tcp_congestion_control = illinois” > /etc/sysctl.d/01-tcp.conf [Restart network processes]
  29. 29. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. TCP-BBR • Available in Linux 4.9 • Uses pacing and active probing to estimate Bandwidth and RTT • Starting in 4.13, fq no longer required # modprobe sch_fq # modprobe tcp_bbr # sysctl net.core.default_qdisc=fq # sysctl net.ipv4.tcp_congestion_control=bbr
  30. 30. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Retransmission timer • Input to when the congestion control algorithm considers a packet lost • Too low: spurious retransmission; congestion control can over-react and be slow to re-open the congestion window • Too high: increased latency while algorithm determines a packet is lost and retransmits
  31. 31. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Tuning retransmission timer minimum • Default minimum: 200 ms # ip route list default via 10.16.16.1 dev eth0 10.16.16.0/24 dev eth0 proto kernel scope link 169.254.169.254 dev eth0 scope link Route to other instances in our subnet (same AZ)
  32. 32. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Tuning retransmission timer minimum # ip route list default via 10.16.16.1 dev eth0 10.16.16.0/24 dev eth0 proto kernel scope link 169.254.169.254 dev eth0 scope link # ip route change 10.16.16.0/24 dev eth0 proto kernel scope link rto_min 50ms # ip route list default via 10.16.16.1 dev eth0 10.16.16.0/24 dev eth0 proto kernel scope link rto_min lock 50ms 169.254.169.254 dev eth0 scope link
  33. 33. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Queueing along the network path Jack Jill
  34. 34. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Queueing along the network path • Intermediate routers along a path have interface buffers • High load leads to more packets in buffer • Latency increases due to queue time • Can trigger retransmission timeouts
  35. 35. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Active queue management $ tc qdisc list qdisc mq 0: dev eth0 root qdisc pfifo_fast 0: dev eth0 parent :1 bands 3 […] qdisc pfifo_fast 0: dev eth0 parent :2 bands 3 […] # tc qdisc add dev eth0 root fq_codel qdisc fq_codel 8006: dev eth0 root refcnt 9 limit 10240p flows 1024 quantum 9015 target 5.0ms interval 100.0ms ecn www.bufferbloat.net/projects/codel/wiki
  36. 36. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Maximum transmission unit 3.47% overhead versus 0.58% overhead Improvement seen among instances in your VPC 1448 B Payload 8949 B Payload
  37. 37. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Tuning maximum transmission unit # ip link list 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc mq state UP mode DEFAULT group default qlen 1000 link/ether 06:f1:b7:e1:3b:e7 # ip route list default via 10.16.16.1 dev eth0 10.16.16.0/24 dev eth0 proto kernel scope link 169.254.169.254 dev eth0 scope link
  38. 38. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Tuning maximum transmission unit # ip route change default via 10.16.16.1 dev eth0 mtu 1500 # ip route list default via 10.16.16.1 dev eth0 mtu 1500 10.16.16.0/24 dev eth0 proto kernel scope link 169.254.169.254 dev eth0 scope link
  39. 39. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon EC2 enhanced networking
  40. 40. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon EC2 enhanced networking Virtualization Layer HW NIC Virtualization Layer HW NIC Xen-PV Xen-PV
  41. 41. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon EC2 enhanced networking HW NIC HW NIC VF VF Intel 82599 Intel 82599 10 Gbps Virtualization Layer Virtualization Layer
  42. 42. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon EC2 Elastic Network Adapter ENA ENA VF VF 20 Gbps 25 Gbps Virtualization Layer Virtualization Layer
  43. 43. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. PV-XEN $ ethtool -k eth0 driver: vif Enhanced Networking $ ethtool -i eth0 driver: ixgbevf C3, C4, D2, I2, R3, M4 (not m4.16XL) Elastic Network Adapter $ ethtool -i eth0 driver: ena F1, G3, I3, P2, P3, R4, X1, m4.16xlarge Verifying ENA is enabled https://github.com/amzn/amzn-drivers
  44. 44. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Applying our new knowledge
  45. 45. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Test setup • m4.16xlarge instances—Jack and Jill • Amazon Linux 2017.09 (Kernel 4.9.51-10.52.amzn1) • Web Server: Nginx 1.12.1 • Client: ApacheBench 2.3 • TLSv1.2,ECDHE-RSA-AES256-GCM-SHA384,2048,256 • Transferring uncompressible data (random bits) • Origin data stored in tmpfs (RAM based; no server disk I/O) • Data discarded once retrieved (no client disk I/O)
  46. 46. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Application 1 HTTPS with intermediate network loss Jack Jill 0.5% loss
  47. 47. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Test setup • 1 test server instance, 1 test client instance • 80 ms RTT • 80 parallel clients retrieving a 100 MB object $ ab -n 1600 -c 80 https://server/100m • Simulated packet loss # tc qdisc add dev eth0 root netem loss 0.5% Goal: Minimize throughput impact with 0.5% loss
  48. 48. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Results—application 1 DefaultsDefaults w/0.5% loss 23.2 s 42.8 s 37.6 s52.3 s
  49. 49. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Results—application 1 Cubic w/0.5% loss Illinois w/0.5% loss 20.7 s 42.8 s52.3 s 41.5 s
  50. 50. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Results—application 1 Cubic w/0.5% loss BBR w/0.5% loss 42.8 s 11.1 s 52.3 s 38.3 s 74% Decrease!
  51. 51. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Results—application 1 BBR no loss BBR w/0.5% loss 44.7 s 8.8 s 11.1 s 38.3 s
  52. 52. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Results—application 1 BBR no loss Cubic no loss 44.7 s 8.8 s 11.1s 38.3s 23.2 s 37.6 s
  53. 53. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Application 2 Data transfer; low RTT path Jack Jill
  54. 54. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Test setup • 1 test server instance, 1 test client instance • 1 ms RTT • 8 parallel clients retrieving a 10 MB object $ ab -n 100000 -c 8 https://server/10m • Start at default RTO, then decrease Goal: Minimize latency at high percentiles with 0.2% loss
  55. 55. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Results—application 2 p99.99 200 ms 2 ms p50
  56. 56. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Results—application 2 RTO:200 p99.99 Latency RTO:50 p99.99 Latency 200 ms 100 ms 50% Decrease!
  57. 57. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Application 3 High transaction rate HTTP service Jack Jill
  58. 58. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Test setup • 1 test server instance, 1 test client instance • 80 ms RTT • HTTP, not HTTPS • 1500 MTU • 200k requests for a 10k object $ ab -n 200000 -c 200 http://server/10k Goal: Minimize latency
  59. 59. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Results—application 3 Test P50 latency Avg BW Initial congestion window—3 packets 321 ms 12.550 Mbps Initial congestion window—10 packets 241 ms 16.765 Mbps Initial congestion window—16 packets 161 ms 22.518 Mbps 79% Increase!
  60. 60. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Takeaways
  61. 61. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Takeaways • The network doesn’t have to be a black box—Linux tools can be used to interrogate and understand • Simple tweaks to settings can dramatically increase performance—test, measure, change • Understand what your application needs from the network, and tune accordingly
  62. 62. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Thank You
  63. 63. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Thank you!
  64. 64. Remember to complete your evaluations!

×