Beyond TCP: The evolution of Internet transport protocols

562 views

Published on

Revised and expanded version of the CNSM keynote with more information on MPTCP. Given at Polytechnique in Jan. 2016

Published in: Internet
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
562
On SlideShare
0
From Embeds
0
Number of Embeds
6
Actions
Shares
0
Downloads
21
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Beyond TCP: The evolution of Internet transport protocols

  1. 1. Beyond TCP: The evolu0on of Internet transport protocols Olivier Bonaventure UCL h2p://inl.info.ucl.ac.be Paris, Polytechnique, Jan, 2016
  2. 2. Agenda •  Internet transport protocols – TCP – SCTP •  MulKpath TCP – Basic principles – Use cases •  What's next ? – QUIC
  3. 3. The origins of TCP Source : h2p://spectrum.ieee.org/compuKng/soRware/the-strange-birth-and-long-life-of-unix
  4. 4. The Unix pipe model echo wc 1234 abbsbbbs
  5. 5. The TCP bytestream model Client Server ABCDEF...111232 0988989 ... XYZZ IP:1.2.3.4 IP:4.5.6.7
  6. 6. TCP More than 30 years old!
  7. 7. CongesKon collapse JACOBSON, V. CongesKon avoidance and control. In Proceedings of SIGCOMM ’88 (Stanford, CA, Aug. 1988), ACM.
  8. 8. Performance issues •  TCP considered to be too complex by many – SoRware implementaKon cannot cope with increasing network bandwidth •  For high performance, transport should be implemented in hardware – Transputers – Simpler transport protocols
  9. 9. More limitaKons of TCP •  Issues with the TCP pipe model – Only supports a single bytestream •  Some applicaKons need several streams with prioriKes – No support for messages – ConnecKons are a2ached to one IP address on client and one IP address on server •  No failover even if hosts have mulKple interfaces •  No support for mobility •  No load balancing for mulKhomed hosts
  10. 10. SCTP : An alternaKve to TCP
  11. 11. SCTP in two slides •  Modern transport protocol –  Cleaner connecKon establishment •  Four-way handshake to counter SYN flooding a2acks –  Cleaner protocol •  Flexible TLV packet format that is easy to extend •  SelecKve acknowledgements from the start –  Richer semanKcs •  Messages, mulKple streams, unreliable delivery •  Advanced API to replace socket API –  Failover support •  ConnecKon can move from one IP address to another one
  12. 12. SCTP connecKon establishment INIT,Itag=1234 INIT-ACK,cookie,ITag=5678 COOKIE-Echo,Vtag=5678, cookie COOKIE-ACK,Vtag=1234 Encrypt state in cookie, Does not store it Decrypts cookie, Recover info to create state
  13. 13. What went wrong with SCTP ? •  Replacing a transport protocol Physical Datalink Network TCP Application SCTP ApplicaKons must be rewri<en with new API IP protocol=132 For SCTP packets
  14. 14. Deploying SCTP •  ApplicaKons developers will invest in SCTP as soon as SCTP is implemented on – Clients – Servers
  15. 15. The Internet architecture that we explain to our students Physical Datalink Network Transport Application O. Bonaventure, Computer networking : Principles, Protocols and PracKce, open ebook, h2p://inl.info.ucl.ac.be/cnp3 Physical Physical Datalink Physical Datalink Network
  16. 16. SCTP deployment Physical Datalink Network Transport Application Physical Datalink Network Transport Application Physical Datalink Network Physical Datalink TCP SCTP SCTP SCTP
  17. 17. In reality – almost as many middleboxes as routers – various types of middleboxes are deployed Sherry, JusKne, et al. "Making middleboxes someone else's problem: Network processing as a cloud service." Proceedings of the ACM SIGCOMM 2012 conference. ACM, 2012.
  18. 18. Internet devices according to Cisco h2p://www.cisco.com/web/about/ac50/ac47/2.html Web Security Appliance NAC Appliance ACE XML Gateway Streamer VPN Concentrator SSL Terminator Cisco IOS Firewall IP Telephony Router PIX Firewall Right and LeR Voice Gateway VVVV Content Engine NAT
  19. 19. Middleboxes in the architecture •  In the official architecture, they do not exist •  In reality... Physical Datalink Network Transport Application Physical Datalink Network Transport Application Physical Datalink Network TCP Physical Datalink Network Transport Application
  20. 20. TCP segments processed by a router Source port Destination port Checksum Urgent pointer THL Reserved Flags Acknowledgment number Sequence number Window Ver IHL ToS Total length ChecksumTTL Protocol Flags Frag. Offset Source IP address Identification Destination IP address Payload Options Source port Destination port Checksum Urgent pointer THL Reserved Flags Acknowledgment number Sequence number Window Ver IHL ToS Total length ChecksumTTL Protocol Flags Frag. Offset Source IP address Identification Destination IP address Payload Options IP TCP
  21. 21. TCP segments processed by a NAT Source port Destination port Checksum Urgent pointer THL Reserved Flags Acknowledgment number Sequence number Window Ver IHL ToS Total length ChecksumTTL Protocol Flags Frag. Offset Source IP address Identification Destination IP address Payload Options Source port Destination port Checksum Urgent pointer THL Reserved Flags Acknowledgment number Sequence number Window Ver IHL ToS Total length ChecksumTTL Protocol Flags Frag. Offset Source IP address Identification Destination IP address Payload Options
  22. 22. TCP segments processed by a NAT (2) •  acKve mode Rp behind a NAT 220 ProFTPD 1.3.3d Server (BELNET FTPD Server) [193.190.67.15] Rp_login: user `<null>' pass `<null>' host `Rp.belnet.be' Name (Rp.belnet.be:obo): anonymous ---> USER anonymous 331 Anonymous login ok, send your complete email address as your password Password: ---> PASS XXXX ---> PORT 192,168,0,7,195,120 200 PORT command successful ---> LIST 150 Opening ASCII mode data connecKon for file list lrw-r--r-- 1 Rp Rp 6 Jun 1 2011 pub -> mirror 226 Transfer complete
  23. 23. TCP segments processed by an ALG running on a NAT Source port Destination port Checksum Urgent pointer THL Reserved Flags Acknowledgment number Sequence number Window Ver IHL ToS Total length ChecksumTTL Protocol Flags Frag. Offset Source IP address Identification Destination IP address Payload Options Source port Destination port Checksum Urgent pointer THL Reserved Flags Acknowledgment number Sequence number Window Ver IHL ToS Total length ChecksumTTL Protocol Flags Frag. Offset Source IP address Identification Destination IP address Payload Options
  24. 24. © O. Bonaventure, 2011 How transparent is the Internet ? •  25th September 2010 to 30th April 2011 •  142 access networks •  24 countries •  Sent specific TCP segments from client to a server in Japan Honda, Michio, et al. "Is it s=ll possible to extend TCP?" Proceedings of the 2011 ACM SIGCOMM conference on Internet measurement conference. ACM, 2011.
  25. 25. End-to-end transparency today Source port Destination port Checksum Urgent pointer THL Reserved Flags Acknowledgment number Sequence number Window Ver IHL ToS Total length ChecksumTTL Protocol Flags Frag. Offset Source IP address Identification Destination IP address Payload Options Source port Destination port Checksum Urgent pointer THL Reserved Flags Acknowledgment number Sequence number Window Ver IHL ToS Total length ChecksumTTL Protocol Flags Frag. Offset Source IP address Identification Destination IP address Payload Options Middleboxes don't change the Protocol field, but some discard packets with a Protocol field different than TCP or UDP
  26. 26. Agenda •  Internet transport protocols – TCP – SCTP •  MulKpath TCP – Basic principles – Use cases •  What's next ? – QUIC
  27. 27. TCP ConnecKon establishment •  Three-way handshake SYN,seq=1234,OpKons SYN+ACK,ack=1235,seq=5678,OpKons ACK,seq=1235,ack=5679
  28. 28. Data transfer seq=1234,"abcd" ACK,ack=1238,win=4 seq=1238,"efgh" ACK,ack=1242,win=0
  29. 29. ConnecKon release seq=1234,"abcd" RST
  30. 30. ConnecKon release seq=1234,"abcd" ACK,ack=1239 FIN,ack=350 seq=345,"ijkl" FIN, seq=1238 FIN,seq=349
  31. 31. MulKpath TCP •  How can we efficiently use the mulKple interfaces that are available on today's hosts?
  32. 32. Design objecKves •  MulKpath TCP is an evolu=on of TCP •  Design objecKves – Support unmodified applicaKons – Work over today’s networks (IPv4 and IPv6) – Works in all networks where regular TCP works
  33. 33. The Mul=path TCP bytestream model 33 Client Server ABCDEF...111232 0988989 ... XYZZ IP:1.2.3.4 IP:4.5.6.7 IP:2.3.4.5 IP:6.7.8.9 BCD A
  34. 34. The MulKpath TCP protocol •  Control plane – How to manage a MulKpath TCP connecKon that uses several paths ? •  Data plane – How to transport data ? •  CongesKon control – How to control congesKon over mulKple paths ?
  35. 35. A naïve MulKpath TCP SYN+ACK+OpKon ACK seq=123, "abc" seq=126, "def" SYN+OpKon
  36. 36. A naïve MulKpath TCP In today's Internet ? SYN+OpKon SYN+ACK+OpKon ACK seq=123, "abc" seq=126, "def" There is no corresponding TCP connecKon
  37. 37. Design decision – A Mul=path TCP connec=on is composed of one or more regular TCP subflows that are combined •  Each host maintains state that glues the TCP subflows that compose a MulKpath TCP connecKon together •  Each TCP subflow is sent over a single path and appears like a regular TCP connecKon along this path
  38. 38. MulKpath TCP and the architecture Physical Datalink Network Transport Application MulKpath TCP TCP1 socket TCP2 TCPn ... Application A. Ford, C. Raiciu, M. Handley, S. Barre, and J. Iyengar, “Architectural guidelines for mulKpath TCP development", RFC6182 2011. No modificaKon to ease deployment MulKple subflows to cope with middleboxes
  39. 39. A regular TCP connecKon •  What is a regular TCP connecKon ? – It starts with a three-way handshake •  SYN segments may contain special opKons – All data segments are sent in sequence •  There is no gap in the sequence numbers – It is terminated by using FIN or RST
  40. 40. MulKpath TCP SYN+OpKon SYN+ACK+OpKon ACK SYN+OtherOpKon SYN+ACK+OtherOpKon ACK
  41. 41. How to combine two TCP subflows ? SYN+OpKon SYN+ACK+OpKon ACK SYN+OtherOpKon SYN+ACK+OtherOpKon ACK How to link with blue subflow ?
  42. 42. TCP 101 IdenKficaKon of a TCP connecKon Four tuple – IPsource – IPdest – Portsource – Portdest All TCP segments contain the four tuple Source port Destination port Checksum Urgent pointer THL Reserved Flags Acknowledgment number Sequence number Window Ver IHL ToS Total length ChecksumTTL Protocol Flags Frag. Offset Source IP address Identification Destination IP address Payload Options IP TCP
  43. 43. How to link TCP subflows ? SYN, Portsrc=1234,Portdst=80+OpKon SYN+ACK[...] ACK SYN, Portsrc=1235,Portdst=80 +OpKon[link Portsrc=1234,Portdst=80] A NAT could change addresses and port numbers
  44. 44. How to link TCP subflows ? SYN, Portsrc=1234,Portdst=80 +OpKon[Token=5678] SYN+ACK+OpKon[Token=6543] ACK SYN, Portsrc=1235,Portdst=80 +OpKon[Token=6543] MyToken=5678 YourToken=6543 MyToken=6543 YourToken=5678
  45. 45. TCP subflows •  Which subflows can be associated to a MulKpath TCP connecKon ? – At least one of the elements of the four-tuple needs to differ between two subflows •  Local IP address •  Remote IP address •  Local port •  Remote port
  46. 46. TCP subflows in pracKce •  MulKpath TCP supports subflow agility – Client/server can add subflows at any Kme – Client/server can remove subflows at any Kme
  47. 47. The MulKpath TCP protocol •  Control plane – How to manage a MulKpath TCP connecKon that uses several paths ? •  Data plane – How to transport data ? •  CongesKon control – How to control congesKon over mulKple paths ?
  48. 48. How to transfer data ? seq=123,"a" seq=124,"b" seq=125,"c" seq=126,"d" ack=124 ack=126 ack=125 ack=127
  49. 49. How to transfer data in today's Internet ? seq=123,"a" seq=124,"b" seq=125,"c" ack=124 ack=126 ack=125 Gap in sequence numbering space Some DPI will not allow this !
  50. 50. MulKpath TCP Data transfer •  Two levels of sequence numbers MulKpath TCP TCP1 socket TCP2 MulKpath TCP TCP1 socket TCP2 ABCDEF Data sequence # TCP1 sequence # TCP2 sequence #
  51. 51. MulKpath TCP Data transfer Dseq=0,seq=123,"a" DSeq=1, seq=456,"b" DSeq=2, seq=124,"c" DAck=1,ack=124 DAck=3, ack=125 DAck=2,ack=457
  52. 52. MulKpath TCP How to deal with losses ? •  Data losses over one TCP subflow – Fast retransmit and Kmeout as in regular TCP Dseq=0,seq=123,"a" DAck=1,ack=124 Dseq=0,seq=123,"a" DAck=1,ack=124
  53. 53. MulKpath TCP •  What happens when a TCP subflow fails ? Dseq=0,seq=123,"a" DSeq=1, seq=456,"b" DAck=0,ack=457 Dseq=0,seq=457,"a" DAck=2,ack=458
  54. 54. Retransmission heurisKcs •  HeurisKcs used by current Linux implementaKon –  Fast retransmit is performed on the same subflow as the original transmission –  Upon Kmeout expiraKon, reevaluate whether the segment could be retransmi2ed over another subflow –  Upon loss of a subflow, all the unacknowledged data are retransmi2ed on other subflows
  55. 55. Flow control •  How should the window-based flow control be performed ? – Independant windows on each TCP subflow – A single window that is shared among all TCP subflows
  56. 56. Independant windows Dseq=0,seq=123,"a" DSeq=1, seq=456,"b" DAck=2,ack=457,win=100 Dseq=2,seq=457,"c" DAck=3,ack=458,win=100 DAck=1,ack=124,win=0
  57. 57. Independant windows possible problem •  Impossible to retransmit, window is already full on green subflow Dseq=0,seq=123,"a" DSeq=1, seq=456,"b" DAck=2,ack=457,win=0
  58. 58. A single window shared by all subflows Dseq=0,seq=123,"a" DSeq=1, seq=456,"b" DAck=2,ack=457,win=10 Dseq=2,seq=457,"c" DAck=3,ack=458,win=10 DAck=1,ack=124,win=10
  59. 59. A single window shared by all subflows Impact of middleboxes Dseq=0,seq=123,"a" DSeq=1, seq=456,"b" DAck=2,ack=457,win=100 DAck=1,ack=124,win=100 DAck=2,ack=457,win=5
  60. 60. MulKpath TCP Windows •  MulKpath TCP maintains one window per MulKpath TCP connecKon –  Window is relaKve to the last acked data (Data Ack) –  Window is shared among all subflows •  It's up to the implementaKon to decide how the window is shared –  Window is transmi2ed inside the window field of the regular TCP header –  If middleboxes change window field, •  use largest window received at MPTCP-level •  use received window over each subflow to cope with the flow control imposed by the middlebox
  61. 61. MulKpath TCP buffers MulKpath TCP TCP1 socket TCP2 Scheduler Transmit queues, process only regular TCP header Reorder queue, processes only TCP header MPTCP-level, resequencing possible send(...) recv(...)
  62. 62. Sending MulKpath TCP informaKon •  How to exchange the MulKpath TCP specific informaKon between two hosts ? •  OpKon 1 –  Use TLVs to encode data and control informaKon inside payload of subflows •  Op0on 2 –  Use TCP opKons to encode all MulKpath TCP informaKon OpKon 1 : Michael Scharf, Thomas-Rolf Banniza , MCTCP: A Mul=path Transport Shim Layer, GLOBECOM 2011
  63. 63. MulKpath TCP with only opKons •  Advantages –  Normal way of extending TCP –  Should be able to go through middleboxes or fallback •  Drawbacks –  limited size of the TCP opKons, notably inside SYN –  What happens when middleboxes drop TCP opKons in data segments
  64. 64. MulKpath TCP using TLV •  Advantages –  MulKpath TCP could start as regular TCP and move to MulKpath only when needed –  Could be implemented as a library in userspace –  TLVs can be easily extended •  Drawbacks –  TCP segments contain TLVs including the data and not only the data •  problem for middleboxes, DPI, .. –  Middleboxes become more difficult Michael Scharf, Thomas-Rolf Banniza , MCTCP: A Mul=path Transport Shim Layer, GLOBECOM 2011
  65. 65. © O. Bonaventure, 2011 Is it safe to use TCP opKons ? •  Known opKon (TS) in Data segments XD6BHM Honda, Michio, et al. "Is it sKll possible to extend TCP?." Proceedings of the 2011 ACM SIGCOMM conference on Internet measurement conference. ACM, 2011.
  66. 66. © O. Bonaventure, 2011 Is it safe to use TCP opKons ? •  Unknown opKon in Data segments XD6BHM Honda, Michio, et al. "Is it sKll possible to extend TCP?." Proceedings of the 2011 ACM SIGCOMM conference on Internet measurement conference. ACM, 2011.
  67. 67. MulKpath TCP opKons •  TCP opKon format •  IniKal design – One opKon kind for each purpose (e.g. Data Sequence number) •  Final design – A single variable-length MulKpath TCP opKon Kind Length OpKon-specific data
  68. 68. MulKpath TCP opKon •  A single opKon type – to minimise the risk of having one opKon accepted by middleboxes in SYN segments and rejected in segments carrying data SubtypeKind Length Subtype specific data (variable length)
  69. 69. Data sequence numbers and TCP segments •  How to transport Data sequence numbers ? – Same soluKon as for TCP •  Data sequence number in TCP opKon is the Data sequence number of the first byte of the segment Source port Destination port Checksum Urgent pointer THL Reserved Flags Acknowledgment number Sequence number Window Payload Datasequence number
  70. 70. MulKpath TCP Data transfer Dseq=0,seq=123,"a" DSeq=1, seq=456,"b" DSeq=2, seq=124,"c" DAck=1,ack=124 DAck=3, ack=125 DAck=2,ack=457
  71. 71. Which middleboxes change TCP sequence numbers ? •  Some firewalls change TCP sequence numbers in SYN segments to ensure randomness – fix for old windows95 bug •  Transparent proxies terminate TCP connecKons
  72. 72. Middlebox interference •  Data segments Data,seq=12,"ab" Data,seq=14,"cd" Data,seq=12,"abcd" Such a middlebox could also be the network adapter of the server that uses LRO to improve performance.
  73. 73. © O. Bonaventure, 2011 Segment coalescing Honda, Michio, et al. "Is it sKll possible to extend TCP?." Proceedings of the 2011 ACM SIGCOMM conference on Internet measurement conference. ACM, 2011.
  74. 74. Data sequence numbers and middleboxes seq=123,Dseq=0, "a" seq=456, DSeq=1, "b" seq=124, DSeq=2,"c" seq=123, DSeq=2, "ac" copies one opKon in coalesced segment buffers small segments seq=123, DSeq=0, "ac"
  75. 75. Data sequence numbers and middleboxes seq=123,Dseq=0, "ab" DSeq=0, seq=123,"a" DSeq=0, seq=124,"b" Middlebox only understands regular TCP
  76. 76. A "middlebox" that both splits and coalesces TCP segments
  77. 77. Data sequence numbers and middleboxes •  How to avoid desynchronisaKon between the bytestream and data sequence numbers ? •  SoluKon – MulKpath TCP opKon carries mapping between Data sequence numbers and (difference between ini=al and current) subflow sequence numbers •  mapping covers a part of the bytestream (length)
  78. 78. MulKpath TCP Data transfer seq=123,DSS[0->123,len=1],"a" seq=456, DSS[1->456,len=1],"b" seq=124, DSS[2->124,len=1],"c" DAck=1,ack=124 DAck=3, ack=125 DAck=2,ack=457
  79. 79. Data sequence numbers and middleboxes seq=123,DSS[0->123,len=1], "a" seq=456, DSS[1->456, len=1], "b" seq=124, DSS[2->124, len=1],"c" seq=123, DSS[0->123, len=1], "ac" DAck=2,ack=125 DSeq=0,ack=457 seq=125, DSS[2->125, len=1],"c"
  80. 80. Data sequence numbers and middleboxes seq=123,DSS[0->123,len=1], "a" seq=456, DSS[1->456, len=1], "b" seq=124, DSS[2->124, len=1],"c" seq=123, DSS[2->124, len=1], "ac" DAck=0,ack=125 seq=125, DSS[0->125, len=1],"a" DAck=3,ack=126
  81. 81. MulKpath TCP and middleboxes •  With the DSS mapping, MulKpath TCP can cope with middleboxes that – combine segments – split segments •  Are they the most annoying middleboxes for MulKpath TCP ? – Unfortunately not
  82. 82. © O. Bonaventure, 2011 TCP sequence number and middleboxes Honda, Michio, et al. "Is it sKll possible to extend TCP?." Proceedings of the 2011 ACM SIGCOMM conference on Internet measurement conference. ACM, 2011.
  83. 83. The worst middlebox •  Is this an academic exercise or reality ? seq=123, DSS[1->123, len=2], "aXXXb" DAck=3,ack=125 seq=125, DSS[3->125, len=2], "cd" seq=123, DSS[1->123,len=2], "ab" DAck=3,ack=128 seq=128, DSS[3->125, len=2], "cd"
  84. 84. The worst middlebox •  Is unfortunately very old... – Any ALG for a NAT 220 ProFTPD 1.3.3d Server (BELNET FTPD Server) [193.190.67.15] Rp_login: user `<null>' pass `<null>' host `Rp.belnet.be' Name (Rp.belnet.be:obo): anonymous ---> USER anonymous 331 Anonymous login ok, send your complete email address as your password Password: ---> PASS XXXX ---> PORT 192,168,0,7,195,120 200 PORT command successful ---> LIST 150 Opening ASCII mode data connecKon for file list lrw-r--r-- 1 Rp Rp 6 Jun 1 2011 pub -> mirror 226 Transfer complete
  85. 85. Coping with the worst middlebox •  What should MulKpath TCP do in the presence of such a worst middlebox ? – Do nothing and ignore the middlebox •  but then the bytestream and the applicaKon would be broken and this problem will be difficult to debug by network administrators – Detect the presence of the middlebox •  and fallback to regular TCP (i.e. use a single path and nothing fancy) MulKpath TCP MUST work in all networks where regular TCP works.
  86. 86. DetecKng the worst middlebox ? •  How can MulKpath TCP detect a middlebox that modifies the bytestream and inserts/ removes bytes ? – Various soluKons were explored – In the end, MulKpath TCP chose to include its own checksum to detect inserKon/deleKon of bytes
  87. 87. The worst middlebox seq=123, DSS[1->123, len=2,Inv], "aXXXb" seq=123, DSS[1->123,len=2,V], "ab" RST, last DSeq=0 RST, last DSeq=0 seq=456, DSS[1->456, len=2,V], "ab" DAck=3,ack=458
  88. 88. MulKpath TCP Data sequence numbers •  What should be the length of the data sequence numbers ? – 32 bits •  compact and compaKble with TCP •  wrap around problem at highspeed requires PAWS – 64 bits •  wrap around is not an issue for most transfers today •  takes more space inside each segment
  89. 89. MulKpath TCP Data sequence numbers •  Data sequence numbers and Data acknowledgements – Maintained inside implementaKon as 64 bits field – ImplementaKons can, as an opKmisaKon, only transmit the lower 32 bits of the data sequence and acknowledgements
  90. 90. Data Sequence Signal opKon CumulaKve Data ack A = Data ACK present a = Data ACK is 8 octets M = mapping present m = DSN is 8 Length of mapping, can extend beyond this segment Computed over data covered by enKre mapping + pseudo header
  91. 91. The MulKpath TCP protocol •  Control plane –  How to manage a MulKpath TCP connecKon that uses several paths ? •  Data plane –  How to transport data ? •  CongesKon control –  How to control congesKon over mulKple paths ? –  CongesKon windows on subflows MUST be coupled to ensure that TCP remains fair with regular TCP
  92. 92. AIMD in TCP •  CongesKon control mechanism –  Each host maintains a conges=on window (cwnd) –  No congesKon •  CongesKon avoidance (addi0ve increase) –  increase cwnd by one segment every round-trip-Kme –  CongesKon •  TCP detects congesKon by detecKng losses •  Mild congesKon (fast retransmit – mul0plica0ve decrease) –  cwnd=cwnd/2 and restart congesKon avoidance •  Severe congesKon (Kmeout) –  cwnd=1, set slow-start-threshold and restart slow-start
  93. 93. EvoluKon of the congesKon window Cwnd Fast retransmit Threshold Threshold Slow-start exponential increase of cwnd Congestion avoidance linear increase of cwnd Fast retransmit Time
  94. 94. CongesKon control for MulKpath TCP •  Simple approach – independant congesKon windows Threshold Threshold Threshold
  95. 95. Independant congesKon windows •  Problem 12Mbps
  96. 96. Coupled congesKon control •  CongesKon windows are coupled – congesKon window growth cannot be faster than TCP with a single flow – Coupled congesKon control aims at moving traffic away from congested path
  97. 97. Agenda •  Internet transport protocols – TCP – SCTP •  MulKpath TCP – Basic principles – Use cases •  What's next ? – QUIC
  98. 98. MulKpath TCP use cases The beast
  99. 99. TCP on servers •  How to increase server bandwidth ? •  Load balancing techniques – packet per packet – per flow load balancing •  each TCP connecKon is mapped onto one interface
  100. 100. Increasing server bandwidth with MulKpath TCP •  Load balancing with MulKpath TCP –  CongesKon control efficiently uses the two links for each MPTCP connecKon –  AutomaKc failover in case of failures
  101. 101. How fast can MulKpath TCP go ? h2p://linux.slashdot.org/story/13/03/23/0054252/a-50-gbps-connecKon-with-mulKpath-tcp
  102. 102. How fast can MulKpath TCP go ?
  103. 103. Datacenters evolve •  TraditionalTopologies are tree- based –  Poor performance –  Not fault tolerant •  Shift towards multipath topologies: FatTree, BCube,VL2, Cisco, EC2 … C. Raiciu, et al. “Improving datacenter performance and robustness with mulKpath TCP,” ACM SIGCOMM 2011.
  104. 104. Fat Tree Topology [Fares et al., 2008; Clos, 1953] K=4 1Gbps 1Gbps AggregaKon Switches K Pods with K Switches each Racks of servers
  105. 105. Fat Tree Topology [Fares et al., 2008; Clos, 1953] K=4 AggregaKon Switches K Pods with K Switches each Racks of servers C. Raiciu, et al. “Improving datacenter performance and robustness with mulKpath TCP,” ACM SIGCOMM 2011.
  106. 106. Collisions
  107. 107. TCP in data centers
  108. 108. TCP in FAT tree networks Cost of collissions C. Raiciu, et al. “Improving datacenter performance and robustness with mulKpath TCP,” ACM SIGCOMM 2011. 0 200 400 600 800 1000 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 Throughput(Mb/s) Rank of Flow MPTCP Optimal Throughput TCP Flow Throughput
  109. 109. How to get rid of these collisions ? •  Consider TCP performance as an opKmisaKon problem
  110. 110. C. Raiciu, et al. “Improving datacenter performance and robustness with mulKpath TCP,” ACM SIGCOMM 2011. The MulKpath TCP way Two subflows differ by their source port ECMP balances the subflows over different paths
  111. 111. MPTCP be2er uKlizes the FatTree network 0 200 400 600 800 1000 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 Throughput(Mb/s) Rank of Flow MPTCP Optimal Throughput TCP Flow Throughput C. Raiciu, et al. “Improving datacenter performance and robustness with mulKpath TCP,” ACM SIGCOMM 2011. See also G. Detal, et al. , Revisi=ng Flow-Based Load Balancing: Stateless Path Selec=on in Data Center Networks, Computer Networks, April 2013 for extensions to ECMP for MPTCP
  112. 112. How many subflows does MulKpath TCP need ? Total Throughput 0 10 20 30 40 50 60 70 80 90 100 RLB 2 3 4 5 6 7 8 Throughput(%ofoptimal) Multipath TCP TCP C. Raiciu, et al. “Improving datacenter performance and robustness with mulKpath TCP,” ACM SIGCOMM 2011.
  113. 113. Can we improve MulKpath TCP ? •  Two subflows may follow similar paths
  114. 114. Improving ECMP •  ECMP's hash –  good load balancing –  impossible to predict result •  CFLB –  replaces hash with block cipher –  hosts can select paths for MulKpath TCP subflows provided they know datacenter topology G. Detal, Ch. Paasch, S. van der Linden, P. Mérindol, G. Avoine, O. Bonaventure, Revisi=ng Flow-Based Load Balancing: Stateless Path Selec=on in Data Center Networks, to appear in Computer Networks
  115. 115. MulKpath TCP with CFLB in Fat-Tree G. Detal, Ch. Paasch, S. van der Linden, P. Mérindol, G. Avoine, O. Bonaventure, Revisi=ng Flow-Based Load Balancing: Stateless Path Selec=on in Data Center Networks, to appear in Computer Networks
  116. 116. MulKpath TCP on EC2 •  Amazon EC2: infrastructure as a service –  We can borrow virtual machines by the hour –  These run in Amazon data centers worldwide –  We can boot our own kernel •  A few availability zones have mulKpath topologies –  2-8 paths available between hosts not on the same machine or in the same rack –  Available via ECMP
  117. 117. Amazon EC2 Experiment •  40 medium CPU instances running MPTCP •  During 12 hours, we sequenKally ran all-to-all iperf cycling through: – TCP – MPTCP (2 and 4 subflows)
  118. 118. MPTCP improves performance on EC2 Same Rack 0 100 200 300 400 500 600 700 800 900 1000 0 500 1000 1500 2000 2500 3000 Throughput(Mb/s) Flow Rank TCP MPTCP, 4 subflows MPTCP, 2 subflows C. Raiciu, et al. “Improving datacenter performance and robustness with mulKpath TCP,” ACM SIGCOMM 2011.
  119. 119. MoKvaKon •  One device, many IP-enabled interfaces
  120. 120. ssh with MulKpath TCP
  121. 121. MPTCP over WiFi/3G 8Mbps, 20ms 2Mbps, 150ms
  122. 122. TCP over WiFi/3G C. Raiciu, et al. “How hard can it be? designing and implemenKng a deployable mulKpath TCP,” NSDI'12: Proceedings of the 9th USENIX conference on Networked Systems Design and ImplementaKon, 2012.
  123. 123. MPTCP over WiFi/3G C. Raiciu, et al. “How hard can it be? designing and implemenKng a deployable mulKpath TCP,” NSDI'12: Proceedings of the 9th USENIX conference on Networked Systems Design and ImplementaKon, 2012.
  124. 124. MPTCP over WiFi/3G MulKpath TCP increases throughput
  125. 125. MPTCP over WiFi/3G What happened here?
  126. 126. Understanding the performance issue 8Mbps, 20ms 2Mbps, 150ms Window B A CD Window full ! No new data can be sent on WiFi path A Reinject segment on fast path Halve conges0on window on slow subflow
  127. 127. MPTCP over WiFi/3G
  128. 128. MulKpath TCP use cases Low latency for Siri •  Long-lived TLS connecKons WiFi 3G/LTE Voice samples Voice samples
  129. 129. MulKpath TCP use cases High bandwidth on smartphones •  Koreans want 800+ Mbps on smartphones WiFi 4G/LTE Multipath TCP Regular TCP SOCKS
  130. 130. Faster broadband networks ?
  131. 131. MulKpath TCP use cases Hybrid Access Networks DSL 4G/LTE Multipath TCP Regular TCP Hybrid Access Gateway TCP TCP
  132. 132. Agenda •  Internet transport protocols – TCP – SCTP •  MulKpath TCP – Basic principles – Use cases •  What's next ? – QUIC
  133. 133. Issues with the current stack Physical Datalink IPv4/IPv6 TCP HTTP1.1 ASCII difficult to parse, no priority Unsecure Wait for three way handshake before data transfer Physical Datalink IPv4/IPv6 TCP HTTP/2 TLS Secure, But adds more delay Physical Datalink IPv4/IPv6 UDP QUIC First bytes A_er 2 RTTs First bytes A_er 3-4 RTTs First bytes A_er 0 RTT
  134. 134. QUIC in a nutshell •  First connecKon a2empt CHLO [SNI, VER] CHLO[Token, Crypto info] ServerName and Version Rejected REJ[Config, Token, CerKficate] DATA[Encrypted] SHLO[Config, Token, CerKficate] DATA[Encrypted]
  135. 135. QUIC features •  CongesKon control – Leverages TCP's long history (CUBIC) •  Retransmissions – Be2er than with regular TCP – Each segment has a different seqnum •  Avoids retransmission ambiguiKes •  SelecKve acknowledgements – Cleaner than in TCP
  136. 136. QUIC usage at google QUIC handshakes fail when RTTs are greater than 2.5 seconds or when UDP is blocked Source : J. Iyengar, QUIC Overview, IETF93, July 2015, Prague
  137. 137. QUIC Reducing delays TCP TCP + TLS QUIC (equivalent to TCP + TLS) Source : J. Iyengar, QUIC Overview, IETF93, July 2015, Prague
  138. 138. Why running QUIC over UDP ? •  Simplest transport protocol –  Supported correctly by all operaKng systems –  Supported correctly by all middleboxes •  ApplicaKon can enKrely control everything –  Same version of QUIC runs on all pla€orms –  QUIC can be upgraded as frequently as the applicaKon –  ApplicaKon developer does not need to coordinate with IETF or anyone
  139. 139. How to cope with middleboxes ? •  Very few middleboxes interfere with UDP – Some middleboxes drop UDP segments •  ApplicaKons will detect and fallback to TCP – Some middleboxes rate limit UDP •  ApplicaKons will detect and fallback to TCP •  What about middleboxes opKmising QUIC/UDP – Nightmare for google – Everything in QUIC (payload and headers) is encrypted
  140. 140. TFO : A Faster TCP •  Simple idea : send data in SYN segments – Modern version of T/TCP SYN(Src=C,seq=x, HTTP GET)‫‏‬ HTTP GET SYN+ACK(Dest=C,ack=x+1,seq=y, HTTP Resp)‫‏‬ ACK(Src=A,seq=x)‫‏‬
  141. 141. Internet transport layer •  SKll lots of innovaKon for an old layer… –  TCP extensions •  IniKal window, TCP Fast Open, … –  MulKpath TCP is ge‚ng deployed •  RFC6824 was published in January 2013 –  But Middleboxes have ossified the Internet •  Other protocols –  QUIC •  Pushed by google for web applicaKons –  TCPINC •  Support encrypKon inside transport layer –  TLS 1.3 •  Faster handshake and lower delays

×