Advertisement
Advertisement

More Related Content

Advertisement
Advertisement

7 tcp-congestion

  1. Week 7 UDP and TCP SCTP and Internet Congestion control
  2. Agenda • TCP • Connection establishment • Reliable data transfer • Connection release • SCTP • Congestion control
  3. TCP segment 32 bits Source port Destination port THL Reserved Flags Window Checksum Urgent pointer Payload 20 bytes Sequence number Optional header extension Flags : used to indicate the function of a segment SYN : used during establishment FIN : used during connection release RST : used in case of problems ACK : if true, means that the Acknowledgement number inside the segment is valid Computed over the entire segment and part of the IP header Acknowledgement number Segment header length
  4. Three-way handshake ACK(seq=x+1, ack=y+1) CONNECT.req CONNECT.ind SYN+ACK(ack=x+1,seq=y) CONNECT.resp Initial sequence number (x) CONNECT.conf Initial sequence number (y) SYN(seq=x) Connection established Connection established The sequence numbers of all segments A->B will start at x+1 The sequence numbers of all segments B->A will start at y+1
  5. TCP FSM Init ?SYN / !SYN+ACK !SYN ?SYN / !SYN+ACK SYN RCVD SYN Sent Established ?SYN+ACK / !ACK ?ACK
  6. Simultaneous open CONNECT.conf SYN(seq=y) CONNECT.req CONNECT.req SYN(seq=x) Connection established Connection established CONNECT.conf SYN+ACK(seq=y, ack=x+1) SYN+ACK(seq=x, ack=y+1)
  7. Negotiating options ACK(seq=x+1, ack=y+1) CONNECT.req CONNECT.ind SYN+ACK(ack=x+1,seq=y) Option CONNECT.resp Initial sequence number (x) Option proposed CONNECT.conf Initial sequence number (y) Option accepted SYN(seq=x),Option Connection established Option accepted Connection established The sequence numbers of all segments A->B will start at x+1 The sequence numbers of all segments B->A will start at y+1
  8. TCP options • MSS • Selective acknowledgements • Timestamps • Window Scale • Multipath TCP • ...
  9. Agenda • TCP • Connection establishment • Reliable data transfer • Connection release • SCTP • Congestion control
  10. Reliable data transfer (seq=123,"abcd") (seq=127,"ef") (seq=123,"abcd") (seq=127,"ef") (ack=123) Retransmission timer (ack=129) (ack=129) "abcdef" unnecessary retransmission Retransmission of all unacked segments “ef” placed in buffer
  11. Retransmission timer • How to compute it ? • round-trip-time may change frequently during the lifetime of a TCP connection
  12. Retransmission timer • Algorithm • timer = mean(rtt) + 4*std_dev(rtt) • est_mean(rtt) = (1- )*est_mean(rtt) + *rtt_measured • est_std_dev=(1-)*est_std_dev+ *|rtt_measured - est_mean(rtt)|
  13. RTT measurements (seq=120,"xyz") (ack=123) • Solution (Karn/Partridge) • Do not measure rtt of retransmitted segments (seq=123,"abcd") (ack=128) measured rtt which is the good one ? Timer (seq=123,"abcd")
  14. With Timestamp option (seq=120,TS=1, TS echo=7, "xyz") (ack=123, TS=12, TS echo=1) (seq=123,TS=3, TS echo=12, "abcd") (ack=127, TS=17, TS echo=3) measured rtt timer measured rtt (seq=123,TS=5, TS echo=12, "abcd")
  15. Fast retransmit (seq=123,"abcd") (ack=123) (ack=123) (ack=123) (ack=123) (ack=133) (seq=123,"abcd") "abcdefghij" (seq=127,"ef") Out of sequence, in buffer (seq=129,"gh") Out of sequence, in buffer (seq=131,"ij") Out of sequence, in buffer
  16. Selective Acks • Receiver reports SACK blocks • Negotiated during establishment (seq=123,"abcd") (ack=123) (seq=127,"ef") (ack=123,sack:127-128) (seq=129,"gh") (ack=123, sack:127-130) (seq=131,"ij") (ack=123, sack:127-132) Lost (seq=123,"abcd") (ack=133) "abcdefghij" only 123-126 must be retransmitted
  17. Delayed acks • Sending an ack per segment is costly • Tradeoff • In sequence data segment • no ack waiting, delay by up to 50msec • one ack waiting, send immediately • Out-of-sequence data segment • send ack immediately
  18. When to send data ? • When should a segment be sent ? • After each write system call • When there is a full segment of data
  19. Nagle algorithm • A new data segment can be sent if • This is a full segment (MSS bytes) • There are no unacknowledged bytes
  20. Observed IP packets http://www.caida.org/research/traffic-analysis/pkt_size_distribution/graphs.xml
  21. Flow control (seq=122,"abcd") (ack=126,rwin=0) Last_ack=122, swin=100, rwin=4 To transmit : abcdefghijklm Last_ack=122, swin=96, rwin=0 Last_ack=126, swin=100, rwin=0 (ack=126,rwin=2) (seq=126,"ef") (ack=128,rwin=20) Last_ack=126, swin=100, rwin=2 Last_ack=126, swin=98, rwin=0 Last_ack=128, swin=100, rwin=20 Last_ack=128, swin=93, rwin=13 (seq=128,"ghijklm") (ack=135,rwin=20) Last_ack=135, swin=100, rwin=20
  22. TCP flow control • Performance function of window size • Throughput ~= window/rtt • TCP window : 16 bits field rtt 1 msec 10 msec 100 msec Window 8 Kbytes 65.6 Mbps 6.5 Mbps 0.66 Mbps 64 Kbytes 524.3 Mbps 52.4 Mbps 5.2 Mbps • RFC1323 Window scale extension
  23. Agenda • TCP • Connection establishment • Reliable data transfer • Connection release • SCTP • Congestion control
  24. Connection release FIN(seq=x) DISCONNECT.req (A-B) DISCONNECT.ind(A-B) ACK(ack=x+1) DISCONNECT.conf(A-B) ACK(ack=y+1) DISCONNECT.req(B-A) DISCONNECT.conf(A-B) outgoing connection closed DISCONNECT.ind(B-A) FIN(seq=y) Time WAIT Maintain state for this connection during twice MSL to be able to retransmit ACK if a segment is received from the other entity incoming connection closed incoming connection closed outgoing connection closed State can be removed Last sent data : x-1 Last sent data : y-1
  25. Abrupt release RST(seq=x) DISCONNECT.req (abrupt) DISCONNECT.ind(abrupt) Connection closed Connection closed State can be removed State can be removed Last sent data : x • Data segments can be lost during such an abrupt release • No entity needs to wait in TIME_WAIT state after such a release • anyway, any segment received when there is no state causes the transmission of a RST segment
  26. TCP connection release SYN RCVD FIN Wait1 ?FIN/!ACK CLOSE Wait Established FIN Wait2 !FIN LAST-ACK Closing TIME Wait ?ACK Closed Timeout[2MSL] ?FIN/!ACK ?ACK !FIN ?ACK ?FIN/!ACK !FIN
  27. Agenda • TCP • Connection establishment • Reliable data transfer • Connection release • SCTP • Congestion control
  28. TCP limitations • Service • Only supports bytestream service • Extensibility • Limited space for options • Security • Various issues like Denial of Service attacks
  29. TCP establishment SYN(Src=C,seq=x) CONNECT.ind SYN+ACK(Dest=C,ack=x+1,seq=y) ACK(Src=A,seq=x) CONNECT.req
  30. DoS attack • Attacker sends 1000s of SYNs SYN(Src=A,seq=x) CONNECT.ind CONNECT.ind SYN+ACK(Dest=A,ack=x+1,seq=y) SYN(Src=B,seq=x) SYN+ACK(Dest=B,ack=x+1,seq=z)
  31. TCP Security • 20th century security • Server trusts Alice but not Bob • Server accepts all TCP connections from Alice's IP address without asking a password • Server always asks a password from Bob's IP address
  32. TCP Security • Can Bob create a fake TCP connection by spoofing Alice's IP when she is away ? SYN(seq=x) SYN+ACK(ack=x+1,seq=y) ACK(seq=x+1, ack=y+1) CONNECT.req CONNECT.ind CONNECT.res p CONNECT.conf
  33. TCP Security • Bob's view of the transfer SYN(Src=A,seq=x) SYN+ACK(Dst=A,ack=x+1,seq=y) ACK(seq=x+1, ack=y+1) Data(Src=A,seq=x+1)
  34. SYN Cookies SYN(seq=x) SYN+ACK(ack=x+1,seq=y) ACK(seq=x+1, ack=y+1) CONNECT.req CONNECT.ind CONNECT.conf No state created y=Hash(IPClient,PortClient,Secret) Verify that ack=1+Hash(IPClient,PortClient,Secret) State is created • Stateless passive opener
  35. SCTP • Segment format
  36. SCTP connection establishment
  37. Agenda • TCP • Connection establishment • Reliable data transfer • Connection release • SCTP • Congestion control
  38. TCP Congestion Control • Congestion detection • Packet loss • Explicit Congestion Notification • Congestion control • Additive Increase Multiplicative Decrease
  39. Additive Increase • No congestion ? • All acks move window • Additive increase • Increment cwnd by on MSS every rtt Cwnd Time
  40. • HowF toa sspeteed urp ithne cgrrowetha osf thee congestion window at connection startup ? • Slow-start • Double cwnd every rtt Cwnd Slow-start exponential increase of cwnd Time Max window
  41. Multiplicative • How to detdecte cocngresetioan ?se • Three duplicate acks • mild congestion for TCP • cwnd/2 and restart additive increase • Expiration of retransmission timer • severe congestion • Reset cwnd at 1 MSS • Perform slow-start until half previous cwnd and then continue with congestion avoidance
  42. Cwnd Mild congestion Fast retransmit Threshold Fast retransmit Threshold Slow-start exponential increase of cwnd Congestion avoidance linear increase of cwnd
  43. Severe congestion Cwnd Time Timer expiration Threshold Timer expiration Threshold Slow-start exponential increase of cwnd Congestion avoidance linear increase of cwnd

Editor's Notes

  1. Urgent pointer is rarely used and will not be described. The THL is indicated in blocs of 32 bits. The TCP header may contain options, these will be discussed later.
  2. MSL in IP networks : 120 seconds
  3. MSL in IP networks : 120 seconds
  4. The computation of TCP’s retransmission timer is described in RFC2988 Computing TCP's Retransmission Timer. V. Paxson, M. Allman. November 2000. Usual values for alpha and beta are 1/8 and 1/4.
  5. See P. Karn, C. Partridge, Improving round-trip time estimates in reliable transport protocols, Proc. ACM SIGCOMM87, August 1987
  6. Les timestamps TCP ont étés introduits dans : RFC1323 TCP Extensions for High Performance. V. Jacobson, R. Braden, D. Borman. May 1992. L'utilisation de ces timestamps est négociée lors de l'établissement de la connexion TCP. La plupart des implémentations TCP actuelles supportent ces extensions.
  7. See e.g. RFC2001 TCP Slow Start, Congestion Avoidance, Fast Retransmit, and Fast Recovery Algorithms. W. Stevens. January 1997.
  8. RFC2018 TCP Selective Acknowledgement Options. M. Mathis, J. Mahdavi, S. Floyd, A. Romanow. October 1996.
  9. Some heavily loaded web servers, use abrupt release to close their connection to avoid maintaining state for 2*MSL seconds.
  10. Most TCP implementations today have fixes for those problems. We will discuss them later.
  11. This utilization of a hash function to compute the value of the initial sequence number is usually called a SYN cookie. In practice, the computation of the SYN cookie is slightly more complex than a simple hash function because the server must also remember inside the cookie the following information : - the MSS value advertised by the client - the optional utilization of TCP options such as RFC1323 large windows or timestamps or SACK by the sender The original discussions that lead to the development of the SYN cookie solution may be found in : http://cr.yp.to/syncookies/archive
Advertisement