Week 7
UDP and TCP
SCTP and Internet Congestion control
Agenda
• TCP
• Connection establishment
• Reliable data transfer
• Connection release
• SCTP
• Congestion control
TCP segment
32 bits
Source port Destination port
THL Reserved Flags
Window
Checksum Urgent pointer
Payload
20 bytes
Sequence number
Optional header extension
Flags :
used to indicate the function of a segment
SYN : used during establishment
FIN : used during connection release
RST : used in case of problems
ACK : if true, means that the Acknowledgement
number inside the segment is valid
Computed over the entire
segment and part of the IP
header
Acknowledgement number
Segment header length
Three-way handshake
ACK(seq=x+1, ack=y+1)
CONNECT.req
CONNECT.ind
SYN+ACK(ack=x+1,seq=y)
CONNECT.resp
Initial sequence number (x)
CONNECT.conf
Initial sequence number (y)
SYN(seq=x)
Connection established
Connection established
The sequence numbers of all
segments A->B will start at x+1
The sequence numbers of all
segments B->A will start at y+1
TCP FSM
Init
?SYN / !SYN+ACK !SYN
?SYN / !SYN+ACK
SYN RCVD SYN Sent
Established
?SYN+ACK / !ACK
?ACK
Simultaneous open
CONNECT.conf
SYN(seq=y)
CONNECT.req
CONNECT.req
SYN(seq=x)
Connection established
Connection established
CONNECT.conf
SYN+ACK(seq=y, ack=x+1)
SYN+ACK(seq=x, ack=y+1)
Negotiating options
ACK(seq=x+1, ack=y+1)
CONNECT.req
CONNECT.ind
SYN+ACK(ack=x+1,seq=y) Option
CONNECT.resp
Initial sequence number (x)
Option proposed
CONNECT.conf
Initial sequence number (y)
Option accepted
SYN(seq=x),Option
Connection established
Option accepted
Connection established
The sequence numbers of all
segments A->B will start at x+1
The sequence numbers of all
segments B->A will start at y+1
Agenda
• TCP
• Connection establishment
• Reliable data transfer
• Connection release
• SCTP
• Congestion control
Reliable data
transfer
(seq=123,"abcd")
(seq=127,"ef")
(seq=123,"abcd")
(seq=127,"ef")
(ack=123)
Retransmission timer
(ack=129)
(ack=129)
"abcdef"
unnecessary
retransmission
Retransmission of all
unacked segments
“ef” placed in buffer
Retransmission
timer
• How to compute it ?
• round-trip-time may change frequently
during the lifetime of a TCP connection
RTT measurements
(seq=120,"xyz")
(ack=123)
• Solution (Karn/Partridge)
• Do not measure rtt of retransmitted
segments
(seq=123,"abcd")
(ack=128)
measured rtt
which is the good one ? Timer
(seq=123,"abcd")
Fast retransmit
(seq=123,"abcd")
(ack=123)
(ack=123)
(ack=123)
(ack=123)
(ack=133)
(seq=123,"abcd")
"abcdefghij"
(seq=127,"ef")
Out of sequence, in buffer
(seq=129,"gh")
Out of sequence, in buffer
(seq=131,"ij")
Out of sequence, in buffer
Selective Acks
• Receiver reports SACK blocks
• Negotiated during establishment
(seq=123,"abcd")
(ack=123)
(seq=127,"ef")
(ack=123,sack:127-128)
(seq=129,"gh")
(ack=123, sack:127-130)
(seq=131,"ij")
(ack=123, sack:127-132)
Lost
(seq=123,"abcd")
(ack=133)
"abcdefghij"
only 123-126 must be
retransmitted
Delayed acks
• Sending an ack per segment is costly
• Tradeoff
• In sequence data segment
• no ack waiting, delay by up to 50msec
• one ack waiting, send immediately
• Out-of-sequence data segment
• send ack immediately
When to send data ?
• When should a segment be sent ?
• After each write system call
• When there is a full segment of data
Nagle algorithm
• A new data segment can be sent if
• This is a full segment (MSS bytes)
• There are no unacknowledged bytes
Observed IP packets
http://www.caida.org/research/traffic-analysis/pkt_size_distribution/graphs.xml
Agenda
• TCP
• Connection establishment
• Reliable data transfer
• Connection release
• SCTP
• Congestion control
Connection release
FIN(seq=x)
DISCONNECT.req (A-B)
DISCONNECT.ind(A-B)
ACK(ack=x+1)
DISCONNECT.conf(A-B)
ACK(ack=y+1)
DISCONNECT.req(B-A)
DISCONNECT.conf(A-B)
outgoing connection closed
DISCONNECT.ind(B-A)
FIN(seq=y)
Time WAIT
Maintain state for this
connection during twice MSL
to be able to retransmit ACK
if a segment is received from
the other entity
incoming connection closed
incoming connection closed
outgoing connection closed
State can be removed
Last sent data : x-1
Last sent data : y-1
Abrupt release
RST(seq=x)
DISCONNECT.req (abrupt)
DISCONNECT.ind(abrupt)
Connection closed
Connection closed
State can be removed
State can be removed
Last sent data : x
• Data segments can be lost during such an abrupt release
• No entity needs to wait in TIME_WAIT state after such a release
• anyway, any segment received when there is no state causes the
transmission of a RST segment
TCP connection
release
SYN RCVD
FIN Wait1
?FIN/!ACK
CLOSE Wait
Established
FIN Wait2
!FIN
LAST-ACK
Closing
TIME Wait
?ACK
Closed
Timeout[2MSL]
?FIN/!ACK
?ACK
!FIN
?ACK
?FIN/!ACK
!FIN
Agenda
• TCP
• Connection establishment
• Reliable data transfer
• Connection release
• SCTP
• Congestion control
TCP limitations
• Service
• Only supports bytestream service
• Extensibility
• Limited space for options
• Security
• Various issues like Denial of Service
attacks
DoS attack
• Attacker sends 1000s of SYNs
SYN(Src=A,seq=x)
CONNECT.ind
CONNECT.ind
SYN+ACK(Dest=A,ack=x+1,seq=y)
SYN(Src=B,seq=x)
SYN+ACK(Dest=B,ack=x+1,seq=z)
TCP Security
• 20th century security
• Server trusts Alice but not Bob
• Server accepts all TCP connections
from Alice's IP address without
asking a password
• Server always asks a password
from Bob's IP address
TCP Security
• Can Bob create a fake TCP connection
by spoofing Alice's IP when she is away
?
SYN(seq=x)
SYN+ACK(ack=x+1,seq=y)
ACK(seq=x+1, ack=y+1)
CONNECT.req
CONNECT.ind
CONNECT.res
p
CONNECT.conf
TCP Security
• Bob's view of the transfer
SYN(Src=A,seq=x)
SYN+ACK(Dst=A,ack=x+1,seq=y)
ACK(seq=x+1, ack=y+1)
Data(Src=A,seq=x+1)
SYN Cookies
SYN(seq=x)
SYN+ACK(ack=x+1,seq=y)
ACK(seq=x+1, ack=y+1)
CONNECT.req
CONNECT.ind
CONNECT.conf
No state created
y=Hash(IPClient,PortClient,Secret)
Verify that
ack=1+Hash(IPClient,PortClient,Secret)
State is created
• Stateless passive opener
Agenda
• TCP
• Connection establishment
• Reliable data transfer
• Connection release
• SCTP
• Congestion control
TCP Congestion
Control
• Congestion detection
• Packet loss
• Explicit Congestion Notification
• Congestion control
• Additive Increase Multiplicative
Decrease
Additive Increase
• No congestion ?
• All acks move window
• Additive increase
• Increment cwnd by on MSS every rtt
Cwnd
Time
• HowF toa sspeteed urp ithne cgrrowetha osf thee
congestion window at connection
startup ?
• Slow-start
• Double cwnd every rtt Cwnd
Slow-start
exponential increase of cwnd
Time
Max window
Multiplicative
• How to detdecte cocngresetioan ?se
• Three duplicate acks
• mild congestion for TCP
• cwnd/2 and restart additive increase
• Expiration of retransmission timer
• severe congestion
• Reset cwnd at 1 MSS
• Perform slow-start until half previous cwnd
and then continue with congestion
avoidance
Cwnd
Mild congestion
Fast retransmit
Threshold
Fast retransmit
Threshold
Slow-start
exponential increase of cwnd
Congestion avoidance
linear increase of cwnd
Severe congestion
Cwnd
Time
Timer expiration
Threshold
Timer expiration
Threshold
Slow-start
exponential increase of cwnd
Congestion avoidance
linear increase of cwnd
Editor's Notes
Urgent pointer is rarely used and will not be described.
The THL is indicated in blocs of 32 bits. The TCP header may contain options, these will be discussed later.
MSL in IP networks : 120 seconds
MSL in IP networks : 120 seconds
The computation of TCP’s retransmission timer is described in
RFC2988 Computing TCP's Retransmission Timer. V. Paxson, M. Allman. November 2000.
Usual values for alpha and beta are 1/8 and 1/4.
See
P. Karn, C. Partridge, Improving round-trip time estimates in reliable transport protocols, Proc. ACM SIGCOMM87, August 1987
Les timestamps TCP ont étés introduits dans :
RFC1323 TCP Extensions for High Performance. V. Jacobson, R. Braden, D. Borman. May 1992.
L'utilisation de ces timestamps est négociée lors de l'établissement de la connexion TCP. La plupart des implémentations TCP actuelles supportent ces extensions.
See e.g.
RFC2001 TCP Slow Start, Congestion Avoidance, Fast Retransmit, and Fast Recovery Algorithms. W. Stevens. January 1997.
RFC2018 TCP Selective Acknowledgement Options. M. Mathis, J. Mahdavi, S. Floyd, A. Romanow. October 1996.
Some heavily loaded web servers, use abrupt release to close their connection to avoid maintaining state for 2*MSL seconds.
Most TCP implementations today have fixes for those problems. We will discuss them later.
This utilization of a hash function to compute the value of the initial sequence number is usually called a SYN cookie.
In practice, the computation of the SYN cookie is slightly more complex than a simple hash function because the server must also remember inside the cookie the following information :
- the MSS value advertised by the client
- the optional utilization of TCP options such as RFC1323 large windows or timestamps or SACK by the sender
The original discussions that lead to the development of the SYN cookie solution may be found in :
http://cr.yp.to/syncookies/archive