2. Outline of the Lectures
1 Introduction
2 User Datagram Protocol
3 Transmission Control Protocol
Flow Control in TCP
TCP Operations
Congestion Control Mechanism
Dr. Ramana ( I.I.T Jodhpur ) Transport Layer Protocol 2 / 22
3. Introduction
Introduction to Transport Layer
Responsible for end-to-end packet delivery
Common properties
reliable message delivery – resolving packet reordering, detect
duplications, handling corrupted and dropped packets
support arbitrarily large messages
flow control between sender and receiver entities
support multiple application process on each host
These services are offered over a network providing best − effort
service
Dr. Ramana ( I.I.T Jodhpur ) Transport Layer Protocol 3 / 22
4. User Datagram Protocol
User Datagram Protocol (UDP) – RFC 768
allows multiple application to communicate over a shared channel
via multiplexing and demultiplexing
correctness of the message via checksum (header + data +
pseudo header – srcip,dstip,protocol,udp hlen)
pseudo header protects packets delivering to different destination
by accidentally or incidentally
Dr. Ramana ( I.I.T Jodhpur ) Transport Layer Protocol 4 / 22
6. Transmission Control Protocol
Transmission Control Protocol (TCP) – RFC 793
End-to-end issues
A logical connection between the peers
Variety of physical links, so different BDPs and RTTs
Packet drops, packet reordering, packet corruptions
Available resources at the peers and in the network may change
with time
Reliable byte stream oriented transport protocol
Properties of TCP
Connection oriented protocol and full-duplex connections via
three-way connection setup and four-way connection close
handshaking mechanisms
Reliability, In-order delivery, handling of duplicate and dropped
segments via sequence numbers, acknowledgements, timers, and
segment retransmissions
Flow control via advertising buffer sizes
Congestion control via timeout events and duplicate
acknowledgements
Dr. Ramana ( I.I.T Jodhpur ) Transport Layer Protocol 6 / 22
7. Transmission Control Protocol
Connection setup – Three way
SYN
SYN
System A
State/(Command)
System B
State/(Command)
(Passive Open)
CLOSED
SYN SENT
System A
State/(Command)
System B
State/(Command)
CLOSED
SYN SENT
(Active Open)
ESTAB
CLOSED
(Active Open)
SYN SENT
ESTAB
(Active Open)
ESTAB
LISTEN
CLOSED
ESTAB
SYN SYN
(a) Active/Passive Open (b) Active/Active Open
Connection Establishment Scenarios
Dr. Ramana ( I.I.T Jodhpur ) Transport Layer Protocol 7 / 22
8. Transmission Control Protocol
(Cont.)
SYN i
SYN i
RST, AN = k
SN i + 1, AN = j + 1
SN = i + 1, AN = j + 1
SYN j, AN = i + 1
(a) Normal operation
A initiates a connection
A initiates a connection
Old SYN arrives at A; A rejects
A acknowledges and begins transmission
A acknowledges and begins transmission
B accepts and acknowledges
SYN i
RST, AN = j
SYN j, AN = i + 1
(b) Delayed SYN
(c) Delayed SYN, ACK
Obsolete SYN arrives
A rejects B's connection
B accepts and acknowledges
B accepts and acknowledges
SYN k, AN = p
SYN j, AN = i + 1
Exaples of Three-Way Handshake
A B
SYN
i
SYN jSYN k
SN = k + 1
A B
Obsolete SYN i arrives
Connection closed
B responds; A sends new SYN
B discards duplicate SYN
B rejects segment as out of sequence
Two-Way Handshake: Problem with Obsolete SYN Segments
Dr. Ramana ( I.I.T Jodhpur ) Transport Layer Protocol 8 / 22
10. Transmission Control Protocol Flow Control in TCP
Flow control
At receiver
LastByteRead ≤ NextByteExpected ≤ LastByteRcvd
NextByteExpect ≤ LastByteRcvd + 1 (if data has arrived in order)
AdvertisedWindow = MaxRcvrBuffer - ((NextByteExpected-1) -
LastByteRead))
Advt. zero windown if no buffer is availble
At sender
LastByteAcked ≤ LastByteSent ≤ LastByteWritten
EffectiveWindow = AdvertisedWindow- (LastByteSent - LastByteAcked)
I.e., data in flight must be ≤ Available buffer at receiver
LastByteWritten - LastByteAcked ≤ MaxSendBuff
Sents periodic 1-byte data segments if zero window advt is received
Dr. Ramana ( I.I.T Jodhpur ) Transport Layer Protocol 10 / 22
11. Transmission Control Protocol Flow Control in TCP
A may send 1400 octets
A shrinks its transmit window with each
transmission
B is prepared to receive 1400 octets,
beginning with 1001
B acknowledges 3 segments (600 octets), but is only
prepared to receive 200 additional octets beyond the
original budget (i.e., B will accept octets 1601
through 2600)
B acknowledges 5 segments (1000 octets) and
restores the original amount of credit
A adjusts its window with each credit
A exhausts its credit
A receives new credit
SN = 1001
SN = 1201
SN = 1401
SN = 1601SN = 1801
SN = 2001
SN = 2201
SN = 2401
AN = 1601,W
= 1000
AN = 2601,W = 1400
Transport Entity A Transport Entity B
...1000 1001 2400 2401... ...1000 1001 2400 2401...
...2600 2601 4000 4001...
...2600 2601 4000 4001...
...1000 1001 1601 2401...
...1000 1001 2001 2401...
...1600 1601 2001 2601...
...1600 1601 2601...
...1600 1601 2001 2601...
...1600 1601 2600 2601...
Dr. Ramana ( I.I.T Jodhpur ) Transport Layer Protocol 11 / 22
12. Transmission Control Protocol TCP Operations
TCP Header Format
Dr. Ramana ( I.I.T Jodhpur ) Transport Layer Protocol 12 / 22
13. Transmission Control Protocol TCP Operations
TCP State-transition Diagram
Dr. Ramana ( I.I.T Jodhpur ) Transport Layer Protocol 13 / 22
14. Transmission Control Protocol TCP Operations
TCP Policies
Send policy: Send if MSS bytes are in SendBuffer and AdvertisedWindow ≥
MSS, or send if PUSH request from application, or SendBuffer < MSS bytes and
no data in flight – Nagle’s algo
Delivery policy: Wait until collecting MSS bytes into RcvrBuffer, if no PUSH from
sender; otherwise deliver immediately
Accept policy: Accept all segments with the receiver’s AdvertisedWindow
(against accepting only in-order segments)
Retransmission policy: Setting a timer for the entire SendBuffer. If Ack received,
remove the appropriate segment from the SendBuffer and reset the timer. If
timer expires, retransmit the segment at the front of the SendBuffer
Acknowledgment policy: Start a delay Ack timer and wait for outbound data
segment and piggyback a cumulative acknowledgment. If timeout, send an
empty segment with an appropriate Ack. If delay Ack is disabled, send an Ack
immediately.
Dr. Ramana ( I.I.T Jodhpur ) Transport Layer Protocol 14 / 22
15. Transmission Control Protocol TCP Operations
Other issues
Sequence number wraparound
Uses 32 bits for sequence numbers and Maximum segment life
time (MSL) is 120 seconds
So the sequence number should not wraparound in MSL
In high speed networks, say OC-48 (2.5 Gbps) time to wraparound
is 14 seconds
Solution: sending a timestamp in TCP optional fields
Keep the pipe full
AdvertisedWindow uses 16 bits, i,e 64K bytes
If receiver has large buffer it can not advertise
BDP of high speed or long delay links >> 64KB, ex. on OC-48 (2.5
Gbps) assuming 100ms RTT BDP = 29.6MB
Solution: send a scaling factor, s via TCP options to advertise large
window size. In such case, advertise window will be
s × AdvertisedWindow
Dr. Ramana ( I.I.T Jodhpur ) Transport Layer Protocol 15 / 22
16. Transmission Control Protocol TCP Operations
(Cont.)
TCP optional fields
TCP timestamp – used in RTT computations and also getting over
wraparound problem
Window scaling factor
Selective acknowledgement (SACK)
TCP timers
TCP connection setup timer, if no Ack for SYN before timeout (75
seconds) SYN is resent
Delay Ack timer – 200 ms
Persist timer – triggered when a NULL window is advertised; if
timeout sends a probe to receiver
Keepalive timer – tests if the other side is still up (usually 2 hours
after connection goes inactive)
Fin Wait 2 timer – avoid a connection in Fin Wait 2 state forever
2MSL timer – to make sure that FIN acking is received by the FIN
sender – MSL = 120 seconds
Retransmission timer – Adaptively update RTO interval from RTTs
– Karn and Jacobson algorithms
Dr. Ramana ( I.I.T Jodhpur ) Transport Layer Protocol 16 / 22
17. Transmission Control Protocol TCP Operations
TCP RTO Estimation
Adaptive retransmission timeout interval
Original algorithm
EstimatedRTT = α× EstimatedRTT + (1 - α) × SampleRTT
TimeOut = 2 × EstimatedRTT
Karn/Partridge Algorithm
TCP stops taking samples of RTT when it retransmits a segment
For each successive retransmission timeout, next timeout interval
will be twice to that of the previous one (like exponential backoff)
Jacobson/Karels Algorithm
Variations in RTT samples is also taken into account
Diff = SampleRTT - EstimatedRTT
EstimatedRTT = (1 - α) × EstimatedRTT + α × SampleRTT
Deviation = (1 - β) × Deviation + β × |Diff|
TimeOut = µ× EstimatedRTT + θ× Deviation
µ = 1, θ = 4, α = 1
8 β = 1
4
Dr. Ramana ( I.I.T Jodhpur ) Transport Layer Protocol 17 / 22
18. Transmission Control Protocol Congestion Control Mechanism
TCP Congestion Control Algorithm (CCA), RFC
1122,2581
Introduced in late 1980’s by Van Jacobson
Congestion collapse due to improper response to the congestion
events
Packet loss based CCA vs Rate based CCA
Loss of an Ack or Retransmission timeout =⇒ congestion in the
network
Congestion window (cwnd) – determines number of bytes that can
be outstanding
MaxWindow = MIN(CongestionWindow, AdvertisedWindow)
EffectiveWindow = MaxWindow - NumberOfBytesOutStanding
(i.e, LastByteSent - LastByteAcked)
Dr. Ramana ( I.I.T Jodhpur ) Transport Layer Protocol 18 / 22
19. Transmission Control Protocol Congestion Control Mechanism
Updating of Congestion Window
Multiplicative/Exponential increase
(MI) – Phase Slow-start (SS)
Additive/Linear increase (AI) –
Phase Congestion avoidance (CA)
Dr. Ramana ( I.I.T Jodhpur ) Transport Layer Protocol 19 / 22
20. Transmission Control Protocol Congestion Control Mechanism
(Cont.)
Trasition from MI and AI – via a threshold slow-start threshould
ssthresh
Events that trigger reduction in cwnd
Retransmission timeout
Tripple duplicate Acks (fast retransmission)
How much reduction?
cwnd = 1 MSS
cwnd = cwnd/2 (fast recovery)
Variants of CCA
TCP Tahoe - Fast Retransmit
TCP Reno - Fast Retransmit + Fast Recovery
TCP New reno - Fast Retransmit + Fast Recovery + Avoid multiple
reduction
TCP Cubic
High-speed TCP
TCP-Few (fractional window increase)
Dr. Ramana ( I.I.T Jodhpur ) Transport Layer Protocol 20 / 22
21. Transmission Control Protocol Congestion Control Mechanism
Slow
Start
cwnd = cwnd + 1/cwnd
Ack
& Tahoe
3rd DAck
cwnd = ssthresh
Ack
& not Tahoe
3rd DAck
DAck or (PAck & Newreno)
Artificially Inflate cwnd
If (Newreno or Reno) & 3rd DAck
ssthresh = cwnd = cwnd/2
If (Tahoe & 3rd DAck) or RTO
ssthresh = cwnd / 2, cwnd = 1 MSS
Recovery
Fast
3rd
DAck
cwnd > ssthresh
RTO or (Tahoe & 3rd DAck)
Avoidance
Cong
cwnd = cwnd + 1 MSS
Ack
RTO
DAck − Duplicate Ack
PAck − Partial Ack
Ack − TCP acknowledgment
RTO − Retransmission timeout
MSS − Maximum segment size
cwnd − TCP congestion window
ssthresh − Slow−start threshold
Legend
Actions for 3rd DAck & RTO
State diagram of various TCP CC algorithms
Dr. Ramana ( I.I.T Jodhpur ) Transport Layer Protocol 21 / 22