SlideShare a Scribd company logo
1 of 52
Download to read offline
Communication Networks
Sanjay K. Bose
Lecture Set VII
Transport Layer
TCP/IP Protocol Suite
• TCP/IP protocol stack is a layered architecture. TCP and IP
are the two most important protocols of this stack.
• It was originally developed by the DARPA (Defense Advanced
Research Projects Agency ) for an experimental packet-
switched network .
• It was later included in the Berkeley Software Distribution of
UNIX.
• It maps closely to the OSI layers and it supports all standard
physical and data link protocols.
• It also includes specifications for such common applications as
e-mail, remote login, terminal emulation, and file transfer.
HTTP SMTP RTP
TCP UDP
IP
Network
Interface 1
Network
Interface 3
Network
Interface 2
DNS
TCP/IP Protocol Suite (Transport Layer)
(ICMP, ARP)
Best-effort
connectionless packet
transfer
Variety of Network Technologies
Reliable
stream
service
User
datagram
service
Distributed
applications
Transport services and protocols
• Provide logical communication
between application processes
running on end hosts
• Transport protocols run only in end
systems (not in routers/switches)
– Sender breaks application
messages into segments, passes
to network layer
– Receiver reassembles segments
into messages, passes to
application layer
• Transport protocols available are
TCP and UDP
application
transport
network
data link
physical
application
transport
network
data link
physical
Network Layer: Logical communication
between hosts
Transport Layer: Logical communication
between processes (relies on, and enhances,
network layer services)
TCP/IP Encapsulation
TCP Header contains
source & destination port
numbers for identifying
the application
IP Header contains source
and destination IP addresses;
transport protocol type (TCP
or UDP)
Ethernet Header
contains source &
destination MAC
addresses
HTTP
Request
TCP
header
HTTP
Request
IP
header
TCP
header
HTTP
Request
Ethernet
header
IP
header
TCP
header
HTTP
Request
FCS
Example application :
HTTP
Transport Layer Multiplexing/Demultiplexing
application
transport
network
link
physical
P1 application
transport
network
link
physical
application
transport
network
link
physical
P2P3 P4P1
Host 1 Host 2 Host 3
= process= socket
Delivering received segments
to correct socket
Demultiplexing at Receiver:
Gathering data from multiple
sockets, enveloping data with
header (later used for
demultiplexing)
Multiplexing at Sender:
Demultiplexing at the Transport Layer
• Host receives IP datagrams
– Each datagram has source
IP address, destination IP
address
– Each datagram carries 1
transport-layer segment
– Each segment has source,
destination port number
• Host uses IP addresses & port
numbers to direct segment to
appropriate socket
Source Port # Dest Port #
32 bits
Application
Data
(Message)
other header fields
TCP/UDP Segment Format
0-255 Well-known ports
256-1023 Less well-known ports
1024-65536 Ephemeral client ports
Connectionless Demultiplexing (UDP)
• Create sockets with port
numbers:
DatagramSocket mySocket1 = new
DatagramSocket(12534);
DatagramSocket mySocket2 = new
DatagramSocket(12535);
• UDP socket identified by two-
tuple
(dest IP address, dest port number)
• When host receives UDP
segment:
– checks destination port
number in segment
– directs UDP segment to
socket with that port
number
• IP datagrams with
different source IP
addresses and/or source
port numbers directed to
same socket if the
destination port number is
the same in the
destination host
Note that TCP does this differently,
using a 4-tuple (S-IP, SP, D-IP, DP) to
identify a socket!
Connectionless Demultiplexing (UDP)
Example: Server creates socket at 6428 to provide UDP service to some
application DatagramSocket serverSocket = new DatagramSocket(6428);
Client IP:B
P2
Client IP: A
P1P1P3
Server IP: C
SP: 6428
DP: 9157
SP: 9157
DP: 6428
SP: 6428
DP: 5775
SP: 5775
DP: 6428
• Same socket (6428) at server for both clients in this example
• DP specifies the process to which data should be delivered at the Receiver
• SP specifies the process from which data is coming, for the specified source
IP address; acts like a return address for replies/responses if required to be
sent back
Connection-oriented Demultiplexing (TCP)
• TCP socket identified by 4-
tuple:
– source IP address
– source port number
– destination IP address
– destination port number
• Receiver host uses all four
values to direct segment to
appropriate socket; socket
is uniquely identified by 4-
tuple (S-IP, SP, D-IP, DP)
• Server host may support
many simultaneous TCP
sockets:
– each socket identified by its
own 4-tuple
• Web servers have different
sockets for each connecting
client
– non-persistent HTTP will
have different socket for
each request
Connection-oriented Demultiplexing (TCP)
Client IP:B
P1
Client IP: A
P1P2P4
Server IP: C
SP: 9157
DP: 80
SP: 9157
DP: 80
P5 P6 P3
D-IP:C
S-IP: A
D-IP:C
S-IP: B
SP: 5775
DP: 80
D-IP:C
S-IP: B
• This is a Web Server example as the segments are being sent to Port 80
of the server which corresponds to the HTTP Service
• Note that in this case, the server is creating a separate process for each
of the sockets. This would be inefficient (see next slide for a more
efficient example with “threading”)
Connection-oriented Demultiplexing (TCP)
Threaded Web Server
Client IP:B
P1
Client IP: A
P1P2
Server IP: C
SP: 9157
DP: 80
SP: 9157
DP: 80
P4 P3
D-IP:C
S-IP: A
D-IP:C
S-IP: B
SP: 5775
DP: 80
D-IP:C
S-IP: B
• This is also a Web Server example as the segments are being sent to Port
80 of the server which corresponds to the HTTP Service
• Note that in this case, the server is creating one process for all the
sockets. A new thread (kind of like a sub-process) is created for each socket
UDP: User Datagram Protocol
• “no frills,” “bare bones”
Internet transport
protocol
• “best effort” service, UDP
segments may be:
– lost
– delivered out of order
to application
• Connectionless:
– no handshaking between
UDP sender, receiver
– each UDP segment
handled independently
of others
Why have UDP?
• No connection
establishment (which can
add delay)
• Simple: no connection state
information kept at sender
or receiver
• Small Segment Header
• No Congestion Control: UDP
can transmit as fast as it
can
UDP: User Datagram Protocol
• Commonly used for
streaming multimedia
applications which tend
to be loss tolerant but
rate sensitive
• UDP also used for DNS
and SNMP
• For reliable transfer
over UDP one must add
reliability at the level
of the application layer,
e.g. application-specific
error recovery!
Source Port # Dest Port #
32 bits
Application
Data
(Message)
UDP Segment Format
Length Checksum
Length, in
bytes of
UDP
segment,
including
header
UDP Checksum: Standard Internet
Checksum added by the sender. Used by
the receiver to check for bit errors.
(See next slide)
UDP Checksum Calculation
• UDP checksum covers pseudoheader followed by UDP datagram
• IP addresses included to detect against misdelivery
• Receiver recalculates the checksum and silently discards the
datagram if errors detected (i.e. no error message generated)
• Using UDP checksums is optional but hosts are required to have
checksums enabled
0 0 0 0 0 0 0 0 Protocol = 17 UDP Length
Source IP Address
Destination IP Address
0 8 16 31
UDP Pseudoheader
(used in checksum calculation but never actually transmitted,
nor is it included in the “Length”)
Note that IP Address information will come from another layer (Network
Layer). Strictly speaking, this goes against the philosophy of keeping the
layers separate from each other.
UDP Destination Port Usage
Port 1 Port 2 Port 3
UDP Demultiplexing
(based on destination port #)
IP Layer
Arrival of UDP Datagram
Datagram demultiplexed
to its appropriate port
Error Message sent
back if the Dest.
Port # indicated in
the datagram does
not exist!
UDP Port Numbers
Well Known Port Numbers Dynamically Assigned Port
Numbers
Universally assigned and
accepted port #s providing
some designated service.
Typically, lower port
numbers used for this
Examples -
37 Time
53 Domain Name Server
67 DHCP Server
68 DHCP Client
• Ports are not globally
known
• When a program needs a
port, it asks for and gets
one from the network
software
• Destination m/c needs to
be queried to find the port
number at which it may be
offering the service to be
accessed
• Typically higher port
numbers used for this
TCP: Transmission Control Protocol
• full duplex data:
– bi-directional data flow in
same connection
– MSS: maximum segment
size
• connection-oriented:
– handshaking (exchange of
control msgs) initializes
sender, receiver state
before data exchange
• flow controlled:
– sender will not overwhelm
receiver
• point-to-point:
– one sender, one receiver
• reliable, in-order byte
steam:
– no “message boundaries”
inside!
• pipelined:
– TCP congestion and flow
control set window size
• send & receive buffers
socket
door
TCP
send buffer
TCP
receive buffer
socket
door
segment
application
writes data
application
reads data
Important to remember though that a TCP stream is unstructured, i.e.
no boundary marks in the stream itself so application would have to
create such boundary marks if needed (e.g. separating different fields)
TCP Segment Format
Each TCP segment has header of 20 or more bytes + 0 or more bytes of data
Source Port Destination Port
Sequence Number
Acknowledgment Number
Checksum Urgent Pointer
Options Padding
0 4 10 16 24 31
U
R
G
A
C
K
P
S
H
R
S
T
S
Y
N
F
I
N
Header
Length
Reserved Window Size
Data
Header
TCP Header
Port Numbers
• A socket identifies a
connection endpoint
– IP address + port
• A connection specified by a
socket pair
• Well-known ports
– FTP 20
– Telnet 23
– DNS 53
– HTTP 80
Sequence Number
• Byte count
• First byte in segment
• 32 bits long
• 0  SN  232-1
• Initial sequence number
selected during connection
setup
TCP Header
Acknowledgement Number
• SN of next byte expected by
receiver
• Acknowledges that all prior
bytes in stream have been
received correctly
• Valid if ACK flag is set
Header length
• 4 bits
• Length of header in multiples
of 32-bit words
• Minimum header length is 20
bytes
• Maximum header length is 60
bytes
TCP Header
Reserved
• 6 bits
Control
• 6 bits
• URG: urgent pointer flag
– Urgent message end = SN + urgent pointer
• ACK: ACK packet flag
• PSH: override TCP buffering
• RST: reset connection
– Upon receipt of RST, connection is
terminated and application layer notified
• SYN: establish connection
• FIN: close connection
TCP Header
Window Size
• 16 bits to advertise window
size
• Used for flow control
• Sender will accept bytes with
SN from ACK to ACK +
window
• Maximum window size is
65535 bytes
TCP Checksum
• Internet checksum method
• Computed over
TCP pseudo header + TCP
segment (header+ data)
(See next slide for TCP pseudo
header)
TCP Pseudo Header
(for checksum calculation)
0 0 0 0 0 0 0 0 Protocol = 6 TCP Segment Length
Source IP address
Destination IP address
0 8 16 31
 Used in checksum calculation but never actually transmitted,
nor is it included in the “Length”
Usage similar to that of the UDP Pseudoheader
TCP Header
Options
• Variable length
• NOP (No Operation) option is
used to pad TCP header to
multiple of 32 bits
• Time stamp option is used
for round trip measurements
Options
• Maximum Segment Size
(MSS) option specifices
largest segment a receiver
wants to receive
• Window Scale option
increases TCP window from
16 to 32 bits
TCP Services
• Provides a full duplex connection-oriented and reliable byte-
stream service using a sliding-window flow control.
• User data are broken into segments not exceeding 64 kbytes
(usually about 1500 bytes) and sent to the destination by
encapsulating them in IP datagrams
– IP provides unreliable packet delivery
– packets can get lost, duplicated or delivered out of
sequence
• Receiver sends an acknowledgment back after receiving a
segment.
• Retransmission of segment if necessary
TCP Services
buffer
segments buffer used
Application
Transport
advertised
window size < B
buffer available = B
Application
buffer
segments
buffer
Application
Transport
ACKS
RTT
Estimation
Application
TCP Round Trip Time and Timeout
How to set TCP timeout value?
• Must be set longer than the RTT, but the RTT also
varies
• If it is set too short, the premature timeout may
happen leading to unnecessary retransmissions
• If it is set too long then the response to segment
loss will be too slow.
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )*EstimatedRTT + *SampleRTT
 Exponential weighted moving average
 influence of past sample decreases exponentially fast
 typical value:  = 0.125
How is the RTT estimated?
• SampleRTT = measured time from segment transmission
until ACK for that is received, ignoring retransmissions
• SampleRTT will fluctuate but we want the estimated RTT
to be “smoother”. This is done by taking a moving average
EstimatedRTT over recent measurements – should not just
use the current SampleRTT
RTT: gaia.cs.umass.edu to fantasia.eurecom.fr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RTT(milliseconds)
SampleRTT Estimated RTT
Example Measurments
(SampleRTT and EstimatedRTT)
TCP Round Trip Time and Timeout
Setting the timeout
• EstimtedRTT plus “safety margin”
– large variation in EstimatedRTT -> larger safety margin
• First estimate of how much SampleRTT deviates from
EstimatedRTT:
TimeoutInterval = EstimatedRTT + 4*DevRTT
DevRTT = (1-)*DevRTT +
*|SampleRTT-EstimatedRTT|
(typically,  = 0.25)
• Then set the timeout interval as -
TCP Connection Establishment
• Three-Way Handshake
– A sends a SYN segment specifying the port number of
the other party B , the initial sequence number (ISN)
that A will use and other info (eg. max. segment size)
– B responds with its own SYN segment containing its ISN.
B also acknowledges A’s SYN by ACKing A’s ISN plus one
– A acknowledges B’s SYN by ACKing B’s ISN plus one
• Initial Sequence Number (ISN) may be randomly chosen but
with some important considerations
Initial Sequence Number (ISN)
• Select initial sequence numbers (ISN) to protect against
segments from prior connections (that may circulate in the
network and arrive at a much later time)
• Select ISN to avoid overlap with sequence numbers of prior
connections
• Use local clock to select ISN sequence number
• Time for clock to go through a full cycle should be greater than
the maximum lifetime of a segment (MSL); Typically MSL=120
seconds
• High bandwidth connections pose a problem
Three Way Handshake
(TCP Connection Setup)
Host A Host B
Protects the ISN against responding falsely to old segments from prior connections
Maximum Segment Size
• Maximum Segment Size (MSS) - largest block of data that TCP
sends to other end
• Each end can announce its MSS during connection establishment
• Default is 576 bytes including 20 bytes for IP header and 20
bytes for TCP header
• Slight difference between the MSS of Ethernet and IEEE
802.3.
Ethernet MSS = 1460 bytes
IEEE 802.3 MSS = 1452 bytes
TCP Window Flow Control
Host A Host B
t1
t2
t3
t4
t0
Win =Advertised
Window size
128 bytes to
transmit
Only 512 bytes
sent as that is the
advertised value
of Win
1024 bytes to
transmit
1024 bytes to
transmit
1024 bytes to
transmit
1024 bytes to
transmit
Nagle Algorithm
• Situation: User types one character at a time
– Transmitter sends TCP segment per character (41B)
– Receiver sends ACK (40B)
– Receiver echoes received character (41B)
– Transmitter ACKs echo (40 B)
– 162 bytes transmitted to transfer one character! Problem!
• Solution:
– TCP sends data & waits for ACK
– New characters buffered
– Send new characters when ACK arrives
– Algorithm adjusts to RTT as follows -
• Short RTT send frequently at low efficiency
• Long RTT send less frequently at greater efficiency
Silly Window Syndrome
• Situation:
– Transmitter sends large amount of data
– Receiver’s buffer is depleted slowly, so buffer fills up
– Every time a few bytes read from buffer, a new advertisement to
transmitter is generated
– Sender immediately sends data & fills buffer
– This leads to many small, inefficient segments being transmitted
• Solution:
– Receiver does not advertize window until window is at least ½ of
receiver buffer or is equal to the maximum segment size (MSS)
– Transmitter refrains from sending small segments
Sequence Number Wraparound
(Potential problem at high data rates)
• 232 = 4.29x109 bytes = 34.3x109 bits (TCP has 32-bit seq. no.)
Therefore, at 1 Gbps, sequence numbers will wraparound in just 34.3
seconds transmitter can only transmit for very brief periods
Solution: Use Timestamp Option in TCP option field. Transmitter inserts
32-byte timestamp in transmitted segment. Receiver echoes this in ACK.
This option must be requested in the SYN segment and is negotiated
during the Connection Setup.
– Timestamp + sequence no → 64-bit seq. no (effectively a
much larger sequence number than the original 32-bit)
– Timestamp clock must:
• Tick forward at least once every 231 bits
• Not complete cycle in less than one MSL
• Example: clock tick every 1 ms @ 8 Tbps wraps around in 25
days
Delay-BW Product & Advertised Window Size
• Suppose RTT=100 ms, R=2.4 Gbps then –
No. of bits in pipe = 3 Mbytes
• If a single TCP process occupies the pipe, then required
advertised window size is RTT x Bit rate = 3 Mbytes
• But, normal maximum window size is only 65535 bytes which
clearly is inadequately small
• Solution: Use the “Window Scale Option” which will allow the
window to be scaled upward by a factor of 214 . Then a Window
Size up to 65535 x 214 = 1 Gbyte will be allowed. This window
scaling option must be requested in the SYN segment and is
negotiated during the Connection Setup.
(Graceful Close)
Host B still
delivers 150
bytes
Host A Host B
Closing a TCP Connection
Host A initiates the TCP
connection termination,
sends its FIN Host B sends ACK
but does not yet
send its own FIN
Host B now sends
its own FIN
Host A ACKs B’s
FIN closing its side
of the connection
Host B gets A’s
ACK and closes its
side of the
connection
After sending FIN, Host A cannot send any more data but
cannot close the connection as B may still be sending something
TIME_WAIT state
TIME_WAIT State is entered if the host sending a FIN (e.g. Host
A in previous slide) receives an ACK from the other side
 This protects future incarnations of connection from delayed
segments
 TIME_WAIT = 2 x MSL
Maximum Segment Lifetime (MSL) is the maximum time that an
IP packet packet can live in the network
 Only valid segment that can arrive while in TIME_WAIT
state is a FIN retransmission. If such segment arrives,
resent ACK & restart TIME_WAIT timer
 When timer expires, close TCP connection
TCP State
Transition
Diagram
CLOSED
LISTEN
SYN_RCVD
ESTABLISHED
CLOSING
TIME_WAIT
SYN_SENT
FIN_WAIT_1
CLOSE_WAIT
LAST_ACK
FIN_WAIT_2
passive open,
create TCB
application
close,
send
FIN
application close
or timeout,
delete TCB
2MSL timeout
delete TCB
receive SYN,
send ACK
Appli-
cation
close
Congestion Control in TCP
• Advertised window size is used to ensure that receiver’s buffer will not
overflow
• However, buffers at intermediate routers between source and
destination may still overflow (i.e. because of network congestion)
Router
R bps
Packetflowsmay
comeinfrommany
sources
 Congestion occurs when total arrival rate from all packet flows exceeds
R over a sustained period of time
 When congestion occurs, buffers at routers will fill and packets will be
lost
Different Phases of Congestion Behavior
1. Light traffic
– Arrival Rate << R
– Low delay
– Can accommodate more
2. Knee (congestion onset)
– Arrival rate approaches R
– Delay increases rapidly
– Throughput begins to
saturate
3. Congestion collapse
– Arrival rate > R
– Large delays, packet loss
– Useful application
throughput drops
Throughput(bps)Delay(sec)
R
R
Arrival
Rate
Arrival
Rate
Window Congestion Control
• (From previous slide) Desired operating point will be just
before knee as shown there. Sources must control their sending
rates so that aggregate arrival rate is just before knee
• TCP sender maintains a congestion window “cwnd” to control
congestion at intermediate routers
• Effective window is minimum of congestion window and
advertised window
• Problem: The source does not know what its “fair” share of
available bandwidth should be and so does not know what value
to set for cwnd
• Solution: Adjust cwnd dynamically to available BW as follows
– Sources probe the network by gradually increasing cwnd
(Initially set cwnd to a low value)
– When congestion detected, sources reduce rate
– Ideally, sources’ sending rate will stabilize near ideal point
Congestion Window
How does the TCP congestion algorithm change congestion
window dynamically according to the most up-to-date state of
the network?
• At light traffic: each segment is ACKed quickly
– Increase cwnd aggresively
• At knee: segment ACKs arrive, but more slowly
– Slow down increase in cwnd
• At congestion: segments encounter large delays (so
retransmission timeouts occur); segments get dropped in
router buffers
– Reduce transmission rate, then probe again
TCP Congestion Control: Slow Start
Slow Start: Increase congestion window size by one segment upon
receiving an ACK from receiver
– initialized at  2 segments
– used at (re)start of data transfer
– congestion window increases exponentially
ACK
Seg
RTTs
1
2
4
8
cwnd
TCP Congestion Control: Congestion Avoidance
• Algorithm progressively sets
a congestion threshold
– When cwnd > threshold,
slow down rate at which
cwnd is increased
• Increase congestion window
size by one segment per
round-trip-time (RTT)
– Each time an ACK
arrives, cwnd is
increased by 1/cwnd
– In one RTT, cwnd
segments are sent, so
total increase in cwnd is
cwnd x 1/cwnd = 1
– cwnd grows linearly with
time
RTTs
1
2
4
8
cwnd
threshold
TCP Congestion Control: Congestion
• Congestion is detected upon
timeout or receipt of
duplicate ACKs
• Assume current cwnd
corresponds to available
bandwidth
• Adjust congestion threshold
= ½ x current cwnd
• Reset cwnd to 1
• Go back to slow-start
• Over several cycles expect
to converge to congestion
threshold equal to about ½
the available bandwidth
Congestionwindow
10
5
15
20
0
Round-trip times
Slow
start
Congestion
Avoidance
Time-out
Threshold
Fast Retransmit & Fast Recovery
• Congestion causes many segments to be
dropped
• If only a single segment is dropped, then
subsequent segments trigger duplicate ACKs
before timeout (as shown)
• Can avoid large decrease in cwnd as follows:
– When three duplicate ACKs arrive,
retransmit lost segment immediately
– Reset congestion threshold to ½ cwnd
– Reset cwnd to congestion threshold + 3
to account for the three segments that
triggered duplicate ACKs
– Remain in congestion avoidance phase
– However if timeout expires, reset cwnd
to 1
– In absence of timeouts, cwnd will
oscillate around optimal value
SN=1
ACK=2
ACK=2
ACK=2
ACK=2
SN=2
SN=3
SN=4
SN=5
TCP Congestion Control:
Fast Retransmit & Fast Recovery
Congestionwindow
10
5
15
20
0
Time (in units of RTT)
Slow
start
Congestion
avoidance
Time-out
Threshold

More Related Content

What's hot

Full Stack Load Testing
Full Stack Load Testing Full Stack Load Testing
Full Stack Load Testing Terral R Jordan
 
Anatomy of neutron from the eagle eyes of troubelshoorters
Anatomy of neutron from the eagle eyes of troubelshoortersAnatomy of neutron from the eagle eyes of troubelshoorters
Anatomy of neutron from the eagle eyes of troubelshoortersSadique Puthen
 
Troubleshooting containerized triple o deployment
Troubleshooting containerized triple o deploymentTroubleshooting containerized triple o deployment
Troubleshooting containerized triple o deploymentSadique Puthen
 
Covert timing channels using HTTP cache headers
Covert timing channels using HTTP cache headersCovert timing channels using HTTP cache headers
Covert timing channels using HTTP cache headersyalegko
 
Ns3: Newreno vs Vegas vs Veno
Ns3: Newreno vs Vegas vs VenoNs3: Newreno vs Vegas vs Veno
Ns3: Newreno vs Vegas vs VenoTCHAYE Jude
 
Covert Timing Channels using HTTP Cache Headers
Covert Timing Channels using HTTP Cache HeadersCovert Timing Channels using HTTP Cache Headers
Covert Timing Channels using HTTP Cache HeadersDenis Kolegov
 
Technical Overview of QUIC
Technical  Overview of QUICTechnical  Overview of QUIC
Technical Overview of QUICshigeki_ohtsu
 
Netty 4-based RPC System Development
Netty 4-based RPC System DevelopmentNetty 4-based RPC System Development
Netty 4-based RPC System DevelopmentAllan Huang
 
Docker Networking with New Ipvlan and Macvlan Drivers
Docker Networking with New Ipvlan and Macvlan DriversDocker Networking with New Ipvlan and Macvlan Drivers
Docker Networking with New Ipvlan and Macvlan DriversBrent Salisbury
 
Altitude SF 2017: QUIC - A low-latency secure transport for HTTP
Altitude SF 2017: QUIC - A low-latency secure transport for HTTPAltitude SF 2017: QUIC - A low-latency secure transport for HTTP
Altitude SF 2017: QUIC - A low-latency secure transport for HTTPFastly
 
Linux Networking Explained
Linux Networking ExplainedLinux Networking Explained
Linux Networking ExplainedThomas Graf
 
2015 FOSDEM - OVS Stateful Services
2015 FOSDEM - OVS Stateful Services2015 FOSDEM - OVS Stateful Services
2015 FOSDEM - OVS Stateful ServicesThomas Graf
 
Network and TCP performance relationship workshop
Network and TCP performance relationship workshopNetwork and TCP performance relationship workshop
Network and TCP performance relationship workshopKae Hsu
 
Enable DPDK and SR-IOV for containerized virtual network functions with zun
Enable DPDK and SR-IOV for containerized virtual network functions with zunEnable DPDK and SR-IOV for containerized virtual network functions with zun
Enable DPDK and SR-IOV for containerized virtual network functions with zunheut2008
 
Oow2007 performance
Oow2007 performanceOow2007 performance
Oow2007 performanceRicky Zhu
 
Analyzing network packets Using Wireshark
Analyzing network packets Using WiresharkAnalyzing network packets Using Wireshark
Analyzing network packets Using WiresharkSmrutiRanjanBiswal9
 
How to Speak Intel DPDK KNI for Web Services.
How to Speak Intel DPDK KNI for Web Services.How to Speak Intel DPDK KNI for Web Services.
How to Speak Intel DPDK KNI for Web Services.Naoto MATSUMOTO
 

What's hot (20)

Full Stack Load Testing
Full Stack Load Testing Full Stack Load Testing
Full Stack Load Testing
 
Anatomy of neutron from the eagle eyes of troubelshoorters
Anatomy of neutron from the eagle eyes of troubelshoortersAnatomy of neutron from the eagle eyes of troubelshoorters
Anatomy of neutron from the eagle eyes of troubelshoorters
 
Troubleshooting containerized triple o deployment
Troubleshooting containerized triple o deploymentTroubleshooting containerized triple o deployment
Troubleshooting containerized triple o deployment
 
Covert timing channels using HTTP cache headers
Covert timing channels using HTTP cache headersCovert timing channels using HTTP cache headers
Covert timing channels using HTTP cache headers
 
Ns3: Newreno vs Vegas vs Veno
Ns3: Newreno vs Vegas vs VenoNs3: Newreno vs Vegas vs Veno
Ns3: Newreno vs Vegas vs Veno
 
Covert Timing Channels using HTTP Cache Headers
Covert Timing Channels using HTTP Cache HeadersCovert Timing Channels using HTTP Cache Headers
Covert Timing Channels using HTTP Cache Headers
 
Technical Overview of QUIC
Technical  Overview of QUICTechnical  Overview of QUIC
Technical Overview of QUIC
 
Netty 4-based RPC System Development
Netty 4-based RPC System DevelopmentNetty 4-based RPC System Development
Netty 4-based RPC System Development
 
QUIC
QUICQUIC
QUIC
 
Docker Networking with New Ipvlan and Macvlan Drivers
Docker Networking with New Ipvlan and Macvlan DriversDocker Networking with New Ipvlan and Macvlan Drivers
Docker Networking with New Ipvlan and Macvlan Drivers
 
Altitude SF 2017: QUIC - A low-latency secure transport for HTTP
Altitude SF 2017: QUIC - A low-latency secure transport for HTTPAltitude SF 2017: QUIC - A low-latency secure transport for HTTP
Altitude SF 2017: QUIC - A low-latency secure transport for HTTP
 
Linux Networking Explained
Linux Networking ExplainedLinux Networking Explained
Linux Networking Explained
 
2015 FOSDEM - OVS Stateful Services
2015 FOSDEM - OVS Stateful Services2015 FOSDEM - OVS Stateful Services
2015 FOSDEM - OVS Stateful Services
 
Ad Server Optimization
Ad Server OptimizationAd Server Optimization
Ad Server Optimization
 
Quic illustrated
Quic illustratedQuic illustrated
Quic illustrated
 
Network and TCP performance relationship workshop
Network and TCP performance relationship workshopNetwork and TCP performance relationship workshop
Network and TCP performance relationship workshop
 
Enable DPDK and SR-IOV for containerized virtual network functions with zun
Enable DPDK and SR-IOV for containerized virtual network functions with zunEnable DPDK and SR-IOV for containerized virtual network functions with zun
Enable DPDK and SR-IOV for containerized virtual network functions with zun
 
Oow2007 performance
Oow2007 performanceOow2007 performance
Oow2007 performance
 
Analyzing network packets Using Wireshark
Analyzing network packets Using WiresharkAnalyzing network packets Using Wireshark
Analyzing network packets Using Wireshark
 
How to Speak Intel DPDK KNI for Web Services.
How to Speak Intel DPDK KNI for Web Services.How to Speak Intel DPDK KNI for Web Services.
How to Speak Intel DPDK KNI for Web Services.
 

Similar to Lecture set 7

CCNA (R & S) Module 01 - Introduction to Networks - Chapter 9
CCNA (R & S) Module 01 - Introduction to Networks - Chapter 9CCNA (R & S) Module 01 - Introduction to Networks - Chapter 9
CCNA (R & S) Module 01 - Introduction to Networks - Chapter 9Waqas Ahmed Nawaz
 
Networking essentials lect3
Networking essentials lect3Networking essentials lect3
Networking essentials lect3Roman Brovko
 
Transport layer services
Transport layer servicesTransport layer services
Transport layer servicesMelvin Cabatuan
 
Byte Ordering - Unit 2.pptx
Byte Ordering - Unit 2.pptxByte Ordering - Unit 2.pptx
Byte Ordering - Unit 2.pptxRockyBhai46825
 
Socket Programming
Socket ProgrammingSocket Programming
Socket ProgrammingCEC Landran
 
tcp-140613123317-phpapp01.pptx
tcp-140613123317-phpapp01.pptxtcp-140613123317-phpapp01.pptx
tcp-140613123317-phpapp01.pptxtouseeqzulfiqar1
 
Comptia Security + Chapter 1 501
Comptia Security           + Chapter 1 501Comptia Security           + Chapter 1 501
Comptia Security + Chapter 1 501AbdulalimBhnsawy
 
LECTURE-Transport-Layer_lec.ppt
LECTURE-Transport-Layer_lec.pptLECTURE-Transport-Layer_lec.ppt
LECTURE-Transport-Layer_lec.pptMonirHossain707319
 
Get into Networking by Clearing Comptia Network+ Test
Get into Networking by Clearing Comptia Network+ TestGet into Networking by Clearing Comptia Network+ Test
Get into Networking by Clearing Comptia Network+ Testcertblaster
 
Unit 4-Transport Layer Protocols-3.pptx
Unit 4-Transport Layer Protocols-3.pptxUnit 4-Transport Layer Protocols-3.pptx
Unit 4-Transport Layer Protocols-3.pptxDESTROYER39
 
Unit 4-Transport Layer Protocols.pptx
Unit 4-Transport Layer Protocols.pptxUnit 4-Transport Layer Protocols.pptx
Unit 4-Transport Layer Protocols.pptxsarosh32
 
Computer networks transport layer
Computer networks  transport layerComputer networks  transport layer
Computer networks transport layerjamunaashok
 
Datacom_Section_2_-_Protocols.ppt
Datacom_Section_2_-_Protocols.pptDatacom_Section_2_-_Protocols.ppt
Datacom_Section_2_-_Protocols.pptKristopher Hefner
 
tcp ip protocols.ppt
tcp ip protocols.ppttcp ip protocols.ppt
tcp ip protocols.pptssuser3acfba
 

Similar to Lecture set 7 (20)

lecturer3.pptx
lecturer3.pptxlecturer3.pptx
lecturer3.pptx
 
TCP /IP
TCP /IPTCP /IP
TCP /IP
 
CCNA (R & S) Module 01 - Introduction to Networks - Chapter 9
CCNA (R & S) Module 01 - Introduction to Networks - Chapter 9CCNA (R & S) Module 01 - Introduction to Networks - Chapter 9
CCNA (R & S) Module 01 - Introduction to Networks - Chapter 9
 
Transport layer protocol
Transport layer protocolTransport layer protocol
Transport layer protocol
 
Networking essentials lect3
Networking essentials lect3Networking essentials lect3
Networking essentials lect3
 
Transport layer services
Transport layer servicesTransport layer services
Transport layer services
 
Byte Ordering - Unit 2.pptx
Byte Ordering - Unit 2.pptxByte Ordering - Unit 2.pptx
Byte Ordering - Unit 2.pptx
 
Networking in python by Rj
Networking in python by RjNetworking in python by Rj
Networking in python by Rj
 
User Datagram Protocol
User Datagram ProtocolUser Datagram Protocol
User Datagram Protocol
 
Socket Programming
Socket ProgrammingSocket Programming
Socket Programming
 
tcp-140613123317-phpapp01.pptx
tcp-140613123317-phpapp01.pptxtcp-140613123317-phpapp01.pptx
tcp-140613123317-phpapp01.pptx
 
Comptia Security + Chapter 1 501
Comptia Security           + Chapter 1 501Comptia Security           + Chapter 1 501
Comptia Security + Chapter 1 501
 
LECTURE-Transport-Layer_lec.ppt
LECTURE-Transport-Layer_lec.pptLECTURE-Transport-Layer_lec.ppt
LECTURE-Transport-Layer_lec.ppt
 
Get into Networking by Clearing Comptia Network+ Test
Get into Networking by Clearing Comptia Network+ TestGet into Networking by Clearing Comptia Network+ Test
Get into Networking by Clearing Comptia Network+ Test
 
TCP/IP(networking)
TCP/IP(networking)TCP/IP(networking)
TCP/IP(networking)
 
Unit 4-Transport Layer Protocols-3.pptx
Unit 4-Transport Layer Protocols-3.pptxUnit 4-Transport Layer Protocols-3.pptx
Unit 4-Transport Layer Protocols-3.pptx
 
Unit 4-Transport Layer Protocols.pptx
Unit 4-Transport Layer Protocols.pptxUnit 4-Transport Layer Protocols.pptx
Unit 4-Transport Layer Protocols.pptx
 
Computer networks transport layer
Computer networks  transport layerComputer networks  transport layer
Computer networks transport layer
 
Datacom_Section_2_-_Protocols.ppt
Datacom_Section_2_-_Protocols.pptDatacom_Section_2_-_Protocols.ppt
Datacom_Section_2_-_Protocols.ppt
 
tcp ip protocols.ppt
tcp ip protocols.ppttcp ip protocols.ppt
tcp ip protocols.ppt
 

More from Gopi Saiteja

More from Gopi Saiteja (20)

Trees gt(1)
Trees gt(1)Trees gt(1)
Trees gt(1)
 
Topic11 sortingandsearching
Topic11 sortingandsearchingTopic11 sortingandsearching
Topic11 sortingandsearching
 
Heapsort
HeapsortHeapsort
Heapsort
 
Hashing gt1
Hashing gt1Hashing gt1
Hashing gt1
 
Ee693 sept2014quizgt2
Ee693 sept2014quizgt2Ee693 sept2014quizgt2
Ee693 sept2014quizgt2
 
Ee693 sept2014quizgt1
Ee693 sept2014quizgt1Ee693 sept2014quizgt1
Ee693 sept2014quizgt1
 
Ee693 sept2014quiz1
Ee693 sept2014quiz1Ee693 sept2014quiz1
Ee693 sept2014quiz1
 
Ee693 sept2014midsem
Ee693 sept2014midsemEe693 sept2014midsem
Ee693 sept2014midsem
 
Ee693 questionshomework
Ee693 questionshomeworkEe693 questionshomework
Ee693 questionshomework
 
Dynamic programming
Dynamic programmingDynamic programming
Dynamic programming
 
Cs105 l15-bucket radix
Cs105 l15-bucket radixCs105 l15-bucket radix
Cs105 l15-bucket radix
 
Chapter11 sorting algorithmsefficiency
Chapter11 sorting algorithmsefficiencyChapter11 sorting algorithmsefficiency
Chapter11 sorting algorithmsefficiency
 
Answers withexplanations
Answers withexplanationsAnswers withexplanations
Answers withexplanations
 
Sorting
SortingSorting
Sorting
 
Solution(1)
Solution(1)Solution(1)
Solution(1)
 
Pthread
PthreadPthread
Pthread
 
Open mp
Open mpOpen mp
Open mp
 
Introduction
IntroductionIntroduction
Introduction
 
Cuda
CudaCuda
Cuda
 
Vector space interpretation_of_random_variables
Vector space interpretation_of_random_variablesVector space interpretation_of_random_variables
Vector space interpretation_of_random_variables
 

Lecture set 7

  • 1. Communication Networks Sanjay K. Bose Lecture Set VII Transport Layer
  • 2. TCP/IP Protocol Suite • TCP/IP protocol stack is a layered architecture. TCP and IP are the two most important protocols of this stack. • It was originally developed by the DARPA (Defense Advanced Research Projects Agency ) for an experimental packet- switched network . • It was later included in the Berkeley Software Distribution of UNIX. • It maps closely to the OSI layers and it supports all standard physical and data link protocols. • It also includes specifications for such common applications as e-mail, remote login, terminal emulation, and file transfer.
  • 3. HTTP SMTP RTP TCP UDP IP Network Interface 1 Network Interface 3 Network Interface 2 DNS TCP/IP Protocol Suite (Transport Layer) (ICMP, ARP) Best-effort connectionless packet transfer Variety of Network Technologies Reliable stream service User datagram service Distributed applications
  • 4. Transport services and protocols • Provide logical communication between application processes running on end hosts • Transport protocols run only in end systems (not in routers/switches) – Sender breaks application messages into segments, passes to network layer – Receiver reassembles segments into messages, passes to application layer • Transport protocols available are TCP and UDP application transport network data link physical application transport network data link physical Network Layer: Logical communication between hosts Transport Layer: Logical communication between processes (relies on, and enhances, network layer services)
  • 5. TCP/IP Encapsulation TCP Header contains source & destination port numbers for identifying the application IP Header contains source and destination IP addresses; transport protocol type (TCP or UDP) Ethernet Header contains source & destination MAC addresses HTTP Request TCP header HTTP Request IP header TCP header HTTP Request Ethernet header IP header TCP header HTTP Request FCS Example application : HTTP
  • 6. Transport Layer Multiplexing/Demultiplexing application transport network link physical P1 application transport network link physical application transport network link physical P2P3 P4P1 Host 1 Host 2 Host 3 = process= socket Delivering received segments to correct socket Demultiplexing at Receiver: Gathering data from multiple sockets, enveloping data with header (later used for demultiplexing) Multiplexing at Sender:
  • 7. Demultiplexing at the Transport Layer • Host receives IP datagrams – Each datagram has source IP address, destination IP address – Each datagram carries 1 transport-layer segment – Each segment has source, destination port number • Host uses IP addresses & port numbers to direct segment to appropriate socket Source Port # Dest Port # 32 bits Application Data (Message) other header fields TCP/UDP Segment Format 0-255 Well-known ports 256-1023 Less well-known ports 1024-65536 Ephemeral client ports
  • 8. Connectionless Demultiplexing (UDP) • Create sockets with port numbers: DatagramSocket mySocket1 = new DatagramSocket(12534); DatagramSocket mySocket2 = new DatagramSocket(12535); • UDP socket identified by two- tuple (dest IP address, dest port number) • When host receives UDP segment: – checks destination port number in segment – directs UDP segment to socket with that port number • IP datagrams with different source IP addresses and/or source port numbers directed to same socket if the destination port number is the same in the destination host Note that TCP does this differently, using a 4-tuple (S-IP, SP, D-IP, DP) to identify a socket!
  • 9. Connectionless Demultiplexing (UDP) Example: Server creates socket at 6428 to provide UDP service to some application DatagramSocket serverSocket = new DatagramSocket(6428); Client IP:B P2 Client IP: A P1P1P3 Server IP: C SP: 6428 DP: 9157 SP: 9157 DP: 6428 SP: 6428 DP: 5775 SP: 5775 DP: 6428 • Same socket (6428) at server for both clients in this example • DP specifies the process to which data should be delivered at the Receiver • SP specifies the process from which data is coming, for the specified source IP address; acts like a return address for replies/responses if required to be sent back
  • 10. Connection-oriented Demultiplexing (TCP) • TCP socket identified by 4- tuple: – source IP address – source port number – destination IP address – destination port number • Receiver host uses all four values to direct segment to appropriate socket; socket is uniquely identified by 4- tuple (S-IP, SP, D-IP, DP) • Server host may support many simultaneous TCP sockets: – each socket identified by its own 4-tuple • Web servers have different sockets for each connecting client – non-persistent HTTP will have different socket for each request
  • 11. Connection-oriented Demultiplexing (TCP) Client IP:B P1 Client IP: A P1P2P4 Server IP: C SP: 9157 DP: 80 SP: 9157 DP: 80 P5 P6 P3 D-IP:C S-IP: A D-IP:C S-IP: B SP: 5775 DP: 80 D-IP:C S-IP: B • This is a Web Server example as the segments are being sent to Port 80 of the server which corresponds to the HTTP Service • Note that in this case, the server is creating a separate process for each of the sockets. This would be inefficient (see next slide for a more efficient example with “threading”)
  • 12. Connection-oriented Demultiplexing (TCP) Threaded Web Server Client IP:B P1 Client IP: A P1P2 Server IP: C SP: 9157 DP: 80 SP: 9157 DP: 80 P4 P3 D-IP:C S-IP: A D-IP:C S-IP: B SP: 5775 DP: 80 D-IP:C S-IP: B • This is also a Web Server example as the segments are being sent to Port 80 of the server which corresponds to the HTTP Service • Note that in this case, the server is creating one process for all the sockets. A new thread (kind of like a sub-process) is created for each socket
  • 13. UDP: User Datagram Protocol • “no frills,” “bare bones” Internet transport protocol • “best effort” service, UDP segments may be: – lost – delivered out of order to application • Connectionless: – no handshaking between UDP sender, receiver – each UDP segment handled independently of others Why have UDP? • No connection establishment (which can add delay) • Simple: no connection state information kept at sender or receiver • Small Segment Header • No Congestion Control: UDP can transmit as fast as it can
  • 14. UDP: User Datagram Protocol • Commonly used for streaming multimedia applications which tend to be loss tolerant but rate sensitive • UDP also used for DNS and SNMP • For reliable transfer over UDP one must add reliability at the level of the application layer, e.g. application-specific error recovery! Source Port # Dest Port # 32 bits Application Data (Message) UDP Segment Format Length Checksum Length, in bytes of UDP segment, including header UDP Checksum: Standard Internet Checksum added by the sender. Used by the receiver to check for bit errors. (See next slide)
  • 15. UDP Checksum Calculation • UDP checksum covers pseudoheader followed by UDP datagram • IP addresses included to detect against misdelivery • Receiver recalculates the checksum and silently discards the datagram if errors detected (i.e. no error message generated) • Using UDP checksums is optional but hosts are required to have checksums enabled 0 0 0 0 0 0 0 0 Protocol = 17 UDP Length Source IP Address Destination IP Address 0 8 16 31 UDP Pseudoheader (used in checksum calculation but never actually transmitted, nor is it included in the “Length”) Note that IP Address information will come from another layer (Network Layer). Strictly speaking, this goes against the philosophy of keeping the layers separate from each other.
  • 16. UDP Destination Port Usage Port 1 Port 2 Port 3 UDP Demultiplexing (based on destination port #) IP Layer Arrival of UDP Datagram Datagram demultiplexed to its appropriate port Error Message sent back if the Dest. Port # indicated in the datagram does not exist!
  • 17. UDP Port Numbers Well Known Port Numbers Dynamically Assigned Port Numbers Universally assigned and accepted port #s providing some designated service. Typically, lower port numbers used for this Examples - 37 Time 53 Domain Name Server 67 DHCP Server 68 DHCP Client • Ports are not globally known • When a program needs a port, it asks for and gets one from the network software • Destination m/c needs to be queried to find the port number at which it may be offering the service to be accessed • Typically higher port numbers used for this
  • 18. TCP: Transmission Control Protocol • full duplex data: – bi-directional data flow in same connection – MSS: maximum segment size • connection-oriented: – handshaking (exchange of control msgs) initializes sender, receiver state before data exchange • flow controlled: – sender will not overwhelm receiver • point-to-point: – one sender, one receiver • reliable, in-order byte steam: – no “message boundaries” inside! • pipelined: – TCP congestion and flow control set window size • send & receive buffers socket door TCP send buffer TCP receive buffer socket door segment application writes data application reads data Important to remember though that a TCP stream is unstructured, i.e. no boundary marks in the stream itself so application would have to create such boundary marks if needed (e.g. separating different fields)
  • 19. TCP Segment Format Each TCP segment has header of 20 or more bytes + 0 or more bytes of data Source Port Destination Port Sequence Number Acknowledgment Number Checksum Urgent Pointer Options Padding 0 4 10 16 24 31 U R G A C K P S H R S T S Y N F I N Header Length Reserved Window Size Data Header
  • 20. TCP Header Port Numbers • A socket identifies a connection endpoint – IP address + port • A connection specified by a socket pair • Well-known ports – FTP 20 – Telnet 23 – DNS 53 – HTTP 80 Sequence Number • Byte count • First byte in segment • 32 bits long • 0  SN  232-1 • Initial sequence number selected during connection setup
  • 21. TCP Header Acknowledgement Number • SN of next byte expected by receiver • Acknowledges that all prior bytes in stream have been received correctly • Valid if ACK flag is set Header length • 4 bits • Length of header in multiples of 32-bit words • Minimum header length is 20 bytes • Maximum header length is 60 bytes
  • 22. TCP Header Reserved • 6 bits Control • 6 bits • URG: urgent pointer flag – Urgent message end = SN + urgent pointer • ACK: ACK packet flag • PSH: override TCP buffering • RST: reset connection – Upon receipt of RST, connection is terminated and application layer notified • SYN: establish connection • FIN: close connection
  • 23. TCP Header Window Size • 16 bits to advertise window size • Used for flow control • Sender will accept bytes with SN from ACK to ACK + window • Maximum window size is 65535 bytes TCP Checksum • Internet checksum method • Computed over TCP pseudo header + TCP segment (header+ data) (See next slide for TCP pseudo header)
  • 24. TCP Pseudo Header (for checksum calculation) 0 0 0 0 0 0 0 0 Protocol = 6 TCP Segment Length Source IP address Destination IP address 0 8 16 31  Used in checksum calculation but never actually transmitted, nor is it included in the “Length” Usage similar to that of the UDP Pseudoheader
  • 25. TCP Header Options • Variable length • NOP (No Operation) option is used to pad TCP header to multiple of 32 bits • Time stamp option is used for round trip measurements Options • Maximum Segment Size (MSS) option specifices largest segment a receiver wants to receive • Window Scale option increases TCP window from 16 to 32 bits
  • 26. TCP Services • Provides a full duplex connection-oriented and reliable byte- stream service using a sliding-window flow control. • User data are broken into segments not exceeding 64 kbytes (usually about 1500 bytes) and sent to the destination by encapsulating them in IP datagrams – IP provides unreliable packet delivery – packets can get lost, duplicated or delivered out of sequence • Receiver sends an acknowledgment back after receiving a segment. • Retransmission of segment if necessary
  • 27. TCP Services buffer segments buffer used Application Transport advertised window size < B buffer available = B Application buffer segments buffer Application Transport ACKS RTT Estimation Application
  • 28. TCP Round Trip Time and Timeout How to set TCP timeout value? • Must be set longer than the RTT, but the RTT also varies • If it is set too short, the premature timeout may happen leading to unnecessary retransmissions • If it is set too long then the response to segment loss will be too slow.
  • 29. TCP Round Trip Time and Timeout EstimatedRTT = (1- )*EstimatedRTT + *SampleRTT  Exponential weighted moving average  influence of past sample decreases exponentially fast  typical value:  = 0.125 How is the RTT estimated? • SampleRTT = measured time from segment transmission until ACK for that is received, ignoring retransmissions • SampleRTT will fluctuate but we want the estimated RTT to be “smoother”. This is done by taking a moving average EstimatedRTT over recent measurements – should not just use the current SampleRTT
  • 30. RTT: gaia.cs.umass.edu to fantasia.eurecom.fr 100 150 200 250 300 350 1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106 time (seconnds) RTT(milliseconds) SampleRTT Estimated RTT Example Measurments (SampleRTT and EstimatedRTT)
  • 31. TCP Round Trip Time and Timeout Setting the timeout • EstimtedRTT plus “safety margin” – large variation in EstimatedRTT -> larger safety margin • First estimate of how much SampleRTT deviates from EstimatedRTT: TimeoutInterval = EstimatedRTT + 4*DevRTT DevRTT = (1-)*DevRTT + *|SampleRTT-EstimatedRTT| (typically,  = 0.25) • Then set the timeout interval as -
  • 32. TCP Connection Establishment • Three-Way Handshake – A sends a SYN segment specifying the port number of the other party B , the initial sequence number (ISN) that A will use and other info (eg. max. segment size) – B responds with its own SYN segment containing its ISN. B also acknowledges A’s SYN by ACKing A’s ISN plus one – A acknowledges B’s SYN by ACKing B’s ISN plus one • Initial Sequence Number (ISN) may be randomly chosen but with some important considerations
  • 33. Initial Sequence Number (ISN) • Select initial sequence numbers (ISN) to protect against segments from prior connections (that may circulate in the network and arrive at a much later time) • Select ISN to avoid overlap with sequence numbers of prior connections • Use local clock to select ISN sequence number • Time for clock to go through a full cycle should be greater than the maximum lifetime of a segment (MSL); Typically MSL=120 seconds • High bandwidth connections pose a problem
  • 34. Three Way Handshake (TCP Connection Setup) Host A Host B Protects the ISN against responding falsely to old segments from prior connections
  • 35. Maximum Segment Size • Maximum Segment Size (MSS) - largest block of data that TCP sends to other end • Each end can announce its MSS during connection establishment • Default is 576 bytes including 20 bytes for IP header and 20 bytes for TCP header • Slight difference between the MSS of Ethernet and IEEE 802.3. Ethernet MSS = 1460 bytes IEEE 802.3 MSS = 1452 bytes
  • 36. TCP Window Flow Control Host A Host B t1 t2 t3 t4 t0 Win =Advertised Window size 128 bytes to transmit Only 512 bytes sent as that is the advertised value of Win 1024 bytes to transmit 1024 bytes to transmit 1024 bytes to transmit 1024 bytes to transmit
  • 37. Nagle Algorithm • Situation: User types one character at a time – Transmitter sends TCP segment per character (41B) – Receiver sends ACK (40B) – Receiver echoes received character (41B) – Transmitter ACKs echo (40 B) – 162 bytes transmitted to transfer one character! Problem! • Solution: – TCP sends data & waits for ACK – New characters buffered – Send new characters when ACK arrives – Algorithm adjusts to RTT as follows - • Short RTT send frequently at low efficiency • Long RTT send less frequently at greater efficiency
  • 38. Silly Window Syndrome • Situation: – Transmitter sends large amount of data – Receiver’s buffer is depleted slowly, so buffer fills up – Every time a few bytes read from buffer, a new advertisement to transmitter is generated – Sender immediately sends data & fills buffer – This leads to many small, inefficient segments being transmitted • Solution: – Receiver does not advertize window until window is at least ½ of receiver buffer or is equal to the maximum segment size (MSS) – Transmitter refrains from sending small segments
  • 39. Sequence Number Wraparound (Potential problem at high data rates) • 232 = 4.29x109 bytes = 34.3x109 bits (TCP has 32-bit seq. no.) Therefore, at 1 Gbps, sequence numbers will wraparound in just 34.3 seconds transmitter can only transmit for very brief periods Solution: Use Timestamp Option in TCP option field. Transmitter inserts 32-byte timestamp in transmitted segment. Receiver echoes this in ACK. This option must be requested in the SYN segment and is negotiated during the Connection Setup. – Timestamp + sequence no → 64-bit seq. no (effectively a much larger sequence number than the original 32-bit) – Timestamp clock must: • Tick forward at least once every 231 bits • Not complete cycle in less than one MSL • Example: clock tick every 1 ms @ 8 Tbps wraps around in 25 days
  • 40. Delay-BW Product & Advertised Window Size • Suppose RTT=100 ms, R=2.4 Gbps then – No. of bits in pipe = 3 Mbytes • If a single TCP process occupies the pipe, then required advertised window size is RTT x Bit rate = 3 Mbytes • But, normal maximum window size is only 65535 bytes which clearly is inadequately small • Solution: Use the “Window Scale Option” which will allow the window to be scaled upward by a factor of 214 . Then a Window Size up to 65535 x 214 = 1 Gbyte will be allowed. This window scaling option must be requested in the SYN segment and is negotiated during the Connection Setup.
  • 41. (Graceful Close) Host B still delivers 150 bytes Host A Host B Closing a TCP Connection Host A initiates the TCP connection termination, sends its FIN Host B sends ACK but does not yet send its own FIN Host B now sends its own FIN Host A ACKs B’s FIN closing its side of the connection Host B gets A’s ACK and closes its side of the connection After sending FIN, Host A cannot send any more data but cannot close the connection as B may still be sending something
  • 42. TIME_WAIT state TIME_WAIT State is entered if the host sending a FIN (e.g. Host A in previous slide) receives an ACK from the other side  This protects future incarnations of connection from delayed segments  TIME_WAIT = 2 x MSL Maximum Segment Lifetime (MSL) is the maximum time that an IP packet packet can live in the network  Only valid segment that can arrive while in TIME_WAIT state is a FIN retransmission. If such segment arrives, resent ACK & restart TIME_WAIT timer  When timer expires, close TCP connection
  • 43. TCP State Transition Diagram CLOSED LISTEN SYN_RCVD ESTABLISHED CLOSING TIME_WAIT SYN_SENT FIN_WAIT_1 CLOSE_WAIT LAST_ACK FIN_WAIT_2 passive open, create TCB application close, send FIN application close or timeout, delete TCB 2MSL timeout delete TCB receive SYN, send ACK Appli- cation close
  • 44. Congestion Control in TCP • Advertised window size is used to ensure that receiver’s buffer will not overflow • However, buffers at intermediate routers between source and destination may still overflow (i.e. because of network congestion) Router R bps Packetflowsmay comeinfrommany sources  Congestion occurs when total arrival rate from all packet flows exceeds R over a sustained period of time  When congestion occurs, buffers at routers will fill and packets will be lost
  • 45. Different Phases of Congestion Behavior 1. Light traffic – Arrival Rate << R – Low delay – Can accommodate more 2. Knee (congestion onset) – Arrival rate approaches R – Delay increases rapidly – Throughput begins to saturate 3. Congestion collapse – Arrival rate > R – Large delays, packet loss – Useful application throughput drops Throughput(bps)Delay(sec) R R Arrival Rate Arrival Rate
  • 46. Window Congestion Control • (From previous slide) Desired operating point will be just before knee as shown there. Sources must control their sending rates so that aggregate arrival rate is just before knee • TCP sender maintains a congestion window “cwnd” to control congestion at intermediate routers • Effective window is minimum of congestion window and advertised window • Problem: The source does not know what its “fair” share of available bandwidth should be and so does not know what value to set for cwnd • Solution: Adjust cwnd dynamically to available BW as follows – Sources probe the network by gradually increasing cwnd (Initially set cwnd to a low value) – When congestion detected, sources reduce rate – Ideally, sources’ sending rate will stabilize near ideal point
  • 47. Congestion Window How does the TCP congestion algorithm change congestion window dynamically according to the most up-to-date state of the network? • At light traffic: each segment is ACKed quickly – Increase cwnd aggresively • At knee: segment ACKs arrive, but more slowly – Slow down increase in cwnd • At congestion: segments encounter large delays (so retransmission timeouts occur); segments get dropped in router buffers – Reduce transmission rate, then probe again
  • 48. TCP Congestion Control: Slow Start Slow Start: Increase congestion window size by one segment upon receiving an ACK from receiver – initialized at  2 segments – used at (re)start of data transfer – congestion window increases exponentially ACK Seg RTTs 1 2 4 8 cwnd
  • 49. TCP Congestion Control: Congestion Avoidance • Algorithm progressively sets a congestion threshold – When cwnd > threshold, slow down rate at which cwnd is increased • Increase congestion window size by one segment per round-trip-time (RTT) – Each time an ACK arrives, cwnd is increased by 1/cwnd – In one RTT, cwnd segments are sent, so total increase in cwnd is cwnd x 1/cwnd = 1 – cwnd grows linearly with time RTTs 1 2 4 8 cwnd threshold
  • 50. TCP Congestion Control: Congestion • Congestion is detected upon timeout or receipt of duplicate ACKs • Assume current cwnd corresponds to available bandwidth • Adjust congestion threshold = ½ x current cwnd • Reset cwnd to 1 • Go back to slow-start • Over several cycles expect to converge to congestion threshold equal to about ½ the available bandwidth Congestionwindow 10 5 15 20 0 Round-trip times Slow start Congestion Avoidance Time-out Threshold
  • 51. Fast Retransmit & Fast Recovery • Congestion causes many segments to be dropped • If only a single segment is dropped, then subsequent segments trigger duplicate ACKs before timeout (as shown) • Can avoid large decrease in cwnd as follows: – When three duplicate ACKs arrive, retransmit lost segment immediately – Reset congestion threshold to ½ cwnd – Reset cwnd to congestion threshold + 3 to account for the three segments that triggered duplicate ACKs – Remain in congestion avoidance phase – However if timeout expires, reset cwnd to 1 – In absence of timeouts, cwnd will oscillate around optimal value SN=1 ACK=2 ACK=2 ACK=2 ACK=2 SN=2 SN=3 SN=4 SN=5
  • 52. TCP Congestion Control: Fast Retransmit & Fast Recovery Congestionwindow 10 5 15 20 0 Time (in units of RTT) Slow start Congestion avoidance Time-out Threshold