Set of rules governing communication between network elements (applications, hosts, routers).
Format and order of messages.
Actions taken on receipt of a message.
Protocols are hard to design
We need design guidelines!
Protocol stack Host Host Application Transport Network Link User A User B Teleconferencing Layering: technique to simplify complex systems Peers
Each layer relies on services from layer below and exports services to layer above.
Interface defines interaction,
Hides implementation - layers can change without disturbing other layers (black box).
OSI Model: 7 Protocol Layers
Physical: how to transmit bits
Data link: how to transmit frames
Network: how to route packets hop2hop
Transport: how to send packets end2end
Session: how to tie flows together
Presentation: byte ordering, security
Application: everything else!
Connection setup (handshaking)
Addressing/naming (locating peers)
Example: Transport layer
First end-to-end layer.
May provide reliability, flow and congestion control.
Example: Network Layer
Network and host addressing.
Interconnection of 2 or more networks forming an internetwork, or internet.
LANs, MANs, and WANs.
Different networks mean different protocols.
TCP/IP, IBM’s SNA, DEC’s DECnet, ATM, Novell and AppleTalk.
TCP/IP is the most widely used internetworking protocol suite
Initially funded through ARPA.
Picked up by NSF.
Used in the Internet.
Other internetworking protocols exist but are less used
Example: AppleTalk, X.25, etc.
The Internet Protocol: IP
Glues Internet together.
Common network-layer protocol spoken by all Internet participating networks.
Best effort datagram service:
No reliability guarantees.
No ordering guarantees.
IP is responsible for datagram routing.
Important : each datagram is routed independently!
Two different datagrams from same source to same destination can take different routes!
IP provides a best effort delivery mechanism
Does not guarantee to prevent duplicate datagrams, delayed and out-of-order delivery, corruption of data or datagram loss
Reliable delivery is provided by the transport layer , not the network layer (IP)
Network layer (IP) can detect and report errors without actually fixing them
The Internet Protocol Router Router Host Host Application Transport Network IP IP IP IP Network
Transport layer breaks data streams into datagrams which are transmitted over Internet, possibly being fragmented.
When all datagram fragments arrive at destination, reassembled by network layer and delivered to transport layer at destination host.
IP Datagram Format
IP datagram consists of header and data (or payload).
20-byte fixed (mandatory) part.
Variable length optional part.
IPv4: IP version 4.
Current, predominant version.
32-bit long addresses.
IPv6: IP version 6.
Evolution of IPv4.
Longer addresses (16-byte long).
IP(v4) Header Format Header Payload
Each datagram is encapsulated within a data link layer frame
The whole datagram is placed in the data area of the frame.
The data link layer addresses for source and destination included in the frame header.
Encapsulation - Example
Encapsulation Across Multiple Hops
Each router in the path from source to destination:
Decapsulates datagram from incoming frame.
Forwards datagram - determines next hop.
Encapsulate datagram in outgoing frame.
Encapsulation Across Multiple Hops - Example
Maximum Transfer Unit
Each data link layer technology specifies the maximum size of a frame.
Called the Maximum Transfer Unit (MTU).
Ethernet: 1,500 bytes.
Token Ring: 2048 or 4096 bytes.
What happens when large packet wants to travel through network with smaller MTU?
Maximum payloads (data portion of datagram) range from 48 bytes (ATM cells) to 64Kbytes (IP packets).
Another solution (used by IP): fragmentation.
Gateways break packets into fragments to fit the network’s MTU ; each sent as separate datagram.
Gateway on the other side have to reassemble fragments into original datagram.
Keeping Track of Fragments
Fragments must be numbered so that original data stream can be reconstructed.
Define elementary fragment size that can pass through every network.
When packet fragmented, all pieces equal to elementary fragment size, except last one (may be smaller).
Datagram may contain several fragments.
Fragmentation - Example
One key aspect of internetworks is unique addresses.
Sending host puts destination internetworking address in the packet.
Destination addresses can be interpreted by any intermediate router/gateway.
Router/gateway examines address and forwards packet on to the destination.
Each machine on the Internet has a unique IP address.
The IP address is different from the “physical” /“MAC” address.
The “physical address” is the address of a computer (actually, of a NIC) in the LAN.
It is only know within the LAN.
The IP address is a universal address.
When a packet arrives in a LAN, there needs to be a conversion from IP to MAC address (local “ address resolution ”).
IP Addresses (cont’d)
An IP address is represented by a binary number with 32 bits (in IPv4).
Meaning that there are around 4 billion addresses.
Often IP addresses are represented in “dotted decimal”, such as 188.8.131.52.
Each group of numbers can go from 0 to 255.
IP Address Organization
Each IP address is divided into a prefix and a suffix
Prefix identifies network to which computers are attached.
Suffix identifies computers within that network.
Network and Host Numbers
Every network in a TCP/IP internet is assigned a unique network number.
Each host on a specific network is assigned a host address that is unique within that network.
Host’s IP address is the combination of the network number (prefix) and host address (suffix).
Assignment of network numbers must be coordinated globally; assignment of host addresses can be managed locally.
IP Address Format
IP address are 32 bits long.
There are different classes of addresses , corresponding to different subdivisions of the 32 bits into prefix and suffix.
Some address classes have large prefix , small suffix.
Many such networks, few hosts per network.
Other address classes have small prefix, large suffix.
Few such networks, many hosts per network.
IP Address Format (cont’d)
How can we recognize to which class an IP address belongs to?
Look at the first 4 bits!
IP Address Format (cont’d)
Class A, B and C are primary classes.
Used for ordinary addressing.
Class D is used for multicast , which is a limited form of broadcast.
Internet hosts join a multicast group.
Packets are delivered to all members of the group.
Routers manage delivery of single packets from source to all members of multicast group.
Class E is reserved.
IP Addresses (cont’d)
Another way to determine the address class is by looking at the first group of numbers in the dotted decimal notation
Networks and Hosts in Each Class
Understanding IP Addresses
10 . 0.0.37 (class A)
128.10 . 0.1 (class B)
192.5.48 . 3 (class C)
IP addresses: how to get one?
ICANN (Internet Corporation for Assigned Names and Numbers) coordinate IP address assignment.
How does host get its IP address in the network? 2 possibilities:
1: Hard-coded by system administrator in a file inside the host.
2: DHCP : “Dynamic Host Configuration Protocol”
Dynamically get address: “plug-and-play”.
DHCP allows a computer to join a new network and automatically obtain an IP address The network administrator establishes a pool of addresses for DHCP to assign.
When a computer boots, it broadcasts a DHCP request to which a server sends a DHCP reply.
DHCP allows non-mobile computers that run server software to be assigned a permanent address (won’t change when the computer reboots).
The permanent address actually needs to be re-negotiated after a certain period of time.
The Internet Transport Protocols: TCP and UDP
UDP: user datagram protocol (RFC 768).
TCP: transmission control protocol (RFCs 793, 1122, 1323).
Provides connection-less, unreliable service.
No delivery guarantees.
No ordering guarantees.
No duplicate detection.
No connection establishment/teardown.
Suitable for short-lived connections.
Example: client-server applications.
Reliable end-to-end communication.
TCP transport entity:
Runs on machine that supports TCP.
Interfaces to the IP layer.
Manages TCP streams.
Accepts user data, breaks it down and sends it as separate IP datagrams.
At receiver, reconstructs original byte stream from IP datagrams.
Timeouts and retransmissions.
TCP Service Model 1
Obtained by creating TCP end points.
Example: UNIX sockets.
Socket number or address: IP address + 16-bit port number (TSAP).
Multiple connections can terminate at same socket.
Connections identified by socket ids at both ends.
Port numbers below 1024: well-known ports reserved for standard services.
List of well-known ports in RFC 1700.
TCP Service Model 2
TCP connections are full-duplex and point-to-point.
Byte stream (not message stream).
Message boundaries are not preserved e2e.
A B C D 4 512-byte segments sent as separate IP datagrams A B C D 2048 bytes of data delivered to application in single READ
TCP Byte Stream
When application passes data to TCP, it may send it immediately or buffer it.
Sometimes application wants to send data immediately.
Example: interactive applications.
Use PUSH flag to force transmission.
TCP could still bundle PUSH data together (e.g., if it cannot transmit it right away).
Also forces TCP to transmit at once.
TCP Protocol Overview 1
TCP’s TPDU: segment.
20-byte header + options.
TCP entity decides the size of segment.
2 limits: 64KByte IP payload and MTU.
Segments that are too large are fragmented.
More overhead by addition of IP header.
TCP Protocol Overview 2
Reliability, ordering, and flow control.
Assigned to every byte .
32-bit sequence numbers.
TCP Connection Setup
Host 1 Host 2 SYN (SEQ=x) SYN(SEQ=y,ACK=x+1) (SEQ=x+1, ACK=y+1)
TCP Connection Release 1
May cause data loss.
TCP Connection Release 2
Each side of the connection released independently.
Either side send TCP segment with FIN=1.
When FIN acknowledged, that direction is shut down for data.
Connection released when both sides shut down.
4 segments: 1 FIN and 1 ACK for each direction; 1st. ACK+2nd. FIN combined.
TCP Connection Release 3
Timers to avoid 2-army problem.
If response to FIN not received within 2*MSL (maximum segment lifetime), FIN sender releases connection.
After connection released, TCP waits for 2*MSL (e.g., 120 sec) to ensure all old segments have aged.
Sender process initiates connection.
Once connection established, TCP can start sending data.
Sender writes bytes to TCP stream.
TCP sender breaks byte stream into segments.
Each byte assigned sequence number.
Segment sent and timer started.
TCP Transmission (cont’d)
If timer expires, retransmit segment.
After retransmitting segment for maximum number of times, assumes connection is dead and closes it.
If user aborts connection, sending TCP flushes its buffers and sends RESET segment.
Receiving TCP decides when to pass received data to upper layer.
TCP Flow Control
Receiver’s advertised window .
Size of advertised window related to receiver’s buffer space.
Sender can send data up to receiver’s advertised window.
TCP Flow Control: Example 2K;SEQ=0 ACK=2048; WIN=2048 2K; SEQ=2048 ACK=4096; WIN=0 ACK=4096; WIN=2048 1K; SEQ=4096 App. writes 2K of data 4K 2K 0 App. reads 2K of data 2K 1K App. does 3K write Sender blocked Sender may send up to 2K
TCP Flow Control: Observations
TCP sender not required to transmit data as soon as it comes in from application.
Example: when first 2KB of data comes in, could wait for more data since window is 4KB.
Receiver not required to send ACKs as soon as possible.
Wait for data so ACK is piggybacked.
Why do it at the transport layer?
Real fix to congestion is to slow down sender.
Use law of “conservation of packets”.
Keep number of packets in the network constant.
Don’t inject new packet until old one leaves.
Congestion indicator: packet loss.
TCP Congestion Control
Like, flow control, also window based.
Sender keeps congestion window (cwin) .
Each sender keeps 2 windows: receiver’s advertised window and congestion window.
Number of bytes that may be sent is min(advertised window, cwin).
TCP Congestion Control (cont’d)
Slow start [Jacobson 1988]:
Connection’s congestion window starts at 1 segment.
If segment ACKed before time out, cwin=cwin+1.
As ACKs come in, current cwin is increased by 1.
TCP Congestion Control (cont’d)
Third parameter: threshold .
Initially set to 64KB.
If timeout, threshold=cwin/2 and cwin=1.
Re-enters slow-start until cwin=threshold.
Then, cwin grows linearly until it reaches receiver’s advertised window.
TCP Congestion Control: Example threshold timeout threshold cwin time
TCP Retransmission Timer
When segment sent, retransmission timer starts.
If segment ACKed, timer stops.
If time out, segment retransmitted and timer starts again.
How to set timer?
Based on round-trip time: time between a segment is sent and ACK comes back.
If timer is too short, unnecessary retransmissions.
If timer is too long, long retransmission delay.
Jacobson’s Algorithm 1
Determining the round-trip time:
TCP keeps RTT variable.
When segment sent, TCP measures how long it takes to get ACK back ( M ).
RTT = alpha*RTT + (1-alpha)M.
alpha: smoothing factor; determines weight given to previous estimate.
Jacobson’s Algorithm 2
Determining timeout value:
Measure RTT variation, or |RTT-M|.
Keeps smoothed value of cumulative variation D=alpha*D+(1-alpha)|RTT-M|.
Alpha may or may not be the same as value used to smooth RTT.
Timeout = RTT+4*D.
Client-Server Model Client Kernel File Server Kernel Printer Server Kernel
Sharing remote files: “on-line” access versus “file transfer”.