This document provides an overview of the TCP/IP protocol. It begins with an introduction to TCP and IP, explaining that TCP provides reliable, ordered delivery of data packets over the unreliable IP network layer. It then discusses key TCP concepts like the three-way handshake for connection establishment, ACK packets for reliability, and the sliding window mechanism for efficient data transfer through pipelining of packets. The document is intended to explain the core logic and functionality of the TCP protocol at a high level.
This presentation outlines the core functions of TCP - Transmission Control Protocol.
These comprise TCP Connection Control, TCP Flow Control, TCP Error Control, TCP Congestion Control, TCP Options and TCP Timers.
TCP/IP is the Internet core protocol that provides reliable, connection-oriented and stream-based communication service. Most of Internet traffic is carried in TCP connections, so scalability and reliability are crucial for a stable network on a global scale.
The Transmission Control Protocol (TCP) is used by the vast majority of applications to transport their data reliably across the Internet and in the cloud. TCP was designed in the 1970s and has slowly evolved since then. Today's networks are multipath: mobile devices have multiple wireless interfaces, datacenters have many redundant paths between servers, and multihoming has become the norm for big server farms. Meanwhile, TCP is essentially a single-path protocol: when a TCP connection is established, the connection is bound to the IP addresses of the two communicating hosts and these cannot change. Multipath TCP (MPTCP) is a major modification to TCP that allows multiple paths to be used simultaneously by a single transport connection. Multipath TCP circumvents the issues mentioned above and several others that affect TCP. The IETF is currently finalising the Multipath TCP RFC and an implementation in the Linux kernel is available today.
This tutorial will present in details the design of Multipath TCP and the role that it could play in cloud environments. We will start with a presentation of the current Internet landscape and explain how various types of middleboxes have influenced the design of Multipath TCP. Second we will describe in details the connection establishment and release procedures as well as the data transfer mechanisms that are specific to Multipath TCP. We will then discuss several use cases for the deployment of Multipath TCP including improving the performance of datacenters and
mobile WiFi offloading on smartphones. All these use cases are key when both accessing cloud-based services or when providing them. We will end the tutorial with some open research issues.
This tutorial was given at the IEEE Cloud'Net 2012 conference in novembrer 2012.
The pptx version containing animations that are not shown here is available from http://www.multipath-tcp.org
The transport provider is the entity that provides the services of the Transport Interface, and the transport user is the entity that requires these services. The objective of this presentation is to aware the students about the field of socket programming in UNIX. This presentation is useful for B.Tech(IT) 6th semester students as well as the students of networking and programming.
transport protocols,unreliable message delivery service,goals for todays lecture,role of transport layer,internet transport protocols,sequence numbers,conclusion
This presentation outlines the core functions of TCP - Transmission Control Protocol.
These comprise TCP Connection Control, TCP Flow Control, TCP Error Control, TCP Congestion Control, TCP Options and TCP Timers.
TCP/IP is the Internet core protocol that provides reliable, connection-oriented and stream-based communication service. Most of Internet traffic is carried in TCP connections, so scalability and reliability are crucial for a stable network on a global scale.
The Transmission Control Protocol (TCP) is used by the vast majority of applications to transport their data reliably across the Internet and in the cloud. TCP was designed in the 1970s and has slowly evolved since then. Today's networks are multipath: mobile devices have multiple wireless interfaces, datacenters have many redundant paths between servers, and multihoming has become the norm for big server farms. Meanwhile, TCP is essentially a single-path protocol: when a TCP connection is established, the connection is bound to the IP addresses of the two communicating hosts and these cannot change. Multipath TCP (MPTCP) is a major modification to TCP that allows multiple paths to be used simultaneously by a single transport connection. Multipath TCP circumvents the issues mentioned above and several others that affect TCP. The IETF is currently finalising the Multipath TCP RFC and an implementation in the Linux kernel is available today.
This tutorial will present in details the design of Multipath TCP and the role that it could play in cloud environments. We will start with a presentation of the current Internet landscape and explain how various types of middleboxes have influenced the design of Multipath TCP. Second we will describe in details the connection establishment and release procedures as well as the data transfer mechanisms that are specific to Multipath TCP. We will then discuss several use cases for the deployment of Multipath TCP including improving the performance of datacenters and
mobile WiFi offloading on smartphones. All these use cases are key when both accessing cloud-based services or when providing them. We will end the tutorial with some open research issues.
This tutorial was given at the IEEE Cloud'Net 2012 conference in novembrer 2012.
The pptx version containing animations that are not shown here is available from http://www.multipath-tcp.org
The transport provider is the entity that provides the services of the Transport Interface, and the transport user is the entity that requires these services. The objective of this presentation is to aware the students about the field of socket programming in UNIX. This presentation is useful for B.Tech(IT) 6th semester students as well as the students of networking and programming.
transport protocols,unreliable message delivery service,goals for todays lecture,role of transport layer,internet transport protocols,sequence numbers,conclusion
If the number of spine switches were to be merely doubled, the effect of a single switch failure is halved. With 8 spine switches, the effect of a single switch failure only causes a 12% reduction in available bandwidth. So, in modern data centers, people build networks with anywhere from 4 to 32 spine switches. With a leaf-spine network, every server on the network is exactly the same distance away from all other servers – three port hops, to be precise. The benefit of this architecture is that you can just add more spines and leaves as you expand the cluster and you don't have to do any recabling. Intuition Systems will also get more predictable latency between the nodes.
As a trend, disaggregation seems to be most useful for very large companies like Facebook and Google, or cloud providers. The technology does not necessarily have significant implications for small or medium sized businesses. Historically, however, technology has a way of trickling down from the pioneering phases of existing only within large companies with tremendous resources, to becoming more standardized across the board.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024Neo4j
Neha Bajwa, Vice President of Product Marketing, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Communications Mining Series - Zero to Hero - Session 1DianaGray10
This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered:
• Communication Mining Overview
• Why is it important?
• How can it help today’s business and the benefits
• Phases in Communication Mining
• Demo on Platform overview
• Q/A
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
2. Hi
Julien PAULI
Programming in PHP since early 2000s
PHP Internals hacker and trainer
PHP 5.5/5.6 Release Manager
Working at SensioLabs in Paris - Blackfire
Writing PHP tech articles and books
http://phpinternalsbook.com
@julienpauli - github.com/jpauli - jpauli@php.net
Like working on OSS such as PHP :-)
3. TOC
We can't explain "the network" , in an hour or
so (sorry)
4. TOC
We can't explain the network , in an hour
Even if that latter is not that hard to understand
In fact, it is easy to spot it in its whole, but long
6. TOC (the real one)
Reminder on OSI
Brief presentation of IP : Internet Protocol
Deep dive into TCP
Conclusion
7. TOC (detailed)
Deep dive into TCP
The need of reliability on top of IP
ARQ: Automatic Repeat reQuest and acknowledgments
TCP header overview
Pipelined reliable Service model
Connected model and FSM
Data Flow, segmentation, MSS and sliding Window
mechanism
Congestion control and Avoidance algorithms
Tuning TCP on Linux
Wireshark examples
9. Everything is under control
Every concept is
Logical (common sense)
Maths oriented
Fully understandable (engineer capabilities assumed)
TCP is not that hard, it is purely logical
TCP is not that huge, it mixes several concepts
TCP is a protocol, a binary language talked
between several hosts
TCP is technically really beautiful
10. Mantra
TCP/IP runs the Internet since 1970, aka 90% of it
(very huge part of it)
TCP runs over another protocol, usually over IP
TCP/IP used to be one layer, and one protocol
They got separated to respect OSI
IP is layer 3 , TCP is layer 4 : they are used together
TCP/IP understanding helps a lot in designing other
(Layer 7) protocols
A simple TCP implementation requires only several
hundreds LOC (C lang)
15. Vocabulary (only)
Every layer adds its infos
Every layer carries some data
But we used to give different wordings to them
Frame
Datagram
Segment (also packet)
payload, packet, … ?
17. IP
Internet protocol
Sends a datagram from place A to place B
A could be several thousands Km far from B
Datagram will be routed, through routers (packet-
switched network)
from 1 router, to several dozens, hundreds
Datagram may or may not arrive at destination
Datagrams may arrive in different order
"Best effort" protocol
Do what you can to route info, as fast as possible
Do not bother to "check" anything, just route, and go
19. IP Forward
Routers analyze the destination address field
They route the IP datagram on connected
networks
Datagram may get destroyed (router overload)
Datagram may get lost (iterconnexion problem)
Datagram may get altered, falsified (noise)
Datagram may arrive good, but the host is
down, or cant process it
29. IP notes
IP is a best effort protocol
Many scenarios may be part of a failure
Datagrams can be destroyed / lost
Datagrams can be misordered
1234 ? 1 34
30. Remember
IP is not reliable
IP is a network protocol : it does what it can to route
packets from A to B , that's all
IP offers no delivery garanty
IP route traffic but doesnt route service
When a datagram reaches a host, which
process should handle it ?
32. UDP
Basically, UDP adds ports to IP
It then allows multiplexing data streams
When a datagram arrives, UDP layer processes
it and distributes it to the right host process by
reading the destination port
33. UDP
UDP is light : only 8 bytes overhead per
message
UDP does not provide
Reliability
Error correction mechanisms
Connected service
35. TCP is more complex than UDP
Transmission Control Protocol
That means : control the transmission :-p
TCP is order of magnitudes more complex than
UDP
That's not hard to do given the simplicity of UDP
TCP brings a lot of intelligence
UDP has none at all
36. TCP overview
TCP runs on top of IP
TCP provides
Full duplex connection oriented data exchanges
Bytes can go in the 2 ways - Senders may be receivers
PointToPoint connection
No broadcast , no multicast , only 2 points
Service multiplexing
Through ports
Stream data transfer
No need to prepare the data to a specific form before sending
37. TCP overview
… TCP provides
Reliability
data sent will be checked and ACKed
Order
data will be treated in the same order its been sent
Flow control
data will arrive whatever the processing speed at either
side*
Congestion avoidance
data will arrive whatever the middleware network load*
* : assuming L3 is not down
38. TCP header
Variable length :
20 bytes minimum
60 bytes maximum (with all options field)
40. TCP header overview
Very verbose, embeds many informations
Min 20 bytes, min 2.5x > UDP header
Carries ports, like UDP (con multiplexing)
Carries a checksum, like UDP
Carries Sequence number and ACK number
The most important things in TCP
45. ARQ
Automatic Repeat reQuest
Sender re-sends the exact same message
… and still waits until ACK
1 1
ack 1
1 1
ack 1ack 1
46. TCP :-)
You just understood the very main baseline
behind TCP
TCP is a connected protocol
sender and receiver keep communicating
with each other
receiver ACK messages received from sender
48. TCP connection
Before exchanging data, hosts must handshake
Like in phone conversation
Sen ->Rec "Hello ?"
Rec ->Sen "Hello !"
Sen ->Rec "Hi, right, let's start talking now"
This is called the 3 way handshake
Because it effectively requires 3 message exchanges
Before any data exchange
50. Communication negotiation
When in SYN, hosts will also exchange protocol
details
Those are stuck in the options of the header
"Do you support feature XXX" ?
"Yes I do, we will use it"
"No I dont, never use it for this communication"
options
51. TCP options negociation
MSS is mandatory (IP MTU negociation)
Prevents hawful IP fragmentation (end-to-end only)
Window Scale factor is used as of 2017
"SACK" is really commonly used
52. Let's disconnect
We are now connected
Disconnection is also a matter of message
exchanges
But remember the full-duplex bi-directionnal
feature of TCP
Connection must be closed in both ways
From A to B
And from B to A
(Or the opposite : it doesn't matter)
53. TCP full closing
One host (A or B) wishes to disconnect
FIN - ACK and FIN - ACK
FIN
ACK
FIN
active closer
passive closer
ACK
FIN bitFIN bit
54. TCP half-closing
Connection could be half closed
Host A cant send any data to host B anymore
But host B still can send data to host A
Until host B initiates a FIN from its side
This is uncommon , but still can happen
close() : full close
shutdown() : half close
The host closing the connection will free
resources on its side
It may then be interesting in some scenarios
55. TCP half-closing
One host (A or B) wishes to disconnect
Sends a FIN
Waits for the corresponding ACK
A can't send anymore data to B now
But will still ACK it !
But B still can send data to A
Until itself sends a FIN , and receives the ACK
FIN
ACK
A B
57. Remember
A TCP connection is uniquely identified by the
quadruplet
Source IP address
Source Port
Destination IP address
Destination Port
Those 4 infos together, are unique quadruplet
A TCP connection needs a preamble handshake
A TCP connection needs a preamble close dialogue
to close
And it may be "half closed"
58. Remember
A TCP connection is between 2 and only 2 hosts
One can be seen as the "client"
The other as the "server"
This is purely conceptual
I prefer talking about "source" and "destination"
Or "host A" and "host B"
client/server , is what will be done at layer 7, with TCP
Connection will run in both directions (full duplex)
Any part, can initiate a close, whenever it wants
64. RTT
Round Trip Time - The time for a packet
To be generated
Create the segment
Compute the checksum
Add the header to the segment
To be sent - routed
Router1 → Router 2 → Router 3 - - - - - - → Router 42 - - - - ?
To be received by destination
NIC will interrupt CPU
To be treated by destination
Destination will extract header
Check the checksum
Done
68. Sliding Window
Don't send just one packet at a time , but
several !
Wait until ACKed
Re-send if not
Go further
This is the crucial concept of TCP sliding
Window
69. Multiple packets on the way
TCP sliding window
1234
ack 1 ack 2 ack 3 ack 4
70. TCP Sliding Windows Demo
The best link about Sliding Window :
http://www2.rad.com/networks/2004/sliding_win
dow/
73. Seq numbers and ACK
TCP is a stream oriented protocol
Layer 7 send() to a TCP socket some bytes
TCP shrinks those bytes to some segment sizes
The current segment size depends on
MSS announced in SYN phase
Total data volume to transfer
TCP starts sending segments
74. Seq numbers
The Sequence number indicates where we are
in the sent stream from our sender view of
the connection
The ACK number indicates what we expect
next from the receiver view of the
connection
We always ACK bytes, not segment
numbers
We can use cumulative ACK
76. Example
A got 750 bytes to send to B
We'll assume A and B have connected
successfully
Let's say segment size is 100 bytes
Let's say window size is 500 bytes
A B
750 bytes
77. Seq and ACK in byte stream
A receives from B
ACK = 101 meaning B received bytes 1-100 and now expects 101
ACK = 201 meaning B received bytes 1-200 and now expects 201
ACK = 301 meaning B received bytes 1-300 and now expects 301
A received from B ack = 301 so far
That means that A received contiguously up to byte 300
That means that A is expecting byte 301 now
seq = 1
len = 100
seq = 101
len = 100
seq = 201
len = 100
seq = 301
len = 100
seq = 401
len = 100
ack = 101ack = 201 ack = 301
78. Seq and ACK in byte stream
ack = 101
seq = 301
len = 100
seq = 401
len = 100
ack = 401
seq = 501
len = 100
seq = 601
len = 100
ack = 201 ack = 301
Host A retransmits missing ACK segments
Host A got as last ACK=601 from B
ack = 501 ack = 601
79. Seq ACK
Remember ACK show what host expects next
Here, B received up until 601, but not 601
ack = 601ack = 401ack = 501
80. Seq ACK
seq = 601
len = 100
seq = 701
len = 50
ack = 701
seq = 701
len = 50
ack = 751
ack = 701
81. Done !
We just transfered bytes with TCP
We managed to cope with lost segments
Remember TCP pairs keep communicating
together
Sender retransmit what receiver missed
83. Selective ACK (SACK)
ACK system is nice but got a drawback
By design, TCP resends every segments from
the point which has been lost
TCP does not ACK each received segment
But TCP ACKS the most recent contiguous
received bytes
SACK is about ACKing segments
To better detect holes, and only resend lost segments
84. TCP default ACK drawbacks
The answer from B to A is 101 followed by 3 times 201
That means that B has received the first 2 segments OK
And that B received a total of 4 segments
Where are the holes ? Which segments have been lost ?
Segment 201 ? 301 ? 401 ?
By default ; TCP must resend from 201 to 501 (3 segments)
seq = 1
len = 100
seq = 101
len = 100
seq = 201
len = 100
seq = 301
len = 100
seq = 401
len = 100
ack = 101ack = 201 ack = 201ack = 201
85. SACK in a word
SACK is an option that allows ACKs to signal a
range of noncontiguous data received so
far
The sender then has a better idea of which
segments have been received by the receiver,
or not
SACK is a TCP Option (RFC 2018)
That must be negociated in the handshake
Both A and B must support it to use it
86. SACK example
seq = 1
len = 100
seq = 101
len = 100
seq = 201
len = 100
seq = 301
len = 100
seq = 401
len = 100
ack = 101ack = 201 ack = 201
sack = 301
ack = 201
sack = 301-401
seq = 201
len = 100
ack = 501
88. Sliding Window size
The Window size is how many bytes the
receiving part is able to proceed
This is typically the side of its buffer
It is also how maximum bytes can be in flight
seq = 645
len = 1000
seq = 1645
len = 1000
2000 bytes in flight
89. Sliding Window size
When host B receives those, it sticks the
segments together in a buffered stream
It effectively recreates the sent stream on its side
TCP pushes up the bytes to layer 7
Now, B layer 7 app must receive that buffer
recv()
tcp buffertcp buffer
layer 7 buffer
90. Sliding Window size
What if B processes the bytes too slowly ?
By reading f.e 10 bytes only from the received stream
B's TCP receive buffer will saturate
Because B doesn't process it fast enough at layer 7
If A keeps sending, B won't be able to handle the
traffic
A congestion will appear
Resources will be wasted, as well as network bandwidth
Packets will be lost and need re-send
recv(10)
tcp buffertcp buffer
layer 7 buffer
91. Window size advertisement
The receiver advertises its window size to
the sender depending on its saturation
seq = 42
len = 1000
seq = 1042
len = 1500
seq = 2542
len = 1500
ack = 1042
win = 500
seq = 1042
len = 100
seq = 1142
len = 100
seq = 1242
len = 100
seq = 1342
len = 100
seq = 1442
len = 100
4000
bytes
500
bytes
92. Window size advertisement
The receiver advertizes its receiving capabilities
to the sender
ack = 1042
win = 500
ouch, please don't send me more than 500 bytes as of now !
Window size
93. Window 0
Eventually, the receiver will completely saturate
It will advsertise a 0 Window
ack = 1042
win = 0
Please, stop any transfer
98. Types of congestions
So far, we've seen how hosts can advertize their
capabilities
The receiving host advertises the sending host on
its receiving capabilities
By adjusting the sliding window size
That is, the number of bytes allowed in flight
This way, the sender will never saturate the receiver
But how to detect network (middle) congestion,
and what to do in such cases ?
99. The need to detect congestions
Remember that TCP is blind about IP traffic*
TCP must continuously guess the IP layer state from
PTP
TCP must measure and evaluate continuously
RTT
Dup ACKs, meaning packet lost
Advertized window size
And TCP must run specific algorithms in case of
congestion detection
*: congestion signals exist (IP ECN)
100. Network Congestion
123 2
Detecting network congestion is easy
Lost packets (dup ACKs)
TTL increasing
But getting congestion to smoothly resorb it,
without improving it, is a challenge
TCP must do its best to smoothly use the network
TCP must guess, and act on the unknown network
state
101. Network congestion avoidance
algorithms RFC 2001
TCP Slow Start
Congestion Avoidance mode (Reno, Tahoe, NewReno,
Westwood)
Fast Retransmit
Fast Recovery
Algorithms implemented in modern TCP stacks
Compute an "image" of the underlying network
By taking some measures
Act and adapt accordingly to reduce congestion
102. TCP Slow Start
When you start sending, send only one MSS, even
if the receiving window is larger
When receiving the ACK, send one more MSS
Even if the window is even larger
When receiving the 2 ACKs, send two more MSS
Even if the window is even larger
This is exponential growth
Etc, until congestion detected, or reaching a
special threshold called ssthresh
This prevents a new connection on the network
from suddenly flooding the link
103. TCP Slow Start (mss = 1000)
seq = 1
len = 500
seq = 501
len = 500
ack = 501
win = 5000
seq = 1001
len = 1000
seq = 2001
len = 1000
ack = 3001
win = 7000
seq = 3001
len = 1000
seq = 4001
len = 1000
seq = 5001
len = 1000
seq = 6001
len = 1000
ack = 1001
win = 5000
1000
bytes
2000
bytes
4000
bytes
104. Slow Start from RFC
Beginning transmission into a network with unknown
conditions requires TCP
to slowly probe the network to determine the available
capacity, in order to avoid
congesting the network with an inappropriately large burst of
data. The slow start
algorithm is used for this purpose at the beginning of a
transfer, or after repairing
loss detected by the retransmission timer
105. Congestion Avoidance algo
When network congestion is detected, drop down
the sender window
Even if the advertized recv window is larger
Usually, it is divided by two
Then, don't slow start, but use a smoother algo
Also after Slow Start, when reaching ssthresh
Only send one more MSS per ACK received (linear
growth)
Instead of doubling them each time (exponential growth)
106. TCP Slow Start and Congestion
avoidance
Start exponentialy (assume bandwidth is
available)
Continue linearly (assume full bandwidth will be
reached soon)
107. TCP congestion avoidance
We send 3000 bytes on the wire
We got a double ACK, meaning packets got lost
ACK = 2001 means packet 2001 has been lost
As we detect a packet loss, we will now send
half the amount of bytes : 1500
seq = 1
len = 1000
seq = 1001
len = 1000
ack = 1001
win = 5000
seq = 2001
len = 1000
ack = 2001
win = 5000
3000
bytes
ack = 2001
win = 5000
108. TCP congestion avoidance
Only one ACK, for two segments sent
Again : something got lost (ack 2501 is expected)
Even if we already divided the payload by two
We must again, divide it by two : 750 bytes
Look at the advertised window
It keeps growing, meaning receiver is up, and wants more
bytes
But the network keeps dropping them : it is congestionned
seq = 1001
len = 1000
ack = 2001
win = 7000
seq = 2001
len = 500
1500
bytes
109. TCP congestion avoidance
By sending smaller segments, TCP managed to make
the network deliver them
Remember
The more data in flight, the more network has to be strong
It has little to do with the number of packets in flight !
Routers have finite-length buffers
If exceeded : they start dropping
seq = 1001
len = 200
ack = 1201
win = 9000
ack = 1401
win = 9000
seq = 1201
len = 200
seq = 1401
len = 200
seq = 1551
len = 150
ack = 1701
win = 9000
750
bytes
110. Fast Retransmit
Sender analyzes duplicate ACK from receiver,
and tries to guess the « holes » in the stream
Eventually helped by SACK
It then retransmits the missing segments
immediately
Without waiting for its segment expiration timer
This is fast retransmit algorithm
111. Fast Retransmit
Here it is clear segment 301 is missing
and segment 401 is received
Don’t wait for the 301 segment timer to expire,
resend !
seq = 1
len = 100
ack = 101
seq = 101
len = 100
seq = 201
len = 100
seq = 301
len = 100
seq = 401
len = 100
ack = 201 ack = 301 ack = 301
112. Fast Retransmit
seq = 1
len = 100
ack = 101
seq = 101
len = 100
seq = 201
len = 100
seq = 301
len = 100
seq = 401
len = 100
ack = 201 ack = 301 ack = 301
seq = 301
len = 100
ack = 501
114. You can feel those algorithms,
everyday
You have already felt such algorithms
Share a L2 switch with a friend – both plugin
Plug in a route to Internet
You start a download. You get full bandwidth
Your friend (same L2) starts a download
… What happens to you ?
What happens to him ?
115. TCP is magic
TCP continuously tries to be fair and to share the
overall Layer 3 bandwidth between every hosts,
whatever it is
Every TCP connection will auto-regulate
Leaving equal traffic to siblings
Assuming no QOS rule
As the Layer3 link gets saturated, every Layer 4
connection slows down to not over saturate the link
As the layer3 link gets unsaturated, every
connection re-accelerate, until full bandwidth is
reached
116. Thank you TCP
Isn’t that terribly magic and fascinating protocol ?
Invented in 70's
Vint Cerf - Bob Kahn
Normalized in 1981
Still under development
Networks evolve, and are faster and faster
The need of newer algorithms
Lots of RFC
Changes are pushed in Kernels and Windows
versions/updates
Be up to date
117. Bandwidth calculation
Theoretical BW calculation is very easy
You have a window size of 70Kb
Your RTT is 30ms
Then your max bandwidth will be
Assuming no packet loss
BW = WinSize / RTT
70*1024/30*10^-3 = 18,2Mbps
118. Improving Bandwidth
Reduce RTT
Get better paths (better peering)
Increase WinSize
Sysctl under Linux
BW = WinSize / RTT
119. Optimal Window size
If you got a 1Gbps link
And your RTT is 30ms
Then you should set Window size to
If you set less : you’ll underuse the network
If you set more : you’ll saturate and drop bandwidth
Take care of available memory on your host
You should also enable SACK
BW = WinSize / RTT
WinSize = BW * RTT
1*1024^3 * 30*10^-3 = 3.84MB
125. TCP performances
Computing the checksum costs a lot to CPU
We nowadays transfer at GB or 10Gb rates
Several Ghz of processing are needed for such rates
TCP segmentation costs on both Tx and Rx
Memory movements
126. TCP offload
Offloading , is dedicating a part of the job to the NIC
TOE : TCP Offload Engine
A full TCP stack embed into a chip on the NIC
Linux never accepted that
TSO
TCP Segmentation offload : offload the segmentation at Tx
Level
LRO
Large receive offload : offload the de-segmentation at Rx
Level
Chk
Compute the checksum
127. TCP offload at the NIC level
> jpauli@840G3:~$ sudo ethtool -k enp0s31f6
Features for enp0s31f6:
rx-checksumming: on
tx-checksumming: on
tx-checksum-ipv4: off [fixed]
tx-checksum-ip-generic: on
tx-checksum-ipv6: off [fixed]
tx-checksum-fcoe-crc: off [fixed]
tx-checksum-sctp: off [fixed]
scatter-gather: on
tx-scatter-gather: on
tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: on
tx-tcp-segmentation: on
tx-tcp-ecn-segmentation: off [fixed]
tx-tcp6-segmentation: on
udp-fragmentation-offload: off [fixed]
generic-segmentation-offload: on