TCP Issues in DataCenter Networks

Hemanth Kumar Mantri
Hemanth Kumar MantriGraduate Teaching Assistant
TCP Issues in Virtualized Datacenter
Networks
Hemanth Kumar Mantri
Department of Computer Science 1 of 27
Selected Papers
• The TCP Outcast Problem: Exposing
Unfairness in Data Center Networks.
– NSDI’12
• vSnoop: Improving TCP Throughput in
VirtualizedEnvironments via Ack Offload.
– ACM/IEEE SC, 2010
2 of 27
Background and Motivation
• Data center is a shared environment
– Multi Tenancy
• Virtualization: A key enabler of cloud
computing
– Amazon EC2
• Resource sharing
– CPU/Memory are strictly shared
– Network sharing largely laissez-faire
3 of 27
Data Center Networks
• Flows compete via TCP
• Ideally, TCP should achieve true fairness
– All flows get equal share of link capacity
• In practice, TCP exhibits RTT-bias
– Throughput is inversely proportional to RTT
• 2 Major Issues
– Unfairness (in general)
– Low Throughput (in virtualized environments)
4 of 27
Datacenter Topology (Hierarchical)
5 of 27
Traffic Pattern: Many to One
6 of 27
Key Find: Unfairness
Inverse RTT Bias?
Low RTT = Low Throughput
7 of 27
Further Investigation
Instantaneous Average
2-hop flow is consistently starved!!
TCP Outcast Problem
• Some Flows are ‘Outcast’ed and receive very low
throughput compared to others
• Almost an order of magnitude reduction in some
cases
8 of 27
Experiments
• Same RTTs
• Same Hop Length
• Unsynchronized Flows
• Introduce Background Traffic
• Vary Switch Buffer Size
• Vary TCP
– RENO, MP-TCP, BIC, Cubic + SACK
• Unfairness Persists! 9 of 27
Observation
Flow differential at input ports is the culprit! 10 of 27
Vary #flows at competing bottle neck
switch
11 of 27
Reason: Port Blackout
1. Packets are roughly same size
2. Similar inter-arrival rates (Predictable Timing) 12 of 27
Port Blackout
• Can occur on any input port
• Happens for small intervals of time
• Has more catastrophic effect on
throughput of fewer flows!!
– Experiments showed that “same number” of
packet drops affect the throughput of fewer
flows much more than if there were several
concurrent flows.
13 of 27
Conditions for TCP Outcast
14 of 27
Solutions?
• Stochastic Fair Queuing (SFQ)
– Explicitly enforce fairness among flows
– Expensive for commodity switches
• Equal Length Routing
– All flows are forced to go through Core
– Better interleaving of packets, alleviate PB
15 of 27
• Multiple VMs hosted by one physical host
• Multiple VMs sharing the same core
– Flexibility, scalability, and economy
VM Consolidation
Hardware
Virtualization Layer
VM 1 VM 3 VM 4VM 2
Observation:
VM consolidation negatively
impacts network performance!
16 of 27
Sender
Hardware
Virtualization Layer
Investigating the Problem
Server
VM 1 VM 2 VM 3
Client
17 of 27
40
60
80
100
120
140
160
180
5432
RTT(ms)
Number of VMs
RTT increases in
proportion to VM
scheduling slice
(30ms)
Effect of CPU Sharing
18 of 27
Exact Culprit
Sender
Hardware
Driver Domain
(dom0)
VM 1
Device
Driver
VM 3
bufbuf
VM 2
buf
19 of 27
Connection to the VM is much
slower than dom0!
Impact on TCP Throughput
+ dom0
x VM
20 of 27
Solution: vSnoop
• Alleviates the negative effect of VM scheduling on
TCP throughput
• Implemented within the driver domain to
accelerate TCP connections
• Does not require any modifications to the VM
• Does not violate end-to-end TCP semantics
• Applicable across a wide range of VMMs
– Xen, VMware, KVM, etc.
21 of 27
Sender VM1 BufferDriver Domain
Time
SYN
SYN,ACK
SYN
SYN,ACK
VM1 buffer
TCP Connection to a VM
Scheduled VM
VM1
VM2
VM3
VM1
VM2
VM3
SYN,ACK
SYN
VM Scheduling
Latency
RTT
RTT
VM Scheduling
Latency
Sender establishes a TCP
connection to VM1
22 of 27
Sender VM Shared BufferDriver Domain
Time
SYN
SYN,ACK
SYN
SYN,ACK
VM1 buffer
Key Idea: Acknowledgement Offload
Scheduled VM
VM1
VM2
VM3
VM1
VM2
VM3
SYN,ACK
w/ vSnoop
Faster progress during
TCP slowstart
23 of 27
• Challenge 1: Out-of-order/special packets (SYN, FIN packets)
• Solution: Let the VM handle these packets
• Challenge 2: Packet loss after vSnoop
• Solution: Let vSnoop acknowledge only if room in buffer
• Challenge 3: ACKs generated by the VM
• Solution: Suppress/rewrite ACKs already generated by vSnoop
Challenges
24 of 27
vSnoop Implementation in Xen
Driver Domain (dom0)
Bridge
Netfront
Netback
vSnoop
VM1
Netfront
Netback
VM3
Netfront
Netback
VM2
buf bufbuf
Tuning
Netfront
25 of 27
Median
0.192MB/s
0.778MB/s
6.003MB/s
TCP Throughput Improvement
• 3 VMs consolidated, 1000 transfers of a 100KB file
• Vanilla Xen, Xen+tuning, Xen+tuning+vSnoop
30x Improvement
+ Vanilla Xen
x Xen+tuning
* Xen+tuning+vSnoop
26 of 27
Thank You!
• References
– http://friends.cs.purdue.edu/dokuwiki/doku.php
– https://www.usenix.org/conference/nsdi12/tech-
schedule/technical-sessions
• Most animations and pictures are taken from
the authors’ original slides and NSDI’12
conference talk.
27 of 27
BACKUP SLIDES
28
Conditions for Outcast
• Switches use the tail-drop queue
management discipline
• A large set of flows and a small set of
flows arriving at two different input ports
compete for a bottleneck output port at a
switch
29
Why does Unfairness Matter?
• Multi Tenant Clouds
– Some tenants get better performance than
others
• Map Reduce Apps
– Straggler problems
– One delayed flow affects overall job
completion
30
State Machine Maintained Per-
FlowStart
Unexpected
Sequence
Active
(online)
No buffer
(offline)
Out-of-order
packet
In-order pkt
Buffer space available
Out-of-order
packet
In-order pkt
No buffer
In-order pkt
Buffer space available
No buffer
Packet recv
Early acknowledgements
for in-order packets
Don’t
acknowledge
Pass out-of-order
pkts to VM
31
vSnoop’s Impact on TCP Flows
• Slow Start
– Early acknowledgements help progress
connections faster
– Most significant benefit for short transfers that are
more prevalent in data centers
• Congestion Avoidance and Fast Retransmit
– Large flows in the steady state can also benefit
from vSnoop
– Benefit not as much as for Slow Start 32
1 of 32

Recommended

2016-tcpkali-websocket by
2016-tcpkali-websocket2016-tcpkali-websocket
2016-tcpkali-websocketLev Walkin
1.2K views16 slides
A survey on SCTP by
A survey on SCTPA survey on SCTP
A survey on SCTPchanwoo Jeong
78 views36 slides
Geneve by
GeneveGeneve
GeneveMadhu c
810 views13 slides
Congestion control Assignment Help by
Congestion control Assignment HelpCongestion control Assignment Help
Congestion control Assignment HelpJosephErin
109 views17 slides
3b multiple access by
3b multiple access3b multiple access
3b multiple accesskavish dani
486 views29 slides
Virtual Distro Dispatcher - A costless distributed virtual environment from T... by
Virtual Distro Dispatcher - A costless distributed virtual environment from T...Virtual Distro Dispatcher - A costless distributed virtual environment from T...
Virtual Distro Dispatcher - A costless distributed virtual environment from T...Flavio Bertini
452 views16 slides

More Related Content

What's hot

Hhm 3470 mq v8 and more recent new things for z os by
Hhm 3470 mq v8 and more recent new things for z osHhm 3470 mq v8 and more recent new things for z os
Hhm 3470 mq v8 and more recent new things for z osPete Siddall
1.3K views53 slides
Design and Performance Characteristics of Tap-as-a-Service by
Design and Performance Characteristics of Tap-as-a-ServiceDesign and Performance Characteristics of Tap-as-a-Service
Design and Performance Characteristics of Tap-as-a-Servicesoichi shigeta
309 views15 slides
Application Live Migration in LAN/WAN Environment by
Application Live Migration in LAN/WAN EnvironmentApplication Live Migration in LAN/WAN Environment
Application Live Migration in LAN/WAN EnvironmentMahendra Kutare
2.7K views37 slides
Training Slides: Basics 102: Introduction to Tungsten Clustering by
Training Slides: Basics 102: Introduction to Tungsten ClusteringTraining Slides: Basics 102: Introduction to Tungsten Clustering
Training Slides: Basics 102: Introduction to Tungsten ClusteringContinuent
165 views29 slides
VM Live Migration Speedup in Xen by
VM Live Migration Speedup in XenVM Live Migration Speedup in Xen
VM Live Migration Speedup in XenThe Linux Foundation
4K views21 slides
Feedback queuing models for time shared systems by
Feedback queuing models for time shared systemsFeedback queuing models for time shared systems
Feedback queuing models for time shared systemsPushpalanka Jayawardhana
986 views18 slides

What's hot(20)

Hhm 3470 mq v8 and more recent new things for z os by Pete Siddall
Hhm 3470 mq v8 and more recent new things for z osHhm 3470 mq v8 and more recent new things for z os
Hhm 3470 mq v8 and more recent new things for z os
Pete Siddall1.3K views
Design and Performance Characteristics of Tap-as-a-Service by soichi shigeta
Design and Performance Characteristics of Tap-as-a-ServiceDesign and Performance Characteristics of Tap-as-a-Service
Design and Performance Characteristics of Tap-as-a-Service
soichi shigeta309 views
Application Live Migration in LAN/WAN Environment by Mahendra Kutare
Application Live Migration in LAN/WAN EnvironmentApplication Live Migration in LAN/WAN Environment
Application Live Migration in LAN/WAN Environment
Mahendra Kutare2.7K views
Training Slides: Basics 102: Introduction to Tungsten Clustering by Continuent
Training Slides: Basics 102: Introduction to Tungsten ClusteringTraining Slides: Basics 102: Introduction to Tungsten Clustering
Training Slides: Basics 102: Introduction to Tungsten Clustering
Continuent165 views
IBM MQ Clustering (2017 version) by MarkTaylorIBM
IBM MQ Clustering (2017 version)IBM MQ Clustering (2017 version)
IBM MQ Clustering (2017 version)
MarkTaylorIBM1.4K views
Feedback Queueing Models for Time Shared Systems by Ishara Amarasekera
Feedback Queueing Models for Time Shared SystemsFeedback Queueing Models for Time Shared Systems
Feedback Queueing Models for Time Shared Systems
Ishara Amarasekera717 views
VMworld 2014: Extreme Performance Series by VMworld
VMworld 2014: Extreme Performance Series VMworld 2014: Extreme Performance Series
VMworld 2014: Extreme Performance Series
VMworld1.2K views
XPDS13: Performance Evaluation of Live Migration based on Xen ARM PVH - Jaeyo... by The Linux Foundation
XPDS13: Performance Evaluation of Live Migration based on Xen ARM PVH - Jaeyo...XPDS13: Performance Evaluation of Live Migration based on Xen ARM PVH - Jaeyo...
XPDS13: Performance Evaluation of Live Migration based on Xen ARM PVH - Jaeyo...
Demand-Based Coordinated Scheduling for SMP VMs by Hwanju Kim
Demand-Based Coordinated Scheduling for SMP VMsDemand-Based Coordinated Scheduling for SMP VMs
Demand-Based Coordinated Scheduling for SMP VMs
Hwanju Kim678 views
Swift container sync by Open Stack
Swift container syncSwift container sync
Swift container sync
Open Stack3.6K views
Containers in a File by OpenVZ
Containers in a FileContainers in a File
Containers in a File
OpenVZ766 views
Where is My Message?: Use MQ Tools to Work Out What Applications Have Done by Morag Hughson
Where is My Message?: Use MQ Tools to Work Out What Applications Have DoneWhere is My Message?: Use MQ Tools to Work Out What Applications Have Done
Where is My Message?: Use MQ Tools to Work Out What Applications Have Done
Morag Hughson4.3K views

Similar to TCP Issues in DataCenter Networks

10 sdn-vir-6up by
10 sdn-vir-6up10 sdn-vir-6up
10 sdn-vir-6upSachin Siddappa
232 views8 slides
XPDS13: On Paravirualizing TCP - Congestion Control on Xen VMs - Luwei Cheng,... by
XPDS13: On Paravirualizing TCP - Congestion Control on Xen VMs - Luwei Cheng,...XPDS13: On Paravirualizing TCP - Congestion Control on Xen VMs - Luwei Cheng,...
XPDS13: On Paravirualizing TCP - Congestion Control on Xen VMs - Luwei Cheng,...The Linux Foundation
3.4K views33 slides
TLS in manet by
TLS in manetTLS in manet
TLS in manetJay Patel
553 views27 slides
VMworld 2013: Extreme Performance Series: Network Speed Ahead by
VMworld 2013: Extreme Performance Series: Network Speed Ahead VMworld 2013: Extreme Performance Series: Network Speed Ahead
VMworld 2013: Extreme Performance Series: Network Speed Ahead VMworld
7.3K views54 slides
Designing TCP-Friendly Window-based Congestion Control by
Designing TCP-Friendly Window-based Congestion ControlDesigning TCP-Friendly Window-based Congestion Control
Designing TCP-Friendly Window-based Congestion Controlsoohyunc
949 views27 slides
transport layer by
transport layertransport layer
transport layerpriyadharshini murugan
5K views53 slides

Similar to TCP Issues in DataCenter Networks(20)

XPDS13: On Paravirualizing TCP - Congestion Control on Xen VMs - Luwei Cheng,... by The Linux Foundation
XPDS13: On Paravirualizing TCP - Congestion Control on Xen VMs - Luwei Cheng,...XPDS13: On Paravirualizing TCP - Congestion Control on Xen VMs - Luwei Cheng,...
XPDS13: On Paravirualizing TCP - Congestion Control on Xen VMs - Luwei Cheng,...
TLS in manet by Jay Patel
TLS in manetTLS in manet
TLS in manet
Jay Patel553 views
VMworld 2013: Extreme Performance Series: Network Speed Ahead by VMworld
VMworld 2013: Extreme Performance Series: Network Speed Ahead VMworld 2013: Extreme Performance Series: Network Speed Ahead
VMworld 2013: Extreme Performance Series: Network Speed Ahead
VMworld7.3K views
Designing TCP-Friendly Window-based Congestion Control by soohyunc
Designing TCP-Friendly Window-based Congestion ControlDesigning TCP-Friendly Window-based Congestion Control
Designing TCP-Friendly Window-based Congestion Control
soohyunc949 views
Congestion_Control09.ppt by tahaniali27
Congestion_Control09.pptCongestion_Control09.ppt
Congestion_Control09.ppt
tahaniali275 views
FATTREE: A scalable Commodity Data Center Network Architecture by Ankita Mahajan
FATTREE: A scalable Commodity Data Center Network ArchitectureFATTREE: A scalable Commodity Data Center Network Architecture
FATTREE: A scalable Commodity Data Center Network Architecture
Ankita Mahajan11.1K views
RIPE 80: Buffers and Protocols by APNIC
RIPE 80: Buffers and ProtocolsRIPE 80: Buffers and Protocols
RIPE 80: Buffers and Protocols
APNIC215 views
AusNOG 2019: TCP and BBR by APNIC
AusNOG 2019: TCP and BBRAusNOG 2019: TCP and BBR
AusNOG 2019: TCP and BBR
APNIC549 views
DevOops - Lessons Learned from an OpenStack Network Architect by James Denton
DevOops - Lessons Learned from an OpenStack Network ArchitectDevOops - Lessons Learned from an OpenStack Network Architect
DevOops - Lessons Learned from an OpenStack Network Architect
James Denton1.9K views
NZNOG 2020: Buffers, Buffer Bloat and BBR by APNIC
NZNOG 2020: Buffers, Buffer Bloat and BBRNZNOG 2020: Buffers, Buffer Bloat and BBR
NZNOG 2020: Buffers, Buffer Bloat and BBR
APNIC272 views

More from Hemanth Kumar Mantri

Basic Paxos Implementation in Orc by
Basic Paxos Implementation in OrcBasic Paxos Implementation in Orc
Basic Paxos Implementation in OrcHemanth Kumar Mantri
1.3K views28 slides
Neural Networks in File access Prediction by
Neural Networks in File access PredictionNeural Networks in File access Prediction
Neural Networks in File access PredictionHemanth Kumar Mantri
720 views17 slides
Connected Components Labeling by
Connected Components LabelingConnected Components Labeling
Connected Components LabelingHemanth Kumar Mantri
2.3K views52 slides
JPEG Image Compression by
JPEG Image CompressionJPEG Image Compression
JPEG Image CompressionHemanth Kumar Mantri
6.8K views42 slides
Traffic Simulation using NetLogo by
Traffic Simulation using NetLogoTraffic Simulation using NetLogo
Traffic Simulation using NetLogoHemanth Kumar Mantri
3K views12 slides
Search Engine Switching by
Search Engine SwitchingSearch Engine Switching
Search Engine SwitchingHemanth Kumar Mantri
282 views30 slides

Recently uploaded

Qualifying SaaS, IaaS.pptx by
Qualifying SaaS, IaaS.pptxQualifying SaaS, IaaS.pptx
Qualifying SaaS, IaaS.pptxSachin Bhandari
1K views8 slides
The Power of Heat Decarbonisation Plans in the Built Environment by
The Power of Heat Decarbonisation Plans in the Built EnvironmentThe Power of Heat Decarbonisation Plans in the Built Environment
The Power of Heat Decarbonisation Plans in the Built EnvironmentIES VE
79 views20 slides
Generative AI: Shifting the AI Landscape by
Generative AI: Shifting the AI LandscapeGenerative AI: Shifting the AI Landscape
Generative AI: Shifting the AI LandscapeDeakin University
53 views55 slides
VNF Integration and Support in CloudStack - Wei Zhou - ShapeBlue by
VNF Integration and Support in CloudStack - Wei Zhou - ShapeBlueVNF Integration and Support in CloudStack - Wei Zhou - ShapeBlue
VNF Integration and Support in CloudStack - Wei Zhou - ShapeBlueShapeBlue
203 views54 slides
DRaaS using Snapshot copy and destination selection (DRaaS) - Alexandre Matti... by
DRaaS using Snapshot copy and destination selection (DRaaS) - Alexandre Matti...DRaaS using Snapshot copy and destination selection (DRaaS) - Alexandre Matti...
DRaaS using Snapshot copy and destination selection (DRaaS) - Alexandre Matti...ShapeBlue
139 views29 slides
Digital Personal Data Protection (DPDP) Practical Approach For CISOs by
Digital Personal Data Protection (DPDP) Practical Approach For CISOsDigital Personal Data Protection (DPDP) Practical Approach For CISOs
Digital Personal Data Protection (DPDP) Practical Approach For CISOsPriyanka Aash
158 views59 slides

Recently uploaded(20)

The Power of Heat Decarbonisation Plans in the Built Environment by IES VE
The Power of Heat Decarbonisation Plans in the Built EnvironmentThe Power of Heat Decarbonisation Plans in the Built Environment
The Power of Heat Decarbonisation Plans in the Built Environment
IES VE79 views
VNF Integration and Support in CloudStack - Wei Zhou - ShapeBlue by ShapeBlue
VNF Integration and Support in CloudStack - Wei Zhou - ShapeBlueVNF Integration and Support in CloudStack - Wei Zhou - ShapeBlue
VNF Integration and Support in CloudStack - Wei Zhou - ShapeBlue
ShapeBlue203 views
DRaaS using Snapshot copy and destination selection (DRaaS) - Alexandre Matti... by ShapeBlue
DRaaS using Snapshot copy and destination selection (DRaaS) - Alexandre Matti...DRaaS using Snapshot copy and destination selection (DRaaS) - Alexandre Matti...
DRaaS using Snapshot copy and destination selection (DRaaS) - Alexandre Matti...
ShapeBlue139 views
Digital Personal Data Protection (DPDP) Practical Approach For CISOs by Priyanka Aash
Digital Personal Data Protection (DPDP) Practical Approach For CISOsDigital Personal Data Protection (DPDP) Practical Approach For CISOs
Digital Personal Data Protection (DPDP) Practical Approach For CISOs
Priyanka Aash158 views
2FA and OAuth2 in CloudStack - Andrija Panić - ShapeBlue by ShapeBlue
2FA and OAuth2 in CloudStack - Andrija Panić - ShapeBlue2FA and OAuth2 in CloudStack - Andrija Panić - ShapeBlue
2FA and OAuth2 in CloudStack - Andrija Panić - ShapeBlue
ShapeBlue147 views
Updates on the LINSTOR Driver for CloudStack - Rene Peinthor - LINBIT by ShapeBlue
Updates on the LINSTOR Driver for CloudStack - Rene Peinthor - LINBITUpdates on the LINSTOR Driver for CloudStack - Rene Peinthor - LINBIT
Updates on the LINSTOR Driver for CloudStack - Rene Peinthor - LINBIT
ShapeBlue206 views
CloudStack Managed User Data and Demo - Harikrishna Patnala - ShapeBlue by ShapeBlue
CloudStack Managed User Data and Demo - Harikrishna Patnala - ShapeBlueCloudStack Managed User Data and Demo - Harikrishna Patnala - ShapeBlue
CloudStack Managed User Data and Demo - Harikrishna Patnala - ShapeBlue
ShapeBlue135 views
Live Demo Showcase: Unveiling Dell PowerFlex’s IaaS Capabilities with Apache ... by ShapeBlue
Live Demo Showcase: Unveiling Dell PowerFlex’s IaaS Capabilities with Apache ...Live Demo Showcase: Unveiling Dell PowerFlex’s IaaS Capabilities with Apache ...
Live Demo Showcase: Unveiling Dell PowerFlex’s IaaS Capabilities with Apache ...
ShapeBlue126 views
Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda... by ShapeBlue
Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda...Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda...
Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda...
ShapeBlue161 views
NTGapps NTG LowCode Platform by Mustafa Kuğu
NTGapps NTG LowCode Platform NTGapps NTG LowCode Platform
NTGapps NTG LowCode Platform
Mustafa Kuğu423 views
Business Analyst Series 2023 - Week 4 Session 8 by DianaGray10
Business Analyst Series 2023 -  Week 4 Session 8Business Analyst Series 2023 -  Week 4 Session 8
Business Analyst Series 2023 - Week 4 Session 8
DianaGray10123 views
Why and How CloudStack at weSystems - Stephan Bienek - weSystems by ShapeBlue
Why and How CloudStack at weSystems - Stephan Bienek - weSystemsWhy and How CloudStack at weSystems - Stephan Bienek - weSystems
Why and How CloudStack at weSystems - Stephan Bienek - weSystems
ShapeBlue238 views
Enabling DPU Hardware Accelerators in XCP-ng Cloud Platform Environment - And... by ShapeBlue
Enabling DPU Hardware Accelerators in XCP-ng Cloud Platform Environment - And...Enabling DPU Hardware Accelerators in XCP-ng Cloud Platform Environment - And...
Enabling DPU Hardware Accelerators in XCP-ng Cloud Platform Environment - And...
ShapeBlue106 views
Webinar : Desperately Seeking Transformation - Part 2: Insights from leading... by The Digital Insurer
Webinar : Desperately Seeking Transformation - Part 2:  Insights from leading...Webinar : Desperately Seeking Transformation - Part 2:  Insights from leading...
Webinar : Desperately Seeking Transformation - Part 2: Insights from leading...
Zero to Cloud Hero: Crafting a Private Cloud from Scratch with XCP-ng, Xen Or... by ShapeBlue
Zero to Cloud Hero: Crafting a Private Cloud from Scratch with XCP-ng, Xen Or...Zero to Cloud Hero: Crafting a Private Cloud from Scratch with XCP-ng, Xen Or...
Zero to Cloud Hero: Crafting a Private Cloud from Scratch with XCP-ng, Xen Or...
ShapeBlue198 views

TCP Issues in DataCenter Networks

  • 1. TCP Issues in Virtualized Datacenter Networks Hemanth Kumar Mantri Department of Computer Science 1 of 27
  • 2. Selected Papers • The TCP Outcast Problem: Exposing Unfairness in Data Center Networks. – NSDI’12 • vSnoop: Improving TCP Throughput in VirtualizedEnvironments via Ack Offload. – ACM/IEEE SC, 2010 2 of 27
  • 3. Background and Motivation • Data center is a shared environment – Multi Tenancy • Virtualization: A key enabler of cloud computing – Amazon EC2 • Resource sharing – CPU/Memory are strictly shared – Network sharing largely laissez-faire 3 of 27
  • 4. Data Center Networks • Flows compete via TCP • Ideally, TCP should achieve true fairness – All flows get equal share of link capacity • In practice, TCP exhibits RTT-bias – Throughput is inversely proportional to RTT • 2 Major Issues – Unfairness (in general) – Low Throughput (in virtualized environments) 4 of 27
  • 6. Traffic Pattern: Many to One 6 of 27
  • 7. Key Find: Unfairness Inverse RTT Bias? Low RTT = Low Throughput 7 of 27
  • 8. Further Investigation Instantaneous Average 2-hop flow is consistently starved!! TCP Outcast Problem • Some Flows are ‘Outcast’ed and receive very low throughput compared to others • Almost an order of magnitude reduction in some cases 8 of 27
  • 9. Experiments • Same RTTs • Same Hop Length • Unsynchronized Flows • Introduce Background Traffic • Vary Switch Buffer Size • Vary TCP – RENO, MP-TCP, BIC, Cubic + SACK • Unfairness Persists! 9 of 27
  • 10. Observation Flow differential at input ports is the culprit! 10 of 27
  • 11. Vary #flows at competing bottle neck switch 11 of 27
  • 12. Reason: Port Blackout 1. Packets are roughly same size 2. Similar inter-arrival rates (Predictable Timing) 12 of 27
  • 13. Port Blackout • Can occur on any input port • Happens for small intervals of time • Has more catastrophic effect on throughput of fewer flows!! – Experiments showed that “same number” of packet drops affect the throughput of fewer flows much more than if there were several concurrent flows. 13 of 27
  • 14. Conditions for TCP Outcast 14 of 27
  • 15. Solutions? • Stochastic Fair Queuing (SFQ) – Explicitly enforce fairness among flows – Expensive for commodity switches • Equal Length Routing – All flows are forced to go through Core – Better interleaving of packets, alleviate PB 15 of 27
  • 16. • Multiple VMs hosted by one physical host • Multiple VMs sharing the same core – Flexibility, scalability, and economy VM Consolidation Hardware Virtualization Layer VM 1 VM 3 VM 4VM 2 Observation: VM consolidation negatively impacts network performance! 16 of 27
  • 17. Sender Hardware Virtualization Layer Investigating the Problem Server VM 1 VM 2 VM 3 Client 17 of 27
  • 18. 40 60 80 100 120 140 160 180 5432 RTT(ms) Number of VMs RTT increases in proportion to VM scheduling slice (30ms) Effect of CPU Sharing 18 of 27
  • 19. Exact Culprit Sender Hardware Driver Domain (dom0) VM 1 Device Driver VM 3 bufbuf VM 2 buf 19 of 27
  • 20. Connection to the VM is much slower than dom0! Impact on TCP Throughput + dom0 x VM 20 of 27
  • 21. Solution: vSnoop • Alleviates the negative effect of VM scheduling on TCP throughput • Implemented within the driver domain to accelerate TCP connections • Does not require any modifications to the VM • Does not violate end-to-end TCP semantics • Applicable across a wide range of VMMs – Xen, VMware, KVM, etc. 21 of 27
  • 22. Sender VM1 BufferDriver Domain Time SYN SYN,ACK SYN SYN,ACK VM1 buffer TCP Connection to a VM Scheduled VM VM1 VM2 VM3 VM1 VM2 VM3 SYN,ACK SYN VM Scheduling Latency RTT RTT VM Scheduling Latency Sender establishes a TCP connection to VM1 22 of 27
  • 23. Sender VM Shared BufferDriver Domain Time SYN SYN,ACK SYN SYN,ACK VM1 buffer Key Idea: Acknowledgement Offload Scheduled VM VM1 VM2 VM3 VM1 VM2 VM3 SYN,ACK w/ vSnoop Faster progress during TCP slowstart 23 of 27
  • 24. • Challenge 1: Out-of-order/special packets (SYN, FIN packets) • Solution: Let the VM handle these packets • Challenge 2: Packet loss after vSnoop • Solution: Let vSnoop acknowledge only if room in buffer • Challenge 3: ACKs generated by the VM • Solution: Suppress/rewrite ACKs already generated by vSnoop Challenges 24 of 27
  • 25. vSnoop Implementation in Xen Driver Domain (dom0) Bridge Netfront Netback vSnoop VM1 Netfront Netback VM3 Netfront Netback VM2 buf bufbuf Tuning Netfront 25 of 27
  • 26. Median 0.192MB/s 0.778MB/s 6.003MB/s TCP Throughput Improvement • 3 VMs consolidated, 1000 transfers of a 100KB file • Vanilla Xen, Xen+tuning, Xen+tuning+vSnoop 30x Improvement + Vanilla Xen x Xen+tuning * Xen+tuning+vSnoop 26 of 27
  • 27. Thank You! • References – http://friends.cs.purdue.edu/dokuwiki/doku.php – https://www.usenix.org/conference/nsdi12/tech- schedule/technical-sessions • Most animations and pictures are taken from the authors’ original slides and NSDI’12 conference talk. 27 of 27
  • 29. Conditions for Outcast • Switches use the tail-drop queue management discipline • A large set of flows and a small set of flows arriving at two different input ports compete for a bottleneck output port at a switch 29
  • 30. Why does Unfairness Matter? • Multi Tenant Clouds – Some tenants get better performance than others • Map Reduce Apps – Straggler problems – One delayed flow affects overall job completion 30
  • 31. State Machine Maintained Per- FlowStart Unexpected Sequence Active (online) No buffer (offline) Out-of-order packet In-order pkt Buffer space available Out-of-order packet In-order pkt No buffer In-order pkt Buffer space available No buffer Packet recv Early acknowledgements for in-order packets Don’t acknowledge Pass out-of-order pkts to VM 31
  • 32. vSnoop’s Impact on TCP Flows • Slow Start – Early acknowledgements help progress connections faster – Most significant benefit for short transfers that are more prevalent in data centers • Congestion Avoidance and Fast Retransmit – Large flows in the steady state can also benefit from vSnoop – Benefit not as much as for Slow Start 32