SlideShare a Scribd company logo
1 of 29
Brought to you by
Cutting Through the Fog
of Virtualization
Bernd Bandemer
Head of Data Science at Clockwork.io
Bernd Bandemer
Head of Data Science at Clockwork.io
■ Accurate, scalable, and stable clock sync is
a game changer in distributed computing
■ My prior lives:
Aruba Networks, Stanford, originally from Germany
The Promise of Virtualization
■ Dynamic demand automatically drives scale-up / scale-down
■ Resource guarantees
● Every VM behaves the same, independent of location, date, time of day
● Every network link between VMs behaves the same
■ Resource isolation
● Your neighbors won't bother you
● Your own VMs won't affect each other
VM Colocation
Top of Rack Switch
Fabric switch Fabric switch
Fabric switch
Top of Rack Switch Top of Rack Switch
Physical machines
(hosts)
VM Colocation
Top of Rack Switch
Physical machines
(hosts)
Fabric switch Fabric switch
Fabric switch
Top of Rack Switch Top of Rack Switch
Virtual machines
VM Colocation
Top of Rack Switch
Fabric switch Fabric switch
Fabric switch
Top of Rack Switch Top of Rack Switch
Virtual machines
YOUR virtual machines
Physical machines
(hosts)
VM Colocation
Top of Rack Switch
Fabric switch Fabric switch
Fabric switch
Top of Rack Switch Top of Rack Switch
Virtual machines
YOUR virtual machines
Physical machines
(hosts)
VM Colocation
Top of Rack Switch
Fabric switch Fabric switch
Fabric switch
Top of Rack Switch Top of Rack Switch
Virtual machines
YOUR virtual machines
Colocated on same host
Physical machines
(hosts)
VM Colocation vs. Shared Tenancy
■ Shared Tenancy
● Load on VMs is independent of each other
● Load spikes average out across the VMs
VM Colocation is much worse than shared tenancy
■ VM Colocation
● VMs participate in the same workload at the same time
● Simultaneous load spikes, averaging does not help
Potential Effects of VM Colocation
■ CPU and memory are well isolated between VMs
■ Networking is not well isolated
● Bandwidth: colocated VMs share a physical network interface (NICs)
● Latency: packets between colocated VMs don't travel through the network
● Packet drops: packets between colocated VMs can't get dropped in the network
How do these effects play out
in practice?
Real-world Data
■ ~3,000 test cluster instances
■ Amazon EKS, Google GKE, Azure AKS
■ Three Geo regions:
Eastern US (Virginia), Western Europe (London), Southeast Asia (Singapore)
■ Each cluster
● Bring up 50 virtual machine instances
● Instrumented with Clockwork's clock sync system
● Latency Sensei Audit, which includes several phases of network load
Determining VM Colocation
■ Clock sync system measures relative
clock offsets and clock drifts
● This is done by exchanging small UDP
packets in a probe mesh and measuring
their transit times
■ Colocated VMs share a physical
system clock
● This can be detected and used to
reverse-engineer the colocation
● Validated by many experiments, including
sole-tenant hosts
← Example colocation structure
Network Bandwidth
Network Bandwidth
■ We measure network bandwidth by sending long TCP flows between the VMs
Colocation structure Egress bandwidth
10 Gbps →
5 Gbps →
0 Gbps →
Colocated VMs have severely lower network bandwidth
Network Bandwidth on Google Cloud
n1-standard-4 n2-standard-4
Bandwidth is impacted when 3 or more VMs are colocated
Network Bandwidth on AWS
m4.xlarge m5.xlarge
Colocation is purposely limited; no bandwidth impact
Network Bandwidth on Azure
Standard_D4s_v3 Standard_D4s_v4
Bandwidth impact appears when colocation > 4
Network Bandwidth on Azure
During low-load times, Azure lifts the speed limit
20 Gbps →
10 Gbps →
0 Gbps →
Network Latency
Network Latency
■ We measure two-way delay between any two VMs with high accuracy
● Two-way delay is the sum of two one-way delays
● Exclude the effect of ACK turn-around time and sender/receiver stack delays
Network Latency on AWS
AWS virtual networking hides any potential latency benefit
■ No visible difference between
colocated and non-colocated
pairs of VM
Network Latency on AWS
AWS virtual networking hides any potential latency benefit
■ Two distinct modes, probably
explained by different generations
of networking implementation
Network Latency on Azure
In Azure, colocated VMs have higher latency
■ Azure's accelerated networking
optimizes the typical case
■ VMs on the same physical host raise an
exception that is handled in software
Network Latency on Google Cloud
In Google cloud, each region/instance type behaves differently
Packet Drops
Network Packet Drops
■ In each measurement run, we send 10s of millions of probe packets
■ UDP packets may get lost on the way
Non-colocated VMs Colocated VMs
Azure 68 ppm 60 ppm
AWS 220 ppm 213 ppm
Google 60 ppm 62 ppm
Packet drops rates are independent of VM colocation
Conclusion
■ VM Colocation has performance impact and no upside
● Highly colocated VMs have lower network bandwidth
● Colocation has no latency or reliability benefit
■ For optimal cloud system performance, VM colocation should be avoided
■ Clockwork Latency Sensei provides visibility into your cloud system
● Accurate clock sync makes VM colocation visible
● Latency Sensei audit reports highlight the impact on YOUR cloud system
Brought to you by
Bernd Bandemer
bernd@clockwork.io
www.clockwork.io
Thank you!

More Related Content

Similar to Cutting Through the Fog of Virtualization

OpenStack and OpenContrail for FreeBSD platform by Michał Dubiel
OpenStack and OpenContrail for FreeBSD platform by Michał DubielOpenStack and OpenContrail for FreeBSD platform by Michał Dubiel
OpenStack and OpenContrail for FreeBSD platform by Michał Dubiel
eurobsdcon
 

Similar to Cutting Through the Fog of Virtualization (20)

Vpc aws meetup
Vpc   aws meetupVpc   aws meetup
Vpc aws meetup
 
Software Defined Networks (SDN) na przykładzie rozwiązania OpenContrail.
Software Defined Networks (SDN) na przykładzie rozwiązania OpenContrail.Software Defined Networks (SDN) na przykładzie rozwiązania OpenContrail.
Software Defined Networks (SDN) na przykładzie rozwiązania OpenContrail.
 
Testing the limits of cloud networks
Testing the limits of cloud networksTesting the limits of cloud networks
Testing the limits of cloud networks
 
The impact of cloud NSBCon NY by Yves Goeleven
The impact of cloud NSBCon NY by Yves GoelevenThe impact of cloud NSBCon NY by Yves Goeleven
The impact of cloud NSBCon NY by Yves Goeleven
 
Challenges in Cloud Computing – VM Migration
Challenges in Cloud Computing – VM MigrationChallenges in Cloud Computing – VM Migration
Challenges in Cloud Computing – VM Migration
 
Aws Architecture Fundamentals
Aws Architecture FundamentalsAws Architecture Fundamentals
Aws Architecture Fundamentals
 
Things You MUST Know Before Deploying OpenStack: Bruno Lago, Catalyst IT
Things You MUST Know Before Deploying OpenStack: Bruno Lago, Catalyst ITThings You MUST Know Before Deploying OpenStack: Bruno Lago, Catalyst IT
Things You MUST Know Before Deploying OpenStack: Bruno Lago, Catalyst IT
 
FIWARE Tech Summit - Docker Swarm Secrets for Creating Great FIWARE Platforms
FIWARE Tech Summit - Docker Swarm Secrets for Creating Great FIWARE PlatformsFIWARE Tech Summit - Docker Swarm Secrets for Creating Great FIWARE Platforms
FIWARE Tech Summit - Docker Swarm Secrets for Creating Great FIWARE Platforms
 
Managing Cloud networking costs for data-intensive applications by provisioni...
Managing Cloud networking costs for data-intensive applications by provisioni...Managing Cloud networking costs for data-intensive applications by provisioni...
Managing Cloud networking costs for data-intensive applications by provisioni...
 
Cloud interconnection networks basic .pptx
Cloud interconnection networks basic .pptxCloud interconnection networks basic .pptx
Cloud interconnection networks basic .pptx
 
Grid middleware is easy to install, configure, secure, debug and manage acros...
Grid middleware is easy to install, configure, secure, debug and manage acros...Grid middleware is easy to install, configure, secure, debug and manage acros...
Grid middleware is easy to install, configure, secure, debug and manage acros...
 
Tokyo azure meetup #12 service fabric internals
Tokyo azure meetup #12   service fabric internalsTokyo azure meetup #12   service fabric internals
Tokyo azure meetup #12 service fabric internals
 
VMworld 2013: Silent Killer: How Latency Destroys Performance...And What to D...
VMworld 2013: Silent Killer: How Latency Destroys Performance...And What to D...VMworld 2013: Silent Killer: How Latency Destroys Performance...And What to D...
VMworld 2013: Silent Killer: How Latency Destroys Performance...And What to D...
 
Edge Zones In CloudStack
Edge Zones In CloudStackEdge Zones In CloudStack
Edge Zones In CloudStack
 
Shoot the Bird: Linear Broadcast Distribution on AWS by Usman Shakeel of Amaz...
Shoot the Bird: Linear Broadcast Distribution on AWS by Usman Shakeel of Amaz...Shoot the Bird: Linear Broadcast Distribution on AWS by Usman Shakeel of Amaz...
Shoot the Bird: Linear Broadcast Distribution on AWS by Usman Shakeel of Amaz...
 
The lies we tell our code, LinuxCon/CloudOpen 2015-08-18
The lies we tell our code, LinuxCon/CloudOpen 2015-08-18The lies we tell our code, LinuxCon/CloudOpen 2015-08-18
The lies we tell our code, LinuxCon/CloudOpen 2015-08-18
 
OpenStack and OpenContrail for FreeBSD platform by Michał Dubiel
OpenStack and OpenContrail for FreeBSD platform by Michał DubielOpenStack and OpenContrail for FreeBSD platform by Michał Dubiel
OpenStack and OpenContrail for FreeBSD platform by Michał Dubiel
 
Rise of the machines: Continuous Delivery at SEEK - YOW! Night Summary Slides
Rise of the machines: Continuous Delivery at SEEK - YOW! Night Summary SlidesRise of the machines: Continuous Delivery at SEEK - YOW! Night Summary Slides
Rise of the machines: Continuous Delivery at SEEK - YOW! Night Summary Slides
 
Migrate to Microservices Judiciously!
Migrate to Microservices Judiciously!Migrate to Microservices Judiciously!
Migrate to Microservices Judiciously!
 
Dependable Cloud Comuting
Dependable Cloud ComutingDependable Cloud Comuting
Dependable Cloud Comuting
 

More from ScyllaDB

More from ScyllaDB (20)

Event-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingEvent-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream Processing
 
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
 
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
What Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQLWhat Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQL
 
Low Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & PitfallsLow Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & Pitfalls
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance Dilemmas
 
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDBBeyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance Dilemmas
 
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
 
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
 
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr SarnaDatabase Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
 
Replacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDBReplacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDB
 
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear ScalabilityPowering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
 
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
 
Getting the most out of ScyllaDB
Getting the most out of ScyllaDBGetting the most out of ScyllaDB
Getting the most out of ScyllaDB
 
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a MigrationNoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
 
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration LogisticsNoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
 
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and ChallengesNoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
 
ScyllaDB Virtual Workshop
ScyllaDB Virtual WorkshopScyllaDB Virtual Workshop
ScyllaDB Virtual Workshop
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider  Progress from Awareness to Implementation.pptxTales from a Passkey Provider  Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
FIDO Alliance
 

Recently uploaded (20)

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
API Governance and Monetization - The evolution of API governance
API Governance and Monetization -  The evolution of API governanceAPI Governance and Monetization -  The evolution of API governance
API Governance and Monetization - The evolution of API governance
 
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider  Progress from Awareness to Implementation.pptxTales from a Passkey Provider  Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
 
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
 
Intro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptxIntro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptx
 
Simplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptxSimplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptx
 
UiPath manufacturing technology benefits and AI overview
UiPath manufacturing technology benefits and AI overviewUiPath manufacturing technology benefits and AI overview
UiPath manufacturing technology benefits and AI overview
 
Modernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using BallerinaModernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using Ballerina
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
The Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightThe Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and Insight
 
Decarbonising Commercial Real Estate: The Role of Operational Performance
Decarbonising Commercial Real Estate: The Role of Operational PerformanceDecarbonising Commercial Real Estate: The Role of Operational Performance
Decarbonising Commercial Real Estate: The Role of Operational Performance
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by Anitaraj
 
الأمن السيبراني - ما لا يسع للمستخدم جهله
الأمن السيبراني - ما لا يسع للمستخدم جهلهالأمن السيبراني - ما لا يسع للمستخدم جهله
الأمن السيبراني - ما لا يسع للمستخدم جهله
 
Top 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development CompaniesTop 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development Companies
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 

Cutting Through the Fog of Virtualization

  • 1. Brought to you by Cutting Through the Fog of Virtualization Bernd Bandemer Head of Data Science at Clockwork.io
  • 2. Bernd Bandemer Head of Data Science at Clockwork.io ■ Accurate, scalable, and stable clock sync is a game changer in distributed computing ■ My prior lives: Aruba Networks, Stanford, originally from Germany
  • 3. The Promise of Virtualization ■ Dynamic demand automatically drives scale-up / scale-down ■ Resource guarantees ● Every VM behaves the same, independent of location, date, time of day ● Every network link between VMs behaves the same ■ Resource isolation ● Your neighbors won't bother you ● Your own VMs won't affect each other
  • 4. VM Colocation Top of Rack Switch Fabric switch Fabric switch Fabric switch Top of Rack Switch Top of Rack Switch Physical machines (hosts)
  • 5. VM Colocation Top of Rack Switch Physical machines (hosts) Fabric switch Fabric switch Fabric switch Top of Rack Switch Top of Rack Switch Virtual machines
  • 6. VM Colocation Top of Rack Switch Fabric switch Fabric switch Fabric switch Top of Rack Switch Top of Rack Switch Virtual machines YOUR virtual machines Physical machines (hosts)
  • 7. VM Colocation Top of Rack Switch Fabric switch Fabric switch Fabric switch Top of Rack Switch Top of Rack Switch Virtual machines YOUR virtual machines Physical machines (hosts)
  • 8. VM Colocation Top of Rack Switch Fabric switch Fabric switch Fabric switch Top of Rack Switch Top of Rack Switch Virtual machines YOUR virtual machines Colocated on same host Physical machines (hosts)
  • 9. VM Colocation vs. Shared Tenancy ■ Shared Tenancy ● Load on VMs is independent of each other ● Load spikes average out across the VMs VM Colocation is much worse than shared tenancy ■ VM Colocation ● VMs participate in the same workload at the same time ● Simultaneous load spikes, averaging does not help
  • 10. Potential Effects of VM Colocation ■ CPU and memory are well isolated between VMs ■ Networking is not well isolated ● Bandwidth: colocated VMs share a physical network interface (NICs) ● Latency: packets between colocated VMs don't travel through the network ● Packet drops: packets between colocated VMs can't get dropped in the network
  • 11. How do these effects play out in practice?
  • 12. Real-world Data ■ ~3,000 test cluster instances ■ Amazon EKS, Google GKE, Azure AKS ■ Three Geo regions: Eastern US (Virginia), Western Europe (London), Southeast Asia (Singapore) ■ Each cluster ● Bring up 50 virtual machine instances ● Instrumented with Clockwork's clock sync system ● Latency Sensei Audit, which includes several phases of network load
  • 13. Determining VM Colocation ■ Clock sync system measures relative clock offsets and clock drifts ● This is done by exchanging small UDP packets in a probe mesh and measuring their transit times ■ Colocated VMs share a physical system clock ● This can be detected and used to reverse-engineer the colocation ● Validated by many experiments, including sole-tenant hosts ← Example colocation structure
  • 15. Network Bandwidth ■ We measure network bandwidth by sending long TCP flows between the VMs Colocation structure Egress bandwidth 10 Gbps → 5 Gbps → 0 Gbps → Colocated VMs have severely lower network bandwidth
  • 16. Network Bandwidth on Google Cloud n1-standard-4 n2-standard-4 Bandwidth is impacted when 3 or more VMs are colocated
  • 17. Network Bandwidth on AWS m4.xlarge m5.xlarge Colocation is purposely limited; no bandwidth impact
  • 18. Network Bandwidth on Azure Standard_D4s_v3 Standard_D4s_v4 Bandwidth impact appears when colocation > 4
  • 19. Network Bandwidth on Azure During low-load times, Azure lifts the speed limit 20 Gbps → 10 Gbps → 0 Gbps →
  • 21. Network Latency ■ We measure two-way delay between any two VMs with high accuracy ● Two-way delay is the sum of two one-way delays ● Exclude the effect of ACK turn-around time and sender/receiver stack delays
  • 22. Network Latency on AWS AWS virtual networking hides any potential latency benefit ■ No visible difference between colocated and non-colocated pairs of VM
  • 23. Network Latency on AWS AWS virtual networking hides any potential latency benefit ■ Two distinct modes, probably explained by different generations of networking implementation
  • 24. Network Latency on Azure In Azure, colocated VMs have higher latency ■ Azure's accelerated networking optimizes the typical case ■ VMs on the same physical host raise an exception that is handled in software
  • 25. Network Latency on Google Cloud In Google cloud, each region/instance type behaves differently
  • 27. Network Packet Drops ■ In each measurement run, we send 10s of millions of probe packets ■ UDP packets may get lost on the way Non-colocated VMs Colocated VMs Azure 68 ppm 60 ppm AWS 220 ppm 213 ppm Google 60 ppm 62 ppm Packet drops rates are independent of VM colocation
  • 28. Conclusion ■ VM Colocation has performance impact and no upside ● Highly colocated VMs have lower network bandwidth ● Colocation has no latency or reliability benefit ■ For optimal cloud system performance, VM colocation should be avoided ■ Clockwork Latency Sensei provides visibility into your cloud system ● Accurate clock sync makes VM colocation visible ● Latency Sensei audit reports highlight the impact on YOUR cloud system
  • 29. Brought to you by Bernd Bandemer bernd@clockwork.io www.clockwork.io Thank you!