SlideShare a Scribd company logo
Confidential + ProprietaryConfidential + Proprietary
Taking the Edge off with Espresso
Scale, Reliability and Programmability for Global Internet Peering
KK Yap, Murtaza Motiwala, Jeremy Rahe, Steve Padgett, Matthew Holliman, Gary Baldus, Marcus Hines,
Taeeun Kim, Ashok Narayanan, Ankur Jain, Victor Lin, Colin Rice, Brian Rogan, Arjun Singh, Bert Tanaka,
Manish Verma, Puneet Sood, Mukarram Tariq, Matt Tierney, Dzevad Trumic, Vytautas Valancius, Calvin Ying,
Mahesh Kallahalla, Bikash Koley, Amin Vahdat and many others.
Presented by: Piotr Marecki (bubu@google.com)
Confidential + Proprietary
Problem Statement
Egress Terabits/sec of traffic to our Internet peers
● High-def video, cloud traffic, etc.
2
Confidential + Proprietary
Problem Statement
Egress Terabits/sec of traffic to our Internet peers
● High-def video, cloud traffic, etc.
1. Optimize traffic per-customer and per-application
● e.g., optimal video quality, or differentiated service for cloud
3
Google
Alternate path with better
user experience?
● Problem: Constrained by BGP shortest path and lack of application awareness
Confidential + Proprietary
Problem Statement
Egress Terabits/sec of traffic to our Internet peers
● High-def video, cloud traffic, etc.
2. Deliver new features quickly
4
Request to
vendor
Commit to
Feature
Implement
Vendor
Testing
Integration
Testing @
Google
Deploy
Novel L2 VPN?
● Problem: router-vendor feature cycles and qualification take many years
Confidential + Proprietary
Espresso: Google’s SDN Peering Edge
Our previous experience with SDN
● B4 [SIGCOMM 2013] and Jupiter [SIGCOMM 2015]
● Enable flexible traffic engineering
● Increase feature velocity
5
SDN is only suited for walled gardens?
Peering edge requires interoperability with heterogeneous peers.
Confidential + Proprietary
Agenda
● Problem Statement
● Espresso in Context
● Design Principles
● Architecture Overview
● Results
● Conclusion
6
Confidential + Proprietary
Espresso in Context
B4
Jupiter Data CenterGoogle
7
[SIGCOMM 2015]
[SIGCOMM 2013]
Confidential + Proprietary
Espresso in Context
B4
Metro/Points-of-presence (PoP)
Jupiter Data Center
Google
Google
8
Confidential + Proprietary
Espresso in Context
B4Espresso
Internet
Metro/PoP
User
Jupiter Data Center
Google
Google
9
Confidential + Proprietary
Points of presence (>100)
Network fiber
Global Edge Footprint, > 100 PoPs
10
Confidential + Proprietary
Agenda
● Problem Statement
● Espresso in Context
● Design Principles
● Architecture Overview
● Results
● Conclusion
11
Confidential + Proprietary
Espresso’s Design Principles
12
1. Hierarchical control plane
○ Global optimization while local control plane provide fast reaction.
2. Fail static
○ Local control plane continues to function without global controller failure.
3. Software programmability
○ Externalize features into software to exploit commodity servers for scale.
4. Testability
5. Manageability
Confidential + Proprietary
Espresso’s Design Principles
13
1. Hierarchical control plane
○ Global optimization while local control plane provide fast reaction.
2. Fail static
○ Local control plane continues to function without global controller failure.
3. Software programmability
○ Externalize features into software to exploit commodity servers for scale.
4. Testability - loosely coupled control plane, automated testing and release
process
5. Manageability
Confidential + Proprietary
Architecture: Externalizing BGP
eBGP Peering
Espresso
Peering Router
Internet-size
routing/forwarding
table
Large ACL
External
Peer
Traditional
Peering Router
14
Hierarchical control plane
Fail static
Software programmability
Host
Host
Host
Host
Host
Host Servers
in Metro
Label-switched
Fabric
BGP
speaker
External
Peer
Peering Fabric
Host
Host
Host
Host
Host
Host Servers
in Metro
Confidential + Proprietary
Label-switched
Fabric
Architecture: Reliability and Scale of BGP
External
Peer
eBGP Peering
Peering Router
Internet-size RIB/FIB
Large TCAM
External
Peer
Traditional
Peering Router
15
Espresso
Peering Fabric
Host
Host
Host
Host
Host
Host
Host
Host
BGP
speaker
BGP
speaker
BGP
speaker
Host Servers
in Metro
Hierarchical control plane
Fail static
Software programmability
Host
Host
Host
Host
Host
Host Servers
in Metro
Confidential + Proprietary
Architecture: Externalize Packet Processing
Label-switched
Fabric
Host
Host
Host
Host
Host
Host
Packet
Processor
BGP
speaker
External PeereBGP
Peering
Host
Host
Host
Host
Host
Labeled packets
specify egress
Host-based packet processor allows flexible packet processing,
including ACL and handling of DoS.
16
Sink DoS
Ingress ACL
Peering Fabric
Hierarchical control plane
Fail static
Software programmability
Confidential + Proprietary
Architecture: Hierarchical Control
Host
Host
Host
Host
Host
Espresso Metro
Global Controller
Host
Host
Host
Host
Host
Peering Fabric
Location
Control
Peering Fabric
Control
17
Label-switched
Fabric
BGP
speaker
External Peer
Host
Packet
Processor
eBGP
Peering
Hierarchical control plane
Fail static
Software programmability
Confidential + Proprietary
Architecture: Fail Static
Host
Host
Host
Host
Host
Espresso Metro
Global Controller
Host
Host
Host
Host
Host
Peering Fabric
18
Label-switched
Fabric
BGP
speaker
Location
Control
Peering Fabric
Control
External Peer
Host
Packet
Processor
eBGP
Peering
Hierarchical control plane
Fail static
Software programmability
Confidential + Proprietary
Architecture: Application Aware Routing
Host
Host
Host
Host
Host
Espresso Metro
Global Controller
Host
Host
Host
Host
Host
Peering Fabric
Location
Control
Peering Fabric
Control
19
Label-switched
Fabric
BGP
speaker
External Peer
Host
Packet
Processor
eBGP
Peering
RIB
FIB
ACL
RIB
Application Signals
Hierarchical control plane
Fail static
Software programmability
Confidential + Proprietary
Using User’s Best Path, not BGP’s
20
Google
● Serve 13% more traffic than
BGP best path in application
aware manner.
● Helps capacity-constrained
ISPs by overflowing demand
to alternate paths within local
metro and also via remote
metros.
Confidential + Proprietary
Improvements in End User Experience
Client ISP Change in mean time
between rebuffers (MTBR)
Change in Mean Goodput
A 10 → 20 min 2.25 → 4.5 Mbps
B 4.6 → 12.5 min 2.75 → 4.9 Mbps
C 14 → 19 min 3.2 → 4.2 Mbps
Provide significant improvements to end-user experience.
21
Confidential + Proprietary
Release Velocity
Component Average Velocity
(days)
Local Controller 11.2
BGP speaker 12.6
Peering Fabric Controller 15.8
> 50× more frequently than with traditional peering routers.
Novel L2 VPN delivered 6× faster via incremental rollout.
22
Confidential + Proprietary
Manageability
● Espresso supports fully automated configuration and upgrade through
intent-driven configuration and management stack
○ To change config , operator or system change intent
○ Commit of change triggers management system to generate, version
and statically verify configuration before pushing it to all relevant
software components and devices
● Change canarying
● “Impact radius”
23
Confidential + Proprietary
Operational aspects
● Project development model - DevOps
○ Developers and operational team works together as one team with
common goal
○ Ops are not only providing requirements but actively participate during
design, development and deployment
○ Ops actively develop software tools for debugging and monitoring
○ Developers participate in operational activities and procedures,
effectively reducing “abstraction bias”
○ Entire team shares “oncall” duties
24
Confidential + Proprietary
Operational aspects - teams involved
● Traditional operations - distinction between system, network and multiple ops
groups that usually different methods, tools and develop different “work
culture”
● Without proper training and participation in DevOps model, support for
distributed SDN on Network Edge can raise confusion
○ where do i change BGP policy ?
○ Device is connected to remote peer but BGP is down - how do i
troubleshoot
● Most importantly - establish who is responsible for different parts of the
system and engage early
25
Confidential + Proprietary
Operational aspects - configuration and deployment
● SDN Edge system is no longer sum of “Network” and “Host” configuration
and provisioning that can be run by separate teams
○ Deployment procedure must be coherent process that efficiently
combines different teams and different provisioning systems
○ Not only Ops and Dev but also Deployment teams must be “in loop”
● Intent driven configuration is a hard requirement
○ Different config systems may complicate process ( synchronisations,
intent consistency )
26
Confidential + Proprietary
Operational aspects - monitoring
● Data and Control planes fully distributed
○ Eventual consistency
○ Fail Open ( data, control and management planes )
● Measure state of system
○ Data plane no longer contained to single device
○ Streaming telemetry (OC)
○ Black box approach
○ Control plane pipeline monitoring and anomaly detection
27
Confidential + Proprietary
Operational aspects - lessons learned
● Interesting failures
○ Configuration state reporting library bug - threat exhaustion caused ALL
espresso control element jobs to lock ( integration testing failure )
○ Erroneous configuration push on GC draining all PF nodes
○ Slow propagation of new routing changes from GC to HOST (inspired
development of local BGP-derived forwarding map )
○ Ingress traffic blackholing
28
Confidential + Proprietary
Conclusion
SDN is only suited for walled gardens.
29
.
Espresso demonstrates that
● traditional peering architecture can evolve to exploit SDN ( incremental changes
while maintaining full interoperability )
● SDN’s value is in flexibility and feature velocity ( cost savings secondary )
Confidential + Proprietary
Conclusion
Cloud 1.0
Router
Centric
Protocols
Local view
Connectivity based optimization
Slow evolution
Costly
Espresso
SDN
Peering
Global view
Application signals-based optimization
Rapid deploy-and-iterate
75% Cheaper
30
Confidential + Proprietary
Q&A
31

More Related Content

What's hot

Expanding your options with the MQ Appliance
Expanding your options with the MQ ApplianceExpanding your options with the MQ Appliance
Expanding your options with the MQ Appliance
Anthony Beardsmore
 
Stephan pfister flexcast remote pc new
Stephan pfister flexcast remote pc newStephan pfister flexcast remote pc new
Stephan pfister flexcast remote pc newDigicomp Academy AG
 
IBM Programmable Network Controller
IBM Programmable Network ControllerIBM Programmable Network Controller
IBM Programmable Network Controller
IBM India Smarter Computing
 
200860 installing an enterprise environment
200860 installing an enterprise environment200860 installing an enterprise environment
200860 installing an enterprise environment
p6academy
 
Network Troubleshooting - Part 1
Network Troubleshooting - Part 1Network Troubleshooting - Part 1
Network Troubleshooting - Part 1
SolarWinds
 
SomashekarJanardan-2014-Resume-Abrd
SomashekarJanardan-2014-Resume-AbrdSomashekarJanardan-2014-Resume-Abrd
SomashekarJanardan-2014-Resume-AbrdSomashekar Janardan
 
Continuous Delivery of Cloud Applications: Blue/Green and Canary Deployments
Continuous Delivery of Cloud Applications:Blue/Green and Canary DeploymentsContinuous Delivery of Cloud Applications:Blue/Green and Canary Deployments
Continuous Delivery of Cloud Applications: Blue/Green and Canary Deployments
Praveen Yalagandula
 
Production Ready Microservices at Scale
Production Ready Microservices at ScaleProduction Ready Microservices at Scale
Production Ready Microservices at Scale
Rajeev Bharshetty
 
Play With Streams
Play With StreamsPlay With Streams
Play With Streams
Tianjian Chen
 
Intel xeon e5v3 y sdi
Intel xeon e5v3 y sdiIntel xeon e5v3 y sdi
Intel xeon e5v3 y sdi
Telecomputer
 
Plastic SCM: Entreprise Version Control Platform for Modern Applications and ...
Plastic SCM: Entreprise Version Control Platform for Modern Applications and ...Plastic SCM: Entreprise Version Control Platform for Modern Applications and ...
Plastic SCM: Entreprise Version Control Platform for Modern Applications and ...
Kiko Monteverde
 
VMworld 2014: Extreme Performance Series
VMworld 2014: Extreme Performance Series VMworld 2014: Extreme Performance Series
VMworld 2014: Extreme Performance Series
VMworld
 
Win08 R2 It Pro Overview
Win08 R2 It Pro OverviewWin08 R2 It Pro Overview
Win08 R2 It Pro Overviewguest092b9a8
 
6 microservice architecture
6 microservice architecture6 microservice architecture
6 microservice architecture
Len Bass
 
VMworld 2015: Horizon View Troubleshooting - Looking Under the Hood
VMworld 2015: Horizon View Troubleshooting - Looking Under the HoodVMworld 2015: Horizon View Troubleshooting - Looking Under the Hood
VMworld 2015: Horizon View Troubleshooting - Looking Under the Hood
VMworld
 
WilliamArthur-Resume-1
WilliamArthur-Resume-1WilliamArthur-Resume-1
WilliamArthur-Resume-1Ted Arthur
 
David_Helg_IT_Resume(Updated)
David_Helg_IT_Resume(Updated)David_Helg_IT_Resume(Updated)
David_Helg_IT_Resume(Updated)David Helg
 
A Skype case study (2011)
A Skype case study (2011)A Skype case study (2011)
A Skype case study (2011)
Vasia Kalavri
 
Rickson_Rijoy_Windows, Exchange, Lync,Scom server support Engineer with 7.5 y...
Rickson_Rijoy_Windows, Exchange, Lync,Scom server support Engineer with 7.5 y...Rickson_Rijoy_Windows, Exchange, Lync,Scom server support Engineer with 7.5 y...
Rickson_Rijoy_Windows, Exchange, Lync,Scom server support Engineer with 7.5 y...Rickson Rijoy
 
RTC/CLM 5.0 Adoption Paths: Deploying in 16 Steps
 RTC/CLM 5.0 Adoption Paths: Deploying in 16 Steps RTC/CLM 5.0 Adoption Paths: Deploying in 16 Steps
RTC/CLM 5.0 Adoption Paths: Deploying in 16 Steps
Stéphane Leroy
 

What's hot (20)

Expanding your options with the MQ Appliance
Expanding your options with the MQ ApplianceExpanding your options with the MQ Appliance
Expanding your options with the MQ Appliance
 
Stephan pfister flexcast remote pc new
Stephan pfister flexcast remote pc newStephan pfister flexcast remote pc new
Stephan pfister flexcast remote pc new
 
IBM Programmable Network Controller
IBM Programmable Network ControllerIBM Programmable Network Controller
IBM Programmable Network Controller
 
200860 installing an enterprise environment
200860 installing an enterprise environment200860 installing an enterprise environment
200860 installing an enterprise environment
 
Network Troubleshooting - Part 1
Network Troubleshooting - Part 1Network Troubleshooting - Part 1
Network Troubleshooting - Part 1
 
SomashekarJanardan-2014-Resume-Abrd
SomashekarJanardan-2014-Resume-AbrdSomashekarJanardan-2014-Resume-Abrd
SomashekarJanardan-2014-Resume-Abrd
 
Continuous Delivery of Cloud Applications: Blue/Green and Canary Deployments
Continuous Delivery of Cloud Applications:Blue/Green and Canary DeploymentsContinuous Delivery of Cloud Applications:Blue/Green and Canary Deployments
Continuous Delivery of Cloud Applications: Blue/Green and Canary Deployments
 
Production Ready Microservices at Scale
Production Ready Microservices at ScaleProduction Ready Microservices at Scale
Production Ready Microservices at Scale
 
Play With Streams
Play With StreamsPlay With Streams
Play With Streams
 
Intel xeon e5v3 y sdi
Intel xeon e5v3 y sdiIntel xeon e5v3 y sdi
Intel xeon e5v3 y sdi
 
Plastic SCM: Entreprise Version Control Platform for Modern Applications and ...
Plastic SCM: Entreprise Version Control Platform for Modern Applications and ...Plastic SCM: Entreprise Version Control Platform for Modern Applications and ...
Plastic SCM: Entreprise Version Control Platform for Modern Applications and ...
 
VMworld 2014: Extreme Performance Series
VMworld 2014: Extreme Performance Series VMworld 2014: Extreme Performance Series
VMworld 2014: Extreme Performance Series
 
Win08 R2 It Pro Overview
Win08 R2 It Pro OverviewWin08 R2 It Pro Overview
Win08 R2 It Pro Overview
 
6 microservice architecture
6 microservice architecture6 microservice architecture
6 microservice architecture
 
VMworld 2015: Horizon View Troubleshooting - Looking Under the Hood
VMworld 2015: Horizon View Troubleshooting - Looking Under the HoodVMworld 2015: Horizon View Troubleshooting - Looking Under the Hood
VMworld 2015: Horizon View Troubleshooting - Looking Under the Hood
 
WilliamArthur-Resume-1
WilliamArthur-Resume-1WilliamArthur-Resume-1
WilliamArthur-Resume-1
 
David_Helg_IT_Resume(Updated)
David_Helg_IT_Resume(Updated)David_Helg_IT_Resume(Updated)
David_Helg_IT_Resume(Updated)
 
A Skype case study (2011)
A Skype case study (2011)A Skype case study (2011)
A Skype case study (2011)
 
Rickson_Rijoy_Windows, Exchange, Lync,Scom server support Engineer with 7.5 y...
Rickson_Rijoy_Windows, Exchange, Lync,Scom server support Engineer with 7.5 y...Rickson_Rijoy_Windows, Exchange, Lync,Scom server support Engineer with 7.5 y...
Rickson_Rijoy_Windows, Exchange, Lync,Scom server support Engineer with 7.5 y...
 
RTC/CLM 5.0 Adoption Paths: Deploying in 16 Steps
 RTC/CLM 5.0 Adoption Paths: Deploying in 16 Steps RTC/CLM 5.0 Adoption Paths: Deploying in 16 Steps
RTC/CLM 5.0 Adoption Paths: Deploying in 16 Steps
 

Similar to PLNOG19 - Piotr Marecki - Espresso: Scalable and Programmable Peering Edge

Cognos Performance Tuning Tips & Tricks
Cognos Performance Tuning Tips & TricksCognos Performance Tuning Tips & Tricks
Cognos Performance Tuning Tips & Tricks
Senturus
 
Preparing for Neo - Singapore OutSystems User Group October 2022 Meetup
Preparing for Neo - Singapore OutSystems User Group October 2022 MeetupPreparing for Neo - Singapore OutSystems User Group October 2022 Meetup
Preparing for Neo - Singapore OutSystems User Group October 2022 Meetup
YashrajNayak4
 
Free GitOps Workshop
Free GitOps WorkshopFree GitOps Workshop
Free GitOps Workshop
Weaveworks
 
Network Automation Journey, A systems engineer NetOps perspective
Network Automation Journey, A systems engineer NetOps perspectiveNetwork Automation Journey, A systems engineer NetOps perspective
Network Automation Journey, A systems engineer NetOps perspective
Walid Shaari
 
USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a Month
USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a MonthUSENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a Month
USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a Month
Nicolas Brousse
 
Cloud-Native Patterns and the Benefits of MySQL as a Platform Managed Service
Cloud-Native Patterns and the Benefits of MySQL as a Platform Managed ServiceCloud-Native Patterns and the Benefits of MySQL as a Platform Managed Service
Cloud-Native Patterns and the Benefits of MySQL as a Platform Managed Service
VMware Tanzu
 
Monitoring federation open stack infrastructure
Monitoring federation open stack infrastructureMonitoring federation open stack infrastructure
Monitoring federation open stack infrastructure
Fernando Lopez Aguilar
 
Last Conference 2017: Big Data in a Production Environment: Lessons Learnt
Last Conference 2017: Big Data in a Production Environment: Lessons LearntLast Conference 2017: Big Data in a Production Environment: Lessons Learnt
Last Conference 2017: Big Data in a Production Environment: Lessons Learnt
Mark Grebler
 
12-Factor Apps
12-Factor Apps12-Factor Apps
DevOps Workflows in the Windows Ecosystem - April 21
 DevOps Workflows in the Windows Ecosystem - April 21 DevOps Workflows in the Windows Ecosystem - April 21
DevOps Workflows in the Windows Ecosystem - April 21
Puppet
 
DevOps Workflows in the Windows Ecosystem - 21 April 2020
 DevOps Workflows in the Windows Ecosystem - 21 April 2020 DevOps Workflows in the Windows Ecosystem - 21 April 2020
DevOps Workflows in the Windows Ecosystem - 21 April 2020
Puppet
 
Weave GitOps 2022.09 Release: A Fast & Reliable Path to Production with Progr...
Weave GitOps 2022.09 Release: A Fast & Reliable Path to Production with Progr...Weave GitOps 2022.09 Release: A Fast & Reliable Path to Production with Progr...
Weave GitOps 2022.09 Release: A Fast & Reliable Path to Production with Progr...
Weaveworks
 
Room 2 - 4 - Juncheng Anthony Lin - Redhat - A Practical Approach to Traditio...
Room 2 - 4 - Juncheng Anthony Lin - Redhat - A Practical Approach to Traditio...Room 2 - 4 - Juncheng Anthony Lin - Redhat - A Practical Approach to Traditio...
Room 2 - 4 - Juncheng Anthony Lin - Redhat - A Practical Approach to Traditio...
Vietnam Open Infrastructure User Group
 
Hybrid and Multi-Cloud Strategies for Kubernetes with GitOps
Hybrid and Multi-Cloud Strategies for Kubernetes with GitOpsHybrid and Multi-Cloud Strategies for Kubernetes with GitOps
Hybrid and Multi-Cloud Strategies for Kubernetes with GitOps
Sonja Schweigert
 
Hybrid and Multi-Cloud Strategies for Kubernetes with GitOps
Hybrid and Multi-Cloud Strategies for Kubernetes with GitOpsHybrid and Multi-Cloud Strategies for Kubernetes with GitOps
Hybrid and Multi-Cloud Strategies for Kubernetes with GitOps
Weaveworks
 
管理向云的迁移过程
管理向云的迁移过程管理向云的迁移过程
管理向云的迁移过程
ITband
 
OpenFlow @ Google
OpenFlow @ GoogleOpenFlow @ Google
OpenFlow @ Google
Open Networking Summits
 
Controlled Evolution with Puppet and AWS
Controlled Evolution with Puppet and AWSControlled Evolution with Puppet and AWS
Controlled Evolution with Puppet and AWS
Puppet
 
Building Efficient Edge Nodes for Content Delivery Networks
Building Efficient Edge Nodes for Content Delivery NetworksBuilding Efficient Edge Nodes for Content Delivery Networks
Building Efficient Edge Nodes for Content Delivery Networks
Rebekah Rodriguez
 
Migrating to Windows 7 or 8 with Lenovo's Deployment Optimization Solutions
Migrating to Windows 7 or 8 with Lenovo's Deployment Optimization SolutionsMigrating to Windows 7 or 8 with Lenovo's Deployment Optimization Solutions
Migrating to Windows 7 or 8 with Lenovo's Deployment Optimization Solutions
Lenovo Business
 

Similar to PLNOG19 - Piotr Marecki - Espresso: Scalable and Programmable Peering Edge (20)

Cognos Performance Tuning Tips & Tricks
Cognos Performance Tuning Tips & TricksCognos Performance Tuning Tips & Tricks
Cognos Performance Tuning Tips & Tricks
 
Preparing for Neo - Singapore OutSystems User Group October 2022 Meetup
Preparing for Neo - Singapore OutSystems User Group October 2022 MeetupPreparing for Neo - Singapore OutSystems User Group October 2022 Meetup
Preparing for Neo - Singapore OutSystems User Group October 2022 Meetup
 
Free GitOps Workshop
Free GitOps WorkshopFree GitOps Workshop
Free GitOps Workshop
 
Network Automation Journey, A systems engineer NetOps perspective
Network Automation Journey, A systems engineer NetOps perspectiveNetwork Automation Journey, A systems engineer NetOps perspective
Network Automation Journey, A systems engineer NetOps perspective
 
USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a Month
USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a MonthUSENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a Month
USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a Month
 
Cloud-Native Patterns and the Benefits of MySQL as a Platform Managed Service
Cloud-Native Patterns and the Benefits of MySQL as a Platform Managed ServiceCloud-Native Patterns and the Benefits of MySQL as a Platform Managed Service
Cloud-Native Patterns and the Benefits of MySQL as a Platform Managed Service
 
Monitoring federation open stack infrastructure
Monitoring federation open stack infrastructureMonitoring federation open stack infrastructure
Monitoring federation open stack infrastructure
 
Last Conference 2017: Big Data in a Production Environment: Lessons Learnt
Last Conference 2017: Big Data in a Production Environment: Lessons LearntLast Conference 2017: Big Data in a Production Environment: Lessons Learnt
Last Conference 2017: Big Data in a Production Environment: Lessons Learnt
 
12-Factor Apps
12-Factor Apps12-Factor Apps
12-Factor Apps
 
DevOps Workflows in the Windows Ecosystem - April 21
 DevOps Workflows in the Windows Ecosystem - April 21 DevOps Workflows in the Windows Ecosystem - April 21
DevOps Workflows in the Windows Ecosystem - April 21
 
DevOps Workflows in the Windows Ecosystem - 21 April 2020
 DevOps Workflows in the Windows Ecosystem - 21 April 2020 DevOps Workflows in the Windows Ecosystem - 21 April 2020
DevOps Workflows in the Windows Ecosystem - 21 April 2020
 
Weave GitOps 2022.09 Release: A Fast & Reliable Path to Production with Progr...
Weave GitOps 2022.09 Release: A Fast & Reliable Path to Production with Progr...Weave GitOps 2022.09 Release: A Fast & Reliable Path to Production with Progr...
Weave GitOps 2022.09 Release: A Fast & Reliable Path to Production with Progr...
 
Room 2 - 4 - Juncheng Anthony Lin - Redhat - A Practical Approach to Traditio...
Room 2 - 4 - Juncheng Anthony Lin - Redhat - A Practical Approach to Traditio...Room 2 - 4 - Juncheng Anthony Lin - Redhat - A Practical Approach to Traditio...
Room 2 - 4 - Juncheng Anthony Lin - Redhat - A Practical Approach to Traditio...
 
Hybrid and Multi-Cloud Strategies for Kubernetes with GitOps
Hybrid and Multi-Cloud Strategies for Kubernetes with GitOpsHybrid and Multi-Cloud Strategies for Kubernetes with GitOps
Hybrid and Multi-Cloud Strategies for Kubernetes with GitOps
 
Hybrid and Multi-Cloud Strategies for Kubernetes with GitOps
Hybrid and Multi-Cloud Strategies for Kubernetes with GitOpsHybrid and Multi-Cloud Strategies for Kubernetes with GitOps
Hybrid and Multi-Cloud Strategies for Kubernetes with GitOps
 
管理向云的迁移过程
管理向云的迁移过程管理向云的迁移过程
管理向云的迁移过程
 
OpenFlow @ Google
OpenFlow @ GoogleOpenFlow @ Google
OpenFlow @ Google
 
Controlled Evolution with Puppet and AWS
Controlled Evolution with Puppet and AWSControlled Evolution with Puppet and AWS
Controlled Evolution with Puppet and AWS
 
Building Efficient Edge Nodes for Content Delivery Networks
Building Efficient Edge Nodes for Content Delivery NetworksBuilding Efficient Edge Nodes for Content Delivery Networks
Building Efficient Edge Nodes for Content Delivery Networks
 
Migrating to Windows 7 or 8 with Lenovo's Deployment Optimization Solutions
Migrating to Windows 7 or 8 with Lenovo's Deployment Optimization SolutionsMigrating to Windows 7 or 8 with Lenovo's Deployment Optimization Solutions
Migrating to Windows 7 or 8 with Lenovo's Deployment Optimization Solutions
 

Recently uploaded

OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamOpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
takuyayamamoto1800
 
top nidhi software solution freedownload
top nidhi software solution freedownloadtop nidhi software solution freedownload
top nidhi software solution freedownload
vrstrong314
 
Designing for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web ServicesDesigning for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web Services
KrzysztofKkol1
 
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
Tier1 app
 
A Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdfA Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdf
kalichargn70th171
 
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
XfilesPro
 
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Globus
 
How to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good PracticesHow to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good Practices
Globus
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
Paco van Beckhoven
 
Lecture 1 Introduction to games development
Lecture 1 Introduction to games developmentLecture 1 Introduction to games development
Lecture 1 Introduction to games development
abdulrafaychaudhry
 
Into the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdfInto the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdf
Ortus Solutions, Corp
 
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus
 
First Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User EndpointsFirst Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User Endpoints
Globus
 
SOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar
 
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Globus
 
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns
 
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, BetterWebinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
XfilesPro
 
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdfDominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
AMB-Review
 
Advanced Flow Concepts Every Developer Should Know
Advanced Flow Concepts Every Developer Should KnowAdvanced Flow Concepts Every Developer Should Know
Advanced Flow Concepts Every Developer Should Know
Peter Caitens
 

Recently uploaded (20)

OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamOpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
 
top nidhi software solution freedownload
top nidhi software solution freedownloadtop nidhi software solution freedownload
top nidhi software solution freedownload
 
Designing for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web ServicesDesigning for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web Services
 
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
 
A Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdfA Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdf
 
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
 
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
 
How to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good PracticesHow to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good Practices
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
 
Lecture 1 Introduction to games development
Lecture 1 Introduction to games developmentLecture 1 Introduction to games development
Lecture 1 Introduction to games development
 
Into the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdfInto the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdf
 
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024
 
First Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User EndpointsFirst Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User Endpoints
 
SOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBroker
 
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
 
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology Solutions
 
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, BetterWebinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
 
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdfDominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
 
Advanced Flow Concepts Every Developer Should Know
Advanced Flow Concepts Every Developer Should KnowAdvanced Flow Concepts Every Developer Should Know
Advanced Flow Concepts Every Developer Should Know
 

PLNOG19 - Piotr Marecki - Espresso: Scalable and Programmable Peering Edge

  • 1. Confidential + ProprietaryConfidential + Proprietary Taking the Edge off with Espresso Scale, Reliability and Programmability for Global Internet Peering KK Yap, Murtaza Motiwala, Jeremy Rahe, Steve Padgett, Matthew Holliman, Gary Baldus, Marcus Hines, Taeeun Kim, Ashok Narayanan, Ankur Jain, Victor Lin, Colin Rice, Brian Rogan, Arjun Singh, Bert Tanaka, Manish Verma, Puneet Sood, Mukarram Tariq, Matt Tierney, Dzevad Trumic, Vytautas Valancius, Calvin Ying, Mahesh Kallahalla, Bikash Koley, Amin Vahdat and many others. Presented by: Piotr Marecki (bubu@google.com)
  • 2. Confidential + Proprietary Problem Statement Egress Terabits/sec of traffic to our Internet peers ● High-def video, cloud traffic, etc. 2
  • 3. Confidential + Proprietary Problem Statement Egress Terabits/sec of traffic to our Internet peers ● High-def video, cloud traffic, etc. 1. Optimize traffic per-customer and per-application ● e.g., optimal video quality, or differentiated service for cloud 3 Google Alternate path with better user experience? ● Problem: Constrained by BGP shortest path and lack of application awareness
  • 4. Confidential + Proprietary Problem Statement Egress Terabits/sec of traffic to our Internet peers ● High-def video, cloud traffic, etc. 2. Deliver new features quickly 4 Request to vendor Commit to Feature Implement Vendor Testing Integration Testing @ Google Deploy Novel L2 VPN? ● Problem: router-vendor feature cycles and qualification take many years
  • 5. Confidential + Proprietary Espresso: Google’s SDN Peering Edge Our previous experience with SDN ● B4 [SIGCOMM 2013] and Jupiter [SIGCOMM 2015] ● Enable flexible traffic engineering ● Increase feature velocity 5 SDN is only suited for walled gardens? Peering edge requires interoperability with heterogeneous peers.
  • 6. Confidential + Proprietary Agenda ● Problem Statement ● Espresso in Context ● Design Principles ● Architecture Overview ● Results ● Conclusion 6
  • 7. Confidential + Proprietary Espresso in Context B4 Jupiter Data CenterGoogle 7 [SIGCOMM 2015] [SIGCOMM 2013]
  • 8. Confidential + Proprietary Espresso in Context B4 Metro/Points-of-presence (PoP) Jupiter Data Center Google Google 8
  • 9. Confidential + Proprietary Espresso in Context B4Espresso Internet Metro/PoP User Jupiter Data Center Google Google 9
  • 10. Confidential + Proprietary Points of presence (>100) Network fiber Global Edge Footprint, > 100 PoPs 10
  • 11. Confidential + Proprietary Agenda ● Problem Statement ● Espresso in Context ● Design Principles ● Architecture Overview ● Results ● Conclusion 11
  • 12. Confidential + Proprietary Espresso’s Design Principles 12 1. Hierarchical control plane ○ Global optimization while local control plane provide fast reaction. 2. Fail static ○ Local control plane continues to function without global controller failure. 3. Software programmability ○ Externalize features into software to exploit commodity servers for scale. 4. Testability 5. Manageability
  • 13. Confidential + Proprietary Espresso’s Design Principles 13 1. Hierarchical control plane ○ Global optimization while local control plane provide fast reaction. 2. Fail static ○ Local control plane continues to function without global controller failure. 3. Software programmability ○ Externalize features into software to exploit commodity servers for scale. 4. Testability - loosely coupled control plane, automated testing and release process 5. Manageability
  • 14. Confidential + Proprietary Architecture: Externalizing BGP eBGP Peering Espresso Peering Router Internet-size routing/forwarding table Large ACL External Peer Traditional Peering Router 14 Hierarchical control plane Fail static Software programmability Host Host Host Host Host Host Servers in Metro Label-switched Fabric BGP speaker External Peer Peering Fabric Host Host Host Host Host Host Servers in Metro
  • 15. Confidential + Proprietary Label-switched Fabric Architecture: Reliability and Scale of BGP External Peer eBGP Peering Peering Router Internet-size RIB/FIB Large TCAM External Peer Traditional Peering Router 15 Espresso Peering Fabric Host Host Host Host Host Host Host Host BGP speaker BGP speaker BGP speaker Host Servers in Metro Hierarchical control plane Fail static Software programmability Host Host Host Host Host Host Servers in Metro
  • 16. Confidential + Proprietary Architecture: Externalize Packet Processing Label-switched Fabric Host Host Host Host Host Host Packet Processor BGP speaker External PeereBGP Peering Host Host Host Host Host Labeled packets specify egress Host-based packet processor allows flexible packet processing, including ACL and handling of DoS. 16 Sink DoS Ingress ACL Peering Fabric Hierarchical control plane Fail static Software programmability
  • 17. Confidential + Proprietary Architecture: Hierarchical Control Host Host Host Host Host Espresso Metro Global Controller Host Host Host Host Host Peering Fabric Location Control Peering Fabric Control 17 Label-switched Fabric BGP speaker External Peer Host Packet Processor eBGP Peering Hierarchical control plane Fail static Software programmability
  • 18. Confidential + Proprietary Architecture: Fail Static Host Host Host Host Host Espresso Metro Global Controller Host Host Host Host Host Peering Fabric 18 Label-switched Fabric BGP speaker Location Control Peering Fabric Control External Peer Host Packet Processor eBGP Peering Hierarchical control plane Fail static Software programmability
  • 19. Confidential + Proprietary Architecture: Application Aware Routing Host Host Host Host Host Espresso Metro Global Controller Host Host Host Host Host Peering Fabric Location Control Peering Fabric Control 19 Label-switched Fabric BGP speaker External Peer Host Packet Processor eBGP Peering RIB FIB ACL RIB Application Signals Hierarchical control plane Fail static Software programmability
  • 20. Confidential + Proprietary Using User’s Best Path, not BGP’s 20 Google ● Serve 13% more traffic than BGP best path in application aware manner. ● Helps capacity-constrained ISPs by overflowing demand to alternate paths within local metro and also via remote metros.
  • 21. Confidential + Proprietary Improvements in End User Experience Client ISP Change in mean time between rebuffers (MTBR) Change in Mean Goodput A 10 → 20 min 2.25 → 4.5 Mbps B 4.6 → 12.5 min 2.75 → 4.9 Mbps C 14 → 19 min 3.2 → 4.2 Mbps Provide significant improvements to end-user experience. 21
  • 22. Confidential + Proprietary Release Velocity Component Average Velocity (days) Local Controller 11.2 BGP speaker 12.6 Peering Fabric Controller 15.8 > 50× more frequently than with traditional peering routers. Novel L2 VPN delivered 6× faster via incremental rollout. 22
  • 23. Confidential + Proprietary Manageability ● Espresso supports fully automated configuration and upgrade through intent-driven configuration and management stack ○ To change config , operator or system change intent ○ Commit of change triggers management system to generate, version and statically verify configuration before pushing it to all relevant software components and devices ● Change canarying ● “Impact radius” 23
  • 24. Confidential + Proprietary Operational aspects ● Project development model - DevOps ○ Developers and operational team works together as one team with common goal ○ Ops are not only providing requirements but actively participate during design, development and deployment ○ Ops actively develop software tools for debugging and monitoring ○ Developers participate in operational activities and procedures, effectively reducing “abstraction bias” ○ Entire team shares “oncall” duties 24
  • 25. Confidential + Proprietary Operational aspects - teams involved ● Traditional operations - distinction between system, network and multiple ops groups that usually different methods, tools and develop different “work culture” ● Without proper training and participation in DevOps model, support for distributed SDN on Network Edge can raise confusion ○ where do i change BGP policy ? ○ Device is connected to remote peer but BGP is down - how do i troubleshoot ● Most importantly - establish who is responsible for different parts of the system and engage early 25
  • 26. Confidential + Proprietary Operational aspects - configuration and deployment ● SDN Edge system is no longer sum of “Network” and “Host” configuration and provisioning that can be run by separate teams ○ Deployment procedure must be coherent process that efficiently combines different teams and different provisioning systems ○ Not only Ops and Dev but also Deployment teams must be “in loop” ● Intent driven configuration is a hard requirement ○ Different config systems may complicate process ( synchronisations, intent consistency ) 26
  • 27. Confidential + Proprietary Operational aspects - monitoring ● Data and Control planes fully distributed ○ Eventual consistency ○ Fail Open ( data, control and management planes ) ● Measure state of system ○ Data plane no longer contained to single device ○ Streaming telemetry (OC) ○ Black box approach ○ Control plane pipeline monitoring and anomaly detection 27
  • 28. Confidential + Proprietary Operational aspects - lessons learned ● Interesting failures ○ Configuration state reporting library bug - threat exhaustion caused ALL espresso control element jobs to lock ( integration testing failure ) ○ Erroneous configuration push on GC draining all PF nodes ○ Slow propagation of new routing changes from GC to HOST (inspired development of local BGP-derived forwarding map ) ○ Ingress traffic blackholing 28
  • 29. Confidential + Proprietary Conclusion SDN is only suited for walled gardens. 29 . Espresso demonstrates that ● traditional peering architecture can evolve to exploit SDN ( incremental changes while maintaining full interoperability ) ● SDN’s value is in flexibility and feature velocity ( cost savings secondary )
  • 30. Confidential + Proprietary Conclusion Cloud 1.0 Router Centric Protocols Local view Connectivity based optimization Slow evolution Costly Espresso SDN Peering Global view Application signals-based optimization Rapid deploy-and-iterate 75% Cheaper 30