SlideShare a Scribd company logo
1 of 31
Download to read offline
Confidential + ProprietaryConfidential + Proprietary
Taking the Edge off with Espresso
Scale, Reliability and Programmability for Global Internet Peering
KK Yap, Murtaza Motiwala, Jeremy Rahe, Steve Padgett, Matthew Holliman, Gary Baldus, Marcus Hines,
Taeeun Kim, Ashok Narayanan, Ankur Jain, Victor Lin, Colin Rice, Brian Rogan, Arjun Singh, Bert Tanaka,
Manish Verma, Puneet Sood, Mukarram Tariq, Matt Tierney, Dzevad Trumic, Vytautas Valancius, Calvin Ying,
Mahesh Kallahalla, Bikash Koley, Amin Vahdat and many others.
Presented by: Piotr Marecki (bubu@google.com)
Confidential + Proprietary
Problem Statement
Egress Terabits/sec of traffic to our Internet peers
● High-def video, cloud traffic, etc.
2
Confidential + Proprietary
Problem Statement
Egress Terabits/sec of traffic to our Internet peers
● High-def video, cloud traffic, etc.
1. Optimize traffic per-customer and per-application
● e.g., optimal video quality, or differentiated service for cloud
3
Google
Alternate path with better
user experience?
● Problem: Constrained by BGP shortest path and lack of application awareness
Confidential + Proprietary
Problem Statement
Egress Terabits/sec of traffic to our Internet peers
● High-def video, cloud traffic, etc.
2. Deliver new features quickly
4
Request to
vendor
Commit to
Feature
Implement
Vendor
Testing
Integration
Testing @
Google
Deploy
Novel L2 VPN?
● Problem: router-vendor feature cycles and qualification take many years
Confidential + Proprietary
Espresso: Google’s SDN Peering Edge
Our previous experience with SDN
● B4 [SIGCOMM 2013] and Jupiter [SIGCOMM 2015]
● Enable flexible traffic engineering
● Increase feature velocity
5
SDN is only suited for walled gardens?
Peering edge requires interoperability with heterogeneous peers.
Confidential + Proprietary
Agenda
● Problem Statement
● Espresso in Context
● Design Principles
● Architecture Overview
● Results
● Conclusion
6
Confidential + Proprietary
Espresso in Context
B4
Jupiter Data CenterGoogle
7
[SIGCOMM 2015]
[SIGCOMM 2013]
Confidential + Proprietary
Espresso in Context
B4
Metro/Points-of-presence (PoP)
Jupiter Data Center
Google
Google
8
Confidential + Proprietary
Espresso in Context
B4Espresso
Internet
Metro/PoP
User
Jupiter Data Center
Google
Google
9
Confidential + Proprietary
Points of presence (>100)
Network fiber
Global Edge Footprint, > 100 PoPs
10
Confidential + Proprietary
Agenda
● Problem Statement
● Espresso in Context
● Design Principles
● Architecture Overview
● Results
● Conclusion
11
Confidential + Proprietary
Espresso’s Design Principles
12
1. Hierarchical control plane
○ Global optimization while local control plane provide fast reaction.
2. Fail static
○ Local control plane continues to function without global controller failure.
3. Software programmability
○ Externalize features into software to exploit commodity servers for scale.
4. Testability
5. Manageability
Confidential + Proprietary
Espresso’s Design Principles
13
1. Hierarchical control plane
○ Global optimization while local control plane provide fast reaction.
2. Fail static
○ Local control plane continues to function without global controller failure.
3. Software programmability
○ Externalize features into software to exploit commodity servers for scale.
4. Testability - loosely coupled control plane, automated testing and release
process
5. Manageability
Confidential + Proprietary
Architecture: Externalizing BGP
eBGP Peering
Espresso
Peering Router
Internet-size
routing/forwarding
table
Large ACL
External
Peer
Traditional
Peering Router
14
Hierarchical control plane
Fail static
Software programmability
Host
Host
Host
Host
Host
Host Servers
in Metro
Label-switched
Fabric
BGP
speaker
External
Peer
Peering Fabric
Host
Host
Host
Host
Host
Host Servers
in Metro
Confidential + Proprietary
Label-switched
Fabric
Architecture: Reliability and Scale of BGP
External
Peer
eBGP Peering
Peering Router
Internet-size RIB/FIB
Large TCAM
External
Peer
Traditional
Peering Router
15
Espresso
Peering Fabric
Host
Host
Host
Host
Host
Host
Host
Host
BGP
speaker
BGP
speaker
BGP
speaker
Host Servers
in Metro
Hierarchical control plane
Fail static
Software programmability
Host
Host
Host
Host
Host
Host Servers
in Metro
Confidential + Proprietary
Architecture: Externalize Packet Processing
Label-switched
Fabric
Host
Host
Host
Host
Host
Host
Packet
Processor
BGP
speaker
External PeereBGP
Peering
Host
Host
Host
Host
Host
Labeled packets
specify egress
Host-based packet processor allows flexible packet processing,
including ACL and handling of DoS.
16
Sink DoS
Ingress ACL
Peering Fabric
Hierarchical control plane
Fail static
Software programmability
Confidential + Proprietary
Architecture: Hierarchical Control
Host
Host
Host
Host
Host
Espresso Metro
Global Controller
Host
Host
Host
Host
Host
Peering Fabric
Location
Control
Peering Fabric
Control
17
Label-switched
Fabric
BGP
speaker
External Peer
Host
Packet
Processor
eBGP
Peering
Hierarchical control plane
Fail static
Software programmability
Confidential + Proprietary
Architecture: Fail Static
Host
Host
Host
Host
Host
Espresso Metro
Global Controller
Host
Host
Host
Host
Host
Peering Fabric
18
Label-switched
Fabric
BGP
speaker
Location
Control
Peering Fabric
Control
External Peer
Host
Packet
Processor
eBGP
Peering
Hierarchical control plane
Fail static
Software programmability
Confidential + Proprietary
Architecture: Application Aware Routing
Host
Host
Host
Host
Host
Espresso Metro
Global Controller
Host
Host
Host
Host
Host
Peering Fabric
Location
Control
Peering Fabric
Control
19
Label-switched
Fabric
BGP
speaker
External Peer
Host
Packet
Processor
eBGP
Peering
RIB
FIB
ACL
RIB
Application Signals
Hierarchical control plane
Fail static
Software programmability
Confidential + Proprietary
Using User’s Best Path, not BGP’s
20
Google
● Serve 13% more traffic than
BGP best path in application
aware manner.
● Helps capacity-constrained
ISPs by overflowing demand
to alternate paths within local
metro and also via remote
metros.
Confidential + Proprietary
Improvements in End User Experience
Client ISP Change in mean time
between rebuffers (MTBR)
Change in Mean Goodput
A 10 → 20 min 2.25 → 4.5 Mbps
B 4.6 → 12.5 min 2.75 → 4.9 Mbps
C 14 → 19 min 3.2 → 4.2 Mbps
Provide significant improvements to end-user experience.
21
Confidential + Proprietary
Release Velocity
Component Average Velocity
(days)
Local Controller 11.2
BGP speaker 12.6
Peering Fabric Controller 15.8
> 50× more frequently than with traditional peering routers.
Novel L2 VPN delivered 6× faster via incremental rollout.
22
Confidential + Proprietary
Manageability
● Espresso supports fully automated configuration and upgrade through
intent-driven configuration and management stack
○ To change config , operator or system change intent
○ Commit of change triggers management system to generate, version
and statically verify configuration before pushing it to all relevant
software components and devices
● Change canarying
● “Impact radius”
23
Confidential + Proprietary
Operational aspects
● Project development model - DevOps
○ Developers and operational team works together as one team with
common goal
○ Ops are not only providing requirements but actively participate during
design, development and deployment
○ Ops actively develop software tools for debugging and monitoring
○ Developers participate in operational activities and procedures,
effectively reducing “abstraction bias”
○ Entire team shares “oncall” duties
24
Confidential + Proprietary
Operational aspects - teams involved
● Traditional operations - distinction between system, network and multiple ops
groups that usually different methods, tools and develop different “work
culture”
● Without proper training and participation in DevOps model, support for
distributed SDN on Network Edge can raise confusion
○ where do i change BGP policy ?
○ Device is connected to remote peer but BGP is down - how do i
troubleshoot
● Most importantly - establish who is responsible for different parts of the
system and engage early
25
Confidential + Proprietary
Operational aspects - configuration and deployment
● SDN Edge system is no longer sum of “Network” and “Host” configuration
and provisioning that can be run by separate teams
○ Deployment procedure must be coherent process that efficiently
combines different teams and different provisioning systems
○ Not only Ops and Dev but also Deployment teams must be “in loop”
● Intent driven configuration is a hard requirement
○ Different config systems may complicate process ( synchronisations,
intent consistency )
26
Confidential + Proprietary
Operational aspects - monitoring
● Data and Control planes fully distributed
○ Eventual consistency
○ Fail Open ( data, control and management planes )
● Measure state of system
○ Data plane no longer contained to single device
○ Streaming telemetry (OC)
○ Black box approach
○ Control plane pipeline monitoring and anomaly detection
27
Confidential + Proprietary
Operational aspects - lessons learned
● Interesting failures
○ Configuration state reporting library bug - threat exhaustion caused ALL
espresso control element jobs to lock ( integration testing failure )
○ Erroneous configuration push on GC draining all PF nodes
○ Slow propagation of new routing changes from GC to HOST (inspired
development of local BGP-derived forwarding map )
○ Ingress traffic blackholing
28
Confidential + Proprietary
Conclusion
SDN is only suited for walled gardens.
29
.
Espresso demonstrates that
● traditional peering architecture can evolve to exploit SDN ( incremental changes
while maintaining full interoperability )
● SDN’s value is in flexibility and feature velocity ( cost savings secondary )
Confidential + Proprietary
Conclusion
Cloud 1.0
Router
Centric
Protocols
Local view
Connectivity based optimization
Slow evolution
Costly
Espresso
SDN
Peering
Global view
Application signals-based optimization
Rapid deploy-and-iterate
75% Cheaper
30
Confidential + Proprietary
Q&A
31

More Related Content

What's hot

Expanding your options with the MQ Appliance
Expanding your options with the MQ ApplianceExpanding your options with the MQ Appliance
Expanding your options with the MQ ApplianceAnthony Beardsmore
 
Stephan pfister flexcast remote pc new
Stephan pfister flexcast remote pc newStephan pfister flexcast remote pc new
Stephan pfister flexcast remote pc newDigicomp Academy AG
 
200860 installing an enterprise environment
200860 installing an enterprise environment200860 installing an enterprise environment
200860 installing an enterprise environmentp6academy
 
Network Troubleshooting - Part 1
Network Troubleshooting - Part 1Network Troubleshooting - Part 1
Network Troubleshooting - Part 1SolarWinds
 
SomashekarJanardan-2014-Resume-Abrd
SomashekarJanardan-2014-Resume-AbrdSomashekarJanardan-2014-Resume-Abrd
SomashekarJanardan-2014-Resume-AbrdSomashekar Janardan
 
Continuous Delivery of Cloud Applications: Blue/Green and Canary Deployments
Continuous Delivery of Cloud Applications:Blue/Green and Canary DeploymentsContinuous Delivery of Cloud Applications:Blue/Green and Canary Deployments
Continuous Delivery of Cloud Applications: Blue/Green and Canary DeploymentsPraveen Yalagandula
 
Production Ready Microservices at Scale
Production Ready Microservices at ScaleProduction Ready Microservices at Scale
Production Ready Microservices at ScaleRajeev Bharshetty
 
Intel xeon e5v3 y sdi
Intel xeon e5v3 y sdiIntel xeon e5v3 y sdi
Intel xeon e5v3 y sdiTelecomputer
 
Plastic SCM: Entreprise Version Control Platform for Modern Applications and ...
Plastic SCM: Entreprise Version Control Platform for Modern Applications and ...Plastic SCM: Entreprise Version Control Platform for Modern Applications and ...
Plastic SCM: Entreprise Version Control Platform for Modern Applications and ...Kiko Monteverde
 
VMworld 2014: Extreme Performance Series
VMworld 2014: Extreme Performance Series VMworld 2014: Extreme Performance Series
VMworld 2014: Extreme Performance Series VMworld
 
Win08 R2 It Pro Overview
Win08 R2 It Pro OverviewWin08 R2 It Pro Overview
Win08 R2 It Pro Overviewguest092b9a8
 
6 microservice architecture
6 microservice architecture6 microservice architecture
6 microservice architectureLen Bass
 
VMworld 2015: Horizon View Troubleshooting - Looking Under the Hood
VMworld 2015: Horizon View Troubleshooting - Looking Under the HoodVMworld 2015: Horizon View Troubleshooting - Looking Under the Hood
VMworld 2015: Horizon View Troubleshooting - Looking Under the HoodVMworld
 
WilliamArthur-Resume-1
WilliamArthur-Resume-1WilliamArthur-Resume-1
WilliamArthur-Resume-1Ted Arthur
 
David_Helg_IT_Resume(Updated)
David_Helg_IT_Resume(Updated)David_Helg_IT_Resume(Updated)
David_Helg_IT_Resume(Updated)David Helg
 
A Skype case study (2011)
A Skype case study (2011)A Skype case study (2011)
A Skype case study (2011)Vasia Kalavri
 
Rickson_Rijoy_Windows, Exchange, Lync,Scom server support Engineer with 7.5 y...
Rickson_Rijoy_Windows, Exchange, Lync,Scom server support Engineer with 7.5 y...Rickson_Rijoy_Windows, Exchange, Lync,Scom server support Engineer with 7.5 y...
Rickson_Rijoy_Windows, Exchange, Lync,Scom server support Engineer with 7.5 y...Rickson Rijoy
 
RTC/CLM 5.0 Adoption Paths: Deploying in 16 Steps
 RTC/CLM 5.0 Adoption Paths: Deploying in 16 Steps RTC/CLM 5.0 Adoption Paths: Deploying in 16 Steps
RTC/CLM 5.0 Adoption Paths: Deploying in 16 StepsStéphane Leroy
 

What's hot (20)

Expanding your options with the MQ Appliance
Expanding your options with the MQ ApplianceExpanding your options with the MQ Appliance
Expanding your options with the MQ Appliance
 
Stephan pfister flexcast remote pc new
Stephan pfister flexcast remote pc newStephan pfister flexcast remote pc new
Stephan pfister flexcast remote pc new
 
IBM Programmable Network Controller
IBM Programmable Network ControllerIBM Programmable Network Controller
IBM Programmable Network Controller
 
200860 installing an enterprise environment
200860 installing an enterprise environment200860 installing an enterprise environment
200860 installing an enterprise environment
 
Network Troubleshooting - Part 1
Network Troubleshooting - Part 1Network Troubleshooting - Part 1
Network Troubleshooting - Part 1
 
SomashekarJanardan-2014-Resume-Abrd
SomashekarJanardan-2014-Resume-AbrdSomashekarJanardan-2014-Resume-Abrd
SomashekarJanardan-2014-Resume-Abrd
 
Continuous Delivery of Cloud Applications: Blue/Green and Canary Deployments
Continuous Delivery of Cloud Applications:Blue/Green and Canary DeploymentsContinuous Delivery of Cloud Applications:Blue/Green and Canary Deployments
Continuous Delivery of Cloud Applications: Blue/Green and Canary Deployments
 
Production Ready Microservices at Scale
Production Ready Microservices at ScaleProduction Ready Microservices at Scale
Production Ready Microservices at Scale
 
Play With Streams
Play With StreamsPlay With Streams
Play With Streams
 
Intel xeon e5v3 y sdi
Intel xeon e5v3 y sdiIntel xeon e5v3 y sdi
Intel xeon e5v3 y sdi
 
Plastic SCM: Entreprise Version Control Platform for Modern Applications and ...
Plastic SCM: Entreprise Version Control Platform for Modern Applications and ...Plastic SCM: Entreprise Version Control Platform for Modern Applications and ...
Plastic SCM: Entreprise Version Control Platform for Modern Applications and ...
 
VMworld 2014: Extreme Performance Series
VMworld 2014: Extreme Performance Series VMworld 2014: Extreme Performance Series
VMworld 2014: Extreme Performance Series
 
Win08 R2 It Pro Overview
Win08 R2 It Pro OverviewWin08 R2 It Pro Overview
Win08 R2 It Pro Overview
 
6 microservice architecture
6 microservice architecture6 microservice architecture
6 microservice architecture
 
VMworld 2015: Horizon View Troubleshooting - Looking Under the Hood
VMworld 2015: Horizon View Troubleshooting - Looking Under the HoodVMworld 2015: Horizon View Troubleshooting - Looking Under the Hood
VMworld 2015: Horizon View Troubleshooting - Looking Under the Hood
 
WilliamArthur-Resume-1
WilliamArthur-Resume-1WilliamArthur-Resume-1
WilliamArthur-Resume-1
 
David_Helg_IT_Resume(Updated)
David_Helg_IT_Resume(Updated)David_Helg_IT_Resume(Updated)
David_Helg_IT_Resume(Updated)
 
A Skype case study (2011)
A Skype case study (2011)A Skype case study (2011)
A Skype case study (2011)
 
Rickson_Rijoy_Windows, Exchange, Lync,Scom server support Engineer with 7.5 y...
Rickson_Rijoy_Windows, Exchange, Lync,Scom server support Engineer with 7.5 y...Rickson_Rijoy_Windows, Exchange, Lync,Scom server support Engineer with 7.5 y...
Rickson_Rijoy_Windows, Exchange, Lync,Scom server support Engineer with 7.5 y...
 
RTC/CLM 5.0 Adoption Paths: Deploying in 16 Steps
 RTC/CLM 5.0 Adoption Paths: Deploying in 16 Steps RTC/CLM 5.0 Adoption Paths: Deploying in 16 Steps
RTC/CLM 5.0 Adoption Paths: Deploying in 16 Steps
 

Similar to PLNOG19 - Piotr Marecki - Espresso: Scalable and Programmable Peering Edge

Cognos Performance Tuning Tips & Tricks
Cognos Performance Tuning Tips & TricksCognos Performance Tuning Tips & Tricks
Cognos Performance Tuning Tips & TricksSenturus
 
Preparing for Neo - Singapore OutSystems User Group October 2022 Meetup
Preparing for Neo - Singapore OutSystems User Group October 2022 MeetupPreparing for Neo - Singapore OutSystems User Group October 2022 Meetup
Preparing for Neo - Singapore OutSystems User Group October 2022 MeetupYashrajNayak4
 
Free GitOps Workshop
Free GitOps WorkshopFree GitOps Workshop
Free GitOps WorkshopWeaveworks
 
Network Automation Journey, A systems engineer NetOps perspective
Network Automation Journey, A systems engineer NetOps perspectiveNetwork Automation Journey, A systems engineer NetOps perspective
Network Automation Journey, A systems engineer NetOps perspectiveWalid Shaari
 
USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a Month
USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a MonthUSENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a Month
USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a MonthNicolas Brousse
 
Cloud-Native Patterns and the Benefits of MySQL as a Platform Managed Service
Cloud-Native Patterns and the Benefits of MySQL as a Platform Managed ServiceCloud-Native Patterns and the Benefits of MySQL as a Platform Managed Service
Cloud-Native Patterns and the Benefits of MySQL as a Platform Managed ServiceVMware Tanzu
 
Monitoring federation open stack infrastructure
Monitoring federation open stack infrastructureMonitoring federation open stack infrastructure
Monitoring federation open stack infrastructureFernando Lopez Aguilar
 
Last Conference 2017: Big Data in a Production Environment: Lessons Learnt
Last Conference 2017: Big Data in a Production Environment: Lessons LearntLast Conference 2017: Big Data in a Production Environment: Lessons Learnt
Last Conference 2017: Big Data in a Production Environment: Lessons LearntMark Grebler
 
DevOps Workflows in the Windows Ecosystem - April 21
 DevOps Workflows in the Windows Ecosystem - April 21 DevOps Workflows in the Windows Ecosystem - April 21
DevOps Workflows in the Windows Ecosystem - April 21Puppet
 
DevOps Workflows in the Windows Ecosystem - 21 April 2020
 DevOps Workflows in the Windows Ecosystem - 21 April 2020 DevOps Workflows in the Windows Ecosystem - 21 April 2020
DevOps Workflows in the Windows Ecosystem - 21 April 2020Puppet
 
Weave GitOps 2022.09 Release: A Fast & Reliable Path to Production with Progr...
Weave GitOps 2022.09 Release: A Fast & Reliable Path to Production with Progr...Weave GitOps 2022.09 Release: A Fast & Reliable Path to Production with Progr...
Weave GitOps 2022.09 Release: A Fast & Reliable Path to Production with Progr...Weaveworks
 
Room 2 - 4 - Juncheng Anthony Lin - Redhat - A Practical Approach to Traditio...
Room 2 - 4 - Juncheng Anthony Lin - Redhat - A Practical Approach to Traditio...Room 2 - 4 - Juncheng Anthony Lin - Redhat - A Practical Approach to Traditio...
Room 2 - 4 - Juncheng Anthony Lin - Redhat - A Practical Approach to Traditio...Vietnam Open Infrastructure User Group
 
Hybrid and Multi-Cloud Strategies for Kubernetes with GitOps
Hybrid and Multi-Cloud Strategies for Kubernetes with GitOpsHybrid and Multi-Cloud Strategies for Kubernetes with GitOps
Hybrid and Multi-Cloud Strategies for Kubernetes with GitOpsWeaveworks
 
Hybrid and Multi-Cloud Strategies for Kubernetes with GitOps
Hybrid and Multi-Cloud Strategies for Kubernetes with GitOpsHybrid and Multi-Cloud Strategies for Kubernetes with GitOps
Hybrid and Multi-Cloud Strategies for Kubernetes with GitOpsSonja Schweigert
 
管理向云的迁移过程
管理向云的迁移过程管理向云的迁移过程
管理向云的迁移过程ITband
 
Controlled Evolution with Puppet and AWS
Controlled Evolution with Puppet and AWSControlled Evolution with Puppet and AWS
Controlled Evolution with Puppet and AWSPuppet
 
Building Efficient Edge Nodes for Content Delivery Networks
Building Efficient Edge Nodes for Content Delivery NetworksBuilding Efficient Edge Nodes for Content Delivery Networks
Building Efficient Edge Nodes for Content Delivery NetworksRebekah Rodriguez
 
Migrating to Windows 7 or 8 with Lenovo's Deployment Optimization Solutions
Migrating to Windows 7 or 8 with Lenovo's Deployment Optimization SolutionsMigrating to Windows 7 or 8 with Lenovo's Deployment Optimization Solutions
Migrating to Windows 7 or 8 with Lenovo's Deployment Optimization SolutionsLenovo Business
 

Similar to PLNOG19 - Piotr Marecki - Espresso: Scalable and Programmable Peering Edge (20)

Cognos Performance Tuning Tips & Tricks
Cognos Performance Tuning Tips & TricksCognos Performance Tuning Tips & Tricks
Cognos Performance Tuning Tips & Tricks
 
Preparing for Neo - Singapore OutSystems User Group October 2022 Meetup
Preparing for Neo - Singapore OutSystems User Group October 2022 MeetupPreparing for Neo - Singapore OutSystems User Group October 2022 Meetup
Preparing for Neo - Singapore OutSystems User Group October 2022 Meetup
 
Free GitOps Workshop
Free GitOps WorkshopFree GitOps Workshop
Free GitOps Workshop
 
Network Automation Journey, A systems engineer NetOps perspective
Network Automation Journey, A systems engineer NetOps perspectiveNetwork Automation Journey, A systems engineer NetOps perspective
Network Automation Journey, A systems engineer NetOps perspective
 
USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a Month
USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a MonthUSENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a Month
USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a Month
 
Cloud-Native Patterns and the Benefits of MySQL as a Platform Managed Service
Cloud-Native Patterns and the Benefits of MySQL as a Platform Managed ServiceCloud-Native Patterns and the Benefits of MySQL as a Platform Managed Service
Cloud-Native Patterns and the Benefits of MySQL as a Platform Managed Service
 
Monitoring federation open stack infrastructure
Monitoring federation open stack infrastructureMonitoring federation open stack infrastructure
Monitoring federation open stack infrastructure
 
Last Conference 2017: Big Data in a Production Environment: Lessons Learnt
Last Conference 2017: Big Data in a Production Environment: Lessons LearntLast Conference 2017: Big Data in a Production Environment: Lessons Learnt
Last Conference 2017: Big Data in a Production Environment: Lessons Learnt
 
12-Factor Apps
12-Factor Apps12-Factor Apps
12-Factor Apps
 
DevOps Workflows in the Windows Ecosystem - April 21
 DevOps Workflows in the Windows Ecosystem - April 21 DevOps Workflows in the Windows Ecosystem - April 21
DevOps Workflows in the Windows Ecosystem - April 21
 
DevOps Workflows in the Windows Ecosystem - 21 April 2020
 DevOps Workflows in the Windows Ecosystem - 21 April 2020 DevOps Workflows in the Windows Ecosystem - 21 April 2020
DevOps Workflows in the Windows Ecosystem - 21 April 2020
 
Weave GitOps 2022.09 Release: A Fast & Reliable Path to Production with Progr...
Weave GitOps 2022.09 Release: A Fast & Reliable Path to Production with Progr...Weave GitOps 2022.09 Release: A Fast & Reliable Path to Production with Progr...
Weave GitOps 2022.09 Release: A Fast & Reliable Path to Production with Progr...
 
Room 2 - 4 - Juncheng Anthony Lin - Redhat - A Practical Approach to Traditio...
Room 2 - 4 - Juncheng Anthony Lin - Redhat - A Practical Approach to Traditio...Room 2 - 4 - Juncheng Anthony Lin - Redhat - A Practical Approach to Traditio...
Room 2 - 4 - Juncheng Anthony Lin - Redhat - A Practical Approach to Traditio...
 
Hybrid and Multi-Cloud Strategies for Kubernetes with GitOps
Hybrid and Multi-Cloud Strategies for Kubernetes with GitOpsHybrid and Multi-Cloud Strategies for Kubernetes with GitOps
Hybrid and Multi-Cloud Strategies for Kubernetes with GitOps
 
Hybrid and Multi-Cloud Strategies for Kubernetes with GitOps
Hybrid and Multi-Cloud Strategies for Kubernetes with GitOpsHybrid and Multi-Cloud Strategies for Kubernetes with GitOps
Hybrid and Multi-Cloud Strategies for Kubernetes with GitOps
 
管理向云的迁移过程
管理向云的迁移过程管理向云的迁移过程
管理向云的迁移过程
 
OpenFlow @ Google
OpenFlow @ GoogleOpenFlow @ Google
OpenFlow @ Google
 
Controlled Evolution with Puppet and AWS
Controlled Evolution with Puppet and AWSControlled Evolution with Puppet and AWS
Controlled Evolution with Puppet and AWS
 
Building Efficient Edge Nodes for Content Delivery Networks
Building Efficient Edge Nodes for Content Delivery NetworksBuilding Efficient Edge Nodes for Content Delivery Networks
Building Efficient Edge Nodes for Content Delivery Networks
 
Migrating to Windows 7 or 8 with Lenovo's Deployment Optimization Solutions
Migrating to Windows 7 or 8 with Lenovo's Deployment Optimization SolutionsMigrating to Windows 7 or 8 with Lenovo's Deployment Optimization Solutions
Migrating to Windows 7 or 8 with Lenovo's Deployment Optimization Solutions
 

Recently uploaded

Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationkaushalgiri8080
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...soniya singh
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...aditisharan08
 
XpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software SolutionsXpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software SolutionsMehedi Hasan Shohan
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfjoe51371421
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfkalichargn70th171
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about usDynamic Netsoft
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - InfographicHr365.us smith
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...Christina Lin
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number SystemsJheuzeDellosa
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 

Recently uploaded (20)

Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanation
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...
 
XpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software SolutionsXpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software Solutions
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdf
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about us
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStack
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - Infographic
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number Systems
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 

PLNOG19 - Piotr Marecki - Espresso: Scalable and Programmable Peering Edge

  • 1. Confidential + ProprietaryConfidential + Proprietary Taking the Edge off with Espresso Scale, Reliability and Programmability for Global Internet Peering KK Yap, Murtaza Motiwala, Jeremy Rahe, Steve Padgett, Matthew Holliman, Gary Baldus, Marcus Hines, Taeeun Kim, Ashok Narayanan, Ankur Jain, Victor Lin, Colin Rice, Brian Rogan, Arjun Singh, Bert Tanaka, Manish Verma, Puneet Sood, Mukarram Tariq, Matt Tierney, Dzevad Trumic, Vytautas Valancius, Calvin Ying, Mahesh Kallahalla, Bikash Koley, Amin Vahdat and many others. Presented by: Piotr Marecki (bubu@google.com)
  • 2. Confidential + Proprietary Problem Statement Egress Terabits/sec of traffic to our Internet peers ● High-def video, cloud traffic, etc. 2
  • 3. Confidential + Proprietary Problem Statement Egress Terabits/sec of traffic to our Internet peers ● High-def video, cloud traffic, etc. 1. Optimize traffic per-customer and per-application ● e.g., optimal video quality, or differentiated service for cloud 3 Google Alternate path with better user experience? ● Problem: Constrained by BGP shortest path and lack of application awareness
  • 4. Confidential + Proprietary Problem Statement Egress Terabits/sec of traffic to our Internet peers ● High-def video, cloud traffic, etc. 2. Deliver new features quickly 4 Request to vendor Commit to Feature Implement Vendor Testing Integration Testing @ Google Deploy Novel L2 VPN? ● Problem: router-vendor feature cycles and qualification take many years
  • 5. Confidential + Proprietary Espresso: Google’s SDN Peering Edge Our previous experience with SDN ● B4 [SIGCOMM 2013] and Jupiter [SIGCOMM 2015] ● Enable flexible traffic engineering ● Increase feature velocity 5 SDN is only suited for walled gardens? Peering edge requires interoperability with heterogeneous peers.
  • 6. Confidential + Proprietary Agenda ● Problem Statement ● Espresso in Context ● Design Principles ● Architecture Overview ● Results ● Conclusion 6
  • 7. Confidential + Proprietary Espresso in Context B4 Jupiter Data CenterGoogle 7 [SIGCOMM 2015] [SIGCOMM 2013]
  • 8. Confidential + Proprietary Espresso in Context B4 Metro/Points-of-presence (PoP) Jupiter Data Center Google Google 8
  • 9. Confidential + Proprietary Espresso in Context B4Espresso Internet Metro/PoP User Jupiter Data Center Google Google 9
  • 10. Confidential + Proprietary Points of presence (>100) Network fiber Global Edge Footprint, > 100 PoPs 10
  • 11. Confidential + Proprietary Agenda ● Problem Statement ● Espresso in Context ● Design Principles ● Architecture Overview ● Results ● Conclusion 11
  • 12. Confidential + Proprietary Espresso’s Design Principles 12 1. Hierarchical control plane ○ Global optimization while local control plane provide fast reaction. 2. Fail static ○ Local control plane continues to function without global controller failure. 3. Software programmability ○ Externalize features into software to exploit commodity servers for scale. 4. Testability 5. Manageability
  • 13. Confidential + Proprietary Espresso’s Design Principles 13 1. Hierarchical control plane ○ Global optimization while local control plane provide fast reaction. 2. Fail static ○ Local control plane continues to function without global controller failure. 3. Software programmability ○ Externalize features into software to exploit commodity servers for scale. 4. Testability - loosely coupled control plane, automated testing and release process 5. Manageability
  • 14. Confidential + Proprietary Architecture: Externalizing BGP eBGP Peering Espresso Peering Router Internet-size routing/forwarding table Large ACL External Peer Traditional Peering Router 14 Hierarchical control plane Fail static Software programmability Host Host Host Host Host Host Servers in Metro Label-switched Fabric BGP speaker External Peer Peering Fabric Host Host Host Host Host Host Servers in Metro
  • 15. Confidential + Proprietary Label-switched Fabric Architecture: Reliability and Scale of BGP External Peer eBGP Peering Peering Router Internet-size RIB/FIB Large TCAM External Peer Traditional Peering Router 15 Espresso Peering Fabric Host Host Host Host Host Host Host Host BGP speaker BGP speaker BGP speaker Host Servers in Metro Hierarchical control plane Fail static Software programmability Host Host Host Host Host Host Servers in Metro
  • 16. Confidential + Proprietary Architecture: Externalize Packet Processing Label-switched Fabric Host Host Host Host Host Host Packet Processor BGP speaker External PeereBGP Peering Host Host Host Host Host Labeled packets specify egress Host-based packet processor allows flexible packet processing, including ACL and handling of DoS. 16 Sink DoS Ingress ACL Peering Fabric Hierarchical control plane Fail static Software programmability
  • 17. Confidential + Proprietary Architecture: Hierarchical Control Host Host Host Host Host Espresso Metro Global Controller Host Host Host Host Host Peering Fabric Location Control Peering Fabric Control 17 Label-switched Fabric BGP speaker External Peer Host Packet Processor eBGP Peering Hierarchical control plane Fail static Software programmability
  • 18. Confidential + Proprietary Architecture: Fail Static Host Host Host Host Host Espresso Metro Global Controller Host Host Host Host Host Peering Fabric 18 Label-switched Fabric BGP speaker Location Control Peering Fabric Control External Peer Host Packet Processor eBGP Peering Hierarchical control plane Fail static Software programmability
  • 19. Confidential + Proprietary Architecture: Application Aware Routing Host Host Host Host Host Espresso Metro Global Controller Host Host Host Host Host Peering Fabric Location Control Peering Fabric Control 19 Label-switched Fabric BGP speaker External Peer Host Packet Processor eBGP Peering RIB FIB ACL RIB Application Signals Hierarchical control plane Fail static Software programmability
  • 20. Confidential + Proprietary Using User’s Best Path, not BGP’s 20 Google ● Serve 13% more traffic than BGP best path in application aware manner. ● Helps capacity-constrained ISPs by overflowing demand to alternate paths within local metro and also via remote metros.
  • 21. Confidential + Proprietary Improvements in End User Experience Client ISP Change in mean time between rebuffers (MTBR) Change in Mean Goodput A 10 → 20 min 2.25 → 4.5 Mbps B 4.6 → 12.5 min 2.75 → 4.9 Mbps C 14 → 19 min 3.2 → 4.2 Mbps Provide significant improvements to end-user experience. 21
  • 22. Confidential + Proprietary Release Velocity Component Average Velocity (days) Local Controller 11.2 BGP speaker 12.6 Peering Fabric Controller 15.8 > 50× more frequently than with traditional peering routers. Novel L2 VPN delivered 6× faster via incremental rollout. 22
  • 23. Confidential + Proprietary Manageability ● Espresso supports fully automated configuration and upgrade through intent-driven configuration and management stack ○ To change config , operator or system change intent ○ Commit of change triggers management system to generate, version and statically verify configuration before pushing it to all relevant software components and devices ● Change canarying ● “Impact radius” 23
  • 24. Confidential + Proprietary Operational aspects ● Project development model - DevOps ○ Developers and operational team works together as one team with common goal ○ Ops are not only providing requirements but actively participate during design, development and deployment ○ Ops actively develop software tools for debugging and monitoring ○ Developers participate in operational activities and procedures, effectively reducing “abstraction bias” ○ Entire team shares “oncall” duties 24
  • 25. Confidential + Proprietary Operational aspects - teams involved ● Traditional operations - distinction between system, network and multiple ops groups that usually different methods, tools and develop different “work culture” ● Without proper training and participation in DevOps model, support for distributed SDN on Network Edge can raise confusion ○ where do i change BGP policy ? ○ Device is connected to remote peer but BGP is down - how do i troubleshoot ● Most importantly - establish who is responsible for different parts of the system and engage early 25
  • 26. Confidential + Proprietary Operational aspects - configuration and deployment ● SDN Edge system is no longer sum of “Network” and “Host” configuration and provisioning that can be run by separate teams ○ Deployment procedure must be coherent process that efficiently combines different teams and different provisioning systems ○ Not only Ops and Dev but also Deployment teams must be “in loop” ● Intent driven configuration is a hard requirement ○ Different config systems may complicate process ( synchronisations, intent consistency ) 26
  • 27. Confidential + Proprietary Operational aspects - monitoring ● Data and Control planes fully distributed ○ Eventual consistency ○ Fail Open ( data, control and management planes ) ● Measure state of system ○ Data plane no longer contained to single device ○ Streaming telemetry (OC) ○ Black box approach ○ Control plane pipeline monitoring and anomaly detection 27
  • 28. Confidential + Proprietary Operational aspects - lessons learned ● Interesting failures ○ Configuration state reporting library bug - threat exhaustion caused ALL espresso control element jobs to lock ( integration testing failure ) ○ Erroneous configuration push on GC draining all PF nodes ○ Slow propagation of new routing changes from GC to HOST (inspired development of local BGP-derived forwarding map ) ○ Ingress traffic blackholing 28
  • 29. Confidential + Proprietary Conclusion SDN is only suited for walled gardens. 29 . Espresso demonstrates that ● traditional peering architecture can evolve to exploit SDN ( incremental changes while maintaining full interoperability ) ● SDN’s value is in flexibility and feature velocity ( cost savings secondary )
  • 30. Confidential + Proprietary Conclusion Cloud 1.0 Router Centric Protocols Local view Connectivity based optimization Slow evolution Costly Espresso SDN Peering Global view Application signals-based optimization Rapid deploy-and-iterate 75% Cheaper 30