SlideShare a Scribd company logo
Switches, Penguins and
One Bad Cable
Alex Balk
Back on AUGUST 13th 2015
ADI NAVEH
ALEX BALK
ALEX KARASIK
CHEN SHABI
DAFNA FRANK
DORI SHMUEL
GAI RADZI
GERARDO LARACUENTE
GUY MAZUZ
RYAN MCQUILLAN
SHAHAF SAGES
YAFIT MELES
We Are
Core
OUR PURPOSE
OUTBRAIN helps people discover
content that they find interesting.
250 Billion
Content Recommendations
Every Month
½ Billion
People Worldwide
OUTBRAIN
BY THE NUMBERS
THE BEGINNING
AVAILABILITYMANAGEMENT
Two Main Networking Challenges
AVAILABILITYMANAGEMENT
4
racks
80
nodes
320
nodes
1
switch
4
racks
80
nodes
320
nodes
1
switch
AVAILABILITYMANAGEMENT
Node
Stack A Stack B
Backbone
NodeNode
SCALE IS ABOUT
DOING MORE WITH LESS
6 Million
Metrics generated every minute
150 Releases
To production every day
OUTBRAIN
BY THE NUMBERS
SCALE IS ABOUT
TURNING THE LIGHTS ON
Network (gasp!)
was 100% Manual!
• Every change =
risk
• Switching stack
proprietary
• Debugging = fight
or just a hit-n-miss
• Lead time to set-up
new stack
measured in
weeks!
• No way to scale to
the next 10X
June 2017
OUTBRAIN OFFICES
New Data Center = Clos Fabric — running BGP end-to-end
Node
Leaf A Leaf B
Spine
NodeNode
SpineSpine
No bonding.
No backbone.
Everything is just a router!
All possible paths to all possible destinations constructed —
hop-by-hop
Node
Leaf A
Spine
NodeNode
SpineSpine
Leaf B
ECMP = “Send it down any available path, they’re all the same”.
SIMPLE
PREDICTABLE
SCALABLE
A Network That is Now
DEVICE
MANAGEMENT
CABLE
MANAGEMENT
SETUP TIME
MONITORING
TESTING
SCALE IS ABOUT
BUILDING THE RIGHT CULTURE
SCALE IS ABOUT
CHOOSING THE RIGHT TOOLS
To bootstrap the new datacenter5 DAYS
99% Of code worked as expected
1 Bad cable...out of 3,000
END SOLUTION MODULES
DRAFT 051116
Thank
You
Hardcore Tech Stuff
Slides shamelessly “borrowed”
from Adi Naveh’s internal tech talk
Infranet Team
Gai
Adi Yafit
Chen
Traditional Network Topology
Aggregation
Core
Access
Traditional Network Topology
Access
Aggregation
Core
Services
Clients
North-South Traffic
Load Balancers Load Balancers
ISP
Traditional Network Topology in Data Center
Access
Aggregation
Core
Services
North-South Traffic
East-West Traffic

More Related Content

More from Cumulus Networks

Building Scalable Data Center Networks
Building Scalable Data Center NetworksBuilding Scalable Data Center Networks
Building Scalable Data Center Networks
Cumulus Networks
 
Network Architecture for Containers
Network Architecture for ContainersNetwork Architecture for Containers
Network Architecture for Containers
Cumulus Networks
 
Webinar: Network Automation [Tips & Tricks]
Webinar: Network Automation [Tips & Tricks]Webinar: Network Automation [Tips & Tricks]
Webinar: Network Automation [Tips & Tricks]
Cumulus Networks
 
July NYC Open Networking Meeup
July NYC Open Networking MeeupJuly NYC Open Networking Meeup
July NYC Open Networking Meeup
Cumulus Networks
 
Demystifying Networking Webinar Series- Routing on the Host
Demystifying Networking Webinar Series- Routing on the HostDemystifying Networking Webinar Series- Routing on the Host
Demystifying Networking Webinar Series- Routing on the Host
Cumulus Networks
 
Ifupdown2: Network Interface Manager
Ifupdown2: Network Interface ManagerIfupdown2: Network Interface Manager
Ifupdown2: Network Interface Manager
Cumulus Networks
 
Operationalizing VRF in the Data Center
Operationalizing VRF in the Data CenterOperationalizing VRF in the Data Center
Operationalizing VRF in the Data Center
Cumulus Networks
 
Microservices Network Architecture 101
Microservices Network Architecture 101Microservices Network Architecture 101
Microservices Network Architecture 101
Cumulus Networks
 
Linux networking is Awesome!
Linux networking is Awesome!Linux networking is Awesome!
Linux networking is Awesome!
Cumulus Networks
 
Webinar-Linux Networking is Awesome
Webinar-Linux Networking is AwesomeWebinar-Linux Networking is Awesome
Webinar-Linux Networking is Awesome
Cumulus Networks
 
Webinar- Tea for the Tillerman
Webinar- Tea for the TillermanWebinar- Tea for the Tillerman
Webinar- Tea for the Tillerman
Cumulus Networks
 
Dreamhost deploying dreamcompute at scale
Dreamhost deploying dreamcompute at scaleDreamhost deploying dreamcompute at scale
Dreamhost deploying dreamcompute at scale
Cumulus Networks
 
Operationalizing BGP in the SDDC
Operationalizing BGP in the SDDCOperationalizing BGP in the SDDC
Operationalizing BGP in the SDDC
Cumulus Networks
 
Manage your switches like servers
Manage your switches like serversManage your switches like servers
Manage your switches like servers
Cumulus Networks
 
Cumulus Linux 2.5.5 What's New
Cumulus Linux 2.5.5 What's NewCumulus Linux 2.5.5 What's New
Cumulus Linux 2.5.5 What's New
Cumulus Networks
 
Cumulus Linux 2.5.4
Cumulus Linux 2.5.4Cumulus Linux 2.5.4
Cumulus Linux 2.5.4
Cumulus Networks
 
Cumulus Linux 2.5.3
Cumulus Linux 2.5.3Cumulus Linux 2.5.3
Cumulus Linux 2.5.3
Cumulus Networks
 
Open Networking for Your OpenStack
Open Networking for Your OpenStackOpen Networking for Your OpenStack
Open Networking for Your OpenStack
Cumulus Networks
 
Big data, better networks
Big data, better networksBig data, better networks
Big data, better networks
Cumulus Networks
 
Mlag invisibile layer 2 redundancy
Mlag invisibile layer 2 redundancyMlag invisibile layer 2 redundancy
Mlag invisibile layer 2 redundancy
Cumulus Networks
 

More from Cumulus Networks (20)

Building Scalable Data Center Networks
Building Scalable Data Center NetworksBuilding Scalable Data Center Networks
Building Scalable Data Center Networks
 
Network Architecture for Containers
Network Architecture for ContainersNetwork Architecture for Containers
Network Architecture for Containers
 
Webinar: Network Automation [Tips & Tricks]
Webinar: Network Automation [Tips & Tricks]Webinar: Network Automation [Tips & Tricks]
Webinar: Network Automation [Tips & Tricks]
 
July NYC Open Networking Meeup
July NYC Open Networking MeeupJuly NYC Open Networking Meeup
July NYC Open Networking Meeup
 
Demystifying Networking Webinar Series- Routing on the Host
Demystifying Networking Webinar Series- Routing on the HostDemystifying Networking Webinar Series- Routing on the Host
Demystifying Networking Webinar Series- Routing on the Host
 
Ifupdown2: Network Interface Manager
Ifupdown2: Network Interface ManagerIfupdown2: Network Interface Manager
Ifupdown2: Network Interface Manager
 
Operationalizing VRF in the Data Center
Operationalizing VRF in the Data CenterOperationalizing VRF in the Data Center
Operationalizing VRF in the Data Center
 
Microservices Network Architecture 101
Microservices Network Architecture 101Microservices Network Architecture 101
Microservices Network Architecture 101
 
Linux networking is Awesome!
Linux networking is Awesome!Linux networking is Awesome!
Linux networking is Awesome!
 
Webinar-Linux Networking is Awesome
Webinar-Linux Networking is AwesomeWebinar-Linux Networking is Awesome
Webinar-Linux Networking is Awesome
 
Webinar- Tea for the Tillerman
Webinar- Tea for the TillermanWebinar- Tea for the Tillerman
Webinar- Tea for the Tillerman
 
Dreamhost deploying dreamcompute at scale
Dreamhost deploying dreamcompute at scaleDreamhost deploying dreamcompute at scale
Dreamhost deploying dreamcompute at scale
 
Operationalizing BGP in the SDDC
Operationalizing BGP in the SDDCOperationalizing BGP in the SDDC
Operationalizing BGP in the SDDC
 
Manage your switches like servers
Manage your switches like serversManage your switches like servers
Manage your switches like servers
 
Cumulus Linux 2.5.5 What's New
Cumulus Linux 2.5.5 What's NewCumulus Linux 2.5.5 What's New
Cumulus Linux 2.5.5 What's New
 
Cumulus Linux 2.5.4
Cumulus Linux 2.5.4Cumulus Linux 2.5.4
Cumulus Linux 2.5.4
 
Cumulus Linux 2.5.3
Cumulus Linux 2.5.3Cumulus Linux 2.5.3
Cumulus Linux 2.5.3
 
Open Networking for Your OpenStack
Open Networking for Your OpenStackOpen Networking for Your OpenStack
Open Networking for Your OpenStack
 
Big data, better networks
Big data, better networksBig data, better networks
Big data, better networks
 
Mlag invisibile layer 2 redundancy
Mlag invisibile layer 2 redundancyMlag invisibile layer 2 redundancy
Mlag invisibile layer 2 redundancy
 

Recently uploaded

UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
Alex Pruden
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Vladimir Iglovikov, Ph.D.
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
Pierluigi Pugliese
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
SOFTTECHHUB
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
James Anderson
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 

Recently uploaded (20)

UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 

Building a Layer 3 network with Cumulus Linux

Editor's Notes

  1. Publishers from all around the world
  2. In fact, when we designed the new datacenter, We wrote Chef cookbooks to automate provisioning and config We wrote unit and integration tests using Chef’s toolchain. And setup a CI pipeline for the code. We even simulated the entire datacenter, switches, servers and all, Using Vagrant. It worked so well, that bootstrapping the new datacenter took us just 5 days. Think about it. The first time we ever saw a real Dell switch running Linux, was when we arrived onsite for the buildout. And yet, 99% of our code worked as expected. In 5 days, we were able to setup a LAN, VPN, server provisioning, DNS, LDAP, and dealt with some quirky BIOS configs. On the servers, mind you, not the switches. We even hooked Cumulus’ built-in cabling validation, To our Prometheus based monitoring system. So that right after we turned monitoring on, we got an alert. On one bad cable. Out of 3000.