SlideShare a Scribd company logo
Dmitry Afanasiev, fl0w@yandex-team.ru
Daniel Ginsburg, dbg@yandex-team.ru
Network Architects
MPLS in DC and inter-DC
networks: the unified
forwarding mechanism for
network programmability at
scale
About Us
3
• Founded in 1993
• NASDAQ:YNDX, Mkt Cap ~$12.5B
• One of Europe's largest internet companies
and the leading search provider in Russia
• Over 60% of the local search market
• Monthly user audience of over 90 million
worldwide.
• Services: search, music, video, cloud storage,
news, weather, maps, traffic, email, ads ...
What is Yandex
4
• We're rather typical MS-DC
• Several DCs in Russia and abroad + MPLS
backbone to connect them
• About 100k servers and growing fast
• Mostly IPv6 internally, need to serve external
IPv4
• Network architecture is a bit outdated, needs
rethinking
Our Infrastructure
In Search of New Arch
6
• It needs to be:
– Scalable
– Flexible
– Programmable
• Lots of approaches out there, some get many
things right…
• But not one combines all the right pieces in the
right way
• It's really surprising because right combination
seems almost inevitable.
In Search of New Arch
7
• Many of the ideas have been around for years
(or even decades)
• Interconnection network topology – folded Clos
• Let the edge handle complexity
• Core just delivers packets edge to edge
• Overlay/underlay logical split
• Control: mix of centralized and distributed.
Needs a nice way to combine both
• Simple commodity network elements
• Hierarchy and automation to scale the network
Ideas to Build Upon
8
• All these are ideas are well known, understood
and almost universally accepted in the industry
• People are trying to implement them using a
wild mix of data plane mechanisms.
• And it introduces enormous complexity
• What's missing? Unified forwarding
mechanism
What’s missing
9
• Life is much easier when we don't have to deal
with multitude of data planes and forwarding
mechanisms.
• Fortunately, there is already well known, well
understood, standardized forwarding plane
mechanism upon which we can implement all
those ideas without compromising their value.
• It has well defined and standardized mapping
to many other popular forwarding panes.
• It's known as MPLS.
Missing… or overlooked?
Unified Forwarding: Why and How
11
• Different data plane mechanisms – different
features
• The unified data plane should be able to
support all useful features and produce their
combinations
• MPLS is very flexible:
– forwarding over a pre-signalled virtual circuit a-la ATM - this is what RSPV-
TE does
– source routing over a previously discovered topology a-la Token Ring
networks - see Segment Routing proposal
– hierarchical LPM a-la IP - just split the address over several labels and
allow routers to act on the topmost one (not that we suggest it is practical,
but it is definitely possible)
Flexibility
12
• Best way to implement arbitrary semantics is to
get rid of any semantics in protocol headers
and assign it externally
• Hardware works with protocol headers
• Control software defines the semantics
An Abstract Note on Semantics
13
• Why combining? To have the right features at
the right place or produce useful combination
of features
• There're basically two ways to combine
different data-planes together: stitch or
interwork them, and overlay them on top of
each other
Combining Data Planes
14
• It’s pain
• Might be done for subset of protocol features
• Need to translate between protocols (complex,
never perfect, looses information)
• Need to provision interworking points: fragile,
operational nightmare, create bottlenecks
• Seems nobody really does this anymore… Or
maybe we still have to sometimes?
Stitching Data Planes
15
• Overlay to: scale, virtualize, augment one data
plane with properties of another
• Overlaying is building hierarchy
• But with multiple data planes it is limited and
ad-hoc
• Often ugly: IP over Ethernet over VXLAN over
IP over Ethernet
• MPLS is intrinsically hierarchical (overlayable,
if you will)
Overlaying Data Planes
16
• Many hierarchical structures are already in the
network: topology, addressing, management
and control
• Hierarchy is the most important and the most
reliable way to scale things
Hierarchy is your friend
17
• The ability to implement hierarchy natively
enables us to ditch the notion of hard
overlay/underlay boundary.
• In a stack of DC-label, ToR-label, port-label,
slice-label, vm-label, where's the boundary of
overlay/underlay? Not in the packet
• Placement of the boundary only depends on
how you structure your control
Overlay/underlay split is a metaphor
18
• Can be as granular or coarse-grained as one
wishes. There's no network-imposed limitation
• Easy behavior aggregation. Just add an extra
label on top
• Easy behavior disaggregation. One can
expose additional granularity by adding extra
label on bottom
FEC is hierarchical
How to Control MPLS
20
• MPLS control plane is notoriously complex
• Good news: you don’t have to use all of it, can
pick good parts
• Classical distributed control is Ok for transport
• Centralized control seems better for higher
level artifacts on the edge, sometimes called
services
• Both styles can (and should) be combined
MPLS is complex?
21
• The device has be a bit smarter than in OF
• Gets parts of label stack from different control
plane components
• Assembles the full stack from those parts,
using local logic to follow assembly instructions
provided by control plane
• Assembly instructions come in form of
referencing by “name”
• Assembly uses late binding
Enabling combinability
22
• MPLS VPN (abstraction A) refers to MPLS
tunnels (abstraction B), using next-hop
resolution.
• The resolution happens on the device itself,
and two control plane entities are loosely
coupled - MPLS tunnels paths can change
their paths, the assigned labels etc, without
MP-BGP caring about it
• VPN abstraction refers to tunnel abstraction
using next-hops. Next-hop is the name which
one control plane abstraction refers to another
Enabling combinability – example
23
• Recursive next-hop resolution with labeled
routes (RFC 3107) is the powerful way to
overlay one control plane abstraction over
another
• Able to express almost anything we currently
want. Still, more expressive way is desired
• BGP 3107 is the way to interact with all-
classically-controlled MPLS networks
Enabling Combinability – BGP 3107
24
• If you can ensure that the labels at some point
of the network always stay the same (because
you assigned them to be so), you can use
static configuration on the other side
• The way to go, when one wants to avoid any
signaling dependencies
• Static configuration can be calculated and
disseminated automatically
Static Configuration
25
• On the host! Or even right from the application
• Hypervisor switch is the easiest point. SW only,
very flexible.
• Naturally fits centralized control
• Helps to scale. Lots of RAM, each element
keeps only needed state
• Modern CPUs can forward 10s of Gbps without
breaking sweat
Where MPLS should start?
26
• A simple forwarding plane (3 simple ops)
• A simple software agent on the device
(receives parts of label stack from different
control plane components, assembles full
stack, and programs the HW)
• Centralized and distributed control, or anything
in between
• Combinability of different control plane
components with late binding via names, which
the device resolves
Looks SDNish
27
• “Modularity based on abstraction is the way
things get done” --Liskov
• “SDN ...Not a revolutionary technology... ...just
a way of organizing network functionality” --
Shenker
• “SDN is merely set of abstractions for control
plane, not a specific set of mechanisms.” --
Shenker
• “Most lasting legacy of SDN is not better
datacenters - But better ways of reasoning
about network control” --Shenker
What SDN is
28
• Let the edge handle complexity – do it on host
• Core just delivers packets edge to edge –
hierarchy enables the devices to be agnostic to
changes on the edge
• Overlay/underlay logical split – just a way to
implement hierarchy
• Control: mix of centralized and distributed.
Needs a nice way to combine both – yeah!
• Simple commodity network elements – cheap
MPLS capable silicon is finally there
How Ideas Map to MPLS
29
• Key point of S-MPLS was to extend MPLS to
access and separate transport and service in
MPLS network
• NFV describes how to host service nodes in
DC. If you don’t have MPLS in DC it’s no
longer seamless
• Fix is obvious – extend MPLS into DC
• Labels can carry additional metadata if one
wants them to
NFV and Seamless MPLS
Case Study: New Yandex DC
31
• Cheap and abundant bandwidth
• Scalable forwarding with minimal state
• Multitenancy (=> network virtualization)
• Efficient resource pooling
• InterDC traffic engineering
• Function chaining: load balancing, FW, etc.
• Interconnection with existing infrastructure
• Means to integrate all of above
• Local response to some events, e.g. failures
• Automation at scale
What we need?
32
We are trying to keep design really simple. Don’t
need many functions often perceived as
desireable:
• L2 (neither real, nor emulated)
• VM mobility
– In scale-out applications nodes coming and going is a norm, no need to
move them around while preserving state and identity
– VM mobility increases complexity as it depends on other features
• Multicast
• We don't have too many changes in topology
What we don’t need
33
• Host with vLER (MPLS capable vRouter)
• Fabric switching elements – LSRs
• Centralized controller
• Legacy routers. Need to interwork with fabric
LSRs and controller. BGP 3107 is the tool
Components
34
• 3-label stack: topmost for egress switch, next
for egress port, bottom for VM
• vRouter uses {dst prefix, VRF} to impose label
stack
• Bottom label processed by destination vLER
• Expected state on a fabric switch:
#switches_in_the_fabric + #local_access_ports
Forwarding model
35
• iBGP 3107 (in-path RR w/ NHS) inside fabric
for reachabilty and label distribution (draft-
lapukhov…, but with iBGP and labels)
• iBGP 3107 to interwork with legacy routers
– Session with connected network element with NHS for switch label
– Session with controller for remaining labels, binds to switch label via next
hop
• Label mappings on edge of the fabric are
stable, can be provisioned rather than signaled
• Internal fabric failures are handled locally
• Label mappings on vRouters are distributed
centrally
Control plane
Why Now and What’s Next?
37
“The world is changed… I smell it in the air”
• A lot of similar ideas in the industry
• Seems that thinking converges on something
• But ... a lot of ugly ad-hoc solutions are
popping out here and there
• Better implement good solution until bad ones
are entrenched
• It would be a shame and missed opportunity to
stick with VXLAN/… for years when we could
get MPLS instead
Why Now?
38
• Merchant silicon is finally MPLS capable. And
the price is almost right.
• Modern CPUs can process tens of Mpps in
SW, making host-based switching feasible.
• Several open source MPLS data plane
implementations are emerging
• Several "classical" MPLS control plane
components are very useful - BGP 3107, and
have been there for quite long time.
What’s Ready?
39
• All RFC3107 implementations are broken
(multiple labels). Talk to your vendor
• Silicon is not perfect. Talk to your vendor
• A more expressive way to control late binding
of control plane artifacts than BGP 3107
• Perception MPLS as complex technology. It's
current MPLS control plane that is complex
• Perception of MPLS as WAN or metro
technology
Gaps
Thank you!
Questions?

More Related Content

What's hot

ASP.NET MVC Presentation
ASP.NET MVC PresentationASP.NET MVC Presentation
ASP.NET MVC Presentation
ivpol
 
Introduction to Node.js
Introduction to Node.jsIntroduction to Node.js
Introduction to Node.js
Rob O'Doherty
 
ClickHouse Unleashed 2020: Our Favorite New Features for Your Analytical Appl...
ClickHouse Unleashed 2020: Our Favorite New Features for Your Analytical Appl...ClickHouse Unleashed 2020: Our Favorite New Features for Your Analytical Appl...
ClickHouse Unleashed 2020: Our Favorite New Features for Your Analytical Appl...
Altinity Ltd
 
Java Presentation
Java PresentationJava Presentation
Java Presentation
pm2214
 
Svelte as a Reactive Web Framework
Svelte as a Reactive Web FrameworkSvelte as a Reactive Web Framework
Svelte as a Reactive Web Framework
University of Moratuwa, Katubedda, Sri Lanka
 
jQuery
jQueryjQuery
jQuery
Vishwa Mohan
 
What is component in reactjs
What is component in reactjsWhat is component in reactjs
What is component in reactjs
manojbkalla
 
Spring boot
Spring bootSpring boot
Spring boot
Bhagwat Kumar
 
Spring Framework - AOP
Spring Framework - AOPSpring Framework - AOP
Spring Framework - AOP
Dzmitry Naskou
 
Collections and its types in C# (with examples)
Collections and its types in C# (with examples)Collections and its types in C# (with examples)
Collections and its types in C# (with examples)
Aijaz Ali Abro
 
Learning Svelte
Learning SvelteLearning Svelte
Learning Svelte
Christoffer Noring
 
Deep Dive async/await in Unity with UniTask(EN)
Deep Dive async/await in Unity with UniTask(EN)Deep Dive async/await in Unity with UniTask(EN)
Deep Dive async/await in Unity with UniTask(EN)
Yoshifumi Kawai
 
KGC 2016: HTTPS 로 모바일 게임 서버 구축한다는 것 - Korea Games Conference
KGC 2016: HTTPS 로 모바일 게임 서버 구축한다는 것 - Korea Games ConferenceKGC 2016: HTTPS 로 모바일 게임 서버 구축한다는 것 - Korea Games Conference
KGC 2016: HTTPS 로 모바일 게임 서버 구축한다는 것 - Korea Games Conference
Xionglong Jin
 
NDC14 - Rx와 Functional Reactive Programming으로 고성능 서버 만들기
NDC14 - Rx와 Functional Reactive Programming으로 고성능 서버 만들기NDC14 - Rx와 Functional Reactive Programming으로 고성능 서버 만들기
NDC14 - Rx와 Functional Reactive Programming으로 고성능 서버 만들기
Jong Wook Kim
 
Bootstrap
BootstrapBootstrap
Bootstrap
Jadson Santos
 
[0903 구경원] recast 네비메쉬
[0903 구경원] recast 네비메쉬[0903 구경원] recast 네비메쉬
[0903 구경원] recast 네비메쉬KyeongWon Koo
 
Node.js Express Tutorial | Node.js Tutorial For Beginners | Node.js + Expres...
Node.js Express Tutorial | Node.js Tutorial For Beginners | Node.js +  Expres...Node.js Express Tutorial | Node.js Tutorial For Beginners | Node.js +  Expres...
Node.js Express Tutorial | Node.js Tutorial For Beginners | Node.js + Expres...
Edureka!
 
GCGC- CGCII 서버 엔진에 적용된 기술 (2) - Perfornance
GCGC- CGCII 서버 엔진에 적용된 기술 (2) - PerfornanceGCGC- CGCII 서버 엔진에 적용된 기술 (2) - Perfornance
GCGC- CGCII 서버 엔진에 적용된 기술 (2) - Perfornance
상현 조
 
Integration Testing in Python
Integration Testing in PythonIntegration Testing in Python
Integration Testing in Python
Panoptic Development, Inc.
 
Presentation of bootstrap
Presentation of bootstrapPresentation of bootstrap
Presentation of bootstrap
1amitgupta
 

What's hot (20)

ASP.NET MVC Presentation
ASP.NET MVC PresentationASP.NET MVC Presentation
ASP.NET MVC Presentation
 
Introduction to Node.js
Introduction to Node.jsIntroduction to Node.js
Introduction to Node.js
 
ClickHouse Unleashed 2020: Our Favorite New Features for Your Analytical Appl...
ClickHouse Unleashed 2020: Our Favorite New Features for Your Analytical Appl...ClickHouse Unleashed 2020: Our Favorite New Features for Your Analytical Appl...
ClickHouse Unleashed 2020: Our Favorite New Features for Your Analytical Appl...
 
Java Presentation
Java PresentationJava Presentation
Java Presentation
 
Svelte as a Reactive Web Framework
Svelte as a Reactive Web FrameworkSvelte as a Reactive Web Framework
Svelte as a Reactive Web Framework
 
jQuery
jQueryjQuery
jQuery
 
What is component in reactjs
What is component in reactjsWhat is component in reactjs
What is component in reactjs
 
Spring boot
Spring bootSpring boot
Spring boot
 
Spring Framework - AOP
Spring Framework - AOPSpring Framework - AOP
Spring Framework - AOP
 
Collections and its types in C# (with examples)
Collections and its types in C# (with examples)Collections and its types in C# (with examples)
Collections and its types in C# (with examples)
 
Learning Svelte
Learning SvelteLearning Svelte
Learning Svelte
 
Deep Dive async/await in Unity with UniTask(EN)
Deep Dive async/await in Unity with UniTask(EN)Deep Dive async/await in Unity with UniTask(EN)
Deep Dive async/await in Unity with UniTask(EN)
 
KGC 2016: HTTPS 로 모바일 게임 서버 구축한다는 것 - Korea Games Conference
KGC 2016: HTTPS 로 모바일 게임 서버 구축한다는 것 - Korea Games ConferenceKGC 2016: HTTPS 로 모바일 게임 서버 구축한다는 것 - Korea Games Conference
KGC 2016: HTTPS 로 모바일 게임 서버 구축한다는 것 - Korea Games Conference
 
NDC14 - Rx와 Functional Reactive Programming으로 고성능 서버 만들기
NDC14 - Rx와 Functional Reactive Programming으로 고성능 서버 만들기NDC14 - Rx와 Functional Reactive Programming으로 고성능 서버 만들기
NDC14 - Rx와 Functional Reactive Programming으로 고성능 서버 만들기
 
Bootstrap
BootstrapBootstrap
Bootstrap
 
[0903 구경원] recast 네비메쉬
[0903 구경원] recast 네비메쉬[0903 구경원] recast 네비메쉬
[0903 구경원] recast 네비메쉬
 
Node.js Express Tutorial | Node.js Tutorial For Beginners | Node.js + Expres...
Node.js Express Tutorial | Node.js Tutorial For Beginners | Node.js +  Expres...Node.js Express Tutorial | Node.js Tutorial For Beginners | Node.js +  Expres...
Node.js Express Tutorial | Node.js Tutorial For Beginners | Node.js + Expres...
 
GCGC- CGCII 서버 엔진에 적용된 기술 (2) - Perfornance
GCGC- CGCII 서버 엔진에 적용된 기술 (2) - PerfornanceGCGC- CGCII 서버 엔진에 적용된 기술 (2) - Perfornance
GCGC- CGCII 서버 엔진에 적용된 기술 (2) - Perfornance
 
Integration Testing in Python
Integration Testing in PythonIntegration Testing in Python
Integration Testing in Python
 
Presentation of bootstrap
Presentation of bootstrapPresentation of bootstrap
Presentation of bootstrap
 

Viewers also liked

ENOG-1 ddos-classification.lyamin
ENOG-1 ddos-classification.lyaminENOG-1 ddos-classification.lyamin
ENOG-1 ddos-classification.lyaminAlexander Lyamin
 
Lyamin GroupIB Report 2015
Lyamin GroupIB Report 2015Lyamin GroupIB Report 2015
Lyamin GroupIB Report 2015
Alexander Lyamin
 
Fork Yeah! The Rise and Development of illumos
Fork Yeah! The Rise and Development of illumosFork Yeah! The Rise and Development of illumos
Fork Yeah! The Rise and Development of illumos
bcantrill
 
Tempesta FW: a FrameWork and FireWall for HTTP DDoS mitigation and Web Applic...
Tempesta FW: a FrameWork and FireWall for HTTP DDoS mitigation and Web Applic...Tempesta FW: a FrameWork and FireWall for HTTP DDoS mitigation and Web Applic...
Tempesta FW: a FrameWork and FireWall for HTTP DDoS mitigation and Web Applic...
Alexander Krizhanovsky
 
Docker's Killer Feature: The Remote API
Docker's Killer Feature: The Remote APIDocker's Killer Feature: The Remote API
Docker's Killer Feature: The Remote API
bcantrill
 
Segment Routing & Application Engeering Routing
Segment Routing & Application Engeering RoutingSegment Routing & Application Engeering Routing
Segment Routing & Application Engeering Routing
Bertrand Duvivier
 

Viewers also liked (6)

ENOG-1 ddos-classification.lyamin
ENOG-1 ddos-classification.lyaminENOG-1 ddos-classification.lyamin
ENOG-1 ddos-classification.lyamin
 
Lyamin GroupIB Report 2015
Lyamin GroupIB Report 2015Lyamin GroupIB Report 2015
Lyamin GroupIB Report 2015
 
Fork Yeah! The Rise and Development of illumos
Fork Yeah! The Rise and Development of illumosFork Yeah! The Rise and Development of illumos
Fork Yeah! The Rise and Development of illumos
 
Tempesta FW: a FrameWork and FireWall for HTTP DDoS mitigation and Web Applic...
Tempesta FW: a FrameWork and FireWall for HTTP DDoS mitigation and Web Applic...Tempesta FW: a FrameWork and FireWall for HTTP DDoS mitigation and Web Applic...
Tempesta FW: a FrameWork and FireWall for HTTP DDoS mitigation and Web Applic...
 
Docker's Killer Feature: The Remote API
Docker's Killer Feature: The Remote APIDocker's Killer Feature: The Remote API
Docker's Killer Feature: The Remote API
 
Segment Routing & Application Engeering Routing
Segment Routing & Application Engeering RoutingSegment Routing & Application Engeering Routing
Segment Routing & Application Engeering Routing
 

Similar to MPLS in DC and inter-DC networks: the unified forwarding mechanism for network programmability at scale

MPLS in DC and inter-DC networks: the unified forwarding mechanism for networ...
MPLS in DC and inter-DC networks: the unified forwarding mechanism for networ...MPLS in DC and inter-DC networks: the unified forwarding mechanism for networ...
MPLS in DC and inter-DC networks: the unified forwarding mechanism for networ...
Yandex
 
The Future of Networking, and the Past of Protocols
The Future of Networking, and the Past of ProtocolsThe Future of Networking, and the Past of Protocols
The Future of Networking, and the Past of Protocols
Open Networking Summits
 
Distributed Clouds and Software Defined Networking
Distributed Clouds and Software Defined NetworkingDistributed Clouds and Software Defined Networking
Distributed Clouds and Software Defined Networking
US-Ignite
 
Lecture 11 Final.pptx
Lecture 11 Final.pptxLecture 11 Final.pptx
Lecture 11 Final.pptx
Hadeeb
 
Software-Defined Networking Layers presentation
Software-Defined Networking Layers presentationSoftware-Defined Networking Layers presentation
Software-Defined Networking Layers presentation
Abdullah Salama
 
Introduction to MPLS - NANOG 61
Introduction to MPLS - NANOG 61Introduction to MPLS - NANOG 61
Introduction to MPLS - NANOG 61
Richard Steenbergen
 
Radisys/Wind River: The Telcom Cloud - Deployment Strategies: SDN/NFV and Vir...
Radisys/Wind River: The Telcom Cloud - Deployment Strategies: SDN/NFV and Vir...Radisys/Wind River: The Telcom Cloud - Deployment Strategies: SDN/NFV and Vir...
Radisys/Wind River: The Telcom Cloud - Deployment Strategies: SDN/NFV and Vir...
Radisys Corporation
 
Topology.ppt
Topology.pptTopology.ppt
Topology.ppt
AyansaErgiba2
 
Network architecure (3).pptx
Network architecure (3).pptxNetwork architecure (3).pptx
Network architecure (3).pptx
Kaythry P
 
Software Defined Networking: Network Virtualization
Software Defined Networking: Network VirtualizationSoftware Defined Networking: Network Virtualization
Software Defined Networking: Network Virtualization
NetCraftsmen
 
08-sdnfvmec.pdf
08-sdnfvmec.pdf08-sdnfvmec.pdf
08-sdnfvmec.pdf
SumaiyaRaiyan
 
Introductionto SDN
Introductionto SDN Introductionto SDN
Introductionto SDN
Md. Shariful Islam Robin
 
Introduction to Software Defined Networking (SDN)
Introduction to Software Defined Networking (SDN)Introduction to Software Defined Networking (SDN)
Introduction to Software Defined Networking (SDN)
Bangladesh Network Operators Group
 
Why sdn
Why sdnWhy sdn
Why sdn
lz1dsb
 
Topic02-Architecture.pptx
Topic02-Architecture.pptxTopic02-Architecture.pptx
Topic02-Architecture.pptx
ImXaib
 
Raga_SDN_NSX_1
Raga_SDN_NSX_1Raga_SDN_NSX_1
Raga_SDN_NSX_1
Ranjith Kumar
 
Link_NwkingforDevOps
Link_NwkingforDevOpsLink_NwkingforDevOps
Link_NwkingforDevOps
Vikas Deolaliker
 
Vaibhav (2)
Vaibhav (2)Vaibhav (2)
Vaibhav (2)
vaibhav jindal
 
4_SDN.pdf
4_SDN.pdf4_SDN.pdf
4_SDN.pdf
ssuser054b31
 
Network-aware Data Management for High Throughput Flows Akamai, Cambridge, ...
Network-aware Data Management for High Throughput Flows   Akamai, Cambridge, ...Network-aware Data Management for High Throughput Flows   Akamai, Cambridge, ...
Network-aware Data Management for High Throughput Flows Akamai, Cambridge, ...
balmanme
 

Similar to MPLS in DC and inter-DC networks: the unified forwarding mechanism for network programmability at scale (20)

MPLS in DC and inter-DC networks: the unified forwarding mechanism for networ...
MPLS in DC and inter-DC networks: the unified forwarding mechanism for networ...MPLS in DC and inter-DC networks: the unified forwarding mechanism for networ...
MPLS in DC and inter-DC networks: the unified forwarding mechanism for networ...
 
The Future of Networking, and the Past of Protocols
The Future of Networking, and the Past of ProtocolsThe Future of Networking, and the Past of Protocols
The Future of Networking, and the Past of Protocols
 
Distributed Clouds and Software Defined Networking
Distributed Clouds and Software Defined NetworkingDistributed Clouds and Software Defined Networking
Distributed Clouds and Software Defined Networking
 
Lecture 11 Final.pptx
Lecture 11 Final.pptxLecture 11 Final.pptx
Lecture 11 Final.pptx
 
Software-Defined Networking Layers presentation
Software-Defined Networking Layers presentationSoftware-Defined Networking Layers presentation
Software-Defined Networking Layers presentation
 
Introduction to MPLS - NANOG 61
Introduction to MPLS - NANOG 61Introduction to MPLS - NANOG 61
Introduction to MPLS - NANOG 61
 
Radisys/Wind River: The Telcom Cloud - Deployment Strategies: SDN/NFV and Vir...
Radisys/Wind River: The Telcom Cloud - Deployment Strategies: SDN/NFV and Vir...Radisys/Wind River: The Telcom Cloud - Deployment Strategies: SDN/NFV and Vir...
Radisys/Wind River: The Telcom Cloud - Deployment Strategies: SDN/NFV and Vir...
 
Topology.ppt
Topology.pptTopology.ppt
Topology.ppt
 
Network architecure (3).pptx
Network architecure (3).pptxNetwork architecure (3).pptx
Network architecure (3).pptx
 
Software Defined Networking: Network Virtualization
Software Defined Networking: Network VirtualizationSoftware Defined Networking: Network Virtualization
Software Defined Networking: Network Virtualization
 
08-sdnfvmec.pdf
08-sdnfvmec.pdf08-sdnfvmec.pdf
08-sdnfvmec.pdf
 
Introductionto SDN
Introductionto SDN Introductionto SDN
Introductionto SDN
 
Introduction to Software Defined Networking (SDN)
Introduction to Software Defined Networking (SDN)Introduction to Software Defined Networking (SDN)
Introduction to Software Defined Networking (SDN)
 
Why sdn
Why sdnWhy sdn
Why sdn
 
Topic02-Architecture.pptx
Topic02-Architecture.pptxTopic02-Architecture.pptx
Topic02-Architecture.pptx
 
Raga_SDN_NSX_1
Raga_SDN_NSX_1Raga_SDN_NSX_1
Raga_SDN_NSX_1
 
Link_NwkingforDevOps
Link_NwkingforDevOpsLink_NwkingforDevOps
Link_NwkingforDevOps
 
Vaibhav (2)
Vaibhav (2)Vaibhav (2)
Vaibhav (2)
 
4_SDN.pdf
4_SDN.pdf4_SDN.pdf
4_SDN.pdf
 
Network-aware Data Management for High Throughput Flows Akamai, Cambridge, ...
Network-aware Data Management for High Throughput Flows   Akamai, Cambridge, ...Network-aware Data Management for High Throughput Flows   Akamai, Cambridge, ...
Network-aware Data Management for High Throughput Flows Akamai, Cambridge, ...
 

Recently uploaded

Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
Antonios Katsarakis
 
Session 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdfSession 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdf
UiPathCommunity
 
"What does it really mean for your system to be available, or how to define w...
"What does it really mean for your system to be available, or how to define w..."What does it really mean for your system to be available, or how to define w...
"What does it really mean for your system to be available, or how to define w...
Fwdays
 
From Natural Language to Structured Solr Queries using LLMs
From Natural Language to Structured Solr Queries using LLMsFrom Natural Language to Structured Solr Queries using LLMs
From Natural Language to Structured Solr Queries using LLMs
Sease
 
Demystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through StorytellingDemystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through Storytelling
Enterprise Knowledge
 
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham HillinQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
LizaNolte
 
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
AlexanderRichford
 
Essentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation ParametersEssentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation Parameters
Safe Software
 
Multivendor cloud production with VSF TR-11 - there and back again
Multivendor cloud production with VSF TR-11 - there and back againMultivendor cloud production with VSF TR-11 - there and back again
Multivendor cloud production with VSF TR-11 - there and back again
Kieran Kunhya
 
Call Girls Chandigarh🔥7023059433🔥Agency Profile Escorts in Chandigarh Availab...
Call Girls Chandigarh🔥7023059433🔥Agency Profile Escorts in Chandigarh Availab...Call Girls Chandigarh🔥7023059433🔥Agency Profile Escorts in Chandigarh Availab...
Call Girls Chandigarh🔥7023059433🔥Agency Profile Escorts in Chandigarh Availab...
manji sharman06
 
AWS Certified Solutions Architect Associate (SAA-C03)
AWS Certified Solutions Architect Associate (SAA-C03)AWS Certified Solutions Architect Associate (SAA-C03)
AWS Certified Solutions Architect Associate (SAA-C03)
HarpalGohil4
 
Must Know Postgres Extension for DBA and Developer during Migration
Must Know Postgres Extension for DBA and Developer during MigrationMust Know Postgres Extension for DBA and Developer during Migration
Must Know Postgres Extension for DBA and Developer during Migration
Mydbops
 
What is an RPA CoE? Session 1 – CoE Vision
What is an RPA CoE?  Session 1 – CoE VisionWhat is an RPA CoE?  Session 1 – CoE Vision
What is an RPA CoE? Session 1 – CoE Vision
DianaGray10
 
Apps Break Data
Apps Break DataApps Break Data
Apps Break Data
Ivo Velitchkov
 
What is an RPA CoE? Session 2 – CoE Roles
What is an RPA CoE?  Session 2 – CoE RolesWhat is an RPA CoE?  Session 2 – CoE Roles
What is an RPA CoE? Session 2 – CoE Roles
DianaGray10
 
QA or the Highway - Component Testing: Bridging the gap between frontend appl...
QA or the Highway - Component Testing: Bridging the gap between frontend appl...QA or the Highway - Component Testing: Bridging the gap between frontend appl...
QA or the Highway - Component Testing: Bridging the gap between frontend appl...
zjhamm304
 
Leveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and StandardsLeveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and Standards
Neo4j
 
The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
operationspcvita
 
Day 2 - Intro to UiPath Studio Fundamentals
Day 2 - Intro to UiPath Studio FundamentalsDay 2 - Intro to UiPath Studio Fundamentals
Day 2 - Intro to UiPath Studio Fundamentals
UiPathCommunity
 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
AstuteBusiness
 

Recently uploaded (20)

Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
 
Session 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdfSession 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdf
 
"What does it really mean for your system to be available, or how to define w...
"What does it really mean for your system to be available, or how to define w..."What does it really mean for your system to be available, or how to define w...
"What does it really mean for your system to be available, or how to define w...
 
From Natural Language to Structured Solr Queries using LLMs
From Natural Language to Structured Solr Queries using LLMsFrom Natural Language to Structured Solr Queries using LLMs
From Natural Language to Structured Solr Queries using LLMs
 
Demystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through StorytellingDemystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through Storytelling
 
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham HillinQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
 
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
 
Essentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation ParametersEssentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation Parameters
 
Multivendor cloud production with VSF TR-11 - there and back again
Multivendor cloud production with VSF TR-11 - there and back againMultivendor cloud production with VSF TR-11 - there and back again
Multivendor cloud production with VSF TR-11 - there and back again
 
Call Girls Chandigarh🔥7023059433🔥Agency Profile Escorts in Chandigarh Availab...
Call Girls Chandigarh🔥7023059433🔥Agency Profile Escorts in Chandigarh Availab...Call Girls Chandigarh🔥7023059433🔥Agency Profile Escorts in Chandigarh Availab...
Call Girls Chandigarh🔥7023059433🔥Agency Profile Escorts in Chandigarh Availab...
 
AWS Certified Solutions Architect Associate (SAA-C03)
AWS Certified Solutions Architect Associate (SAA-C03)AWS Certified Solutions Architect Associate (SAA-C03)
AWS Certified Solutions Architect Associate (SAA-C03)
 
Must Know Postgres Extension for DBA and Developer during Migration
Must Know Postgres Extension for DBA and Developer during MigrationMust Know Postgres Extension for DBA and Developer during Migration
Must Know Postgres Extension for DBA and Developer during Migration
 
What is an RPA CoE? Session 1 – CoE Vision
What is an RPA CoE?  Session 1 – CoE VisionWhat is an RPA CoE?  Session 1 – CoE Vision
What is an RPA CoE? Session 1 – CoE Vision
 
Apps Break Data
Apps Break DataApps Break Data
Apps Break Data
 
What is an RPA CoE? Session 2 – CoE Roles
What is an RPA CoE?  Session 2 – CoE RolesWhat is an RPA CoE?  Session 2 – CoE Roles
What is an RPA CoE? Session 2 – CoE Roles
 
QA or the Highway - Component Testing: Bridging the gap between frontend appl...
QA or the Highway - Component Testing: Bridging the gap between frontend appl...QA or the Highway - Component Testing: Bridging the gap between frontend appl...
QA or the Highway - Component Testing: Bridging the gap between frontend appl...
 
Leveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and StandardsLeveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and Standards
 
The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
 
Day 2 - Intro to UiPath Studio Fundamentals
Day 2 - Intro to UiPath Studio FundamentalsDay 2 - Intro to UiPath Studio Fundamentals
Day 2 - Intro to UiPath Studio Fundamentals
 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
 

MPLS in DC and inter-DC networks: the unified forwarding mechanism for network programmability at scale

  • 1. Dmitry Afanasiev, fl0w@yandex-team.ru Daniel Ginsburg, dbg@yandex-team.ru Network Architects MPLS in DC and inter-DC networks: the unified forwarding mechanism for network programmability at scale
  • 3. 3 • Founded in 1993 • NASDAQ:YNDX, Mkt Cap ~$12.5B • One of Europe's largest internet companies and the leading search provider in Russia • Over 60% of the local search market • Monthly user audience of over 90 million worldwide. • Services: search, music, video, cloud storage, news, weather, maps, traffic, email, ads ... What is Yandex
  • 4. 4 • We're rather typical MS-DC • Several DCs in Russia and abroad + MPLS backbone to connect them • About 100k servers and growing fast • Mostly IPv6 internally, need to serve external IPv4 • Network architecture is a bit outdated, needs rethinking Our Infrastructure
  • 5. In Search of New Arch
  • 6. 6 • It needs to be: – Scalable – Flexible – Programmable • Lots of approaches out there, some get many things right… • But not one combines all the right pieces in the right way • It's really surprising because right combination seems almost inevitable. In Search of New Arch
  • 7. 7 • Many of the ideas have been around for years (or even decades) • Interconnection network topology – folded Clos • Let the edge handle complexity • Core just delivers packets edge to edge • Overlay/underlay logical split • Control: mix of centralized and distributed. Needs a nice way to combine both • Simple commodity network elements • Hierarchy and automation to scale the network Ideas to Build Upon
  • 8. 8 • All these are ideas are well known, understood and almost universally accepted in the industry • People are trying to implement them using a wild mix of data plane mechanisms. • And it introduces enormous complexity • What's missing? Unified forwarding mechanism What’s missing
  • 9. 9 • Life is much easier when we don't have to deal with multitude of data planes and forwarding mechanisms. • Fortunately, there is already well known, well understood, standardized forwarding plane mechanism upon which we can implement all those ideas without compromising their value. • It has well defined and standardized mapping to many other popular forwarding panes. • It's known as MPLS. Missing… or overlooked?
  • 11. 11 • Different data plane mechanisms – different features • The unified data plane should be able to support all useful features and produce their combinations • MPLS is very flexible: – forwarding over a pre-signalled virtual circuit a-la ATM - this is what RSPV- TE does – source routing over a previously discovered topology a-la Token Ring networks - see Segment Routing proposal – hierarchical LPM a-la IP - just split the address over several labels and allow routers to act on the topmost one (not that we suggest it is practical, but it is definitely possible) Flexibility
  • 12. 12 • Best way to implement arbitrary semantics is to get rid of any semantics in protocol headers and assign it externally • Hardware works with protocol headers • Control software defines the semantics An Abstract Note on Semantics
  • 13. 13 • Why combining? To have the right features at the right place or produce useful combination of features • There're basically two ways to combine different data-planes together: stitch or interwork them, and overlay them on top of each other Combining Data Planes
  • 14. 14 • It’s pain • Might be done for subset of protocol features • Need to translate between protocols (complex, never perfect, looses information) • Need to provision interworking points: fragile, operational nightmare, create bottlenecks • Seems nobody really does this anymore… Or maybe we still have to sometimes? Stitching Data Planes
  • 15. 15 • Overlay to: scale, virtualize, augment one data plane with properties of another • Overlaying is building hierarchy • But with multiple data planes it is limited and ad-hoc • Often ugly: IP over Ethernet over VXLAN over IP over Ethernet • MPLS is intrinsically hierarchical (overlayable, if you will) Overlaying Data Planes
  • 16. 16 • Many hierarchical structures are already in the network: topology, addressing, management and control • Hierarchy is the most important and the most reliable way to scale things Hierarchy is your friend
  • 17. 17 • The ability to implement hierarchy natively enables us to ditch the notion of hard overlay/underlay boundary. • In a stack of DC-label, ToR-label, port-label, slice-label, vm-label, where's the boundary of overlay/underlay? Not in the packet • Placement of the boundary only depends on how you structure your control Overlay/underlay split is a metaphor
  • 18. 18 • Can be as granular or coarse-grained as one wishes. There's no network-imposed limitation • Easy behavior aggregation. Just add an extra label on top • Easy behavior disaggregation. One can expose additional granularity by adding extra label on bottom FEC is hierarchical
  • 20. 20 • MPLS control plane is notoriously complex • Good news: you don’t have to use all of it, can pick good parts • Classical distributed control is Ok for transport • Centralized control seems better for higher level artifacts on the edge, sometimes called services • Both styles can (and should) be combined MPLS is complex?
  • 21. 21 • The device has be a bit smarter than in OF • Gets parts of label stack from different control plane components • Assembles the full stack from those parts, using local logic to follow assembly instructions provided by control plane • Assembly instructions come in form of referencing by “name” • Assembly uses late binding Enabling combinability
  • 22. 22 • MPLS VPN (abstraction A) refers to MPLS tunnels (abstraction B), using next-hop resolution. • The resolution happens on the device itself, and two control plane entities are loosely coupled - MPLS tunnels paths can change their paths, the assigned labels etc, without MP-BGP caring about it • VPN abstraction refers to tunnel abstraction using next-hops. Next-hop is the name which one control plane abstraction refers to another Enabling combinability – example
  • 23. 23 • Recursive next-hop resolution with labeled routes (RFC 3107) is the powerful way to overlay one control plane abstraction over another • Able to express almost anything we currently want. Still, more expressive way is desired • BGP 3107 is the way to interact with all- classically-controlled MPLS networks Enabling Combinability – BGP 3107
  • 24. 24 • If you can ensure that the labels at some point of the network always stay the same (because you assigned them to be so), you can use static configuration on the other side • The way to go, when one wants to avoid any signaling dependencies • Static configuration can be calculated and disseminated automatically Static Configuration
  • 25. 25 • On the host! Or even right from the application • Hypervisor switch is the easiest point. SW only, very flexible. • Naturally fits centralized control • Helps to scale. Lots of RAM, each element keeps only needed state • Modern CPUs can forward 10s of Gbps without breaking sweat Where MPLS should start?
  • 26. 26 • A simple forwarding plane (3 simple ops) • A simple software agent on the device (receives parts of label stack from different control plane components, assembles full stack, and programs the HW) • Centralized and distributed control, or anything in between • Combinability of different control plane components with late binding via names, which the device resolves Looks SDNish
  • 27. 27 • “Modularity based on abstraction is the way things get done” --Liskov • “SDN ...Not a revolutionary technology... ...just a way of organizing network functionality” -- Shenker • “SDN is merely set of abstractions for control plane, not a specific set of mechanisms.” -- Shenker • “Most lasting legacy of SDN is not better datacenters - But better ways of reasoning about network control” --Shenker What SDN is
  • 28. 28 • Let the edge handle complexity – do it on host • Core just delivers packets edge to edge – hierarchy enables the devices to be agnostic to changes on the edge • Overlay/underlay logical split – just a way to implement hierarchy • Control: mix of centralized and distributed. Needs a nice way to combine both – yeah! • Simple commodity network elements – cheap MPLS capable silicon is finally there How Ideas Map to MPLS
  • 29. 29 • Key point of S-MPLS was to extend MPLS to access and separate transport and service in MPLS network • NFV describes how to host service nodes in DC. If you don’t have MPLS in DC it’s no longer seamless • Fix is obvious – extend MPLS into DC • Labels can carry additional metadata if one wants them to NFV and Seamless MPLS
  • 30. Case Study: New Yandex DC
  • 31. 31 • Cheap and abundant bandwidth • Scalable forwarding with minimal state • Multitenancy (=> network virtualization) • Efficient resource pooling • InterDC traffic engineering • Function chaining: load balancing, FW, etc. • Interconnection with existing infrastructure • Means to integrate all of above • Local response to some events, e.g. failures • Automation at scale What we need?
  • 32. 32 We are trying to keep design really simple. Don’t need many functions often perceived as desireable: • L2 (neither real, nor emulated) • VM mobility – In scale-out applications nodes coming and going is a norm, no need to move them around while preserving state and identity – VM mobility increases complexity as it depends on other features • Multicast • We don't have too many changes in topology What we don’t need
  • 33. 33 • Host with vLER (MPLS capable vRouter) • Fabric switching elements – LSRs • Centralized controller • Legacy routers. Need to interwork with fabric LSRs and controller. BGP 3107 is the tool Components
  • 34. 34 • 3-label stack: topmost for egress switch, next for egress port, bottom for VM • vRouter uses {dst prefix, VRF} to impose label stack • Bottom label processed by destination vLER • Expected state on a fabric switch: #switches_in_the_fabric + #local_access_ports Forwarding model
  • 35. 35 • iBGP 3107 (in-path RR w/ NHS) inside fabric for reachabilty and label distribution (draft- lapukhov…, but with iBGP and labels) • iBGP 3107 to interwork with legacy routers – Session with connected network element with NHS for switch label – Session with controller for remaining labels, binds to switch label via next hop • Label mappings on edge of the fabric are stable, can be provisioned rather than signaled • Internal fabric failures are handled locally • Label mappings on vRouters are distributed centrally Control plane
  • 36. Why Now and What’s Next?
  • 37. 37 “The world is changed… I smell it in the air” • A lot of similar ideas in the industry • Seems that thinking converges on something • But ... a lot of ugly ad-hoc solutions are popping out here and there • Better implement good solution until bad ones are entrenched • It would be a shame and missed opportunity to stick with VXLAN/… for years when we could get MPLS instead Why Now?
  • 38. 38 • Merchant silicon is finally MPLS capable. And the price is almost right. • Modern CPUs can process tens of Mpps in SW, making host-based switching feasible. • Several open source MPLS data plane implementations are emerging • Several "classical" MPLS control plane components are very useful - BGP 3107, and have been there for quite long time. What’s Ready?
  • 39. 39 • All RFC3107 implementations are broken (multiple labels). Talk to your vendor • Silicon is not perfect. Talk to your vendor • A more expressive way to control late binding of control plane artifacts than BGP 3107 • Perception MPLS as complex technology. It's current MPLS control plane that is complex • Perception of MPLS as WAN or metro technology Gaps