SlideShare a Scribd company logo
1 of 40
Dmitry Afanasiev, fl0w@yandex-team.ru
Daniel Ginsburg, dbg@yandex-team.ru
Network Architects
MPLS in DC and inter-DC
networks: the unified
forwarding mechanism for
network programmability at
scale
About Us
3
• Founded in 1993
• NASDAQ:YNDX, Mkt Cap ~$12.5B
• One of Europe's largest internet companies
and the leading search provider in Russia
• Over 60% of the local search market
• Monthly user audience of over 90 million
worldwide.
• Services: search, music, video, cloud storage,
news, weather, maps, traffic, email, ads ...
What is Yandex
4
• We're rather typical MS-DC
• Several DCs in Russia and abroad + MPLS
backbone to connect them
• About 100k servers and growing fast
• Mostly IPv6 internally, need to serve external
IPv4
• Network architecture is a bit outdated, needs
rethinking
Our Infrastructure
In Search of New Arch
6
• It needs to be:
– Scalable
– Flexible
– Programmable
• Lots of approaches out there, some get many
things right…
• But not one combines all the right pieces in the
right way
• It's really surprising because right combination
seems almost inevitable.
In Search of New Arch
7
• Many of the ideas have been around for years
(or even decades)
• Interconnection network topology – folded Clos
• Let the edge handle complexity
• Core just delivers packets edge to edge
• Overlay/underlay logical split
• Control: mix of centralized and distributed.
Needs a nice way to combine both
• Simple commodity network elements
• Hierarchy and automation to scale the network
Ideas to Build Upon
8
• All these are ideas are well known, understood
and almost universally accepted in the industry
• People are trying to implement them using a
wild mix of data plane mechanisms.
• And it introduces enormous complexity
• What's missing? Unified forwarding
mechanism
What’s missing
9
• Life is much easier when we don't have to deal
with multitude of data planes and forwarding
mechanisms.
• Fortunately, there is already well known, well
understood, standardized forwarding plane
mechanism upon which we can implement all
those ideas without compromising their value.
• It has well defined and standardized mapping
to many other popular forwarding panes.
• It's known as MPLS.
Missing… or overlooked?
Unified Forwarding: Why and How
11
• Different data plane mechanisms – different
features
• The unified data plane should be able to
support all useful features and produce their
combinations
• MPLS is very flexible:
– forwarding over a pre-signalled virtual circuit a-la ATM - this is what RSPV-
TE does
– source routing over a previously discovered topology a-la Token Ring
networks - see Segment Routing proposal
– hierarchical LPM a-la IP - just split the address over several labels and
allow routers to act on the topmost one (not that we suggest it is practical,
but it is definitely possible)
Flexibility
12
• Best way to implement arbitrary semantics is to
get rid of any semantics in protocol headers
and assign it externally
• Hardware works with protocol headers
• Control software defines the semantics
An Abstract Note on Semantics
13
• Why combining? To have the right features at
the right place or produce useful combination
of features
• There're basically two ways to combine
different data-planes together: stitch or
interwork them, and overlay them on top of
each other
Combining Data Planes
14
• It’s pain
• Might be done for subset of protocol features
• Need to translate between protocols (complex,
never perfect, looses information)
• Need to provision interworking points: fragile,
operational nightmare, create bottlenecks
• Seems nobody really does this anymore… Or
maybe we still have to sometimes?
Stitching Data Planes
15
• Overlay to: scale, virtualize, augment one data
plane with properties of another
• Overlaying is building hierarchy
• But with multiple data planes it is limited and
ad-hoc
• Often ugly: IP over Ethernet over VXLAN over
IP over Ethernet
• MPLS is intrinsically hierarchical (overlayable,
if you will)
Overlaying Data Planes
16
• Many hierarchical structures are already in the
network: topology, addressing, management
and control
• Hierarchy is the most important and the most
reliable way to scale things
Hierarchy is your friend
17
• The ability to implement hierarchy natively
enables us to ditch the notion of hard
overlay/underlay boundary.
• In a stack of DC-label, ToR-label, port-label,
slice-label, vm-label, where's the boundary of
overlay/underlay? Not in the packet
• Placement of the boundary only depends on
how you structure your control
Overlay/underlay split is a metaphor
18
• Can be as granular or coarse-grained as one
wishes. There's no network-imposed limitation
• Easy behavior aggregation. Just add an extra
label on top
• Easy behavior disaggregation. One can
expose additional granularity by adding extra
label on bottom
FEC is hierarchical
How to Control MPLS
20
• MPLS control plane is notoriously complex
• Good news: you don’t have to use all of it, can
pick good parts
• Classical distributed control is Ok for transport
• Centralized control seems better for higher
level artifacts on the edge, sometimes called
services
• Both styles can (and should) be combined
MPLS is complex?
21
• The device has be a bit smarter than in OF
• Gets parts of label stack from different control
plane components
• Assembles the full stack from those parts,
using local logic to follow assembly instructions
provided by control plane
• Assembly instructions come in form of
referencing by “name”
• Assembly uses late binding
Enabling combinability
22
• MPLS VPN (abstraction A) refers to MPLS
tunnels (abstraction B), using next-hop
resolution.
• The resolution happens on the device itself,
and two control plane entities are loosely
coupled - MPLS tunnels paths can change
their paths, the assigned labels etc, without
MP-BGP caring about it
• VPN abstraction refers to tunnel abstraction
using next-hops. Next-hop is the name which
one control plane abstraction refers to another
Enabling combinability – example
23
• Recursive next-hop resolution with labeled
routes (RFC 3107) is the powerful way to
overlay one control plane abstraction over
another
• Able to express almost anything we currently
want. Still, more expressive way is desired
• BGP 3107 is the way to interact with all-
classically-controlled MPLS networks
Enabling Combinability – BGP 3107
24
• If you can ensure that the labels at some point
of the network always stay the same (because
you assigned them to be so), you can use
static configuration on the other side
• The way to go, when one wants to avoid any
signaling dependencies
• Static configuration can be calculated and
disseminated automatically
Static Configuration
25
• On the host! Or even right from the application
• Hypervisor switch is the easiest point. SW only,
very flexible.
• Naturally fits centralized control
• Helps to scale. Lots of RAM, each element
keeps only needed state
• Modern CPUs can forward 10s of Gbps without
breaking sweat
Where MPLS should start?
26
• A simple forwarding plane (3 simple ops)
• A simple software agent on the device
(receives parts of label stack from different
control plane components, assembles full
stack, and programs the HW)
• Centralized and distributed control, or anything
in between
• Combinability of different control plane
components with late binding via names, which
the device resolves
Looks SDNish
27
• “Modularity based on abstraction is the way
things get done” --Liskov
• “SDN ...Not a revolutionary technology... ...just
a way of organizing network functionality” --
Shenker
• “SDN is merely set of abstractions for control
plane, not a specific set of mechanisms.” --
Shenker
• “Most lasting legacy of SDN is not better
datacenters - But better ways of reasoning
about network control” --Shenker
What SDN is
28
• Let the edge handle complexity – do it on host
• Core just delivers packets edge to edge –
hierarchy enables the devices to be agnostic to
changes on the edge
• Overlay/underlay logical split – just a way to
implement hierarchy
• Control: mix of centralized and distributed.
Needs a nice way to combine both – yeah!
• Simple commodity network elements – cheap
MPLS capable silicon is finally there
How Ideas Map to MPLS
29
• Key point of S-MPLS was to extend MPLS to
access and separate transport and service in
MPLS network
• NFV describes how to host service nodes in
DC. If you don’t have MPLS in DC it’s no
longer seamless
• Fix is obvious – extend MPLS into DC
• Labels can carry additional metadata if one
wants them to
NFV and Seamless MPLS
Case Study: New Yandex DC
31
• Cheap and abundant bandwidth
• Scalable forwarding with minimal state
• Multitenancy (=> network virtualization)
• Efficient resource pooling
• InterDC traffic engineering
• Function chaining: load balancing, FW, etc.
• Interconnection with existing infrastructure
• Means to integrate all of above
• Local response to some events, e.g. failures
• Automation at scale
What we need?
32
We are trying to keep design really simple. Don’t
need many functions often perceived as
desireable:
• L2 (neither real, nor emulated)
• VM mobility
– In scale-out applications nodes coming and going is a norm, no need to
move them around while preserving state and identity
– VM mobility increases complexity as it depends on other features
• Multicast
• We don't have too many changes in topology
What we don’t need
33
• Host with vLER (MPLS capable vRouter)
• Fabric switching elements – LSRs
• Centralized controller
• Legacy routers. Need to interwork with fabric
LSRs and controller. BGP 3107 is the tool
Components
34
• 3-label stack: topmost for egress switch, next
for egress port, bottom for VM
• vRouter uses {dst prefix, VRF} to impose label
stack
• Bottom label processed by destination vLER
• Expected state on a fabric switch:
#switches_in_the_fabric + #local_access_ports
Forwarding model
35
• iBGP 3107 (in-path RR w/ NHS) inside fabric
for reachabilty and label distribution (draft-
lapukhov…, but with iBGP and labels)
• iBGP 3107 to interwork with legacy routers
– Session with connected network element with NHS for switch label
– Session with controller for remaining labels, binds to switch label via next
hop
• Label mappings on edge of the fabric are
stable, can be provisioned rather than signaled
• Internal fabric failures are handled locally
• Label mappings on vRouters are distributed
centrally
Control plane
Why Now and What’s Next?
37
“The world is changed… I smell it in the air”
• A lot of similar ideas in the industry
• Seems that thinking converges on something
• But ... a lot of ugly ad-hoc solutions are
popping out here and there
• Better implement good solution until bad ones
are entrenched
• It would be a shame and missed opportunity to
stick with VXLAN/… for years when we could
get MPLS instead
Why Now?
38
• Merchant silicon is finally MPLS capable. And
the price is almost right.
• Modern CPUs can process tens of Mpps in
SW, making host-based switching feasible.
• Several open source MPLS data plane
implementations are emerging
• Several "classical" MPLS control plane
components are very useful - BGP 3107, and
have been there for quite long time.
What’s Ready?
39
• All RFC3107 implementations are broken
(multiple labels). Talk to your vendor
• Silicon is not perfect. Talk to your vendor
• A more expressive way to control late binding
of control plane artifacts than BGP 3107
• Perception MPLS as complex technology. It's
current MPLS control plane that is complex
• Perception of MPLS as WAN or metro
technology
Gaps
Thank you!
Questions?

More Related Content

What's hot

Linux kernel development chapter 10
Linux kernel development chapter 10Linux kernel development chapter 10
Linux kernel development chapter 10
huangachou
 

What's hot (8)

Software-Defined Networking , Survey of HotSDN 2012
Software-Defined Networking , Survey of HotSDN 2012Software-Defined Networking , Survey of HotSDN 2012
Software-Defined Networking , Survey of HotSDN 2012
 
Sdn dell lab report v2
Sdn dell lab report v2Sdn dell lab report v2
Sdn dell lab report v2
 
Introduction to SDN: Software Defined Networking
Introduction to SDN: Software Defined NetworkingIntroduction to SDN: Software Defined Networking
Introduction to SDN: Software Defined Networking
 
SDN: an introduction
SDN: an introductionSDN: an introduction
SDN: an introduction
 
Performance Evaluation for Software Defined Networking (SDN) Based on Adaptiv...
Performance Evaluation for Software Defined Networking (SDN) Based on Adaptiv...Performance Evaluation for Software Defined Networking (SDN) Based on Adaptiv...
Performance Evaluation for Software Defined Networking (SDN) Based on Adaptiv...
 
Sdn presentation
Sdn presentation Sdn presentation
Sdn presentation
 
Linux kernel development chapter 10
Linux kernel development chapter 10Linux kernel development chapter 10
Linux kernel development chapter 10
 
SDN, OpenFlow, NFV, and Virtual Network
SDN, OpenFlow, NFV, and Virtual NetworkSDN, OpenFlow, NFV, and Virtual Network
SDN, OpenFlow, NFV, and Virtual Network
 

Viewers also liked

ENOG-1 ddos-classification.lyamin
ENOG-1 ddos-classification.lyaminENOG-1 ddos-classification.lyamin
ENOG-1 ddos-classification.lyamin
Alexander Lyamin
 
Tempesta FW: a FrameWork and FireWall for HTTP DDoS mitigation and Web Applic...
Tempesta FW: a FrameWork and FireWall for HTTP DDoS mitigation and Web Applic...Tempesta FW: a FrameWork and FireWall for HTTP DDoS mitigation and Web Applic...
Tempesta FW: a FrameWork and FireWall for HTTP DDoS mitigation and Web Applic...
Alexander Krizhanovsky
 

Viewers also liked (6)

ENOG-1 ddos-classification.lyamin
ENOG-1 ddos-classification.lyaminENOG-1 ddos-classification.lyamin
ENOG-1 ddos-classification.lyamin
 
Lyamin GroupIB Report 2015
Lyamin GroupIB Report 2015Lyamin GroupIB Report 2015
Lyamin GroupIB Report 2015
 
Fork Yeah! The Rise and Development of illumos
Fork Yeah! The Rise and Development of illumosFork Yeah! The Rise and Development of illumos
Fork Yeah! The Rise and Development of illumos
 
Tempesta FW: a FrameWork and FireWall for HTTP DDoS mitigation and Web Applic...
Tempesta FW: a FrameWork and FireWall for HTTP DDoS mitigation and Web Applic...Tempesta FW: a FrameWork and FireWall for HTTP DDoS mitigation and Web Applic...
Tempesta FW: a FrameWork and FireWall for HTTP DDoS mitigation and Web Applic...
 
Docker's Killer Feature: The Remote API
Docker's Killer Feature: The Remote APIDocker's Killer Feature: The Remote API
Docker's Killer Feature: The Remote API
 
Segment Routing & Application Engeering Routing
Segment Routing & Application Engeering RoutingSegment Routing & Application Engeering Routing
Segment Routing & Application Engeering Routing
 

Similar to MPLS in DC and inter-DC networks: the unified forwarding mechanism for network programmability at scale

Network-aware Data Management for High Throughput Flows Akamai, Cambridge, ...
Network-aware Data Management for High Throughput Flows   Akamai, Cambridge, ...Network-aware Data Management for High Throughput Flows   Akamai, Cambridge, ...
Network-aware Data Management for High Throughput Flows Akamai, Cambridge, ...
balmanme
 

Similar to MPLS in DC and inter-DC networks: the unified forwarding mechanism for network programmability at scale (20)

MPLS in DC and inter-DC networks: the unified forwarding mechanism for networ...
MPLS in DC and inter-DC networks: the unified forwarding mechanism for networ...MPLS in DC and inter-DC networks: the unified forwarding mechanism for networ...
MPLS in DC and inter-DC networks: the unified forwarding mechanism for networ...
 
The Future of Networking, and the Past of Protocols
The Future of Networking, and the Past of ProtocolsThe Future of Networking, and the Past of Protocols
The Future of Networking, and the Past of Protocols
 
Distributed Clouds and Software Defined Networking
Distributed Clouds and Software Defined NetworkingDistributed Clouds and Software Defined Networking
Distributed Clouds and Software Defined Networking
 
Lecture 11 Final.pptx
Lecture 11 Final.pptxLecture 11 Final.pptx
Lecture 11 Final.pptx
 
Software-Defined Networking Layers presentation
Software-Defined Networking Layers presentationSoftware-Defined Networking Layers presentation
Software-Defined Networking Layers presentation
 
Introduction to MPLS - NANOG 61
Introduction to MPLS - NANOG 61Introduction to MPLS - NANOG 61
Introduction to MPLS - NANOG 61
 
Radisys/Wind River: The Telcom Cloud - Deployment Strategies: SDN/NFV and Vir...
Radisys/Wind River: The Telcom Cloud - Deployment Strategies: SDN/NFV and Vir...Radisys/Wind River: The Telcom Cloud - Deployment Strategies: SDN/NFV and Vir...
Radisys/Wind River: The Telcom Cloud - Deployment Strategies: SDN/NFV and Vir...
 
Topology.ppt
Topology.pptTopology.ppt
Topology.ppt
 
Network architecure (3).pptx
Network architecure (3).pptxNetwork architecure (3).pptx
Network architecure (3).pptx
 
Software Defined Networking: Network Virtualization
Software Defined Networking: Network VirtualizationSoftware Defined Networking: Network Virtualization
Software Defined Networking: Network Virtualization
 
08-sdnfvmec.pdf
08-sdnfvmec.pdf08-sdnfvmec.pdf
08-sdnfvmec.pdf
 
Introductionto SDN
Introductionto SDN Introductionto SDN
Introductionto SDN
 
Introduction to Software Defined Networking (SDN)
Introduction to Software Defined Networking (SDN)Introduction to Software Defined Networking (SDN)
Introduction to Software Defined Networking (SDN)
 
Why sdn
Why sdnWhy sdn
Why sdn
 
Topic02-Architecture.pptx
Topic02-Architecture.pptxTopic02-Architecture.pptx
Topic02-Architecture.pptx
 
Raga_SDN_NSX_1
Raga_SDN_NSX_1Raga_SDN_NSX_1
Raga_SDN_NSX_1
 
Link_NwkingforDevOps
Link_NwkingforDevOpsLink_NwkingforDevOps
Link_NwkingforDevOps
 
Vaibhav (2)
Vaibhav (2)Vaibhav (2)
Vaibhav (2)
 
4_SDN.pdf
4_SDN.pdf4_SDN.pdf
4_SDN.pdf
 
Network-aware Data Management for High Throughput Flows Akamai, Cambridge, ...
Network-aware Data Management for High Throughput Flows   Akamai, Cambridge, ...Network-aware Data Management for High Throughput Flows   Akamai, Cambridge, ...
Network-aware Data Management for High Throughput Flows Akamai, Cambridge, ...
 

Recently uploaded

Recently uploaded (20)

Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 

MPLS in DC and inter-DC networks: the unified forwarding mechanism for network programmability at scale

  • 1. Dmitry Afanasiev, fl0w@yandex-team.ru Daniel Ginsburg, dbg@yandex-team.ru Network Architects MPLS in DC and inter-DC networks: the unified forwarding mechanism for network programmability at scale
  • 3. 3 • Founded in 1993 • NASDAQ:YNDX, Mkt Cap ~$12.5B • One of Europe's largest internet companies and the leading search provider in Russia • Over 60% of the local search market • Monthly user audience of over 90 million worldwide. • Services: search, music, video, cloud storage, news, weather, maps, traffic, email, ads ... What is Yandex
  • 4. 4 • We're rather typical MS-DC • Several DCs in Russia and abroad + MPLS backbone to connect them • About 100k servers and growing fast • Mostly IPv6 internally, need to serve external IPv4 • Network architecture is a bit outdated, needs rethinking Our Infrastructure
  • 5. In Search of New Arch
  • 6. 6 • It needs to be: – Scalable – Flexible – Programmable • Lots of approaches out there, some get many things right… • But not one combines all the right pieces in the right way • It's really surprising because right combination seems almost inevitable. In Search of New Arch
  • 7. 7 • Many of the ideas have been around for years (or even decades) • Interconnection network topology – folded Clos • Let the edge handle complexity • Core just delivers packets edge to edge • Overlay/underlay logical split • Control: mix of centralized and distributed. Needs a nice way to combine both • Simple commodity network elements • Hierarchy and automation to scale the network Ideas to Build Upon
  • 8. 8 • All these are ideas are well known, understood and almost universally accepted in the industry • People are trying to implement them using a wild mix of data plane mechanisms. • And it introduces enormous complexity • What's missing? Unified forwarding mechanism What’s missing
  • 9. 9 • Life is much easier when we don't have to deal with multitude of data planes and forwarding mechanisms. • Fortunately, there is already well known, well understood, standardized forwarding plane mechanism upon which we can implement all those ideas without compromising their value. • It has well defined and standardized mapping to many other popular forwarding panes. • It's known as MPLS. Missing… or overlooked?
  • 11. 11 • Different data plane mechanisms – different features • The unified data plane should be able to support all useful features and produce their combinations • MPLS is very flexible: – forwarding over a pre-signalled virtual circuit a-la ATM - this is what RSPV- TE does – source routing over a previously discovered topology a-la Token Ring networks - see Segment Routing proposal – hierarchical LPM a-la IP - just split the address over several labels and allow routers to act on the topmost one (not that we suggest it is practical, but it is definitely possible) Flexibility
  • 12. 12 • Best way to implement arbitrary semantics is to get rid of any semantics in protocol headers and assign it externally • Hardware works with protocol headers • Control software defines the semantics An Abstract Note on Semantics
  • 13. 13 • Why combining? To have the right features at the right place or produce useful combination of features • There're basically two ways to combine different data-planes together: stitch or interwork them, and overlay them on top of each other Combining Data Planes
  • 14. 14 • It’s pain • Might be done for subset of protocol features • Need to translate between protocols (complex, never perfect, looses information) • Need to provision interworking points: fragile, operational nightmare, create bottlenecks • Seems nobody really does this anymore… Or maybe we still have to sometimes? Stitching Data Planes
  • 15. 15 • Overlay to: scale, virtualize, augment one data plane with properties of another • Overlaying is building hierarchy • But with multiple data planes it is limited and ad-hoc • Often ugly: IP over Ethernet over VXLAN over IP over Ethernet • MPLS is intrinsically hierarchical (overlayable, if you will) Overlaying Data Planes
  • 16. 16 • Many hierarchical structures are already in the network: topology, addressing, management and control • Hierarchy is the most important and the most reliable way to scale things Hierarchy is your friend
  • 17. 17 • The ability to implement hierarchy natively enables us to ditch the notion of hard overlay/underlay boundary. • In a stack of DC-label, ToR-label, port-label, slice-label, vm-label, where's the boundary of overlay/underlay? Not in the packet • Placement of the boundary only depends on how you structure your control Overlay/underlay split is a metaphor
  • 18. 18 • Can be as granular or coarse-grained as one wishes. There's no network-imposed limitation • Easy behavior aggregation. Just add an extra label on top • Easy behavior disaggregation. One can expose additional granularity by adding extra label on bottom FEC is hierarchical
  • 20. 20 • MPLS control plane is notoriously complex • Good news: you don’t have to use all of it, can pick good parts • Classical distributed control is Ok for transport • Centralized control seems better for higher level artifacts on the edge, sometimes called services • Both styles can (and should) be combined MPLS is complex?
  • 21. 21 • The device has be a bit smarter than in OF • Gets parts of label stack from different control plane components • Assembles the full stack from those parts, using local logic to follow assembly instructions provided by control plane • Assembly instructions come in form of referencing by “name” • Assembly uses late binding Enabling combinability
  • 22. 22 • MPLS VPN (abstraction A) refers to MPLS tunnels (abstraction B), using next-hop resolution. • The resolution happens on the device itself, and two control plane entities are loosely coupled - MPLS tunnels paths can change their paths, the assigned labels etc, without MP-BGP caring about it • VPN abstraction refers to tunnel abstraction using next-hops. Next-hop is the name which one control plane abstraction refers to another Enabling combinability – example
  • 23. 23 • Recursive next-hop resolution with labeled routes (RFC 3107) is the powerful way to overlay one control plane abstraction over another • Able to express almost anything we currently want. Still, more expressive way is desired • BGP 3107 is the way to interact with all- classically-controlled MPLS networks Enabling Combinability – BGP 3107
  • 24. 24 • If you can ensure that the labels at some point of the network always stay the same (because you assigned them to be so), you can use static configuration on the other side • The way to go, when one wants to avoid any signaling dependencies • Static configuration can be calculated and disseminated automatically Static Configuration
  • 25. 25 • On the host! Or even right from the application • Hypervisor switch is the easiest point. SW only, very flexible. • Naturally fits centralized control • Helps to scale. Lots of RAM, each element keeps only needed state • Modern CPUs can forward 10s of Gbps without breaking sweat Where MPLS should start?
  • 26. 26 • A simple forwarding plane (3 simple ops) • A simple software agent on the device (receives parts of label stack from different control plane components, assembles full stack, and programs the HW) • Centralized and distributed control, or anything in between • Combinability of different control plane components with late binding via names, which the device resolves Looks SDNish
  • 27. 27 • “Modularity based on abstraction is the way things get done” --Liskov • “SDN ...Not a revolutionary technology... ...just a way of organizing network functionality” -- Shenker • “SDN is merely set of abstractions for control plane, not a specific set of mechanisms.” -- Shenker • “Most lasting legacy of SDN is not better datacenters - But better ways of reasoning about network control” --Shenker What SDN is
  • 28. 28 • Let the edge handle complexity – do it on host • Core just delivers packets edge to edge – hierarchy enables the devices to be agnostic to changes on the edge • Overlay/underlay logical split – just a way to implement hierarchy • Control: mix of centralized and distributed. Needs a nice way to combine both – yeah! • Simple commodity network elements – cheap MPLS capable silicon is finally there How Ideas Map to MPLS
  • 29. 29 • Key point of S-MPLS was to extend MPLS to access and separate transport and service in MPLS network • NFV describes how to host service nodes in DC. If you don’t have MPLS in DC it’s no longer seamless • Fix is obvious – extend MPLS into DC • Labels can carry additional metadata if one wants them to NFV and Seamless MPLS
  • 30. Case Study: New Yandex DC
  • 31. 31 • Cheap and abundant bandwidth • Scalable forwarding with minimal state • Multitenancy (=> network virtualization) • Efficient resource pooling • InterDC traffic engineering • Function chaining: load balancing, FW, etc. • Interconnection with existing infrastructure • Means to integrate all of above • Local response to some events, e.g. failures • Automation at scale What we need?
  • 32. 32 We are trying to keep design really simple. Don’t need many functions often perceived as desireable: • L2 (neither real, nor emulated) • VM mobility – In scale-out applications nodes coming and going is a norm, no need to move them around while preserving state and identity – VM mobility increases complexity as it depends on other features • Multicast • We don't have too many changes in topology What we don’t need
  • 33. 33 • Host with vLER (MPLS capable vRouter) • Fabric switching elements – LSRs • Centralized controller • Legacy routers. Need to interwork with fabric LSRs and controller. BGP 3107 is the tool Components
  • 34. 34 • 3-label stack: topmost for egress switch, next for egress port, bottom for VM • vRouter uses {dst prefix, VRF} to impose label stack • Bottom label processed by destination vLER • Expected state on a fabric switch: #switches_in_the_fabric + #local_access_ports Forwarding model
  • 35. 35 • iBGP 3107 (in-path RR w/ NHS) inside fabric for reachabilty and label distribution (draft- lapukhov…, but with iBGP and labels) • iBGP 3107 to interwork with legacy routers – Session with connected network element with NHS for switch label – Session with controller for remaining labels, binds to switch label via next hop • Label mappings on edge of the fabric are stable, can be provisioned rather than signaled • Internal fabric failures are handled locally • Label mappings on vRouters are distributed centrally Control plane
  • 36. Why Now and What’s Next?
  • 37. 37 “The world is changed… I smell it in the air” • A lot of similar ideas in the industry • Seems that thinking converges on something • But ... a lot of ugly ad-hoc solutions are popping out here and there • Better implement good solution until bad ones are entrenched • It would be a shame and missed opportunity to stick with VXLAN/… for years when we could get MPLS instead Why Now?
  • 38. 38 • Merchant silicon is finally MPLS capable. And the price is almost right. • Modern CPUs can process tens of Mpps in SW, making host-based switching feasible. • Several open source MPLS data plane implementations are emerging • Several "classical" MPLS control plane components are very useful - BGP 3107, and have been there for quite long time. What’s Ready?
  • 39. 39 • All RFC3107 implementations are broken (multiple labels). Talk to your vendor • Silicon is not perfect. Talk to your vendor • A more expressive way to control late binding of control plane artifacts than BGP 3107 • Perception MPLS as complex technology. It's current MPLS control plane that is complex • Perception of MPLS as WAN or metro technology Gaps