Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

RIFT A New Approach to Building DC Fabrics

214 views

Published on

RIFT A New Approach to Building DC Fabrics
Nitin Vig, Juniper Networks

Published in: Internet
  • Be the first to comment

  • Be the first to like this

RIFT A New Approach to Building DC Fabrics

  1. 1. © 2018 Juniper Networks RIFT A new approach to building DC fabrics Nitin Vig Chief Architect, Juniper Networks
  2. 2. © 2018 Juniper Networks AGENDA 2 Datacenter Fabric Trends Introduction to RIFT RIFT key features Industry status Summary
  3. 3. © 2018 Juniper Networks DATACENTER FABRIC - TRENDS Hybrid Clouds are here to stay • Hybrid cloud for many reasons, one of them to keep real-estate from Hyper scalers • Customers are hosting their content & critical business processes; Need to build own fabrics • Impossible to sustain proprietary OPEX efforts Fabrics are becoming Uniform, Local & Regular • Vast amount of bandwidth close to the producer & consumer necessary • Fabric architectures being adopted outside the conventional DC (Metro, PoP) • WAN-style Traffic Engineering & protection replaced by Wide Fan-out & distributed systems redundancy Fabric is the new “RAM chip” • No one configures RAM banks manually in every laptop • IP fabrics HW is largely commodity already • IP fabrics will “OPEX commoditize” (consume bandwidth) 3
  4. 4. © 2018 Juniper Networks DATACENTER FABRIC – TECHNOLOGY EVOLUTION Tree to CLOS topology • Tree: core/aggregation/access layers • Folded CLOS or Fat Trees: Spine & Leaf Layer2 switching to Layer3 routing • Layer 3 routing underlay with Layer2/3 overlay Layer3 underlay routing options: IGP > eBGP • For scaling. Convergence & OPEX considerations 4 Folded Original Fat Tree (based on CLOS) Folder Fat Tree
  5. 5. © 2018 Juniper Networks DATACENTER FABRIC: ROUTING PROTOCOL CHALLENGES • Routing protocols are complex (to deal with irregular topologies) • Routing protocols are: • EITHER: Fast, but not scalable to 100k nodes (link-state) • OR: Slow, when scalable to 100k nodes (distance-vector) CURRENT ROUTING PROTOCOLS DATACENTER FABRICS Built for irregular network topologies Low degree of connectivity Uniform topology (CLOS, folded Fat-Tree) High degree of connectivity (Hyper-scale DCs) NOT A PERFECT MATCH
  6. 6. © 2018 Juniper Networks 6 REQUIREMENT BGP (modified for DC) ISIS (modified for DC) 01 Close to Zero Touch Provisioning 02 Link discovery/Automatic forming of trees/preventing cabling violations ⚠ ⚠ 03 Minimal amount of routes/information on ToRs (cost-optimized) 04 High degree of ECMP (BGP needs lots knobs, memory, own-as-path violations) ⚠ 05 Traffic engineering by Next-hops, Prefix modification 06 See all links in topology to support PCE/SR ⚠ 07 Carry opaque configuration data (key-value) efficiently ⚠ 08 Take a node out of production quickly and without disruption (overload) 09 Automatic disaggregation on failures to prevent black-holing 10 Minimal blast radius on failures 11 Fastest possible convergence on failures DATACENTER FABRIC: KEY REQUIREMENTS
  7. 7. © 2018 Juniper Networks LET’S TAKE A FRESH LOOK Distance Vector (RIP) 7 Link State (ISIS, OSPF) Path Vector (BGP) Vectors of destination and distance “Tell you neighbors rest of the network” Router announced LSDB, Dijkstra “Tell rest of the network your neighbors” Full-paths announced in BGP “Paths described by sequence of ASs” Routing protocols in our network
  8. 8. © 2018 Juniper Networks LINK STATE v/s DISTANCE/PATH VECTOR Link State • Topology view à TE enabler • Fast propagation Distance/Path Vector • Granular policy control & traffic engineering time time Node 1 Node 0 Node 3 Node 2 Node 5 Node 4 Node 1 Node 0 Node 3 Node 2 Node 5 Node 4 computation Update tx-mission Link State Convergence Distance/Path Vector Convergence Both protocols types (LS and Distance/Path Vector) are frequently used in todays networks
  9. 9. © 2018 Juniper Networks RIFT: ROUTING IN FAT TREES • CLOS optimized routing protocol • Full BW Utilization • Built in Fabric Provisioning • Fast convergence 9 Clean slate approach to building DC Fabrics Market Requirements Juniper Invention • Link-State (North) + Distance-Vector (South) • Simplest leaf Implementation • Failure Domain Containment • Support all DC applications
  10. 10. © 2018 Juniper Networks RIFT AT A GLANCE 1. Topological sort • Uses the concept of directionality 2. Link-State flood Up (North) • Full topology and all prefixes @ top spine only 3. Distance Vector Down (South) • 0/0 is sufficient to send traffic Up. • More-specific prefixes advertised in specific scenarios (link failures, traffic engineering) 4. Bounce • Flood reduction • Automatic dis-aggregation
  11. 11. © 2018 Juniper Networks RIFT IN STEADY STATE – BASICS Aggregation Localization Pfx: 0/0 Pfx Y Pfx Z Pfx ZPfx YPfx XPfx W Pfx: 0/0 Spine (Level 2)Learn Pfx A,B,C,D from Spine (level 1) Spine (Level 1) Learn 0/0 from Spine (level 2) Learn Pfx A,B,C,D from Leaf (level 0) Leaf (Level 0)Learn 0/0 from Spine (level 1)
  12. 12. © 2018 Juniper Networks POD 1 Pfx DPfx CPfx BPfx A Spine (Level 2) Spine (Level 1) Leaf (Level 0) RIFT FEATURES DETECTING CABLING MIS-CONFIGURATION Problem statement: Fabric should automatically detect and block wrong cabling. Automatic rejection of adjacencies based on minimal configuration • A1 to B1: Forbidden due to POD mismatch • A0 to B1: Forbidden due to POD mismatch (A0 already formed A0-A1 even if POD not configured on A0) • B0 to C0: Forbidden based on level mismatch POD 0 C0 A0 A1 B0 B1
  13. 13. © 2018 Juniper Networks RIFT FEATURES (NEAR) ZERO TOUCH PROVISIONING Problem statement: Fabric should auto-configure with close to zero-touch Automatic SystemID derivation • RIFT SystemID (64 bits) is automatically derived from node’s EUI-64 Top-level (superspine) switches must be manually configured • Either: with flag=SUPERSPINE (default level 16) • Or: explicit level (e.g.: level 3 in the example) A node with non-configured level derives its level from the neighbor’s level (highest neighbor’s level – 1) • E, F -> derived level 2 • I, J -> derived level 1 Node with flag=LEAF_ONLY has always derived level 0 J N F Level 0 Level 1 Level 2 Level 3A E I M Flag = LEAF_ONLY Flag = LEAF_ONLY level=3 manual
  14. 14. © 2018 Juniper Networks A0 RIFT FEATURES ROUTING IN FAILURE: AUTOMATIC DISAGGREGATION Problem statement: Avoid any traffic black-holing due to Link failures 1) Link C2 – B1 breaks. C2 looses reachability to Pfx Y & Z 2) C2 sends updates with only one Nbr (A1) 3) D2 receives update from C2: • Our neighbors don’t match (B1 is missing) • C2 has no reachability to pfx Y & Z • Lower level nodes use 0/0 – risk of traffic black hole. 4) D2 originates new update w/ disaggregated prefixes (Y,Z) Note: • Nodes on lower level (A1, B1) get more specific route. • Nodes further down [Level 0] still can use 0/0 only A1 C2 Pfx ZPfx YPfx XPfx W D2 learns C2 has lost Nbr B1 3 D2 Pfx 0/0 à C2, D2 Pfx Y,Z à D2 Pfx 0/0 à A1, A2 B1C2 – B1 link fails 1 C2 sends only Nbr A1 in update 2 D2 advertises specific route to pfx Y & Z 4
  15. 15. © 2018 Juniper Networks RIFT FEATURES FLOODING REDUCTION: FOR HIGHLY MESHED DC TOPOLOGIES Problem statement: Avoid redundant information in highly meshed topologies N-port spine switch Level 2 spine – all N ports are southbound Level 1 spine • N/2 ports are Southbound • N/2 ports are Northbound Link-State Flooding become over-kill (known problem in link- state protocols)
  16. 16. © 2018 Juniper Networks RIFT FEATURES FLOODING REDUCTION: HAPPENS IN THE NORTH DIRECTION Each ‘L’ node which ‘L+2’ nodes are reachable via particular “L+1’ nodes Single ‘L+1’ node can flood updates from ‘L’ node to given set of ‘L+2’ nodes -> Flood Repeater (FR) node For redundancy, in RIFT ‘L’ node selects at least two ‘L+1’ nodes as FRs (using a selection algorithm) Updates sent to non-FRs marked with ‘do-not-reflect’ flag Similar algorithm is executed at each level.L L+1 L+2 XX XX
  17. 17. © 2018 Juniper Networks RIFT FEATURES WEIGHTED BANDWIDTH LOAD-BALANCING Problem Statement: Load-balance traffic across links based on link capacity Weighted Bandwidth load-balancing example: 1. Each upstream node gets a value based on available bandwidth • Upstream node BW = BW to upstream node + uplink BW upstream node • On X, upstream node I & J -> 2 x 10G + 4 x 40G = 180G • Upstream node BW is converted to next exponent of 2 • On X, upstream node I & J -> 180G -> 8 (Note: 27 < 180 < 28) • Exponent for I & J = 8 2. Received route’s metric is adjusted based on above value (BAD – Bandwidth Adjusted Distance) • BAD = original D * (1 + Max_Upstream_Exp – Current_Upstream_Exp) • On X, upstream node I -> BAD = D * (1 + 8 - 8) = D • On X, upstream node J -> BAD = D * (1 + 8 - 8) = D • Equal BW load-balancing -> distance (metric) not adjusted J Y F A E I X 10G 40G 100G
  18. 18. © 2018 Juniper Networks 18 REQUIREMENT BGP (modified for DC) ISIS (modified for DC) RIFT 01 Close to Zero Touch Provisioning 02 Link discovery/Automatic forming of trees/preventing cabling violations ⚠ ⚠ 03 Minimal amount of routes/information on ToRs (cost-optimized) 04 High degree of ECMP (BGP needs lots knobs, memory, own-as-path violations) ⚠ 05 Traffic engineering by Next-hops, Prefix modification 06 See all links in topology to support PCE/SR ⚠ 07 Carry opaque configuration data (key-value) efficiently ⚠ 08 Take a node out of production quickly and without disruption (overload) 09 Automatic disaggregation on failures to prevent black-holing 10 Minimal blast radius on failures 11 Fastest possible convergence on failures RIFT FEATURES SUMMARY DATACENTER FABRIC: KEY REQUIREMENTS
  19. 19. © 2018 Juniper Networks INDUSTRY STATUS Standardization • Initiated by Antoni Przygienda (Juniper Networks) • Standards Track Working Group Draft (I-D) • Base for further work toward RFC • https://tools.ietf.org/html/draft-ietf-rift-rift-06 Co-operation • Join work at IETF WG (JNPR, CSCO, Nokia, Comcast) • Contact authors, share opinion • The data structures for packet are public (GPB) I-D RFC STD individual Availability • RIFT on python: https://github.com/brunorijsman/rift- python • RIFT trial code available from Juniper: https://www.juniper.net/us/en/dm/free-rift-trial/ • Production-ready Juniper code: Q4’2019 Relevant drafts • Policy-guided prefixes with RIFT: https://tools.ietf.org/html/draft-atlas-rift-pgp-01 • RIFT YANG model: https://tools.ietf.org/html/draft-ietf-rift-yang-00 • Segment Routing in Fat Trees (SRIFT): https://tools.ietf.org/html/draft-zzhang-rift-sr-01
  20. 20. © 2018 Juniper Networks SUMMARY: RIFT PROTOCOL ADVANTAGES • Fastest possible convergence • Automatic topology detection • Minimal routes on TORs • High degree of ECMP • Fast de-commissioning of Nodes • Excessive flooding • Manual neighbor detection • Zero-touch provisioning • Automatic disaggregation on failure • Minimal blast radius on failures • Utilize all fabric paths without loops • Support for non-ECMP paths • Key-Value Store Link-State and Distance Vector Take ‘best of both’ Leave ‘not-so-good’ Unique RIFT additions
  21. 21. © 2018 Juniper Networks THANK YOU nitinvig@juniper.net

×