Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

July NYC Open Networking Meeup

471 views

Published on

Learn all about the history of L2, Routing in the datacenter and BGP on servers.

Published in: Technology
  • Be the first to comment

July NYC Open Networking Meeup

  1. 1. v BGP in the Datacenter Pete Lumbis – @PeteCCDE Datacenter Architect CCIE #28677, CCDE 2012::3 cumulusnetworks.com 1
  2. 2. Pete Who? CCIE R&S #28677, CCDE 2012::3 Former Cisco TAC Routing Escalation Current Cumulus Networks SE DC Automation and Architecture
  3. 3. Agenda The history of L2 Routing in the datacenter BGP in the datacenter Troubleshooting improvements BGP on Servers cumulusnetworks.com 3
  4. 4. In the Beginning… There was L2… cumulusnetworks.com 4
  5. 5. In the Beginning… …but it had problems cumulusnetworks.com 5 50% bandwidth loss due to STP
  6. 6. In the Beginning… …but it had problems cumulusnetworks.com 6 Unexpected Root change Roo t
  7. 7. In the Beginning… …but it had problems cumulusnetworks.com 7 STP Brownout Flooding! Temporary loops!STP Block on TCN!
  8. 8. Agenda The history of L2 Routing in the datacenter BGP in the datacenter Troubleshooting improvements BGP on Servers cumulusnetworks.com 8
  9. 9. Layer 3 Clos cumulusnetworks.com 9 Server gateway is attached Leaf Routing Between Spine and Leafs 10.1.1.0/24 10.2.2.0/24 10.3.3.0/24 OSPF or BGP
  10. 10. Layer 3 – Spine and Leaf cumulusnetworks.com 10 Full ECMP
  11. 11. Layer 3 – Spine and Leaf cumulusnetworks.com 11 Full ECMP Manageable Oversubscription 48 x 10Gig = 480 Gigs 2 x 40Gig = 80 Gigs = 6:1 Oversubscription
  12. 12. Layer 3 – Spine and Leaf cumulusnetworks.com 12 Full ECMP Manageable Oversubscription Easy to Adjust 48 x 10Gig = 480 Gigs 2 x 40Gig = 80 Gigs = 6:1 Oversubscription
  13. 13. Layer 3 – Spine and Leaf cumulusnetworks.com 13 Full ECMP Manageable Oversubscription Easy to Adjust 48 x 10Gig = 480 Gigs 3 x 40Gig = 120 Gigs = 4:1 Oversubscription
  14. 14. Layer 3 – Spine and Leaf cumulusnetworks.com 14 Full ECMP Manageable Oversubscription Easy to Adjust 48 x 10Gig = 480 Gigs 3 x 40Gig = 120 Gigs = 4:1 Oversubscription
  15. 15. Layer 3 – Spine and Leaf cumulusnetworks.com 15 Full ECMP Manageable Oversubscription Easy to Adjust Massive Scale 48 x 10Gig = 480 Gigs 3 x 40Gig = 120 Gigs = 4:1 Oversubscription
  16. 16. Layer 3 – Spine and Leaf cumulusnetworks.com 16 Full ECMP Manageable Oversubscription Easy to Adjust Massive Scale Controlled Failures Leaf Failure Reduces Compute
  17. 17. Layer 3 – Spine and Leaf cumulusnetworks.com 17 Full ECMP Manageable Oversubscription Easy to Adjust Massive Scale Controlled Failures Spine Failure Increases Oversubscription
  18. 18. Agenda The history of L2 Routing in the datacenter BGP in the datacenter Troubleshooting improvements BGP on Servers cumulusnetworks.com 18
  19. 19. BGP as an IGP RFC Draft submitted 2014 Microsoft and Facebook Targeting DC All the hows and whys cumulusnetworks.com 19
  20. 20. But I thought BGP was… …slow  Nope. Not with BFD and timer tuning. Just as fast as OSPF. …hard to configure  We’ll get to that one later, but it can be easy …only for service providers  SPs build for scale and stability. You should too …hard to troubleshoot  Nice and easy when everything is defined + recent advances cumulusnetworks.com 20
  21. 21. Single ASN for Spines Unique ASN for Leafs Use Private ASN range 2-byte (1023):  64512 – 65534 4-byte (94 million):  4200000000 - 4294967294 BGP Datacenter Design cumulusnetworks.com 21 65534 65534 64512 64513 64514
  22. 22. Reducing BGP Configuration Complexity Classically lots to manage cumulusnetworks.com 22 65534 65534 64512 64513 64514 router bgp 65534 router-id 10.0.0.1 neighbor 10.1.1.1 remote-as 64512 neighbor 10.1.1.2 remote-as 64513 neighbor 10.1.1.3 remote-as 64514 neighbor 10.1.1.1 timers 1 3 neighbor 10.1.1.2 timers 1 3 neighbor 10.1.1.3 timers 1 3 neighbor 10.1.1.1 timers connect 3 neighbor 10.1.1.2 timers connect 3 neighbor 10.1.1.3 timers connect 3
  23. 23. Reducing BGP Configuration Complexity First – Simplify Remote AS cumulusnetworks.com 23 65534 65534 64512 64513 64514 router bgp 65534 router-id 10.0.0.1 neighbor 10.1.1.1 remote-as 64512 neighbor 10.1.1.2 remote-as 64513 neighbor 10.1.1.3 remote-as 64514 neighbor 10.1.1.1 timers 1 3 neighbor 10.1.1.2 timers 1 3 neighbor 10.1.1.3 timers 1 3 neighbor 10.1.1.1 timers connect 3 neighbor 10.1.1.2 timers connect 3 neighbor 10.1.1.3 timers connect 3
  24. 24. Reducing BGP Configuration Complexity First – Simplify Remote AS cumulusnetworks.com 24 65534 65534 64512 64513 64514 router bgp 65534 router-id 10.0.0.1 neighbor 10.1.1.1 remote-as external neighbor 10.1.1.2 remote-as external neighbor 10.1.1.3 remote-as external neighbor 10.1.1.1 timers 1 3 neighbor 10.1.1.2 timers 1 3 neighbor 10.1.1.3 timers 1 3 neighbor 10.1.1.1 timers connect 3 neighbor 10.1.1.2 timers connect 3 neighbor 10.1.1.3 timers connect 3
  25. 25. Reducing BGP Configuration Complexity First – Simplify Remote AS cumulusnetworks.com 25 65534 65534 64512 64513 64514 router bgp 65534 router-id 10.0.0.1 neighbor 10.1.1.1 remote-as external neighbor 10.1.1.2 remote-as external neighbor 10.1.1.3 remote-as external neighbor 10.1.1.1 timers 1 3 neighbor 10.1.1.2 timers 1 3 neighbor 10.1.1.3 timers 1 3 neighbor 10.1.1.1 timers connect 3 neighbor 10.1.1.2 timers connect 3 neighbor 10.1.1.3 timers connect 3 remote-as internal as well
  26. 26. Reducing BGP Configuration Complexity Next – Use Peer Groups cumulusnetworks.com 27 65534 65534 64512 64513 64514 router bgp 65534 router-id 10.0.0.1 neighbor 10.1.1.1 remote-as external neighbor 10.1.1.2 remote-as external neighbor 10.1.1.3 remote-as external neighbor 10.1.1.1 timers 1 3 neighbor 10.1.1.2 timers 1 3 neighbor 10.1.1.3 timers 1 3 neighbor 10.1.1.1 timers connect 3 neighbor 10.1.1.2 timers connect 3 neighbor 10.1.1.3 timers connect 3
  27. 27. Reducing BGP Configuration Complexity Next – Use Peer Groups cumulusnetworks.com 28 65534 65534 64512 64513 64514 router bgp 65534 router-id 10.0.0.1 neighbor 10.1.1.1 peer-group leafs neighbor 10.1.1.2 peer-group leafs neighbor 10.1.1.3 peer-group leafs neighbor leafs remote-as external neighbor leafs timers 1 3 neighbor leafs timers connect 3
  28. 28. Reducing BGP Configuration Complexity Finally – BGP Unnumbered cumulusnetworks.com 29 65534 65534 64512 64513 64514 router bgp 65534 router-id 10.0.0.1 neighbor 10.1.1.1 peer-group leafs neighbor 10.1.1.2 peer-group leafs neighbor 10.1.1.3 peer-group leafs neighbor leafs remote-as external neighbor leafs timers 1 3 neighbor leafs timers connect 3
  29. 29. Reducing BGP Configuration Complexity Finally – BGP Unnumbered cumulusnetworks.com 30 65534 65534 64512 64513 64514 router bgp 65534 router-id 10.0.0.1 neighbor swp1 peer-group leafs neighbor swp2 peer-group leafs neighbor swp3 peer-group leafs neighbor leafs remote-as external neighbor leafs timers 1 3 neighbor leafs timers connect 3
  30. 30. BGP Unnumbered Uses IPv6 Link Local addresses  Automatically assigned, no address management No need for infrastructure Ips  Only need Loopbacks Advertises both IPv4 and IPv6 Routes  RFC 5549. Full interop with Cisco, Arista, Juniper cumulusnetworks.com 31
  31. 31. Agenda The history of L2 Routing in the datacenter BGP in the datacenter Troubleshooting improvements BGP on Servers cumulusnetworks.com 34
  32. 32. BGP Troubleshooting Improvements - Traceroute How do you troubleshoot links without IPs? Traceroute improvements  Report back loopback IP cumulusnetworks.com 35
  33. 33. BGP Troubleshooting Improvements - Hostnames Who is the peer? Hostname BGP extension draft-walton- bgp-hostname- capability cumulusnetworks.com 36
  34. 34. Comparing BGP Configurations Traditional Config cumulusnetworks.com 37 router bgp 65534 router-id 10.0.0.1 maximum-paths 64 bgp bestpath as-path multipath-relax neighbor 10.1.1.1 remote-as 64512 neighbor 10.1.1.2 remote-as 64513 neighbor 10.1.1.3 remote-as 64514 neighbor 10.1.1.1 timers 1 3 neighbor 10.1.1.2 timers 1 3 neighbor 10.1.1.3 timers 1 3 neighbor 10.1.1.1 timers connect 3 neighbor 10.1.1.2 timers connect 3 neighbor 10.1.1.3 timers connect 3 router bgp 65534 router-id 10.0.0.1 neighbor swp1 peer-group leafs neighbor swp2 peer-group leafs neighbor swp3 peer-group leafs neighbor leafs remote-as external Cumulus Config
  35. 35. Agenda The history of L2 Routing in the datacenter BGP in the datacenter Troubleshooting improvements BGP on Servers cumulusnetworks.com 38
  36. 36. BGP to the Server Why stop at the top of rack? BGP to the Server! Cumulus Quagga, GoBGP, Bird.  Just Linux Apps! No L2, No mLAG, No Infrastructure IPs  Use BGP Unnumbered Same troubleshooting and monitoring cumulusnetworks.com 39
  37. 37. Summary L3 > L2  At least 1 better  Routing provides better scale and stability Easy to configure, automate, troubleshoot BGP all the way to the server! Smart defaults and Configuration Simplifications cumulusnetworks.com 41
  38. 38. © 2014Cumulus Networks. Cumulus Networks, the Cumulus Networks Logo, and Cumulus Linux are trademarks or registered trademarks of Cumulus Networks, Inc. or its affiliates in the U.S. and other countries. Other names may be trademarks of their respective owners.The registered trademark Linux® is used pursuant to a sublicense from LMI, the exclusive licensee of LinusTorvalds, owner of the mark on a world-wide basis. ThankYou! cumulusnetworks.com 42
  39. 39. Asaf Wachtel, Sr. Director Enterprise July 2016 25GbE Technology Update
  40. 40. © 2016 Mellanox Technologies 44- Mellanox Confidential - Open APIs Open Composable Networks Automation End-to-End Interconnect Network OS Choice SONiC
  41. 41. © 2016 Mellanox Technologies 45- Mellanox Confidential - Open Networking is Real: OCP Summit March 2016
  42. 42. © 2016 Mellanox Technologies 46- Mellanox Confidential - 25/50/100GbE: The Future is Here! Compute Nodes Storage Nodes Network 40GbE 10GbE 40GbE Compute Nodes 150% Higher Bandwidth Storage Nodes 25% Higher Bandwidth Network 150% Higher Bandwidth 100GbE 25GbE 50GbE Similar Connectors Similar Infrastructure Similar Cost / Power
  43. 43. © 2016 Mellanox Technologies 47- Mellanox Confidential - Who needs more than 10GbE?  Latest multi-core Intel CPUs can easily drive more than 10Gb/s  Cloud (public or private) • Multi-tenancy • Need to deliver higher SLAs with lower predictability  Hyperconverged / Software Defined Storage / NVMe • Network & Storage on the same wire • Faster & Cheaper storage media  Database / Big Data • Increasing volumes • Moving from batch to real-time  Network Function Virtualization (NFV) • I/O intensive data plane
  44. 44. © 2016 Mellanox Technologies 48- Mellanox Confidential - Why 25GbE? Do the Math!  Best match for current PCI technology • PCIe3x8 = ~52Gb/s; 2 x 25 = 50Gb/s  Most efficient switch silicon design • Maximizes both ports and bandwidth • 40GbE requires 4 lanes per port == cost + power  Unmatched price-performance / Best price per Gb/s • 25G = 2.5X BW at 1.5x the price  Lower OPEX & TCO • Cut number of NICs, cables, switch ports in half • Lower power & cooling  Better switch port density • Fewer uplinks needed to maintain 1:1 subscription  Uses existing fiber infrastructure (single lane)  Fully backward compatible • Mix/match new 25GbE components and existing 10GbE  Future proof + economies of scale (50/100GbE) • 50Gb is 2x25G, 100G is 4x25G 2.5X bandwidth with single-lane technology
  45. 45. © 2016 Mellanox Technologies 49- Mellanox Confidential - 25GbE Industry Timeline  March 2014: Microsoft presents proposal for 25GbE to IEEE, leveraging existing activities, such as 25G PHY (100GbE) & SFP28 (32G FC)  July 2014: Open Industry Consortium to Bring 25 and 50 Gigabit Ethernet to Cloud-Scale Networks  August 2015: First products ship to end customers  September 2015: The 25G Ethernet Consortium specification draft completed  December 2015: Multi-vendor interoperability validated by multiple customers  Q4 2015 – Q2 2016: Ecosystem grows and matures  June 2016: IEEE 802.3by standard approved by The IEEE-SA Standards Board
  46. 46. © 2016 Mellanox Technologies 50- Mellanox Confidential - 25GbE vs 10GbE 25GbE 10GbE Picture Standard SFP28 SFP+ Physical Form Factor SFP SFP Number of lanes 1 1 Lane speed 25Gbps 10Gbps Encoding 64b/66b 64b/66b Backward/Forward Compatibility Fully interoperable @ 10Gb/s Fully interoperable @ 10Gb/s Max Copper Reach 5m 7m MM Fiber Reach 100m 300m SM Fiber Reach 10KM 10KM
  47. 47. © 2016 Mellanox Technologies 51- Mellanox Confidential - 3 Types of Connectivity Products Direct Attach Copper (DAC) “Transceiver” 4-channels Transmit 4-channels Receiver Copper Wires. Directly Attaches one system to another Key feature = Lowest Priced Link <3m reaches Optical Transceiver Converts electrical signals to optical. Transmits blinking laser light over optical fiber. Key feature = long reach - up to 10Km. Active Optical Cable 2 Transceivers with optical fiber bonded in. Key feature = Lowest Priced Optical Link 100m/200m Reaches SFP28 LC Transceiver QSFP28 LC Transceiver QSFP28 MPO Transceiver
  48. 48. © 2016 Mellanox Technologies 52- Mellanox Confidential - As Data Rates Increase, Distances Decrease Favoring Silicon Photonics + Single-mode Fiber Link Length (m) 10 100 500150 300 1000 2000 10 25 50 3 51 20 DataRateperLane(Gbs) 10000500020 30 50 752 Single mode fiber OM4OM3 Copper Multi-mode fiber Silicon Photonics Direct Attach Copper • Zero power • Demo’d 8m at 100G • Best fit 3m DACs Active Optical Cables • VCSEL 100m • Silicon Photonics 200m • Best fit for 5-20m SR/SR4 VCSEL Transceivers • Reaches to 100m • Best fit for MMF • Structured cabling Silicon Photonics Transceivers • Reaches to 2km • Best fit for SMF • Parallel PSM4 or WDM4 3-5M 70m 100M MMF= MULTI-MODE FIBER SMF = SINGLE-MODE FIBER 2Km/10KmSR-SR4 VCSELs
  49. 49. © 2016 Mellanox Technologies 53- Mellanox Confidential - Webscale IT Innovation: QSFP TOR for 4x Density and Lower COGS EST = $166 Single cable! Break-out cabling vs standard cabling Ideal port density and configuration deployment options 4 cables = $216 Qty (4) cables @ $54  Benefits • Easier cable management • fewer cables • 23% lowers cost  Benefits • Flexible configuration options • Highest port density • Lowest power consumption • Half-width deployment option • 4 SFP+ plus 4 QSFP+ ports • Up to 128 ports of 10GbE in 2 RU • Illogical configuration with wasted ports * RU = rack unit • 16 QSFP28 ports (32 in 1 RU*) • Up to 128 10/25GbE ports in 1 RU • Logical configuration options: • Redundant “48 + 4” in 1 RU Mellanox Competition To achieve equivalent bandwidth $1000 less cable cost per rack
  50. 50. © 2016 Mellanox Technologies 54- Mellanox Confidential - Summary: 25/50/100GbE is Here! 100GbE Adapter 150 million messages per second 10 / 25 / 40 / 50 / 56 / 100GbE 32 100GbE Ports, 64 25/50GbE Ports 10 / 25 / 40 / 50 / 56 / 100GbE Throughput of 6.4Tb/s Transceivers Active Optical and Copper Cables 10 / 25 / 40 / 50 / 56 / 100GbE VCSELs, Silicon Photonics and Copper

×