Approaches to Designing a High-Performance Switch Router

1,861 views

Published on

This talk/tutorial was one that I delivered to multiple organizations -- ranging from semiconductor houses, to start-up system vendors, to research and academic institutions, back in the 2002 time frame. As the abstract below illustrates, it captures the key essence & principles behind the router designs of two of the most popular and landmark switch/routers in our industry -- the Cisco...

Published in: Technology, Business
  • Be the first to comment

Approaches to Designing a High-Performance Switch Router

  1. 1. Metanoia, Inc. Critical Systems Thinking™Approaches to Designing a High-Performance Switch Router Dr. Vishal Sharma Principal Consultant Metanoia, Inc. Phone: +1 408-955-0910 Email: v.sharma@ieee.org Web: http://www.metanoia- © Copyright 2002All Rights Reserved inc.com
  2. 2. Metanoia, Inc. Critical Systems Thinking™Classification of Switch Architectures  1st gen. – shared-bus based  Bus-based with central memory, centralized processing  2nd gen. – advanced shared-bus based  Bus-based with local memory, distributed processing  3rd gen. – interconnection fabric w/ multiple parallel paths  Crossbar or cross-point switch, rings, …  4th gen. – distributed switch  Interconnect smaller, ASIC-based 1st, 2nd, or 3rd generation switches in a regular topology  Centralized, high-perf. switch core, with distributed line cards©Copyright 2002,All Rights Reserved Designing a High-Performance Switch Router 2
  3. 3. Metanoia, Inc.Switch Architectures: Shared-bus Critical Systems Thinking™with Central Memory e plan B ack DMA DMA Memory CP U M em 1 3 or y Lin Li e ne Ca Ca rd rd 1 2 2 4 DMA DMA CPU LC1 R LCN R Without DMA a packet crosses bus 4 times (2 times with DMA)©Copyright 2002,All Rights Reserved Designing a High-Performance Switch Router 3
  4. 4. Metanoia, Inc.Switch Architectures: Shared-bus Critical Systems Thinking™with Central Memory Blocking if bus b/w or CPU processing < 4.N.R (2.N.R w/ DMA) Delay: function of memory I/O speed and CPU processing Throughput: upper-bounded by min(bus speed, CPU power)  Most commercial Ethernet switching platforms -- 1-2 Gb/s backplane  The most expensive backplanes today could yield up to 20 Gb/s©Copyright 2002,All Rights Reserved Designing a High-Performance Switch Router 4
  5. 5. Metanoia, Inc.Switch Architectures: Shared-bus Critical Systems Thinking™with Central MemoryExample: Cisco Catalyst 2820 Ethernet Switch (also 1900 family)  24 10BaseT and 2 100BaseT full-duplex ports (on 2820) ⇒ 440 Mbps x 2 = 880 Mbps min. bus throughput required  Bus bandwidth : 1 Gb/s  CPU: Intel 486 with 1 MB of flash  Central memory: 3 MB of RAM Observations:  10 Mbps ports ⇒ Require 20 Kpps/port for 64B packets  Available: 14.8 Kpps per port  Require: 880 Kpps aggregate forwarding perf. (Ethernet + Fast Eth.)  Available: 450 Kpps ⇒ Performance is CPU limited (not bus bandwidth limited)  Latency: ~70 us©Copyright 2002,All Rights Reserved Designing a High-Performance Switch Router 5
  6. 6. Metanoia, Inc.Switch Architectures: Shared-bus, Critical Systems Thinking™Distributed Memory & Processing Buffering Routing/ kp lane Bac DMA Lookup DMA Memory CPDM U A M em Fast Path or y L in Li e ne Ca r 2 Ca rd d 2 1 1 Slow Path Buffering CPU Full Routing LC1 R LCN R Function©Copyright 2002,All Rights Reserved Designing a High-Performance Switch Router 6
  7. 7. Metanoia, Inc.Switch Architectures: Shared-bus Critical Systems Thinking™with Distributed Memory Blocking if bus b/w or CPU processing < 2.N.R (N.R with DMA) Delay: function of memory I/O speed and CPU processing Packet forwarding via dedicated engines, one per line card (LC)  Allows line rate forwarding, even with small packets  Enables design parameter adjustment based on LC type Throughput: upper-bounded by min(bus speed, forwarding engine)©Copyright 2002,All Rights Reserved Designing a High-Performance Switch Router 7
  8. 8. Metanoia, Inc.Switch Architectures: Shared-bus Critical Systems Thinking™with Distributed MemoryExample: 3Com CoreBuilder 5000 Switching System  17 slot/chassis, 24 10BaseT’s/slot or 4 100BaseT’s/slot (or port) ⇒ 17x24x10 = 4.08 Gb/s minimum bus throughput required!  Bus bandwidth: 2 Gb/s ⇒ max. 3.9 Mpps @ 64B/packet  CPU + 18MB DRAM: for address learning, fragmentation, SPT algorithm  Packet switching:custom ASIC + 4MB DRAM per slot: for forwarding, filtering Observations:  Require: 480 Kpps/slot (Eth.) or 800 Kpps/slot (Fast Eth.)  Available: 650 Kpps per switching ASIC ⇒ Performance here is bus-bandwidth limited (not forwarding limited)  Latency: ~45-100 us  Jitter: ~ 5 us©Copyright 2002,All Rights Reserved Designing a High-Performance Switch Router 8
  9. 9. Metanoia, Inc. Switch Architectures: Inter-connect Critical Systems Thinking™ Fabric with Multiple Parallel Paths e plan I/F CPU Memory Back Fo Full Routing rw I/F ar di Function ng Sw Li itc ne Fo h Ca rw In ar te rd d rc 1 CP ing on U Li n ne ct e M Ca em rd or N y Interconnect Forwarding I/F I/F Sw itc Sw he planInte I/F itc h Mid rc Local In on te ne Memory rc ct on Fo n I/F e rwSw ct ar din itc g h In Fo Lin te e MAC MAC rc rw Ca on ar ne di ng rd ct CP 1 U Lin e LC1 LCN M Ca em rd or N y ©Copyright 2002, All Rights Reserved Designing a High-Performance Switch Router 9
  10. 10. Metanoia, Inc.Switch Architectures: Inter-connect Critical Systems Thinking™Fabric with Multiple Parallel Paths Non-blocking (for unicast) if crossbar or shared memory with adequate bandwidth (2NR) Delay: 10s of us (in an unloaded system) Throughput: full line rate, subject to queueing discipline  Provided LC processing & interconnect scheduling keep up  Note that this is not always the case! Applicability: state of the art for many current switches/routers  Cisco GSR 12000 family (high-end, core router 98-99), Ascend GRF (mid-end router, 96-97), Cisco Catalyst 8500 (low-end, enterprise router 97-98),©Copyright 2002,All Rights Reserved Designing a High-Performance Switch Router 10
  11. 11. Metanoia, Inc. Critical Systems Thinking™ Switch Architectures: Distributed Switch Route Processor with Memory Interconnect smaller switches, RP Mem 1st, 2nd, or 3rd each with the architecture of a gen. switch 1st, 2nd, or 3rd generation switch. The smaller switches are usually ASIC based Connected in a specific topology, such as a hypercube or mesh (more on this ahead) Distributed interconnect ©Copyright 2002, All Rights Reserved Designing a High-Performance Switch Router 11
  12. 12. Metanoia, Inc. Critical Systems Thinking™Switch Architectures: Distributed Switch Electrical or Optical Connections Switch Core Line Card 1 Line Card 1 Line Card 2 Line Card 2 Line Card N Line Card N RP Mem Centralized, high-performance switch core, with distributed line cards Switch core and line cards may be in different chassis Interconnect composed of optical or electronic links©Copyright 2002,All Rights Reserved Designing a High-Performance Switch Router 12
  13. 13. Metanoia, Inc. Functional Map of Processing in a Critical Systems Thinking™ Typical IP Router To Route Processor Lookup Buffer/State uP Tables Memory To Fabric Input Lookup Traffic Fabric Framing Engine Manager I/FO/E Packet Processing Physical LayerE/O From Output Link Fabric Fabric Framing Scheduler I/F Buffer/State Memory©Copyright 2002,All Rights Reserved Designing a High-Performance Switch Router 13
  14. 14. Metanoia, Inc.A Canonical Realization of the Critical Systems Thinking™Functional Map Lookup Table To Route Buffer Memory SDRAM Co- Processor LCP Proc. DRAM PC I 3.125 Gb/s SERDES Input SPI-4 Network Traffic Fabric Framer Proc. Manager I/F Trans- SFI-4 ceiver Packet Processing Switch Fabric Trans- ceiver Output Traffic Fabric Framer Manager I/F Buffer/State Memory©Copyright 2002,All Rights Reserved Designing a High-Performance Switch Router 14
  15. 15. Metanoia, Inc. Critical Systems Thinking™Juniper M40 and M160: A Comparison M40 M160 Throughput 20 Gb/s 80 Gb/s Processing @ 40 Mpps 160 Mpps 64B packets (1 pkt. proc.) (4 pkt. procs.) Back/mid-plane 25.6 Gb/s 102.4 Gb/s (full duplex) Data Slots 8 (4 ports/slot) 8 (4 ports/slot) Data Ports (max.) 8 OC-48 8 OC-192 Power (max.) 1.7 KW 3.4 KW Weight 280 lb 370 lb Size Half telco rack Half telco rack M40 M160 Dimensions 35x19x23.5 35x19x29 (HxWxD in.)©Copyright 2002,All Rights Reserved Designing a High-Performance Switch Router 15
  16. 16. Metanoia, Inc. Critical Systems Thinking™Juniper M-Series System Architecture Routing User Routing Engine Process JUNOS Router OS Interface (CPU-based) (routing & signaling protocols, system Chassis Routing Interface management) Mgmt. Table Mgmt. Computer-scale ASIC- Forwarding Engine based centralized (ASIC-based) Packet packet processor Processing Line Card Forwarding Line Card Table Packets In Packets Out Line Card Line Card Switch Fabric©Copyright 2002,All Rights Reserved Designing a High-Performance Switch Router 16
  17. 17. Metanoia, Inc. Juniper M-Series Functional System Critical Systems Thinking™ Operation Forwarding Table Internet Processor II ASIC Notification 4a 6 Distributed Buffer Distributed Buffer Manager ASIC Manager ASIC Backplane or 4b Midplane 7 5 8 64B Blocks 3 FPC Shared Memory FPC Controller (distributed on FPCs) ASICInput Port Output Port 1 2 9 10 I/O Manager I/O Manager ASIC ASIC PIC PIC Packets Packets ©Copyright 2002, All Rights Reserved Designing a High-Performance Switch Router 17
  18. 18. Metanoia, Inc. Critical Systems Thinking™Juniper M-Series Module Organization 100 Mb/s JUNOS Internet S/W Ethernet Routing Engine Misc. Control Subsys. Control Plane Data Plane #4 FPC #8 3.2 Gb/s #2 FPC #2 full duplex #1 Switching & FPC #1 Forwarding Module Packet 128 Director Distributed uP MB Buffer Mgr. #2 PCI Cntlr. #1 I/O PIC Manager #1 FT Cntlr. #4 I/O Manager Internet PIC Proc. II 12.8 Gb/s full duplex©Copyright 2002, M160 Midplane (204.8 Gb/s) M40 Backplane (51.2 Gb/s)All Rights Reserved Designing a High-Performance Switch Router 18
  19. 19. Metanoia, Inc. Critical Systems Thinking™Cisco Catalyst 6000 Family: A Comparison 6009 6513 Throughput 32 Gb/s 128 Gb/s (non-blocking) Processing @ 15 Mpps 100 Mpps (?) 256B packets 128 Gb/s (switch) Back/mid-plane 32 Gb/s (bus) 32 Gb/s (bus) Data Slots† 8 10 Data Ports (max.) 128 GbE†† 128 GbE 6000 Family Power (max.) ~1.3 KW > 2.5 KW Weight ~166 lb 240 lb Size >1/3 telco rack ~Half telco rack Dimensions 25.2x17.2x18.1 33.3x17.2x18.1 (HxWxD in.) † Only includes usable data slots † † This number of max. ports means an oversubscription of 4x (so not non-blocking!) 6513©Copyright 2002,All Rights Reserved Designing a High-Performance Switch Router 19
  20. 20. Metanoia, Inc. Critical Systems Thinking™Cisco Catalyst Family System Architecture Supervisor Engine Management Engine Network Management Routing Engine Routing MSFC (CPU-based) Table Control Plane Forwarding Engine Forwarding Data Plane Table PFC (CPU-based) Bus Packets In Packets Out Line Card Line CardFirst Generation of Catalyst: Catalyst 6000©Copyright 2002,All Rights Reserved Designing a High-Performance Switch Router 20
  21. 21. Metanoia, Inc.Cisco Catalyst Family Functional Critical Systems Thinking™System OperationSupervisor Engine 4 3 Data Bus 32 Gb/s PFC Fabric Results BusArbitration Control Bus 5 5MSFC 2 NetworkManagement #1 64KB 64KB #1 Controller Controller ASIC ASIC 1 6 #4 #4 448KB 448KB©Copyright 2002,All Rights Reserved Designing a High-Performance Switch Router 21
  22. 22. Metanoia, Inc. Critical Systems Thinking™ Cisco Catalyst 6500 System Architecture Supervisor Engine Management Engine Network Management Routing Engine Routing Table MSFC (CPU-based) Control Plane Forwarding Engine Forwarding Data Plane Table PFC (ASIC-based) Bus HeadersPackets In Packets Out Line Card Line Card Data Switching Second Generation of Catalyst: Fabric Catalyst 6500 ©Copyright 2002, All Rights Reserved Designing a High-Performance Switch Router 22
  23. 23. Metanoia, Inc. Cisco Catalyst 6500 Functional Critical Systems Thinking™ System Operation 5 4 Data Bus 32 Gb/sPFC Fabric 6 Arb. Results Bus Control BusMSFC 6 Network Mgt. Line Card 3 Line CardSupervisor 512KB 512KB ASIC Engine #1 Fabric I/F Fabric I/F 1 ASIC #1 8 2 7 #1 8 Gb/s 9 ASIC ASIC #4 #4 #4 16 Gb/s Switching Second Generation FabricCatalyst: Catalyst 6500 ©Copyright 2002, All Rights Reserved Designing a High-Performance Switch Router 23
  24. 24. Metanoia, Inc. Critical Systems Thinking™Cisco Catalyst 6500 System Architecture Supervisor Engine Third Generation of Catalyst: Management Network Catalyst 6500+ Engine Management Routing Routing Engine Table MSFC Control Plane Data Plane Forwarding Forwarding Engine EnginePackets In Packets In PFC PFC Line CardPackets Out Packets Out Switching Fabric©Copyright 2002,All Rights Reserved Designing a High-Performance Switch Router 24
  25. 25. Metanoia, Inc.Cisco Catalyst Family Functional Critical Systems Thinking™System Operation Line Card Line Card 512KB 6 512KB ASIC ASIC Fabric I/F Fabric I/F #1 1 #1 8 2 7 5 3 9 #1 #1 ASIC DFC DFC ASIC #4 #4 #4 #4 4 Supervisor Engine Network Mgt. MSFC Switching Fabric Fabric Arb. Third Generation of©Copyright 2002,All Rights Reserved Designing a High-Performance Switch Router Catalyst: Catalyst 6500+ 25
  26. 26. Metanoia, Inc.Building Very High-Speed Switches Critical Systems Thinking™from Low-speed Components Input Output Scheduler Queues Queues Input VOQ Output Links 1,1 OQ Links 1 1 1 1 1 VOQ 1 1,N 2 2 VOQ N N,1 OQ N N N N VOQN, N Virtual Output Queues Switch Fabric  Problem: scale this architecture to handle higher link speeds  Emulate output queueing  Provide some measure of perf., such as bounded delay©Copyright 2002,All Rights Reserved Designing a High-Performance Switch Router 26
  27. 27. Metanoia, Inc.Building Very High-Speed Switches Critical Systems Thinking™from Low-speed Components Global  Operate parallel switches s. t. they Scheduler collectively mimic an OQ switch  Requires  Speedup in the system D1 M1 S1  Emulation of shadow OQ switch 1 1  Iyer, Awadallah & McKeown Mj j Di i  Operate parallel switch system under control of a global scheduler DN MN  Requires N N  No speedup in the system Sk  No reordering at outputs  Mneimneh, Sharma & Siu Input Parallel Output Demultiplexers Switches Multiplexers©Copyright 2002,All Rights Reserved Designing a High-Performance Switch Router 27
  28. 28. Metanoia, Inc.Building Very High-Speed Switches: Critical Systems Thinking™References [SAN00] S. Iyer, A. Awadallah, N. McKeown, ““Analysis of a packet switch with memories running slower than the line rate,” Proc. IEEE Infocom’00, March 2000. [Sun00] S. Iyer, “Analysis of a packet switch with memories running slower than the line rate,” MS Thesis, Stanford University, May 2000. [SuM03] S. Iyer, N. McKeown, “Analysis of the parallel packet switch architecture,” to appear IEEE/ACM Trans. on Networking, April 2003. [MSS01] S. Mneimneh, V. Sharma, K. Y. Siu, “On scheduling using parallel input- output queued crossbar switches with no speedup,” Proc. IEEE Workshop on High Performance Switching & Routing (HPSR’01), May 2001. [MSS02] S. Mneimneh, V. Sharma, K. Y. Siu, “Switching using parallel input-output queued switches with no speedup,” IEEE/ACM Trans. on Networking, vol. 10, no. 5, Oct. 2002. [Mne02] S. Mneimneh, “Algorithms for high-speed switching and routing,” Ph.D. Thesis, MIT, June 2002.©Copyright 2002,All Rights Reserved Designing a High-Performance Switch Router 28

×