Cisco 刘洋 从“路由”回归“交换”


Published on


Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Cisco 刘洋 从“路由”回归“交换”

  1. 1. 从“路由”回归“交换” --探讨数据中心网络的演变 刘 洋 思科中国互联网运营商事业部 技术总监
  2. 2. “交换”的烦恼•物理连接层次•透明生成树,二层多路径,网络收敛•Unicast Flooding,环路,广播风暴
  3. 3. “路由”后的幸福生活•ECMP(Equal Cost Multi Path);•平滑扩展;•快速收敛;•防止广播风暴;
  4. 4. 烦恼•集群的规模•网段地址规划•路由控制平面•虚机•开放平台,云计算•价格•Dumb Big Flat
  5. 5. 从“路由”回归“交换”--大型数据中心的交换网络 FabricPath• Turn your network into a Fabric!• 关键技术:FabricPath / Trill
  6. 6. FabricPath对于二层交换的创新• 实现交换机间多条路径同时转发流量ECMP(EqualCost Multi Path);去除透明生成树• 类似路由网络的平滑扩展;• 快速收敛;• 防止广播风暴(TTL);• 保持原有二层网络• 基于会话的MAC地址学习• 成本降低
  7. 7. FabricPath的设计目标Switching FabricPath Routing Minimal Configuration  Configuration Intense Plug & Play  Configured Learning Auto Discovery  Configured Discovery Auto Learning  Plan & Play Flat Addressing  Fast Convergence Spanning Tree Protocol  Multiple Paths (STP)  Load Balancing Slow Convergence  Multiple Multicast Trees Single Path  Hierarchical Forwarding Edge-to-Root Rigid Design  Any-to-any Flexible Design Single Multicast Tree  Highly Scalable Constrained Scaleability
  8. 8. FabricPath 封装结构 16-Byte MAC-in-MAC Header Classical Ethernet Frame DMAC SMAC 802.1Q Etype Payload CRC Original CE FrameCisco FabricPath Outer Outer FP CRC DA SA Tag DMAC SMAC 802.1Q Etype Payload Frame (48) (48) (32) (new) 6 bits 1 1 2 bits 1 1 12 bits 8 bits 16 bits 16 bits 10 bits 6 bits OOO/DL RSVDEndnode ID Endnode ID Sub U/L I/G Switch ID Port ID Etype Ftag TTL (5:0) (7:6) Switch ID  Switch ID – Unique number identifying each FabricPath switch  Sub-Switch ID – Identifies devices/hosts connected via VPC+  Port ID – Identifies the destination or source interface  Ftag (Forwarding tag) – Unique number identifying topology and/or multidestination distribution tree  TTL – Decremented at each switch hop to prevent frames looping infinitely
  9. 9. FabricPath 控制平面:L2 IS-IS  L2 IS-IS 替代STP作为控制平  提升故障检测,网络收敛及高 面协议 可用性  引入链路状态协议以支持二层  Minimal IS-IS knowledge 环境下的ECMP能力 required –无需用户手动配置  交换Switch IDs的可达性并构建 保持了二层的即插即用特性 转发拓扑 FabricPath IS-IS STP BPDU STP BPDU STP FabricPath
  10. 10. A few key reasons:  仅维系设备之间的可达性信息,而L2 Fabric 无需IP地址的信息 – 非L3协议,是 解决L2 环境下MAC地址传递的协议 创新  易扩展–可使用定制的TLVs来传递信 息  具备SPF功能– 优秀的拓扑构建及收 敛能力 FabricPath Port CE Port
  11. 11. FabricPath 的数据平面 DSID→20 DSID→20 → FabricPath interface SSID→10 DMAC→B SSID→10 → CE interface SMAC→A DMAC→B Payload SMAC→A S10 Payload S20 Ingress FabricPath Egress FabricPath Switch Switch Payload DMAC→B SMAC→A SMAC→A DMAC→B Payload FabricPath Core DMAC→B Payload SMAC→A MAC A MAC B SMAC→A Payload DMAC→B  入口FabricPath 交换机决定目的交换机ID 并且插入FabricPath 头封装  目的交换机ID 作为路由决策参考  核心内部无需终端MAC 的学习和查找  出口FabricPath 交换机去除FabricPath 头封装并转发给CE设备
  12. 12. FabricPath MAC 转发表  Edge switches maintain both MAC address table and Switch ID table  Ingress switch uses MAC table to determine destination Switch ID  Egress switch uses MAC table to determine output switchport S10 S20 S30 S40 FabricPath MAC Table on S100 MAC IF/SID Local MACs point to switchports A e1/1 S100 S101 FabricPath S200 B e1/2Remote MACs point C S101 to Switch IDs D S200 MAC A MAC B MAC C MAC D
  13. 13. FabricPath Routing 转发表  FabricPath IS-IS manages Switch ID (routing) table  All FabricPath-enabled switches automatically assigned Switch ID (no user configuration required)  Algorithm computes shortest (best) paths to each Switch ID based on link metrics  Equal-cost paths supported between FabricPath switches S10 S20 S30 S40 FabricPath Routing Table on S100 Switch IFOne ‘best’ pathto S10 (via L1) S10 L1 S20 L2 S30 L3 L1 L2 L3 L4 S40 L4Four equal-cost S101 L1, L2, L3, L4 paths to S101 … … FabricPath S200 L1, L2, L3, L4 S100 S101 S200
  14. 14. FabricPath Routing 转发表项构建Switch IF Switch IF S20 L1,L5,L9 S10 L4,L8,L12 S30 L1,L5,L9 S20 L4,L8,L12 S40 L1,L5,L9 S30 L4,L8,L12 S10 S20 S30 S40S100 L1 S100 L4S101 L5 S101 L8 … … … …S200 L9 S200 L12 L5 L6 L7 L8 L1 L2 L3 L4 L9 L10 L11 L12 S100 S101 FabricPath S200Switch IF Switch IF S10 L1 S10 L9 S20 L2 S20 L10 S30 L3 S30 L11 S40 L4 MAC A MAC B MAC C MAC D S40 L12S101 L1, L2, L3, L4 S100 L9, L10, L11, L12 … … S101 L9, L10, L11, L12S200 L1, L2, L3, L4 … …
  15. 15. Putting It All Together – Host A to Host B (1) Broadcast ARP Request Root for Root for Multidestination Tree 1 Tree 2 Trees on Switch 10 S10 S20 S30 S40 Tree IF DSID→FFFtag → 1 L1,L5,L9 Ftag→1 2 L9 SSID→100 DSID→FF Ftag→1 DMAC→FF L5 L6 L7 L8 SSID→100 SMAC→A Multidestination L1 L2 L3 L4 DMAC→FF Payload L9 L10 L11 L12 Trees on Switch 100 SMAC→A Tree IF PayloadBroadcast → 1 L1,L2,L3,L4 S100 S101 FabricPath S200 2 L4 Multidestination Trees on Switch 200 FabricPath Payload Tree IF MAC Table on S100 DMAC→FF Ftag → 1 L9 SMAC→A MAC IF/SID SMAC→A 2 L9,L10,L11,L12 DMAC→FF A e1/1 (local) Payload MAC A MAC B FabricPath MAC Table on S200 MAC IF/SID Don’t learn MACs in flood frames Learn MACs of directly-connected devices unconditionally
  16. 16. Putting It All Together – Host A to Host B (2) Unicast ARP Reply Multidestination Trees on Switch 10 S10 S20 S30 S40 Tree IFFtag → 1 L1,L5,L9 2 L9 DSID→MC1 DSID→MC1 Ftag→1 Ftag→1 L5 L6 L7 L8 SSID→200 SSID→200 Multidestination DMAC→A DMAC→A L1 L2 L3 L4 L9 L10 L11 L12 Trees on Switch 100 SMAC→B SMAC→B Payload Payload Tree IFFtag → 1 L1,L2,L3,L4 S100 S101 FabricPath S200 2 L4 Multidestination Trees on Switch 200 FabricPath DMAC→A Payload Tree IF MAC Table on S100 SMAC→B Unknown → 1 L9 SMAC→B MAC IF/SID Payload DMAC→A 2 L9,L10,L11,L12 A→ A e1/1 (local) MAC A MAC B B S200 (remote) FabricPath MAC Table on S200 MAC IF/SID If DMAC is known, then A→ learn remote MAC B e12/2 (local)
  17. 17. Putting It All Together – Host A to Host B (3) Unicast Data FabricPath Routing Table on S30 S10 S20 S30 S40 Switch IF … …S200 → S200 L11 DSID→200 DSID→200 Ftag→1 Ftag→1 SSID→100 L5 L6 L7 L8 SSID→100 DMAC→B FabricPath Routing L1 L2 L3 L4 DMAC→B L9 L10 L11 L12 Table on S100 SMAC→A SMAC→A Payload Switch IF Hash Payload S10 L1 S100 S101 FabricPath S200 S20 L2 FabricPath Routing S30 L3 Table on S30 S40 L4 Switch IF Payload DMAC→B S101 L1, L2, L3, L4 … … SMAC→A SMAC→A … … Payload S200 → S200 – DMAC→BS200 → S200 L1, L2, L3, L4 MAC A MAC B FabricPath FabricPath MAC Table on S200 MAC Table on S100 MAC IF/SID MAC IF/SID A S100 (remote) A e1/1 (local) B→ B e12/2 (local) B→ B S200 (remote)
  18. 18. 基于会话的MAC学习 FabricPath MAC Table on S300 MAC IF/SID B S200 (remote) S300 C e7/10 (local) FabricPath MAC C S100MAC Table on S100MAC IF/SID A e1/1 (local) B S200 (remote) FabricPath FabricPath MAC Table on S200 Core S200 MAC A IF/SID S100 (remote) MAC A B e12/1(local) C S300 (remote) MAC B
  19. 19. Conversational MAC Learning优化资源利用率 – Learning only the MAC addresses required 250 250 MACs MACs MAC IF 500 500 MACs MACs MAC IF L2 Fabric B 2/1 STP S11 B Domain MAC IF 500 500 C 3/1 MACs MACs A S11 250 250 MACs MACs A C  ALL MACs needs to be  Local MAC: Source-MAC Learning only learn on EVERY Switch happen to traffic received on CE Ports  Large L2 domain and  Remote MAC: Source-MAC for traffic virtualization present challenges to MAC received on FabricPath Ports are only Table scalability learned if Destination-MAC is already known as Local
  20. 20. Architectural Approach for MSDC Scale-Up Spine Lean Core CLOS Scale-Out Leaf Smart Edge Same node type used in  High density spine  Layer-1.5 Spine all roles (Spine and node (Dumb Core) Edge) Fine Grain Redundancy  Smaller fixed leaf  Intelligent Edge Additional density  Fewer control provided through planes than pure density of node or additional layers Clos
  21. 21. Fabricpath 构建通用网络交换平台 POD 1 POD 2 POD 3 PODS 1-3 VLANs 100-199 VLANs 200-299 VLANs 300-399 VLANs 100-399
  22. 22. 大规模数据中心的通用网络交换平台--网络对业务部署灵活性的支持 模块化 易扩展 网络带宽及延时的一致性 与服务器所处位置无关 业务的快速部署 计算资源的灵活移动和调配 Any service on any server, at any time!!! 可扩展性 业务/集群的扩展不再受制于网络 服务器的使用效率 服务器重复利用 可管理性 即插即用,配置最简化,人工干预少 可靠性 单点故障对整体业务的影响
  23. 23. 从“路由”回归“交换” --中小型数据中心的交换网络 Nexus 7000/5000 Virtualized chassis Nexus 5000 + Nexus 2000 Fabric Extender =• Turn your network into a Switch• 关键技术:远端扩展模块,FEX as TOR
  24. 24. FEX Terminology FEX can be connected to a parent switch Parent switch in three ways: single attached without any vPC running on the Fabric Links parent switch single attached with vPC running on the parent NIFs switch dual attached in vPC mode HIFs vPC vPC Primary Secondary vPC vPC Primary Secondary Fabric Links Fabric Links NIFs NIFs vPC 1 vPC 2 HIFs HIFs
  25. 25. FEX Inner FunctioningInband Management Model software image, configuration Fabric extender is discovered by switch using an L2 Satellite N5k01 Discover Protocol (SDP) that is run on the uplink port of fabric extender 1,2,3,4 Core Switch checks software image Core Switch pushes programming data to Fabric Extender 1-48 GigE
  26. 26. Data Center-Wide Scalability at Layer 2 • 扁平化结构 • 应用在更大区域的灵活部署 • 线速的网络
  27. 27. 谢谢