Evolving Data Center Switching<br />Part 1.  Setting the Stage for Layer 2 Multi-pathing (TRILL)<br />v 1.0<br />Brad Hedl...
 - About the Author -<br />Brad Hedlund<br />Technical Solutions Architect, Data Center<br />Cisco Systems, Inc.<br />CCIE...
Why Evolve Data Center Switching?<br />Transformational Paradigm Shifts<br />From Connectivity to Virtualization<br />New ...
Revisiting Classic Ethernet<br />What works? What needs to be improved?<br />SW1<br />SW2<br />SW3<br />INTERNETWORK EXPER...
Flooding Behavior<br />Plug & Play<br />Unknown UNICAST<br />BROADCAST<br />B<br />B<br />MAC Table<br />A   Eth 1/1<br />...
Assists in MAC Learning
Foundational Behavior
Assists in Discovery
Assists in MAC Learning
Plug & Play</li></ul>A L L<br />A L L<br />A L L<br />C<br />SW1<br />SW1<br />D<br />D<br />D<br />C<br />A<br />A<br />I...
Classical Ethernet MAC Learning<br />All switches learn all MACs<br /><ul><li>Plug & Play
Inefficient</li></ul>Improvement Area (TRILL*)<br />SW1  MAC Table<br />A  -- Eth 1<br />B  -- Eth 1<br />C  -- Eth 2<br /...
Flooding with Multiple Paths<br />Plug & Play loop prevention is needed…<br />SW1<br />SW2<br />SW3<br />SW4<br />No Ether...
Classical Ethernet Loop Prevention<br />Spanning Tree Protocol (STP)<br />STP Enforced Tree Topology<br />SW2<br />SW1<br ...
Loop Free for flooded traffic
Single Topology for all traffic
Single Path for Unicast
Single Path for Multicast
50% Bandwidth Unused
Timer based recovery</li></ul>Improvement Areas (TRILL*)<br />A<br />B<br />INTERNETWORK EXPERT .ORG<br />5/7/2010<br />
Scaling the data center network<br />Challenges, Approaches, <br />SW1<br />SW2<br />SW3<br />SW4<br />SW5<br />SW6<br />I...
Scaling out Tier 1<br />SW1<br />SW2<br />SW5<br />SW6<br />Why scale out Tier 1?<br /><ul><li>More Bandwidth at Tier 2
Larger Tier 2 Scalability
Smaller Tier 1 switches
Spread Risk (RAID)
Pay as you grow
Upcoming SlideShare
Loading in …5
×

Evolving Data Center switching with TRILL

1,726 views

Published on

Setting the stage for TRILL, rethinking data center switching.
By Brad Hedlund

Published in: Technology
1 Comment
2 Likes
Statistics
Notes
No Downloads
Views
Total views
1,726
On SlideShare
0
From Embeds
0
Number of Embeds
105
Actions
Shares
0
Downloads
0
Comments
1
Likes
2
Embeds 0
No embeds

No notes for slide

Evolving Data Center switching with TRILL

  1. 1. Evolving Data Center Switching<br />Part 1. Setting the Stage for Layer 2 Multi-pathing (TRILL)<br />v 1.0<br />Brad Hedlund<br />Cisco Systems, Inc.<br />INTERNETWORK EXPERT .ORG<br />5/7/2010<br />
  2. 2. - About the Author -<br />Brad Hedlund<br />Technical Solutions Architect, Data Center<br />Cisco Systems, Inc.<br />CCIE #5530<br />http://www.internetworkexpert.org/about/<br />Blog: http://www.internetworkexpert.org<br />Twitter: http://twitter.com/bradhedlund<br />E-mail: bhedlund@cisco.com<br />Comments welcome.<br />INTERNETWORK EXPERT .ORG<br />5/7/2010<br />
  3. 3. Why Evolve Data Center Switching?<br />Transformational Paradigm Shifts<br />From Connectivity to Virtualization<br />New Requirements<br />Large, Flat, Scalable L2 fabrics<br />Simpler, Smarter Networks<br />Ultra High Availability<br />Storage/Ethernet consolidation<br />More bandwidth<br />The Server is a fluid object<br />The virtual machine is the new Server<br />The physical machine is the new Network<br />Miniaturization & Scale <br />Any server, any VLAN, anywhere, anytime<br />What’s Next?<br />The “Data Center” becomes a fluid object<br />INTERNETWORK EXPERT .ORG<br />5/7/2010<br />
  4. 4. Revisiting Classic Ethernet<br />What works? What needs to be improved?<br />SW1<br />SW2<br />SW3<br />INTERNETWORK EXPERT .ORG<br />5/7/2010<br />Narrative of this section located at: http://www.internetworkexpert.org/2010/05/07/setting-the-stage-for-trill/<br />
  5. 5. Flooding Behavior<br />Plug & Play<br />Unknown UNICAST<br />BROADCAST<br />B<br />B<br />MAC Table<br />A Eth 1/1<br />B Eth 1/2<br />C Eth 1/3<br /><ul><li>Allows Plug & Play
  6. 6. Assists in MAC Learning
  7. 7. Foundational Behavior
  8. 8. Assists in Discovery
  9. 9. Assists in MAC Learning
  10. 10. Plug & Play</li></ul>A L L<br />A L L<br />A L L<br />C<br />SW1<br />SW1<br />D<br />D<br />D<br />C<br />A<br />A<br />INTERNETWORK EXPERT .ORG<br />5/7/2010<br />
  11. 11. Classical Ethernet MAC Learning<br />All switches learn all MACs<br /><ul><li>Plug & Play
  12. 12. Inefficient</li></ul>Improvement Area (TRILL*)<br />SW1 MAC Table<br />A -- Eth 1<br />B -- Eth 1<br />C -- Eth 2<br />D -- Eth 2<br />B<br />SW1<br />Unnecessary<br />2<br />1<br />A L L<br /> A<br />A L L<br /> A<br />A L L<br /> A<br />A L L<br /> A<br />A L L<br /> A<br />A L L<br /> A<br />A L L<br /> D<br />A L L<br /> C<br />A L L<br />A L L<br /> B<br />A L L<br />A L L<br />A L L<br />A L L<br />A L L<br />A L L<br />SW3 MAC Table<br />A -- Eth 1<br />B -- Eth 2<br />C -- Eth 3<br />D -- Eth 3<br />SW4 MAC Table<br />A -- Eth 3<br />B -- Eth 3<br />C -- Eth 1<br />D -- Eth 2<br />SW4<br />SW3<br />3<br />3<br />1<br />2<br />2<br />1<br />C<br />D<br />Useful<br />A<br />INTERNETWORK EXPERT .ORG<br />5/7/2010<br />
  13. 13. Flooding with Multiple Paths<br />Plug & Play loop prevention is needed…<br />SW1<br />SW2<br />SW3<br />SW4<br />No Ethernet TTL<br />Infinite Loop<br />Broadcast<br />Unknown Unicast<br />Flooding Required<br />A<br />B<br />Improvement Area (TRILL)<br />INTERNETWORK EXPERT .ORG<br />5/7/2010<br />
  14. 14. Classical Ethernet Loop Prevention<br />Spanning Tree Protocol (STP)<br />STP Enforced Tree Topology<br />SW2<br />SW1<br />SW1<br />R<br />R<br />SW3<br />SW2<br />SW4<br />SW3<br />SW4<br /><ul><li>Plug & Play
  15. 15. Loop Free for flooded traffic
  16. 16. Single Topology for all traffic
  17. 17. Single Path for Unicast
  18. 18. Single Path for Multicast
  19. 19. 50% Bandwidth Unused
  20. 20. Timer based recovery</li></ul>Improvement Areas (TRILL*)<br />A<br />B<br />INTERNETWORK EXPERT .ORG<br />5/7/2010<br />
  21. 21. Scaling the data center network<br />Challenges, Approaches, <br />SW1<br />SW2<br />SW3<br />SW4<br />SW5<br />SW6<br />INTERNETWORK EXPERT .ORG<br />5/7/2010<br />Narrative of this section located at: http://www.internetworkexpert.org/2010/05/07/setting-the-stage-for-trill/<br />
  22. 22. Scaling out Tier 1<br />SW1<br />SW2<br />SW5<br />SW6<br />Why scale out Tier 1?<br /><ul><li>More Bandwidth at Tier 2
  23. 23. Larger Tier 2 Scalability
  24. 24. Smaller Tier 1 switches
  25. 25. Spread Risk (RAID)
  26. 26. Pay as you grow
  27. 27. Lower latency</li></ul>Tier 2<br />SW3<br />SW4<br />INTERNETWORK EXPERT .ORG<br />5/7/2010<br />
  28. 28. Scaling out Tier 1 with Classic Ethernet<br />with Spanning Tree Protocol (STP)<br />SW1<br />SW2<br />SW5<br />SW6<br />R<br />Tier 1<br />SW3<br />SW4<br />Improvement Areas (TRILL)<br />Tier 2<br />Why scale out Tier 1?<br /><ul><li>More Bandwidth at Tier 2
  29. 29. Larger Tier 2 Scalability
  30. 30. Smaller Tier 1 switches
  31. 31. Spread Risk
  32. 32. Pay as you grow
  33. 33. Low latency
  34. 34. Spanning Tree still required
  35. 35. Blocking on all but one path
  36. 36. Did not gain scalability
  37. 37. Did not gain bandwidth</li></ul>A<br />B<br />INTERNETWORK EXPERT .ORG<br />5/7/2010<br />
  38. 38. Scaling out Tier 1 with Classic Ethernet<br />with Spanning Tree Protocol (STP)<br />SW1<br />SW2<br />SW5<br />SW6<br />Tier 1<br />R<br />SW3<br />SW4<br />Tier 2<br />Improvement Area (TRILL)<br />Why Scale Tier 1?<br /><ul><li>More Bandwidth at Tier 2
  39. 39. Larger Tier 2 Scalability
  40. 40. Smaller Tier 1 switches
  41. 41. Pay as you grow
  42. 42. Low latency
  43. 43. Spanning Tree still required
  44. 44. Blocking on all but one path
  45. 45. Did not gain scalability
  46. 46. Did not gain bandwidth</li></ul>A<br />B<br />INTERNETWORK EXPERT .ORG<br />5/7/2010<br />
  47. 47. Scaling out Tier 1 with Classic Ethernet<br />with Spanning Tree Protocol (STP)<br />SW1<br />SW2<br />SW5<br />SW6<br />Tier 1<br />R<br />SW3<br />SW4<br />Tier 2<br />Alternate Topology<br />Why Scale Tier 1?<br /><ul><li>More Bandwidth at Tier 2
  48. 48. Larger Tier 2 Scalability
  49. 49. Smaller Tier 1 switches
  50. 50. Pay as you grow
  51. 51. Low latency
  52. 52. Unbalanced at Tier 2
  53. 53. Only 1 switch with all paths
  54. 54. Did not gain scalability
  55. 55. Did not gain bandwidth</li></ul>A<br />B<br />INTERNETWORK EXPERT .ORG<br />5/7/2010<br />
  56. 56. Scaling UP Tier 1 with Classic Ethernet<br />Rigid (2) Switch Tier 1 design constraint forces Scaling UP – not OUT<br />SW2<br />SW1<br />Improvement Area (TRILL)<br />SW1<br />SW2<br />SW3<br />SW4<br />INTERNETWORK EXPERT .ORG<br />5/7/2010<br />
  57. 57. Multi Path with Classic Ethernet<br />Multi Chassis Ether Channel (MCEC) aka Virtual Port Channel (vPC)<br />STP finds a Loop Free Topology<br />No Blocking Paths, All links active<br />State Synchronization at Tier 1<br />SW2<br />SW1<br />SW1<br />R<br />R<br />SW3<br />SW3<br />SW4<br />SW3<br /><ul><li>STP Roles
  58. 58. MAC learning
  59. 59. LACP
  60. 60. Interface states/config
  61. 61. Split Brain detection</li></ul>Not a trivial accomplishment!<br />Several different states must be synced<br />Improvement area (TRILL)<br />Multi path with minimal state/sync complexity<br />INTERNETWORK EXPERT .ORG<br />5/7/2010<br />
  62. 62. Scaling Tier 2 with Classical Ethernet<br />Scale Bandwidth or Size? – Pick one, you can’t have both!<br />SCALING BANDWITH<br />SCALING SIZE<br />OR<br />SW1<br />SW2<br />SW2<br />SW1<br />Tier 1<br />SW4<br />SW3<br />Tier 2<br />SW5<br />SW6<br />SW4<br />SW3<br />Trade-off between Bandwidth or Size<br />Tier 1 switch density key scaling factor<br />Improvement Area (TRILL)<br />INTERNETWORK EXPERT .ORG<br />5/7/2010<br />
  63. 63. Scaling out Tier 1 with MCEC<br />Increasingly Complex State Synchronization across (4) or more Tier 1 switches <br />SW1<br />SW2<br />SW6<br />SW5<br />R<br />SW1<br />R<br />States & Role Sync X 4<br /><ul><li>STP Roles
  64. 64. MAC learning
  65. 65. LACP
  66. 66. Interface states/config
  67. 67. Split Brain detection</li></ul>SW3<br />SW4<br />SW3<br />SW4<br />STP still present<br />Increased Complexity = Difficult & Fragile scaling<br />INTERNETWORK EXPERT .ORG<br />5/7/2010<br />
  68. 68. Scaling out Tier 1 with MPLS<br />Replacing L2 switching with L3 + MPLS and L2 pseudo wire full mesh Overlay via VPLS <br />Sound complicated? That’s because it is.<br />SW1<br />SW2<br />SW6<br />SW5<br />IP + MPLS +VPLS<br />L2 pseudo wires<br />(full mesh required)<br />SW3<br />SW4<br />SW7<br />SW8<br />SW9<br />SW10<br />N * (N-1) / 2<br />Complex overlay of L2 services over L3<br />MPLS skill sets?<br />Configuration Intense<br />NOT Plug & Play!<br />Increased Complexity = Difficult & Fragile scaling<br />INTERNETWORK EXPERT .ORG<br />5/7/2010<br />
  69. 69. TRILL - Layer 2 multi Pathing<br />An Introduction<br />SW1<br />SW2<br />SW3<br />INTERNETWORK EXPERT .ORG<br />5/7/2010<br />Narrative of this section to be posted at: http://www.internetworkexpert.org/topic/trill/<br />
  70. 70. Design Goals for TRILL<br />TRILL<br />Switching<br />Routing<br />Minimal Configuration<br />Plug & Play<br />Auto Discovery<br />Auto Learning<br />Flat Addressing<br />Spanning Tree Protocol (STP)<br />Slow Convergence<br />Single Path<br />Edge-to-Root Rigid Design<br />Single Multicast Tree<br />Constrained Scaleability<br />Configuration Intense<br />Configured Learning<br />Configured Discovery<br />Plan & Play<br />Fast Convergence<br />Multiple Paths<br />Load Balancing<br />Multiple Multicast Trees<br />Hierarchical Forwarding<br />Any-to-any Flexible Design<br />Highly Scalable<br />INTERNETWORK EXPERT .ORG<br />5/7/2010<br />
  71. 71. MAC Learning -- Evolved<br />All switches learn all MACs<br /><ul><li>Plug & Play
  72. 72. Flat & Inefficient</li></ul>Hierarchical<br />Converstation Based Learning<br /><ul><li>Plug & Play
  73. 73. Hierarchical, Efficient
  74. 74. Scalable</li></ul>Improvement Area (TRILL*)<br />SW1 MAC Table<br />A -- Eth 1<br />B -- Eth 1<br />C -- Eth 2<br />D -- Eth 2<br />B<br />SW3 -- Eth 1<br />SW4 -- Eth 2<br />Conversation<br />Learning<br />SW1<br />Unnecessary<br />2<br />1<br />SW3 MAC Table<br />A -- Eth 1<br />B -- Eth 2<br />C -- Eth 3<br />D -- Eth 3<br />SW4 MAC Table<br />A -- Eth 3<br />B -- Eth 3<br />C -- Eth 1<br />D -- Eth 2<br />B – SW3<br />AFTER<br />SW4<br />SW3<br />3<br />3<br />BEFORE<br />C -- SW4<br />1<br />2<br />2<br />1<br />C<br />D<br />Useful<br />A<br />INTERNETWORK EXPERT .ORG<br />5/7/2010<br />
  75. 75. MAC Learning -- Evolved<br />All switches learn all MACs<br /><ul><li>Plug & Play
  76. 76. Flat & Inefficient</li></ul>Improvement Area (TRILL*)<br />SW1 MAC Table<br />A -- Eth 1<br />B -- Eth 1<br />C -- Eth 2<br />D -- Eth 2<br />B<br />SW1<br />Unnecessary<br />2<br />1<br />SW3 MAC Table<br />A -- Eth 1<br />B -- Eth 2<br />C -- Eth 3<br />D -- Eth 3<br />SW4 MAC Table<br />A -- Eth 3<br />B -- Eth 3<br />C -- Eth 1<br />D -- Eth 2<br />SW4<br />SW3<br />3<br />3<br />BEFORE<br />C -- SW4<br />1<br />2<br />2<br />1<br />C<br />D<br />Useful<br />A<br />INTERNETWORK EXPERT .ORG<br />5/7/2010<br />
  77. 77. MAC Learning -- Evolved<br />Hierarchical<br />Converstation Based Learning<br /><ul><li>Plug & Play
  78. 78. Hierarchical, Efficient
  79. 79. Scalable</li></ul>SW1 MAC Table<br />B<br />SW3 -- Eth 1<br />SW4 -- Eth 2<br />Conversation<br />Learning<br />SW1<br />2<br />1<br />SW3 MAC Table<br />A -- Eth 1<br />B -- Eth 2<br />SW4 MAC Table<br />C -- Eth 1<br />D -- Eth 2<br />B – SW3<br />AFTER<br />SW4<br />SW3<br />3<br />3<br />C -- SW4<br />1<br />2<br />2<br />1<br />C<br />D<br />Useful<br />A<br />INTERNETWORK EXPERT .ORG<br />5/7/2010<br />
  80. 80. Multi Pathing -- Evolved<br />16-way Equal Cost Multi Path (ECMP) Layer 2 Forwarding<br />SW5<br />SW1<br />SW2<br />SW6<br />16<br />SW3<br />SW4<br />2<br />1<br />3<br />4<br />2<br />1<br />3<br />4<br />MAC Table<br />B -- SW4<br />A -- Eth5<br />MAC Table<br />A -- SW3<br />B -- Eth5<br />5<br />5<br />Per Flow L2-L4 hashing<br />SW Table<br />SW4 -- Eth 1,2,3,4<br />SW Table<br />SW3 -- Eth 1,2,3,4<br />A<br />B<br />INTERNETWORK EXPERT .ORG<br />5/7/2010<br />
  81. 81. Loop Free Flooding -- Evolved<br />A unique loop free forwarding topology for Broadcast & Unknown UnicastONLY<br />SW5<br />SW1<br />SW2<br />SW6<br />16<br />SW3<br />2<br />1<br />3<br />4<br />5<br />A<br />B<br />INTERNETWORK EXPERT .ORG<br />5/7/2010<br />SW4<br />2<br />1<br />3<br />4<br />Does not punish path availability for known unicast & multicast conversations <br />5<br />
  82. 82. Multicast -- Evolved<br />More Bandwidth for Multicast – All possible Topologies Used <br />SW5<br />SW1<br />SW2<br />SW6<br />16<br />SW3<br />SW4<br />Per (S,G) Topologies<br />S1<br />S2<br />R1<br />R2<br />INTERNETWORK EXPERT .ORG<br />5/7/2010<br />
  83. 83. Scaling out Tier 1 with TRILL<br />Plug & Play L2 fabric with L3 Scalability and Robustness <br />SW5<br />SW1<br />SW2<br />SW6<br />16<br />Tier 1<br />SW3<br />SW4<br />Why scale out Tier 1?<br /><ul><li>More Bandwidth at Tier 2
  84. 84. Larger Tier 2 Scalability
  85. 85. Smaller Tier 1 switches
  86. 86. Spread Risk (RAID)
  87. 87. Pay as you grow
  88. 88. Low latency</li></ul>Tier 2<br /><ul><li> No complicated MCEC state sync
  89. 89. No Spanning Tree
  90. 90. No complicated MPLS overlay
  91. 91. Simple Configuration
  92. 92. Flexible any-to-any design</li></ul>INTERNETWORK EXPERT .ORG<br />5/7/2010<br />
  93. 93. Scaling Tier 2 with TRILL<br />Flexible design can scale both Bandwidth and Size<br />SCALING BANDWITH<br />SCALING SIZE<br />AND<br />SW2<br />SW1<br />SW2<br />SW3<br />SW4<br />SW1<br />Tier 1<br />SW4<br />SW3<br />Tier 2<br />SW5<br />SW8<br />SW7<br />SW6<br />Minimal Trade-off between Bandwidth or Size<br />Tier 1 switch density less of a scaling factor<br />INTERNETWORK EXPERT .ORG<br />5/7/2010<br />
  94. 94. Part 1. Summary<br />INTERNETWORK EXPERT .ORG<br />Plug and Play Layer 2 with Layer 3 Scalability and Robustness<br />Addressing scalability<br /><ul><li>Conversational MAC Learning*
  95. 95. Hierarchical MAC Forwarding*</li></ul>Robust L3 characteristics<br /><ul><li>Link State Topology Awareness
  96. 96. Fast Convergence</li></ul>Bandwidth scalability<br /><ul><li>16 active paths
  97. 97. Multiple Multicast Topologies*</li></ul>Plug & Play<br /><ul><li>Auto Learning
  98. 98. Simple Configuration**</li></ul>Domain Size scalability<br /><ul><li>Flexible 16-switch Tier 1 design options</li></ul>* Not present in current RFC 5556 (TRILL)<br />* Present in Cisco specific enhancements, Cisco driving PARs to improve current RFC<br />** (3) NX-OS commands per Nexus switch, subject to change <br />5/7/2010<br />
  99. 99. Stay tuned for more details…<br />http://www.internetworkexpert.org/topics/trill/<br />SW1<br />SW2<br />SW3<br />INTERNETWORK EXPERT .ORG<br />5/7/2010<br />

×