Helios: A Hybrid Electrical/Optical Switch Architecture for Modular Data Centers
Nathan Farrington
George Porter, Sivasank...
Electrical Packet Switch
Optical Circuit Switch
$500/port
10 Gb/s fixed rate
12 W/port
Requires transceivers
Per-packet sw...
3
Intro
Technology
Analysis
Data Plane
Control Plane
Experimental Setup
Evaluation
Related Work
Conclusion
Optical Circuit Switch
2010-09-02 SIGCOMM
Nathan Farrington
4
Output 1
Output 2
Fixed
Mirror
Lenses
Input 1
Glass Fiber
Bu...
Wavelength Division Multiplexing
2010-09-02 SIGCOMM
Nathan Farrington
5
Optical Circuit Switch
No Transceivers
Required
Su...
Stability Increases with Aggregation
2010-09-02 SIGCOMM
Nathan Farrington
6
Inter-Data Center
Where is the
Sweet Spot?
Int...
7
Intro
Technology
Analysis
Data Plane
Control Plane
Experimental Setup
Evaluation
Related Work
Conclusion
2010-09-02 SIGCOMM
Nathan Farrington
8
k switches, N-ports each
N pods, k-ports each
Example: N=64 pods * k=1024 hosts/pod...
Example: N=64 pods * k=1024 hosts/pod = 64K hosts total; 8 wavelengths
2010-09-02 SIGCOMM
Nathan Farrington
9
k switches, ...
2010-09-02 SIGCOMM
Nathan Farrington
10
Less than k switches, N-ports each
Fewer Core
Switches
N pods, k-ports each
Exampl...
11
Intro
Technology
Analysis
Data Plane
Control Plane
Experimental Setup
Evaluation
Related Work
Conclusion
2010-09-02 SIGCOMM
Nathan Farrington
12
Setup a Circuit
Pod 1 -> 2:
   Capacity = 10G
   Demand = 10G
   Throughput = 1...
2010-09-02 SIGCOMM
Nathan Farrington
13
Traffic Patterns Change
Pod 1 -> 2:
   Capacity = 10G
   Demand = 10G
   Throug...
2010-09-02 SIGCOMM
Nathan Farrington
14
Traffic Patterns Change
Pod 1 -> 2:
   Capacity = 10G
   Demand = 10G80G
   Thr...
2010-09-02 SIGCOMM
Nathan Farrington
15
Break a Circuit
Pod 1 -> 2:
   Capacity = 10G
   Demand = 10G80G
   Throughput ...
2010-09-02 SIGCOMM
Nathan Farrington
16
Setup a Circuit
Pod 1 -> 2:
   Capacity = 10G
   Demand = 10G80G
   Throughput ...
2010-09-02 SIGCOMM
Nathan Farrington
17
Pod 1 -> 2:
   Capacity = 80G
   Demand = 80G
   Throughput = 80G
Pod 1 -> 3...
2010-09-02 SIGCOMM
Nathan Farrington
18
Pod 1 -> 2:
   Capacity = 80G
   Demand = 80G
   Throughput = 80G
Pod 1 -> 3...
19
Intro
Technology
Analysis
Data Plane
Control Plane
Experimental Setup
Evaluation
Related Work
Conclusion
2010-09-02 SIGCOMM
Nathan Farrington
20
Topology
Manager
EPS
OCS
Circuit Switch
Manager
10G
10G
80G
10G
80G
80G
Pod Switch...
Outline of Control Loop
Estimate traffic demand
Compute optimal topology for maximum throughput
Program the pod switches a...
1. Estimate Traffic Demand
Question: Will this flow use more bandwidth if we give it more capacity?
Identify elephant flow...
2. Compute Optimal Topology
Formulate as instance of max-weight perfect matching problem on bipartite graph
Solve with Edm...
Example: Compute Optimal Topology
2010-09-02 SIGCOMM
Nathan Farrington
24
Example: Compute Optimal Topology
2010-09-02 SIGCOMM
Nathan Farrington
25
Example: Compute Optimal Topology
2010-09-02 SIGCOMM
Nathan Farrington
26
27
Intro
Technology
Analysis
Data Plane
Control Plane
Experimental Setup
Evaluation
Related Work
Conclusion
2010-09-02 SIGCOMM
Nathan Farrington
28
Traditional Network
Helios Network
100% bisection bandwidth
(240 Gb/s)
Hardware
24 servers
HP DL380
2 socket (E5520) Nehalem
Dual Myricom 10G NICs
7 switches
One Dell 1G 48-port
Three Fulcrum 1...
2010-09-02 SIGCOMM
30
Nathan Farrington
31
Intro
Technology
Analysis
Data Plane
Control Plane
Experimental Setup
Evaluation
Related Work
Conclusion
Traditional Network
2010-09-02 SIGCOMM
Nathan Farrington
32
Hash Collisions
TCP/IP Overhead
190 Gb/s Peak
171 Gb/sAvg
Helios Network (Baseline)
2010-09-02 SIGCOMM
Nathan Farrington
33
160 Gb/s Peak
43 Gb/sAvg
Port Debouncing
2010-09-02 SIGCOMM
Nathan Farrington
34
Layer 1 PHY signal locked (bits are detected)
Switch thread wakes ...
Without Debouncing
2010-09-02 SIGCOMM
Nathan Farrington
35
160 Gb/s Peak
87 Gb/sAvg
Without EDC
2010-09-02 SIGCOMM
Nathan Farrington
36
Software Limitation
27 ms Gaps
160 Gb/s Peak
142 Gb/sAvg
Bidirectional Circuits
2010-09-02 SIGCOMM
Nathan Farrington
37
Optical Circuit Switch
RX
TX
RX
TX
RX
TX
Pod Switch
Pod Swi...
Unidirectional Circuits
2010-09-02 SIGCOMM
Nathan Farrington
38
Optical Circuit Switch
RX
TX
RX
TX
RX
TX
Pod Switch
Pod Sw...
Unidirectional Circuits
2010-09-02 SIGCOMM
Nathan Farrington
39
Unidirectional Scheduler
142 Gb/sAvg
Daisy Chain Needed fo...
Traffic Stability and Throughput
2010-09-02 SIGCOMM
Nathan Farrington
40
41
Intro
Technology
Analysis
Data Plane
Control Plane
Experimental Setup
Evaluation
Related Work
Conclusion
2010-09-02 SIGCOMM
Nathan Farrington
42
43
Intro
Technology
Analysis
Data Plane
Control Plane
Experimental Setup
Evaluation
Related Work
Conclusion
“Why Packet Switching?”
“The conventional wisdom [of 1985 is] that packet switching is poorly suited to the needs of telep...
Conclusion
Helios: a scalable, energy-efficient network architecture for modular data centers
Large cost, power, and cabli...
Upcoming SlideShare
Loading in …5
×

Helios: A Hybrid Electrical/Optical Switch Architecture for Modular Data Centers

2,421 views
2,309 views

Published on

Traditionally, local area networks, including data center networks, have been constructed using switched Ethernet. A typical 10G Ethernet switch uses 12.5W per port, and cost upwards of $500 per port or more. There is a cheaper source of bandwidth: optical circuit switching. This presentation describes how we constructed a data center network using an optical circuit switch and what we had to do to achieve good performance.

0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,421
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
68
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Helios: A Hybrid Electrical/Optical Switch Architecture for Modular Data Centers

  1. 1. Helios: A Hybrid Electrical/Optical Switch Architecture for Modular Data Centers Nathan Farrington George Porter, Sivasankar Radhakrishnan, Hamid Hajabdolali Bazzaz, Vikram Subramanya, Yeshaiahu Fainman, George Papen, and Amin Vahdat
  2. 2. Electrical Packet Switch Optical Circuit Switch $500/port 10 Gb/s fixed rate 12 W/port Requires transceivers Per-packet switching For bursty, uniform traffic $500/port Rate free 240 mW/port No transceivers 12 ms switching time For stable, pair-wise traffic 2010-09-02 SIGCOMM Nathan Farrington 2
  3. 3. 3 Intro Technology Analysis Data Plane Control Plane Experimental Setup Evaluation Related Work Conclusion
  4. 4. Optical Circuit Switch 2010-09-02 SIGCOMM Nathan Farrington 4 Output 1 Output 2 Fixed Mirror Lenses Input 1 Glass Fiber Bundle Rotate Mirror Full crossbar switch Does not decode packets Needs external scheduler Mirrors on Motors
  5. 5. Wavelength Division Multiplexing 2010-09-02 SIGCOMM Nathan Farrington 5 Optical Circuit Switch No Transceivers Required Superlink 80G WDM MUX WDM DEMUX 10G WDM Optical Transceivers 1 2 3 4 5 6 7 8 Electrical Packet Switch
  6. 6. Stability Increases with Aggregation 2010-09-02 SIGCOMM Nathan Farrington 6 Inter-Data Center Where is the Sweet Spot? Inter-Pod Inter-Rack Enough Stability Enough Traffic Inter-Server Inter-Process Inter-Thread
  7. 7. 7 Intro Technology Analysis Data Plane Control Plane Experimental Setup Evaluation Related Work Conclusion
  8. 8. 2010-09-02 SIGCOMM Nathan Farrington 8 k switches, N-ports each N pods, k-ports each Example: N=64 pods * k=1024 hosts/pod = 64K hosts total; 8 wavelengths
  9. 9. Example: N=64 pods * k=1024 hosts/pod = 64K hosts total; 8 wavelengths 2010-09-02 SIGCOMM Nathan Farrington 9 k switches, N-ports each N pods, k-ports each
  10. 10. 2010-09-02 SIGCOMM Nathan Farrington 10 Less than k switches, N-ports each Fewer Core Switches N pods, k-ports each Example: N=64 pods * k=1024 hosts/pod = 64K hosts total; 8 wavelengths
  11. 11. 11 Intro Technology Analysis Data Plane Control Plane Experimental Setup Evaluation Related Work Conclusion
  12. 12. 2010-09-02 SIGCOMM Nathan Farrington 12 Setup a Circuit Pod 1 -> 2: Capacity = 10G Demand = 10G Throughput = 10G Pod 1 -> 3: Capacity = 80G Demand = 80G Throughput = 80G EPS OCS 10G 10G 80G 10G 80G 80G Pod 1 Pod 2 Pod 3
  13. 13. 2010-09-02 SIGCOMM Nathan Farrington 13 Traffic Patterns Change Pod 1 -> 2: Capacity = 10G Demand = 10G Throughput = 10G Pod 1 -> 3: Capacity = 80G Demand = 80G Throughput = 80G EPS OCS 10G 10G 80G 10G 80G 80G Pod 1 Pod 2 Pod 3
  14. 14. 2010-09-02 SIGCOMM Nathan Farrington 14 Traffic Patterns Change Pod 1 -> 2: Capacity = 10G Demand = 10G80G Throughput = 10G Pod 1 -> 3: Capacity = 80G Demand = 80G10G Throughput = 10G EPS OCS 10G 10G 80G 10G 80G 80G Pod 1 Pod 2 Pod 3
  15. 15. 2010-09-02 SIGCOMM Nathan Farrington 15 Break a Circuit Pod 1 -> 2: Capacity = 10G Demand = 10G80G Throughput = 10G Pod 1 -> 3: Capacity = 80G Demand = 80G10G Throughput = 10G EPS OCS 10G 10G 80G 10G 80G 80G Pod 1 Pod 2 Pod 3
  16. 16. 2010-09-02 SIGCOMM Nathan Farrington 16 Setup a Circuit Pod 1 -> 2: Capacity = 10G Demand = 10G80G Throughput = 10G Pod 1 -> 3: Capacity = 80G Demand = 80G10G Throughput = 10G EPS OCS 10G 10G 80G 10G 80G 80G Pod 1 Pod 2 Pod 3
  17. 17. 2010-09-02 SIGCOMM Nathan Farrington 17 Pod 1 -> 2: Capacity = 80G Demand = 80G Throughput = 80G Pod 1 -> 3: Capacity = 80G Demand = 80G10G Throughput = 10G EPS OCS 10G 10G 80G 10G 80G 80G Pod 1 Pod 2 Pod 3
  18. 18. 2010-09-02 SIGCOMM Nathan Farrington 18 Pod 1 -> 2: Capacity = 80G Demand = 80G Throughput = 80G Pod 1 -> 3: Capacity = 10G Demand = 10G Throughput = 10G EPS OCS 10G 10G 80G 10G 80G 80G Pod 1 Pod 2 Pod 3
  19. 19. 19 Intro Technology Analysis Data Plane Control Plane Experimental Setup Evaluation Related Work Conclusion
  20. 20. 2010-09-02 SIGCOMM Nathan Farrington 20 Topology Manager EPS OCS Circuit Switch Manager 10G 10G 80G 10G 80G 80G Pod Switch Manager Pod Switch Manager Pod Switch Manager Pod 1 Pod 2 Pod 3
  21. 21. Outline of Control Loop Estimate traffic demand Compute optimal topology for maximum throughput Program the pod switches and circuit switches 2010-09-02 SIGCOMM Nathan Farrington 21
  22. 22. 1. Estimate Traffic Demand Question: Will this flow use more bandwidth if we give it more capacity? Identify elephant flows (mice don’t grow) Problem: Measurements are biased by current topology Pretend all hosts are connected to an ideal crossbar switch Compute the max-min fair bandwidth fixpoint 2010-09-02 SIGCOMM Nathan Farrington 22 Mohammad Al-Fares, Sivasankar Radhakrishnan, BarathRaghavan, Nelson Huang, and Amin Vahdat. Hedera: Dynamic Flow Scheduling for Data Center Networks. In NSDI’10.
  23. 23. 2. Compute Optimal Topology Formulate as instance of max-weight perfect matching problem on bipartite graph Solve with Edmonds algorithm 2010-09-02 SIGCOMM Nathan Farrington 23 Source Pods Destination Pods 1 1 Pods do not send traffic to themselves Edge weights represent interpod demand Algorithm is run iteratively for each circuit switch, making use of the previous results 2 2 3 3 4 4
  24. 24. Example: Compute Optimal Topology 2010-09-02 SIGCOMM Nathan Farrington 24
  25. 25. Example: Compute Optimal Topology 2010-09-02 SIGCOMM Nathan Farrington 25
  26. 26. Example: Compute Optimal Topology 2010-09-02 SIGCOMM Nathan Farrington 26
  27. 27. 27 Intro Technology Analysis Data Plane Control Plane Experimental Setup Evaluation Related Work Conclusion
  28. 28. 2010-09-02 SIGCOMM Nathan Farrington 28 Traditional Network Helios Network 100% bisection bandwidth (240 Gb/s)
  29. 29. Hardware 24 servers HP DL380 2 socket (E5520) Nehalem Dual Myricom 10G NICs 7 switches One Dell 1G 48-port Three Fulcrum 10G 24-port One Glimmerglass 64-port optical circuit switch Two Cisco Nexus 5020 10G 52-port 2010-09-02 SIGCOMM Nathan Farrington 29
  30. 30. 2010-09-02 SIGCOMM 30 Nathan Farrington
  31. 31. 31 Intro Technology Analysis Data Plane Control Plane Experimental Setup Evaluation Related Work Conclusion
  32. 32. Traditional Network 2010-09-02 SIGCOMM Nathan Farrington 32 Hash Collisions TCP/IP Overhead 190 Gb/s Peak 171 Gb/sAvg
  33. 33. Helios Network (Baseline) 2010-09-02 SIGCOMM Nathan Farrington 33 160 Gb/s Peak 43 Gb/sAvg
  34. 34. Port Debouncing 2010-09-02 SIGCOMM Nathan Farrington 34 Layer 1 PHY signal locked (bits are detected) Switch thread wakes up and polls for PHY status Makes note to enable link after 2 seconds Switch thread enables Layer 2 link 0.0 0.25 0.5 0.75 1.0 1.25 1.5 1.75 2.0 Time (s)
  35. 35. Without Debouncing 2010-09-02 SIGCOMM Nathan Farrington 35 160 Gb/s Peak 87 Gb/sAvg
  36. 36. Without EDC 2010-09-02 SIGCOMM Nathan Farrington 36 Software Limitation 27 ms Gaps 160 Gb/s Peak 142 Gb/sAvg
  37. 37. Bidirectional Circuits 2010-09-02 SIGCOMM Nathan Farrington 37 Optical Circuit Switch RX TX RX TX RX TX Pod Switch Pod Switch Pod Switch
  38. 38. Unidirectional Circuits 2010-09-02 SIGCOMM Nathan Farrington 38 Optical Circuit Switch RX TX RX TX RX TX Pod Switch Pod Switch Pod Switch
  39. 39. Unidirectional Circuits 2010-09-02 SIGCOMM Nathan Farrington 39 Unidirectional Scheduler 142 Gb/sAvg Daisy Chain Needed for Good Performance For Arbitrary Traffic Patterns Bidirectional Scheduler 100 Gb/sAvg
  40. 40. Traffic Stability and Throughput 2010-09-02 SIGCOMM Nathan Farrington 40
  41. 41. 41 Intro Technology Analysis Data Plane Control Plane Experimental Setup Evaluation Related Work Conclusion
  42. 42. 2010-09-02 SIGCOMM Nathan Farrington 42
  43. 43. 43 Intro Technology Analysis Data Plane Control Plane Experimental Setup Evaluation Related Work Conclusion
  44. 44. “Why Packet Switching?” “The conventional wisdom [of 1985 is] that packet switching is poorly suited to the needs of telephony . . .” 2010-09-02 SIGCOMM Nathan Farrington 44 Jonathan Turner. “Design of an Integrated Services Packet Network”. IEEE J. on Selected Areas in Communications, SAC-4 (8), Nov 1986.
  45. 45. Conclusion Helios: a scalable, energy-efficient network architecture for modular data centers Large cost, power, and cabling complexity savings Dynamically and automatically provisions bisection bandwidth at runtime Does not require end-host modifications or switch hardware modifications Deployable today using commercial components Uses the strengths of circuit switching to compensate for the weaknesses of packet switching, and vice versa 2010-09-02 SIGCOMM Nathan Farrington 45

×