Data Center Network                             MultipathingPeregrine: An All-Layer-2 Container Computer NetworkTzi-cker C...
Motivation• Summarize features of the popular multi-root  Clos / fat-tree data center topology  Take ITRI’s prototype as a...
Agenda•    Multi-Root Clos / Fat-Tree Topology•    Surveyed Solutions to Multipathing•    802.1Qau – QCN•    QCN and React...
Multi-Root Clos / Fat-Tree• Adopted by various publications    – VL2, PortLand, BCube, Elastic Tree, Peregrine• Scale-out,...
High rate but limited capability•   All-L2 Ethernet switches•   Up to 1 GE or 10 GE links, dozens ports•   Limited buffer,...
Topology: Folded Clos         cross containerA rack  12 racks                 A container                                 ...
Within One Rack                  7
Within One Container                       5-to-5 per rack                       But only 4 ports                         ...
DS and RAS• Directory Server  – Address association, mgmt, and reuse  – Performs IP-MAC lookup, mappings  – Updates mappin...
Routing, Balancing, and Tolerance                                10
Logical Architecture                       11
Dual-Mode Forwarding                       12
Switching to Backup                      13
ITRI Container Computer Prototype• 6.096m shipping container• 12 server racks, 12 storage racks• All-L2 network, commodity...
Discussions• Spanning tree for multipathing and load-  balancing: Simple but limited flexibility• How to plug and play? Sc...
Agenda•   Multi-Root Clos / Fat-Tree Topology•   Surveyed Solutions to Multipathing•   802.1Qau – QCN•   QCN and Reactive ...
Multipathing• VLB:  – Traffic splits to intermediate points  – Automatically balances load  – Ideally great, but subject t...
Multipathing• Spanning Tree / VLAN: (Spain)  – Near-static, pre-computation required, but simple  – Re-computes when topol...
Multipathing References• M. Kodialam, T. V. Kakshman, S. Sengupta, “Efficient and Robust Routing of Highly  Variable Traff...
Agenda•    Multi-Root Clos / Fat-Tree Topology•    Surveyed Solutions to Multipathing•    802.1Qau – QCN•    QCN and React...
Data Center Bridging Task Group• Converged network  – LAN: no priority control     Qbb: Priority-based Flow Control  – FCo...
QCN• CP: Congestion Point  – A switch monitors queue, Q, Qeg, Qold  – Samples and sends Fb msg to RP  – Fb a combination o...
QCN      23
QCN      24
AF-QCN         25
Modify Fb Msg to Imply More                              26
Agenda•    Multi-Root Clos / Fat-Tree Topology•    Surveyed Solutions to Multipathing•    802.1Qau – QCN•    QCN and React...
Exploit Multipath Property• Use QCN to further leverage redundancy  – Per-flow CN adjusts BW: Spectral  – Relocates flows ...
Reactive Reroute• Edge switches counts received QCNs-Ports  – Only edge switches will reroute, consider enough  – Only for...
Algorithm Pseudo Code       Only when within a short period                                         30
NS-3 Simulation• Simulation for 1 second• Also a TCP simulation                            31
Throughput and Latency                         32
Outlier Latency•   Very large flows are throttled by L2 congestion    control, thus with large latency•   60% within 1ms, ...
Discussion• Why Min. reroute is always worse?  – Some flows’ path overlap in the beginning  – Edge switches have no global...
Discussion• L2 congestion control protects TCP over UDP• No PKT loss, almost no incast problem• Out-of-order problem is mo...
Agenda•    Multi-Root Clos / Fat-Tree Topology•    Surveyed Solutions to Multipathing•    802.1Qau – QCN•    QCN and React...
Multipathing Methods• Deterministic, static, or preconfigured  – Single fixed path  – VLAN-based, multiple fixed paths, ST...
Comparison• Deterministic, static, or preconfigured  – Simple, no re-ordering• Oblivious, randomized, good when…  – Single...
Discussion• Data center traffic patterns are evolving and  unknown a priori in many cases• Justifies multiple routing / ba...
Reference•   Tzi-cker Chiueh, Cheng-Chun Tu, Yu-Cheng Wang, Pai-Wei Wang, Kai-Wen Li, Yu-Ming Huang ,    “Peregrine: An Al...
Upcoming SlideShare
Loading in...5
×

Data Center Network Multipathing

1,113

Published on

Internet Research Lab at NTU, Taiwan.
A survey of routing in data center networks and latest IEEE 802.1Qau - Congestion Notification standard in data center bridging task group.

Published in: Technology
0 Comments
5 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,113
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
51
Comments
0
Likes
5
Embeds 0
No embeds

No notes for slide

Data Center Network Multipathing

  1. 1. Data Center Network MultipathingPeregrine: An All-Layer-2 Container Computer NetworkTzi-cker Chiueh*§, Cheng-Chun Tu*§, Yu-Cheng Wang§, Pai-Wei Wang§, Kai-Wen Li§, Yu-Ming Huang§*Industrial Technology Research Institute, Taiwan§Computer Science Department, Stony Brook UniversityIEEE Cloud 2012Leveraging Performance of Multiroot Data Center Networks by Reactive RerouteAdrian S.-W. Tam, Kang Xi H,. Jonathan ChaoDepartment of Electrical and Computer Engineering, Polytechnic Institute of New York Universit2010 18th IEEE Symposium on High Performance Interconnects Presenter: Jason, Tsung-Cheng, HOU Advisor: Wanjiun Liao May 17th, 2012 1
  2. 2. Motivation• Summarize features of the popular multi-root Clos / fat-tree data center topology Take ITRI’s prototype as an example• Surveyed solutions of multipathing• Recap Jin-Jia Chang’s presentation on QCN• Present another solution to multipathing• Compare several multipathing methods 2
  3. 3. Agenda• Multi-Root Clos / Fat-Tree Topology• Surveyed Solutions to Multipathing• 802.1Qau – QCN• QCN and Reactive Reroute• Comparison of Multipathing Methods Peregrine: An All-Layer-2 Container Computer Network Tzi-cker Chiueh*§, Cheng-Chun Tu*§, Yu-Cheng Wang§, Pai-Wei Wang§, Kai-Wen Li§, Yu-Ming Huang§ *Industrial Technology Research Institute, Taiwan §Computer Science Department, Stony Brook University IEEE Cloud 2012 3
  4. 4. Multi-Root Clos / Fat-Tree• Adopted by various publications – VL2, PortLand, BCube, Elastic Tree, Peregrine• Scale-out, cheap commodity switches• Through fixed maximum switches / hops – If no bouncing, no routing loop• Nearly full bisection, multipathing, symmetric• Possibly tremendous routing table entries• Up and down paths, handled differently• High rate but limited capability, buffer, CPU.. 4
  5. 5. High rate but limited capability• All-L2 Ethernet switches• Up to 1 GE or 10 GE links, dozens ports• Limited buffer, hundred K bytes• Limited CPU ability, processing bottleneck• Limited flow table entries, at most dozen Ks• Optimized for fast table lookups• Take Peregrine for example – ITRT’s industrial, commodity production prototype – Others, mostly experimental or high-end 5
  6. 6. Topology: Folded Clos cross containerA rack 12 racks A container 6
  7. 7. Within One Rack 7
  8. 8. Within One Container 5-to-5 per rack But only 4 ports 8
  9. 9. DS and RAS• Directory Server – Address association, mgmt, and reuse – Performs IP-MAC lookup, mappings – Updates mappings to end hosts• Route Algorithm Server – Collects entries of the traffic matrix – Runs load-balancing algorithms, based on TM – Distributes routing entries to switches, update DS• Within one container, cross-container unclear• Scalability unclear, VM mobility unclear (Only refers to sth like mobile IP) 9
  10. 10. Routing, Balancing, and Tolerance 10
  11. 11. Logical Architecture 11
  12. 12. Dual-Mode Forwarding 12
  13. 13. Switching to Backup 13
  14. 14. ITRI Container Computer Prototype• 6.096m shipping container• 12 server racks, 12 storage racks• All-L2 network, commodity switches• “Folded” Clos topology• Directory Server, Route Algorithm Server• Unclear: Load-balancing algo., VM mobility, DS-RAS scalability, cross-container• In the future: OpenFlow, OpenStack (Currently not using OpenFlow to connect switches… how? unclear) 14
  15. 15. Discussions• Spanning tree for multipathing and load- balancing: Simple but limited flexibility• How to plug and play? Scalable? – A new switch leads to reconfiguration – VM migration = affects TM and direct routes?• DS-RAS: a simple version of controller But mechanism, performance unclear• Seems to be trying to combined various advantages: Address mapping, ST multipathing, converged network, folded-Clos 15
  16. 16. Agenda• Multi-Root Clos / Fat-Tree Topology• Surveyed Solutions to Multipathing• 802.1Qau – QCN• QCN and Reactive Reroute• Comparison of Multipathing Methods 16
  17. 17. Multipathing• VLB: – Traffic splits to intermediate points – Automatically balances load – Ideally great, but subject to PKT reordering• ECMP-hashing – Different hashing functions, big difference – Flow always sticks to one path during transmit• Hedera: – Flow-to-core mapping, flow scheduling – Requires global information, higher complexity 17
  18. 18. Multipathing• Spanning Tree / VLAN: (Spain) – Near-static, pre-computation required, but simple – Re-computes when topology changes – Segmentation of resources, limited flexibility• Multipath TCP: – One flow, many parallel paths – VLAN-based routing in publication (like Spain) – Shifts traffic to less congested paths – A new transport mechanism, adaptive – Still with segmentation of resources 18
  19. 19. Multipathing References• M. Kodialam, T. V. Kakshman, S. Sengupta, “Efficient and Robust Routing of Highly Variable Traffic”, HotHets, 2004.• R. Zhang-Shen and N. McKeown “Designing a Predictable Internet Backbone Network”, Third Workshop on Hot Topics in Networks (HotNets-III), November 2004.• A. Greenberg et al., “VL2: A Scalable and Flexible Data Center Network”, ACM SIGCOMM 2009.• M YSORE, R. N., PAMPORIS, A., FARRINGTON, N., H UANG, N., MIRI , P., R ADHAKRISHNAN, S., S UBRAMANYA, V., AND VAHDAT, A. “PortLand: A Scalable, Fault-Tolerant Layer 2 Data Center Network Fabric.” In Proceedings of ACM SIGCOMM, 2009.• M. Al-Fares, et. al., “Hedera: Dynamic Flow Scheduling for Data Center Network”, USENIX NSDI 2010.• J. Mudigonda, P. Yalagandula, M. Al-Fares, and J. C. Mogul. “SPAIN: COTS Data-Center Ethernet for Multipathing over Arbitrary Topologies.” In USENIX NSDI, April 2010.• C. Raiciu, C. Pluntke, S. Barre, A. Greenhalgh, D. Wischik, and M. Handley. “Data center networking with multipath TCP.” In HotNets, 2010. 19
  20. 20. Agenda• Multi-Root Clos / Fat-Tree Topology• Surveyed Solutions to Multipathing• 802.1Qau – QCN• QCN and Reactive Reroute• Comparison of Multipathing Methods Data center transport mechanisms: Congestion control theory and IEEE standardization M. Alizadeh, B. Atikoglu, A. Kabbani, A. Lakshmikantha, R. Pan, B. Prabhakar, and M. Seaman, Communication, Control, and Computing, 2008 46th Annual Allerton Conference on AF-QCN: Approximate fairness with quantized congestion notification for multitenanted data centers A. Kabbani, M. Alizadeh, M. Yasuda, R. Pan, and B. Prabhakar, B. In High Performance Interconnects (HOTI), 2010, IEEE 18th Annual Symposium on 20
  21. 21. Data Center Bridging Task Group• Converged network – LAN: no priority control Qbb: Priority-based Flow Control – FCoE (SAN): no congestion control Qau: Quantized Congestion Notification• Need to survey more on converged network – Respective features and requirements – Could be a very important trend 21
  22. 22. QCN• CP: Congestion Point – A switch monitors queue, Q, Qeg, Qold – Samples and sends Fb msg to RP – Fb a combination of (queue, rate) excess – Targets for no PKT loss• RP: Reaction Point – A host with Rate Limiter, Counter, and Timer – Retries for more BW like AIMD – Decreases according to Fb msg – Counter and Timer both controls RL 22
  23. 23. QCN 23
  24. 24. QCN 24
  25. 25. AF-QCN 25
  26. 26. Modify Fb Msg to Imply More 26
  27. 27. Agenda• Multi-Root Clos / Fat-Tree Topology• Surveyed Solutions to Multipathing• 802.1Qau – QCN• QCN and Reactive Reroute• Comparison of Multipathing Methods Leveraging Performance of Multiroot Data Center Networks by Reactive Reroute Adrian S.-W. Tam, Kang Xi H,. Jonathan Chao Department of Electrical and Computer Engineering, Polytechnic Institute of New York Universit 27
  28. 28. Exploit Multipath Property• Use QCN to further leverage redundancy – Per-flow CN adjusts BW: Spectral – Relocates flows among paths: Spatial – Both mitigates congestions• Multiroot, Clos / fat-tree topology – Upward: destination based, deterministic – Downward: could be randomized or rerouted• Hashed ECMP: Distributes flow population• Flow-reroute: Balancing congested links 28
  29. 29. Reactive Reroute• Edge switches counts received QCNs-Ports – Only edge switches will reroute, consider enough – Only for upward PKTs, not for downward• Reroutes flows (elephant && congested), detects by counting QCNs in a short period• Three reroute methods: – Uniform random – Min. prob. of congestion (conditional prob.) – Weighted of above two• Freezes a rerouted flow to avoid flapping 29
  30. 30. Algorithm Pseudo Code Only when within a short period 30
  31. 31. NS-3 Simulation• Simulation for 1 second• Also a TCP simulation 31
  32. 32. Throughput and Latency 32
  33. 33. Outlier Latency• Very large flows are throttled by L2 congestion control, thus with large latency• 60% within 1ms, but in average it takes 15ms! 33
  34. 34. Discussion• Why Min. reroute is always worse? – Some flows’ path overlap in the beginning – Edge switches have no global information – Receives QCN from the same (port, agg) Synchronized reroute• Operates a centralized controller? – Authors argue that gain is very small – But they do not present more on the “outliers” – The flows with longest latencies, the larger – The larger flows could be some vital connections 34
  35. 35. Discussion• L2 congestion control protects TCP over UDP• No PKT loss, almost no incast problem• Out-of-order problem is more severe for UDP• However, because switch buffer is tightly monitored, the number of out-of-order PKTs is limited at most as (5nr/s) (n: buffer size) (r: sending rate) (s: link rate)• Freezes a rerouted flow: Also limits reordering 35
  36. 36. Agenda• Multi-Root Clos / Fat-Tree Topology• Surveyed Solutions to Multipathing• 802.1Qau – QCN• QCN and Reactive Reroute• Comparison of Multipathing Methods Comparative Evaluation of CEE-based Switch Adaptive Routing Daniel Crisan, Mitch Gusat, Cyriel Minkenberg, 2nd Workshop on Data Center - Converged and Virtual Ethernet Switching (DC CAVES), 2010 36
  37. 37. Multipathing Methods• Deterministic, static, or preconfigured – Single fixed path – VLAN-based, multiple fixed paths, ST-per-VLAN• Oblivious, randomized – Hashed by headers – Split to intermediaries• Reactive, switch adaptive routing• Controller-enabled centralized scheduling 37
  38. 38. Comparison• Deterministic, static, or preconfigured – Simple, no re-ordering• Oblivious, randomized, good when… – Single prio., symmetric traffic• Reactive, switch adaptive routing, realistic… – Multiple prio., asymmetric• Controller-enabled centralized scheduling – Large input set, higher complexity – Controller hard to implement, high cost low gain?• Convergence and virtualization are trends 38
  39. 39. Discussion• Data center traffic patterns are evolving and unknown a priori in many cases• Justifies multiple routing / balancing schemes Currently no single killer solution• Should be able to switch between modes Reactive-Adaptive and Randomized• Role of controller still to be optimized – Could be useful for criti cal flows / situation – Detect and react in slower manner – Not ideal for dynamic fast reaction 39
  40. 40. Reference• Tzi-cker Chiueh, Cheng-Chun Tu, Yu-Cheng Wang, Pai-Wei Wang, Kai-Wen Li, Yu-Ming Huang , “Peregrine: An All-Layer-2 Container Computer Network”, IEEE Cloud 2012• M. Alizadeh, B. Atikoglu, A. Kabbani, A. Lakshmikantha, R. Pan, B. Prabhakar, and M. Seaman, “Data center transport mechanisms: Congestion control theory and IEEE standardization,” Communication, Control, and Computing, 2008 46th Annual Allerton Conference on• A. Kabbani, M. Alizadeh, M. Yasuda, R. Pan, and B. Prabhakar. “AF-QCN: Approximate fairness with quantized congestion notification for multitenanted data centers”, In High Performance Interconnects (HOTI), 2010, IEEE 18th Annual Symposium on• Adrian S.-W. Tam, Kang Xi H., Jonathan Chao , “Leveraging Performance of Multiroot Data Center Networks by Reactive Reroute”, 2010 18th IEEE Symposium on High Performance Interconnects• Daniel Crisan, Mitch Gusat, Cyriel Minkenberg, “Comparative Evaluation of CEE-based Switch Adaptive Routing”, 2nd Workshop on Data Center - Converged and Virtual Ethernet Switching (DC CAVES), 2010 40
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×