数据中心网络研究:机遇与挑战

       郭传雄

   微软亚洲研究院 (MSRA)
      2011.04.15
                    1
Outline
•   DCN background
•   Opportunities
•   Research challenges
•   A modular DCN design




                             2
3
4
Background: personal experience
• Bandwidth is a scarce resource
 Network   Memory        Disk     CPU                Year

 10Mb/s    2MB           10MB     386/20M            1994

 100Mb/s   128MB         2GB      PentiumII/233      1998

 100Mb/s   256MB         40GB     PentiumIII/800     2002

 1Gb/s     2GB           160GB    Core2/2GHZ         2007

 1Gb/s     4GB           500GB    Core2 Quad/3GHZ    2011

 X100      X2000, but    X50000   X150X4, but multi- 17 years
           slow access            core and instruction
                                  level progress

                                                                5
Background: technology trends
– Disk is cheap (TB and PB are common)
   • 500RMB for 1TB
– Memory is cheap (32GB a PC is not uncommon)
   • 150RMB for 2GB DRAM
– CPU is powerful yet inexpensive (multi-core)
   • 2000RMB for Intel core i7 with 4 cores
– But “network bandwidth is a scarce resource
   • Intra-DC: replication everywhere for fault tolerance
   • Inter-DC: Input and output need bandwidth
   • 50$ (per 1G port), 500$ (per 10G port)
– 0.1$ = 1GB bandwidth = 1CPU hour = 1GB storage per
  month
                                                            6
DCN building blocks




Server   Rack   Container   Data Center   7
DCN reference design
              •   Does not scale
              •   Low bandwidth
              •   Single point of failure
              •   High cost




                                       8
Outline
•   DCN background
•   Opportunities
•   Research challenges
•   A modular DCN design




                             9
Right time for DCN research
• It is a real problem
• It is an important problem
  – DCN as the infrastructure for cloud computing
• The assumptions are different
  – Data centers are owned by single organization
  – We can innovate at both end-hosts and network
    devices
  – Security is easier (closed environment and trusted
    people)

                                                     10
DCN research: opportunities
• Full of research problems
  – Scalability: tens of thousands to millions servers
  – Performance
  – Fault tolerance
  – Cost saving
  – Feel free to suggest new “TCP” protocols
• You can invent your own DCN!


                                                         11
Outline
•   DCN background
•   Opportunities
•   Research challenges
•   A modular DCN design




                             12
Research challenges
Applications                       Architectures

•   Search                         •   Topology design
•   Distributed execution engine   •   Network virtualization
•   Distributed file systems       •   Electrical/optical switching
•   Online social networking       •   Commodity vs. special system
•   HPC applications



Technologies                       Protocols

• DCN management                   • DCN routing
• DCN platform                     • TCP incast congestion control
• Energy efficiency                • Multicast




                                                                      13
Architecture design
•   Scaling: from thousands to millions of servers
•   High capacity: support various traffic patterns
•   Fault tolerance
•   Cost efficient
•   Easy to deploy and manage




                                                      14
Fat-tree (ucsd-sigcomm08)




                            15
VL2 (msrr-sigcomm09)

               OSFP+ECMP


                           10G


                           10G

                           1G




                                16
Dcell/Bcube (msra-sigcomm08,09)

             • Put intelligence at servers
             • Use Ethernet switches as crossbar
             • Innovations in topology design and routing




  DCell                          BCube
                                                      17
Architecture: optical/electrical
switching (ucsd-sigcomm10, rice-
           sigcomm10)
                    • A hybrid architecture
                       • Optical circuit switching
                       • Electrical packet switching




                                              18
Protocols: TCP incast congestion
                 control

                   S1


                   S2
R



                   Sn


cmu-sigcomm09, msra-conext10


                                       19
Technologies: research platform
• A DCN research platform
  – High performance: comparable to ASIC
  – Easy to program: comparable to commodity server
  – Rich functions
     • Programmable packet forwarding
     • Experiment various control/management funcs
     • Can implement various routing/congestion control
       designs
• ServerSwitch (msra-nsdi11)
                                                          20
Applications
• A unified network for both data center and
  HPC applications?
                      Data center               HPC
Topology              Tree-based                Torus/mesh, fat-tree
Routing               Deterministic routing     Single path routing
                      Per-packet adaptive       L2 spanning tree
                      routing to exploit path   L3 shortest path routing
                      diversity
Flow control          No packet drop            Packets can be dropped
                      Hop by hop                End-to-end
Application support   Scientific applications   Search, e-commerce,
                                                cloud computing
Programming API       MPI/RDMA                  TCP/IP socket
                                                                           21
Outline
•   DCN background
•   Opportunities
•   Research challenges
•   A modular DCN design




                             22
Team
• Chuanxiong Guo, Guohan Lu, Haitao Wu,
  Yongqiang Xiong
• Interns: Zhiqiang Zhou, Jiaxin Cao, Jiabo Ju, Qin
  Jia, Jun Li
• Alumni/Alumna
  – members: Songwu Lu, Dan Li
  – interns: Lei Shi, Yunfeng Shi, Danfeng Zhang, Xuan Zhang,
    Byunchul Park, Nan Hua, Chen Tian, Min-Chen Zhao, Chao
    Kong, Kai Chen, Wenfei Wu, Shuang Yang, Peng Su, Bruce
    Chen, Zhenqian Feng, Min-Jeong Shi, Yibo Zhu…
                                                                23
Modular, mega-data center
      networking




                            24
Modular, mega-data center
        networking

BCube       BCube        BCube


BCube      MDCube        BCube


BCube       BCube        BCube
                                 25
BCube: Server centric network
BCube1


      <1,0>               <1,1>               <1,2>               <1,3>



BCube0
      <0,0>               <0,1>               <0,2>               <0,3>



 00   01   02   03   10   11   12   13   20   21   22   23   30   31   32        33




                                                                            26
2-D MDCube
             MDCube structure




                                27
Problem: Server for pkt fwding?
BCube1


      <1,0>                <1,1>               <1,2>               <1,3>



BCube0
      <0,0>                <0,1>               <0,2>               <0,3>



 00   01    02   03   10   11   12   13   20   21   22   23   30   31   32        33



                                      Forwarding node
                                                                             28
Solution: ServerSwitch

                   • Full programmability at server CPU
                      – Kernel module for low latency processing
Software




                      – User space for ease-to-use
                        programmability

                   • Low latency and high throughput
           PCI-E
                     interconnection
Hardware




                   • Packet forwarding in commodity
                     switching ASIC
                      – High performance and limited
                        programmability
                                                           29
Testbed
• A BCube testbed
  – 16 servers (Dell Precision 490 workstation with
    Intel 2.00GHz dualcore CPU, 4GB DRAM, 160GB
    disk)
  – 8 8-port mini-switches (DLink 8-port Gigabit
    switch DGS-1008D)
• NIC
  – Intel Pro/1000 PT quad-port Ethernet NIC
  – NetFPGA
                                                      30
Summary
• DCN is an area full of opportunities and
  challenges
• The best is yet to come!
• Further information
  • http://research.microsoft.com/en-
    us/projects/msradcn/default.aspx




                                             31
32

数据中心网络研究:机遇与挑战

  • 1.
    数据中心网络研究:机遇与挑战 郭传雄 微软亚洲研究院 (MSRA) 2011.04.15 1
  • 2.
    Outline • DCN background • Opportunities • Research challenges • A modular DCN design 2
  • 3.
  • 4.
  • 5.
    Background: personal experience •Bandwidth is a scarce resource Network Memory Disk CPU Year 10Mb/s 2MB 10MB 386/20M 1994 100Mb/s 128MB 2GB PentiumII/233 1998 100Mb/s 256MB 40GB PentiumIII/800 2002 1Gb/s 2GB 160GB Core2/2GHZ 2007 1Gb/s 4GB 500GB Core2 Quad/3GHZ 2011 X100 X2000, but X50000 X150X4, but multi- 17 years slow access core and instruction level progress 5
  • 6.
    Background: technology trends –Disk is cheap (TB and PB are common) • 500RMB for 1TB – Memory is cheap (32GB a PC is not uncommon) • 150RMB for 2GB DRAM – CPU is powerful yet inexpensive (multi-core) • 2000RMB for Intel core i7 with 4 cores – But “network bandwidth is a scarce resource • Intra-DC: replication everywhere for fault tolerance • Inter-DC: Input and output need bandwidth • 50$ (per 1G port), 500$ (per 10G port) – 0.1$ = 1GB bandwidth = 1CPU hour = 1GB storage per month 6
  • 7.
    DCN building blocks Server Rack Container Data Center 7
  • 8.
    DCN reference design • Does not scale • Low bandwidth • Single point of failure • High cost 8
  • 9.
    Outline • DCN background • Opportunities • Research challenges • A modular DCN design 9
  • 10.
    Right time forDCN research • It is a real problem • It is an important problem – DCN as the infrastructure for cloud computing • The assumptions are different – Data centers are owned by single organization – We can innovate at both end-hosts and network devices – Security is easier (closed environment and trusted people) 10
  • 11.
    DCN research: opportunities •Full of research problems – Scalability: tens of thousands to millions servers – Performance – Fault tolerance – Cost saving – Feel free to suggest new “TCP” protocols • You can invent your own DCN! 11
  • 12.
    Outline • DCN background • Opportunities • Research challenges • A modular DCN design 12
  • 13.
    Research challenges Applications Architectures • Search • Topology design • Distributed execution engine • Network virtualization • Distributed file systems • Electrical/optical switching • Online social networking • Commodity vs. special system • HPC applications Technologies Protocols • DCN management • DCN routing • DCN platform • TCP incast congestion control • Energy efficiency • Multicast 13
  • 14.
    Architecture design • Scaling: from thousands to millions of servers • High capacity: support various traffic patterns • Fault tolerance • Cost efficient • Easy to deploy and manage 14
  • 15.
  • 16.
    VL2 (msrr-sigcomm09) OSFP+ECMP 10G 10G 1G 16
  • 17.
    Dcell/Bcube (msra-sigcomm08,09) • Put intelligence at servers • Use Ethernet switches as crossbar • Innovations in topology design and routing DCell BCube 17
  • 18.
    Architecture: optical/electrical switching (ucsd-sigcomm10,rice- sigcomm10) • A hybrid architecture • Optical circuit switching • Electrical packet switching 18
  • 19.
    Protocols: TCP incastcongestion control S1 S2 R Sn cmu-sigcomm09, msra-conext10 19
  • 20.
    Technologies: research platform •A DCN research platform – High performance: comparable to ASIC – Easy to program: comparable to commodity server – Rich functions • Programmable packet forwarding • Experiment various control/management funcs • Can implement various routing/congestion control designs • ServerSwitch (msra-nsdi11) 20
  • 21.
    Applications • A unifiednetwork for both data center and HPC applications? Data center HPC Topology Tree-based Torus/mesh, fat-tree Routing Deterministic routing Single path routing Per-packet adaptive L2 spanning tree routing to exploit path L3 shortest path routing diversity Flow control No packet drop Packets can be dropped Hop by hop End-to-end Application support Scientific applications Search, e-commerce, cloud computing Programming API MPI/RDMA TCP/IP socket 21
  • 22.
    Outline • DCN background • Opportunities • Research challenges • A modular DCN design 22
  • 23.
    Team • Chuanxiong Guo,Guohan Lu, Haitao Wu, Yongqiang Xiong • Interns: Zhiqiang Zhou, Jiaxin Cao, Jiabo Ju, Qin Jia, Jun Li • Alumni/Alumna – members: Songwu Lu, Dan Li – interns: Lei Shi, Yunfeng Shi, Danfeng Zhang, Xuan Zhang, Byunchul Park, Nan Hua, Chen Tian, Min-Chen Zhao, Chao Kong, Kai Chen, Wenfei Wu, Shuang Yang, Peng Su, Bruce Chen, Zhenqian Feng, Min-Jeong Shi, Yibo Zhu… 23
  • 24.
  • 25.
    Modular, mega-data center networking BCube BCube BCube BCube MDCube BCube BCube BCube BCube 25
  • 26.
    BCube: Server centricnetwork BCube1 <1,0> <1,1> <1,2> <1,3> BCube0 <0,0> <0,1> <0,2> <0,3> 00 01 02 03 10 11 12 13 20 21 22 23 30 31 32 33 26
  • 27.
    2-D MDCube MDCube structure 27
  • 28.
    Problem: Server forpkt fwding? BCube1 <1,0> <1,1> <1,2> <1,3> BCube0 <0,0> <0,1> <0,2> <0,3> 00 01 02 03 10 11 12 13 20 21 22 23 30 31 32 33 Forwarding node 28
  • 29.
    Solution: ServerSwitch • Full programmability at server CPU – Kernel module for low latency processing Software – User space for ease-to-use programmability • Low latency and high throughput PCI-E interconnection Hardware • Packet forwarding in commodity switching ASIC – High performance and limited programmability 29
  • 30.
    Testbed • A BCubetestbed – 16 servers (Dell Precision 490 workstation with Intel 2.00GHz dualcore CPU, 4GB DRAM, 160GB disk) – 8 8-port mini-switches (DLink 8-port Gigabit switch DGS-1008D) • NIC – Intel Pro/1000 PT quad-port Ethernet NIC – NetFPGA 30
  • 31.
    Summary • DCN isan area full of opportunities and challenges • The best is yet to come! • Further information • http://research.microsoft.com/en- us/projects/msradcn/default.aspx 31
  • 32.