Jeong Wook-jae
wjjung11@gmail.com
Data Center Network Architecture:
Towards a Cloud Data Center
1/44
Contents
 The Conventional Architecture & Problem
 The New Architecture
 The Monsoon Architecture
 The VL2 Architecture
 The SEATTLE Architecture
 The PortLand Architecture
 The TRILL
 Related Works
 Summary
 The CDCN(Cloud Data Center Network) Architecture Proposal
 Trend
2/44
Confidential
The Conventional Architecture
The conventional architecture for data centers (adapted from figure by Cisco_2004)
3/44
Confidential
The Problems of a Conventional DC
Ethernet is hard to scale out
- STP
- Broadcast (ARP, RARP, DHCP…)
- Packet Floods in Switch (for Mac Learning)
Fragmentation of resources
No Performance Isolation
Poor server to server connectivity
Need very high reliability near top of the tree (Single Point of Failure)
4/44
Confidential
The Problems of a Conventional DC
Fragmentation of Resources
- VLANs used to isolate properties from each other
- IP addresses topologically determined by ARs
- Reconfiguration of IPs and VLAN trunks
• painful, error-prone, slow, often manual
5/44
Confidential
The Problems of a Conventional DC
No Performance Isolation
- VLANs typically provide only reachability isolation
- One service sending/receiving too much traffic hurts all services sharing its
subtree
6/44
Confidential
The Problems of a Conventional DC
Poor server to server connectivity
- Data centers run two kinds of applications:
• Outward facing (serving web pages to users)
• Internal computation
- 70~80% of the packets stay inside the data center
7/44
Confidential
The Problems of a Conventional DC
8/44
Confidential
Monsoon
Albert Greenberg and 4 other persons
(Microsoft Research)
9/44
Confidential
The Monsoon Architecture
Monsoon
- A new network architecture, which scales and commoditizes data center networking.
Abstract
- Scale-out instead of Scale-up
- A single large Layer 2 domain
- Using programmable commodity layer 2 switches and servers.
- Hierarchy has 2:
• TOR(Top-Of-Rack) Switch => Access Switch
• LB(Load Balancing) Switch => Core Switch
- Scale to 100,000 servers or more.
10/44
Confidential
The Monsoon Architecture
Objectives
- Low-Cost & Scale-out
- Uniform high capacity
• Capacity between two servers limited only by their NICs
• No need to consider topology when adding servers
- Performance isolation
• Traffic of one service should be unaffected by others
- Layer-2 semantics
• Flat addressing, so any server can have any IP address
• Server configuration is the same as in a LAN
• Legacy applications depending on broadcast must work
11/44
Confidential
The Monsoon Architecture
Server-to-Server Forwarding
- An Example Monsoon Topology (Clos Network)
• A scale-out design with broad layers
- Same bisection BW at each layer -> no oversubscription
- Extensive path diversity -> Graceful degradation under failure
SWITCH Up-link Port Down-link Port #
Inter. SW N/A 10Gbps X 144 72
Aggr. SW 10Gbps X 72 10Gbps X 72 144
TOR SW 10Gbps X 2 1Gbps X 20 5,184
12/44
Confidential
The Monsoon Architecture
Clos Network Topology
- A Multistage(ex. 3-stage) switching network.
- The advantage
• The connection between a large number of input and output ports can be made by
using only small-sized switches.
• It can be shown that with k ≥ n, the clos network can be non-blocking like a crossbar
switch.
- Clos Theorem: If K >= 2n-1, then a new connection can always be added
without rearrangement
13/44
Confidential
The Monsoon Architecture
Server-to-Server Forwarding
Valiant Load Balancing
• Every flow “bounced” off a random intermediate switch
• Probably hotspot free for any admissible traffic matrix
• Servers could randomize flow-lets if needed
14/44
Confidential
The Monsoon Architecture
Valiant Load Balancing
15/44
Confidential
The Monsoon Architecture
Server-to-Server Forwarding
- Encapsulation used to transfer complexity to servers
• Commodity switches have simple forwarding primitives
• Complexity moved to computing the headers
- Encapsulation available
• IEEE 802.1ah defines MAC-in-MAC encapsulation
Frame processing when packets go from one server to another in the same data center.
16/44
Confidential
The Monsoon Architecture
Server-to-Server Forwarding
- Data center OSes already heavily modified for VMs, storage, etc.
• A thin shim for network support is no big deal
- Applications work with Application Addresses
• AA’s are flat names; infrastructure addresses invisible to apps
- No change to applications or clients outside DC
The networking stack of a host.
The Monsoon Agent looks up remote IPs in the central directory.
Monsoon
Agent
17/44
Confidential
The Monsoon Architecture
External Connection & Full Topology(Example)
- Routers do not support the Monsoon functions
- Ingress Server with each Access Router
• Implements the Monsoon functionality and acts as a GW to the DC.
• Two Interface : AR & TOR switch
• Default GW
ARAR AR AR ···
Ingress
Server
···Ingress
Server
Ingress
Server
Ingress
Server
18/44
Confidential
The Monsoon Architecture
Directory System Performance
- Key issues:
• Lookup latency
• How many servers needed to handle a DC’s lookup traffic?
• Update latency
• Convergence latency
19/44
Confidential
VL2
Albert Greenberg, Changhoon Kim and 7 other persons
(Microsoft Research)
20/44
Confidential
The VL2 Architecture
VL2 uses
- flat addressing to allow service instances to be placed anywhere in the network
- Valiant Load Balancing to spread traffic uniformly across network paths
- end system-based address resolution to scale to large server pools without introducing
complexity to the network control plane.
Objectives
- Uniform high capacity
- Performance isolation
- Layer-2 semantics
Topology
- Low-cost switch into a Clos topology.
• Traffic Engineering
- Valiant Load Balancing
21/44
Confidential
The VL2 Architecture
Building on proven networking technology
- Link-state routing
• To maintain the Switch-level topology
• Not end hosts’ information
- ECMP to enable VLB
Separating names from locators
- Hosting any service on any server.
- Addressing scheme
• AAs(Application-specific Addresses) & LAs(Location-specific Addresses)
• Directory system: mapping between names and locators.
• VL2 agent (in Host) : 2.5Layer, invokes the directory system’s resolution service.
Embracing end-system
- VL2 agent in host
22/44
Confidential
The VL2 Architecture
Addressing
23/44
Confidential
The VL2 Architecture
Routing
24/44
Confidential
The VL2 Architecture
Potential issue for both ECMP and VLB
- transient congestion on some links.
- it can change the hash used to create the source address periodically or
whenever TCP detects a severe congestion event (e.g., a full window loss) or an
Explicit Congestion Notification.
- Switches today only support up to 16-way ECMP, with 256-way ECMP being
released by some vendors this year.
- Some inexpensive switches cannot correctly retrieve the five-tuple values when
a packet is encapsulated with multiple IP headers. Thus, the agent at the source
computes a hash of the five-tuple values and writes that value into the source
IP address field, which all switches do use in making ECMP forwarding
decisions.
25/44
Confidential
The VL2 Architecture
Discussion
- Cost & Scale
• the VL2 topology can scale to create networks with no oversubscription.
• switches with 144 ports (D = 144) are available today for $150K.
• switches with 24 ports (D = 24) are available today for $8K.
• Building a conventional network with no oversubscription would cost roughly 14× the
cost of a equivalent VL2 network with no oversubscription.
26/44
Confidential
SEATTLE
Changhoon Kim and 2 other persons
(Univ. of Princeton)
27/44
Confidential
The SEATTLE Architecture
Floodless in SEATTLE: A Scalable Ethernet Architecture for Large Enterprises.
- In SIGCOMM, 2008.
Flat addressing of end-hosts
- Switches use hosts’ MAC addresses for routing
- Ensures zero-configuration and backwards-compatibility
Automated host discovery at the edge
- Switches detect the arrival/departure of hosts
- Obviates flooding and ensures scalability
Hash-based on-demand resolution
- Hash deterministically maps a host to a switch
- Switches resolve end-hosts’ location and address via hashing
- Ensures scalability
Shortest-path forwarding between switches
- Switches run link-state routing to maintain only switch-level topology (i.e., do
not disseminate end-host information)
- Ensures data-plane efficiency
28/44
Confidential
The SEATTLE Architecture
Packet forwarding & Lookup
29/44
Confidential
The SEATTLE Architecture
Packet forwarding & Lookup
30/44
Confidential
PortLand
R.N. Mysore and 7 other persons
(Univ. of California San Diego)
31/44
Confidential
The PortLand Architecture
Add a new host
Transfer a packet
Key features
- Layer 2 protocol based on tree topology
- PMAC encode the position information
- Data forwarding proceeds based on PMAC
- Edge switch’s responsible for mapping between
PMAC and AMAC (Rewriting)
- Fabric manger’s responsible for address resolution
- Edge switch makes PMAC invisible to end host
- Each switch node can identify its position by itself
- Fabric manager keep information of overall topology.
Corresponding to the fault, it notifies affected nodes.
- PMAC(48bits): pod(16).position(8).port(8).vmid(16)
32/44
Confidential
TRILL (RFC 5556)
Radia Perlman
(Univ. of California San Diego)
33/44
Confidential
The TRILL
TRILL: Transparent Interconnection of Lots of Links
- TRILL is a new standard protocol to perform Layer 2 bridging with IS-IS link state routing
technology.
A simple idea
- Encapsulate native frames in a transport header providing a hop count.
- Route the encapsulated frames using IS-IS.
- Decapsulate the native frame before delivery.
Definitions
- RBridge - Routing Bridge
• A device which implements TRILL
- RBridge Campus
• A network of RBridges, links, and any intervening bridges, bounded by end stations/layer 3
router.
34/44
Confidential
The TRILL
Encapsulation & Header
TRILL Header – 64 bits
Nicknames - auto-configured 16-bit campus local names for RBridges
V = Version (2 bits)
R = Reserved (2 bits)
M = Multi-Destination (1 bit)
OpLng = Length of TRILL Options
Hop = Hop Limit (6 bits)
35/44
Confidential
The TRILL
Packet Routing
- ESADI (End Station Address Distribution Information protocol)
36/44
Confidential
Related Works & Summary
37/44
Confidential
Related Works
OpenFlow
- Shares idea of simple switches controlled by external SW
- Monsoon & VL2 is a philosophy for how to use the switches
Brocade: Brocade One (TRILL, Clos Net, DCB)
Cisco: FabricPath (TRILL)
Juniper: Qfabric (HW & FC)
38/44
Confidential
Summary
Comparison of the Data Center Network Architecture
Monsoon VL2 SEATTLE FAT-TREE PortLand SPAIN
MOOS
E
TRILL Dcell Bcube MDCube
Org. MS Research
Univ. of
Princeton
Univ. of California
San Diego
HP
Univ. of
Cambrid
ge
MS Research Asia
Publishing
SIGCOMM
2008
SIGCOMM
2009
SIGCOMM
2008
SIGCOMM
2008
SIGCOMM
2009
NSDI 2010
DC CAVE
S Works
hop
2009
RFC 5556
2009
SIGCOMM
2008
SIGCOMM
2009
CoNEXT
2009
Authors
Albert
Greenberg…
Albert
Greenberg,
Changhoon
Kim…
Changhoon
Kim…
M. Al-Fares…
R.N.
Mysore…
J. Mudigon
da,
M. Al-Fare
s…
M. Scott
…
Radia
Perlman
C. GUO… C. GUO…
H. Wu,
C. GUO…
Topology Clos Network Clos Network N/A Fat-Tree Fat-Tree N/A N/A N/A
Bcube Topo
logy
Packetizing
MAC-in-MAC
(802.1ah PBB)
IP-in-IP IP-in-IP(?) IP rewriting
MAC
rewriting
(PMAC)
MAC
rewriting
TRILL Hdr
Load
Spreading
MAC-Rotation ECMP ECMP ECMP ECMP
Multi-path O O X O O O X O
Mod. of
End-Host?
O O X X X O X X O
Mod. of
switches?
O X O
O
(Special HW)
O
(Special
HW)
X
O
(Rbridge)
△
ARP
Directory
Server
Directory
Server
DHT
on
the switches
Fabric
Manager
ESADI
39/44
Confidential
Traffic Engineering is …
Thank you.

Data center network architectures v1.3

  • 1.
    Jeong Wook-jae wjjung11@gmail.com Data CenterNetwork Architecture: Towards a Cloud Data Center
  • 2.
    1/44 Contents  The ConventionalArchitecture & Problem  The New Architecture  The Monsoon Architecture  The VL2 Architecture  The SEATTLE Architecture  The PortLand Architecture  The TRILL  Related Works  Summary  The CDCN(Cloud Data Center Network) Architecture Proposal  Trend
  • 3.
    2/44 Confidential The Conventional Architecture Theconventional architecture for data centers (adapted from figure by Cisco_2004)
  • 4.
    3/44 Confidential The Problems ofa Conventional DC Ethernet is hard to scale out - STP - Broadcast (ARP, RARP, DHCP…) - Packet Floods in Switch (for Mac Learning) Fragmentation of resources No Performance Isolation Poor server to server connectivity Need very high reliability near top of the tree (Single Point of Failure)
  • 5.
    4/44 Confidential The Problems ofa Conventional DC Fragmentation of Resources - VLANs used to isolate properties from each other - IP addresses topologically determined by ARs - Reconfiguration of IPs and VLAN trunks • painful, error-prone, slow, often manual
  • 6.
    5/44 Confidential The Problems ofa Conventional DC No Performance Isolation - VLANs typically provide only reachability isolation - One service sending/receiving too much traffic hurts all services sharing its subtree
  • 7.
    6/44 Confidential The Problems ofa Conventional DC Poor server to server connectivity - Data centers run two kinds of applications: • Outward facing (serving web pages to users) • Internal computation - 70~80% of the packets stay inside the data center
  • 8.
  • 9.
    8/44 Confidential Monsoon Albert Greenberg and4 other persons (Microsoft Research)
  • 10.
    9/44 Confidential The Monsoon Architecture Monsoon -A new network architecture, which scales and commoditizes data center networking. Abstract - Scale-out instead of Scale-up - A single large Layer 2 domain - Using programmable commodity layer 2 switches and servers. - Hierarchy has 2: • TOR(Top-Of-Rack) Switch => Access Switch • LB(Load Balancing) Switch => Core Switch - Scale to 100,000 servers or more.
  • 11.
    10/44 Confidential The Monsoon Architecture Objectives -Low-Cost & Scale-out - Uniform high capacity • Capacity between two servers limited only by their NICs • No need to consider topology when adding servers - Performance isolation • Traffic of one service should be unaffected by others - Layer-2 semantics • Flat addressing, so any server can have any IP address • Server configuration is the same as in a LAN • Legacy applications depending on broadcast must work
  • 12.
    11/44 Confidential The Monsoon Architecture Server-to-ServerForwarding - An Example Monsoon Topology (Clos Network) • A scale-out design with broad layers - Same bisection BW at each layer -> no oversubscription - Extensive path diversity -> Graceful degradation under failure SWITCH Up-link Port Down-link Port # Inter. SW N/A 10Gbps X 144 72 Aggr. SW 10Gbps X 72 10Gbps X 72 144 TOR SW 10Gbps X 2 1Gbps X 20 5,184
  • 13.
    12/44 Confidential The Monsoon Architecture ClosNetwork Topology - A Multistage(ex. 3-stage) switching network. - The advantage • The connection between a large number of input and output ports can be made by using only small-sized switches. • It can be shown that with k ≥ n, the clos network can be non-blocking like a crossbar switch. - Clos Theorem: If K >= 2n-1, then a new connection can always be added without rearrangement
  • 14.
    13/44 Confidential The Monsoon Architecture Server-to-ServerForwarding Valiant Load Balancing • Every flow “bounced” off a random intermediate switch • Probably hotspot free for any admissible traffic matrix • Servers could randomize flow-lets if needed
  • 15.
  • 16.
    15/44 Confidential The Monsoon Architecture Server-to-ServerForwarding - Encapsulation used to transfer complexity to servers • Commodity switches have simple forwarding primitives • Complexity moved to computing the headers - Encapsulation available • IEEE 802.1ah defines MAC-in-MAC encapsulation Frame processing when packets go from one server to another in the same data center.
  • 17.
    16/44 Confidential The Monsoon Architecture Server-to-ServerForwarding - Data center OSes already heavily modified for VMs, storage, etc. • A thin shim for network support is no big deal - Applications work with Application Addresses • AA’s are flat names; infrastructure addresses invisible to apps - No change to applications or clients outside DC The networking stack of a host. The Monsoon Agent looks up remote IPs in the central directory. Monsoon Agent
  • 18.
    17/44 Confidential The Monsoon Architecture ExternalConnection & Full Topology(Example) - Routers do not support the Monsoon functions - Ingress Server with each Access Router • Implements the Monsoon functionality and acts as a GW to the DC. • Two Interface : AR & TOR switch • Default GW ARAR AR AR ··· Ingress Server ···Ingress Server Ingress Server Ingress Server
  • 19.
    18/44 Confidential The Monsoon Architecture DirectorySystem Performance - Key issues: • Lookup latency • How many servers needed to handle a DC’s lookup traffic? • Update latency • Convergence latency
  • 20.
    19/44 Confidential VL2 Albert Greenberg, ChanghoonKim and 7 other persons (Microsoft Research)
  • 21.
    20/44 Confidential The VL2 Architecture VL2uses - flat addressing to allow service instances to be placed anywhere in the network - Valiant Load Balancing to spread traffic uniformly across network paths - end system-based address resolution to scale to large server pools without introducing complexity to the network control plane. Objectives - Uniform high capacity - Performance isolation - Layer-2 semantics Topology - Low-cost switch into a Clos topology. • Traffic Engineering - Valiant Load Balancing
  • 22.
    21/44 Confidential The VL2 Architecture Buildingon proven networking technology - Link-state routing • To maintain the Switch-level topology • Not end hosts’ information - ECMP to enable VLB Separating names from locators - Hosting any service on any server. - Addressing scheme • AAs(Application-specific Addresses) & LAs(Location-specific Addresses) • Directory system: mapping between names and locators. • VL2 agent (in Host) : 2.5Layer, invokes the directory system’s resolution service. Embracing end-system - VL2 agent in host
  • 23.
  • 24.
  • 25.
    24/44 Confidential The VL2 Architecture Potentialissue for both ECMP and VLB - transient congestion on some links. - it can change the hash used to create the source address periodically or whenever TCP detects a severe congestion event (e.g., a full window loss) or an Explicit Congestion Notification. - Switches today only support up to 16-way ECMP, with 256-way ECMP being released by some vendors this year. - Some inexpensive switches cannot correctly retrieve the five-tuple values when a packet is encapsulated with multiple IP headers. Thus, the agent at the source computes a hash of the five-tuple values and writes that value into the source IP address field, which all switches do use in making ECMP forwarding decisions.
  • 26.
    25/44 Confidential The VL2 Architecture Discussion -Cost & Scale • the VL2 topology can scale to create networks with no oversubscription. • switches with 144 ports (D = 144) are available today for $150K. • switches with 24 ports (D = 24) are available today for $8K. • Building a conventional network with no oversubscription would cost roughly 14× the cost of a equivalent VL2 network with no oversubscription.
  • 27.
    26/44 Confidential SEATTLE Changhoon Kim and2 other persons (Univ. of Princeton)
  • 28.
    27/44 Confidential The SEATTLE Architecture Floodlessin SEATTLE: A Scalable Ethernet Architecture for Large Enterprises. - In SIGCOMM, 2008. Flat addressing of end-hosts - Switches use hosts’ MAC addresses for routing - Ensures zero-configuration and backwards-compatibility Automated host discovery at the edge - Switches detect the arrival/departure of hosts - Obviates flooding and ensures scalability Hash-based on-demand resolution - Hash deterministically maps a host to a switch - Switches resolve end-hosts’ location and address via hashing - Ensures scalability Shortest-path forwarding between switches - Switches run link-state routing to maintain only switch-level topology (i.e., do not disseminate end-host information) - Ensures data-plane efficiency
  • 29.
  • 30.
  • 31.
    30/44 Confidential PortLand R.N. Mysore and7 other persons (Univ. of California San Diego)
  • 32.
    31/44 Confidential The PortLand Architecture Adda new host Transfer a packet Key features - Layer 2 protocol based on tree topology - PMAC encode the position information - Data forwarding proceeds based on PMAC - Edge switch’s responsible for mapping between PMAC and AMAC (Rewriting) - Fabric manger’s responsible for address resolution - Edge switch makes PMAC invisible to end host - Each switch node can identify its position by itself - Fabric manager keep information of overall topology. Corresponding to the fault, it notifies affected nodes. - PMAC(48bits): pod(16).position(8).port(8).vmid(16)
  • 33.
    32/44 Confidential TRILL (RFC 5556) RadiaPerlman (Univ. of California San Diego)
  • 34.
    33/44 Confidential The TRILL TRILL: TransparentInterconnection of Lots of Links - TRILL is a new standard protocol to perform Layer 2 bridging with IS-IS link state routing technology. A simple idea - Encapsulate native frames in a transport header providing a hop count. - Route the encapsulated frames using IS-IS. - Decapsulate the native frame before delivery. Definitions - RBridge - Routing Bridge • A device which implements TRILL - RBridge Campus • A network of RBridges, links, and any intervening bridges, bounded by end stations/layer 3 router.
  • 35.
    34/44 Confidential The TRILL Encapsulation &Header TRILL Header – 64 bits Nicknames - auto-configured 16-bit campus local names for RBridges V = Version (2 bits) R = Reserved (2 bits) M = Multi-Destination (1 bit) OpLng = Length of TRILL Options Hop = Hop Limit (6 bits)
  • 36.
    35/44 Confidential The TRILL Packet Routing -ESADI (End Station Address Distribution Information protocol)
  • 37.
  • 38.
    37/44 Confidential Related Works OpenFlow - Sharesidea of simple switches controlled by external SW - Monsoon & VL2 is a philosophy for how to use the switches Brocade: Brocade One (TRILL, Clos Net, DCB) Cisco: FabricPath (TRILL) Juniper: Qfabric (HW & FC)
  • 39.
    38/44 Confidential Summary Comparison of theData Center Network Architecture Monsoon VL2 SEATTLE FAT-TREE PortLand SPAIN MOOS E TRILL Dcell Bcube MDCube Org. MS Research Univ. of Princeton Univ. of California San Diego HP Univ. of Cambrid ge MS Research Asia Publishing SIGCOMM 2008 SIGCOMM 2009 SIGCOMM 2008 SIGCOMM 2008 SIGCOMM 2009 NSDI 2010 DC CAVE S Works hop 2009 RFC 5556 2009 SIGCOMM 2008 SIGCOMM 2009 CoNEXT 2009 Authors Albert Greenberg… Albert Greenberg, Changhoon Kim… Changhoon Kim… M. Al-Fares… R.N. Mysore… J. Mudigon da, M. Al-Fare s… M. Scott … Radia Perlman C. GUO… C. GUO… H. Wu, C. GUO… Topology Clos Network Clos Network N/A Fat-Tree Fat-Tree N/A N/A N/A Bcube Topo logy Packetizing MAC-in-MAC (802.1ah PBB) IP-in-IP IP-in-IP(?) IP rewriting MAC rewriting (PMAC) MAC rewriting TRILL Hdr Load Spreading MAC-Rotation ECMP ECMP ECMP ECMP Multi-path O O X O O O X O Mod. of End-Host? O O X X X O X X O Mod. of switches? O X O O (Special HW) O (Special HW) X O (Rbridge) △ ARP Directory Server Directory Server DHT on the switches Fabric Manager ESADI
  • 40.
  • 41.

Editor's Notes

  • #20 RSM : Replication Server Manager