SlideShare a Scribd company logo
15: Datacenter Design and Networking
Zubair Nabi
zubair.nabi@itu.edu.pk
April 21, 2013
Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 1 / 27
Outline
1 Datacenter Topologies
2 Transport Protocols
3 Network Sharing
4 Wrapping Up
Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 2 / 27
Outline
1 Datacenter Topologies
2 Transport Protocols
3 Network Sharing
4 Wrapping Up
Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 3 / 27
Introduction
Datacenters are traditionally designed in the form of a 2/3-level tree
Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 4 / 27
Introduction
Datacenters are traditionally designed in the form of a 2/3-level tree
Switching elements become more specialized and faster when we go
up the tree structure
Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 4 / 27
Introduction
Datacenters are traditionally designed in the form of a 2/3-level tree
Switching elements become more specialized and faster when we go
up the tree structure
A three-level tree has a core switch at the root, aggregation switches
in the middle, and edge switches at the leaves of the tree
Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 4 / 27
Introduction
Datacenters are traditionally designed in the form of a 2/3-level tree
Switching elements become more specialized and faster when we go
up the tree structure
A three-level tree has a core switch at the root, aggregation switches
in the middle, and edge switches at the leaves of the tree
Edge switches have a large number of 1Gbps ports and a small
number of 10Gbps ports
Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 4 / 27
Introduction
Datacenters are traditionally designed in the form of a 2/3-level tree
Switching elements become more specialized and faster when we go
up the tree structure
A three-level tree has a core switch at the root, aggregation switches
in the middle, and edge switches at the leaves of the tree
Edge switches have a large number of 1Gbps ports and a small
number of 10Gbps ports
The 1Gbps ports connect end-hosts while 10Gbps ports connect to
aggregation switches
Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 4 / 27
Introduction
Datacenters are traditionally designed in the form of a 2/3-level tree
Switching elements become more specialized and faster when we go
up the tree structure
A three-level tree has a core switch at the root, aggregation switches
in the middle, and edge switches at the leaves of the tree
Edge switches have a large number of 1Gbps ports and a small
number of 10Gbps ports
The 1Gbps ports connect end-hosts while 10Gbps ports connect to
aggregation switches
Aggregation and core switches have 10Gbps ports
Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 4 / 27
Introduction
Datacenters are traditionally designed in the form of a 2/3-level tree
Switching elements become more specialized and faster when we go
up the tree structure
A three-level tree has a core switch at the root, aggregation switches
in the middle, and edge switches at the leaves of the tree
Edge switches have a large number of 1Gbps ports and a small
number of 10Gbps ports
The 1Gbps ports connect end-hosts while 10Gbps ports connect to
aggregation switches
Aggregation and core switches have 10Gbps ports
Partitioning if switches up the tree go down
Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 4 / 27
Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 5 / 27
Oversubscription
Ideal value of 1:1 – All hosts may potentially communicate with others
at full bandwidth of their interface
Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 6 / 27
Oversubscription
Ideal value of 1:1 – All hosts may potentially communicate with others
at full bandwidth of their interface
5:1 – Only 20% of the bandwidth is available (200Mbps)
Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 6 / 27
Oversubscription
Ideal value of 1:1 – All hosts may potentially communicate with others
at full bandwidth of their interface
5:1 – Only 20% of the bandwidth is available (200Mbps)
Typical datacenter designs are oversubscribed by a factor of 2.5:1
(400Mbps) to 8:1 (125Mbps)
Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 6 / 27
Fat-tree Topology
k-ary fat-tree has k pods
Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 7 / 27
Fat-tree Topology
k-ary fat-tree has k pods
Each pod contains two layers of k/2 switches
Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 7 / 27
Fat-tree Topology
k-ary fat-tree has k pods
Each pod contains two layers of k/2 switches
Each k-port switch in the lower layer is directly connected to k/2 hosts
Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 7 / 27
Fat-tree Topology
k-ary fat-tree has k pods
Each pod contains two layers of k/2 switches
Each k-port switch in the lower layer is directly connected to k/2 hosts
Each of the remaining k/2 ports is connected to k/2 of the k ports of the
aggregation switches
Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 7 / 27
Fat-tree Topology
k-ary fat-tree has k pods
Each pod contains two layers of k/2 switches
Each k-port switch in the lower layer is directly connected to k/2 hosts
Each of the remaining k/2 ports is connected to k/2 of the k ports of the
aggregation switches
(k/2)2
core switches
Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 7 / 27
Fat-tree Topology
k-ary fat-tree has k pods
Each pod contains two layers of k/2 switches
Each k-port switch in the lower layer is directly connected to k/2 hosts
Each of the remaining k/2 ports is connected to k/2 of the k ports of the
aggregation switches
(k/2)2
core switches
Each core switch has one port connected to each of the k pods
Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 7 / 27
Fat-tree Topology
k-ary fat-tree has k pods
Each pod contains two layers of k/2 switches
Each k-port switch in the lower layer is directly connected to k/2 hosts
Each of the remaining k/2 ports is connected to k/2 of the k ports of the
aggregation switches
(k/2)2
core switches
Each core switch has one port connected to each of the k pods
The ith port of any core switch is connected to pod i
Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 7 / 27
Fat-tree Topology
k-ary fat-tree has k pods
Each pod contains two layers of k/2 switches
Each k-port switch in the lower layer is directly connected to k/2 hosts
Each of the remaining k/2 ports is connected to k/2 of the k ports of the
aggregation switches
(k/2)2
core switches
Each core switch has one port connected to each of the k pods
The ith port of any core switch is connected to pod i
A k-ary fat-tree supports k3
/4 hosts
Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 7 / 27
Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 8 / 27
DCell
Uses a recursively defined structure to interconnect servers
Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 9 / 27
DCell
Uses a recursively defined structure to interconnect servers
Each server connects to different levels of DCells through multiple links
Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 9 / 27
DCell
Uses a recursively defined structure to interconnect servers
Each server connects to different levels of DCells through multiple links
High-level DCells are built recursively from many low-level ones
Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 9 / 27
DCell
Uses a recursively defined structure to interconnect servers
Each server connects to different levels of DCells through multiple links
High-level DCells are built recursively from many low-level ones
Fault tolerant as there is no single point of failure
Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 9 / 27
Structure
Uses servers with multiple network ports and mini-switches to
construct its recursive structure
Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 10 / 27
Structure
Uses servers with multiple network ports and mini-switches to
construct its recursive structure
DCell0 is the building block to construct larger DCells
Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 10 / 27
Structure
Uses servers with multiple network ports and mini-switches to
construct its recursive structure
DCell0 is the building block to construct larger DCells
Consists of n servers and a mini-switch
Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 10 / 27
Structure
Uses servers with multiple network ports and mini-switches to
construct its recursive structure
DCell0 is the building block to construct larger DCells
Consists of n servers and a mini-switch
High-level DCells are built recursively from many low-level ones
Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 10 / 27
Structure
Uses servers with multiple network ports and mini-switches to
construct its recursive structure
DCell0 is the building block to construct larger DCells
Consists of n servers and a mini-switch
High-level DCells are built recursively from many low-level ones
DCell1 constructed using n +1 DCell0s
Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 10 / 27
Structure
Uses servers with multiple network ports and mini-switches to
construct its recursive structure
DCell0 is the building block to construct larger DCells
Consists of n servers and a mini-switch
High-level DCells are built recursively from many low-level ones
DCell1 constructed using n +1 DCell0s
The same applies to DCellk
Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 10 / 27
Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 11 / 27
Outline
1 Datacenter Topologies
2 Transport Protocols
3 Network Sharing
4 Wrapping Up
Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 12 / 27
TCP and UDP
TCP: Connection-oriented with reliability, ordering, and congestion
control
Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 13 / 27
TCP and UDP
TCP: Connection-oriented with reliability, ordering, and congestion
control
UDP: Connectionless with no ordering, reliability, or congestion control
Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 13 / 27
TCP and Datacenter Networks
Communication between different nodes is thought of as just opening a
TCP connection between them
Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 14 / 27
TCP and Datacenter Networks
Communication between different nodes is thought of as just opening a
TCP connection between them
Common sockets API
Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 14 / 27
TCP and Datacenter Networks
Communication between different nodes is thought of as just opening a
TCP connection between them
Common sockets API
But TCP was designed for a wide-area network
Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 14 / 27
TCP and Datacenter Networks
Communication between different nodes is thought of as just opening a
TCP connection between them
Common sockets API
But TCP was designed for a wide-area network
Clearly, a datacenter is not a wide-area network
Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 14 / 27
TCP and Datacenter Networks
Communication between different nodes is thought of as just opening a
TCP connection between them
Common sockets API
But TCP was designed for a wide-area network
Clearly, a datacenter is not a wide-area network
Significantly different bandwidth-delay product, round-trip time (RTT),
and retransmission timeout (RTO)
Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 14 / 27
TCP and Datacenter Networks
Communication between different nodes is thought of as just opening a
TCP connection between them
Common sockets API
But TCP was designed for a wide-area network
Clearly, a datacenter is not a wide-area network
Significantly different bandwidth-delay product, round-trip time (RTT),
and retransmission timeout (RTO)
For example, due to the low RTT, the congestion window for each flow
is very small
Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 14 / 27
TCP and Datacenter Networks
Communication between different nodes is thought of as just opening a
TCP connection between them
Common sockets API
But TCP was designed for a wide-area network
Clearly, a datacenter is not a wide-area network
Significantly different bandwidth-delay product, round-trip time (RTT),
and retransmission timeout (RTO)
For example, due to the low RTT, the congestion window for each flow
is very small
As a result, flow recovery through TCP fast retransmit is impossible,
leading to poor net throughput
Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 14 / 27
More problems for TCP
In production data centers, due to the widely-varying mix of
applications, congestion in the network can last from 10s to 100s of
seconds
Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 15 / 27
More problems for TCP
In production data centers, due to the widely-varying mix of
applications, congestion in the network can last from 10s to 100s of
seconds
In commodity switches the buffer pool is shared by all interfaces
Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 15 / 27
More problems for TCP
In production data centers, due to the widely-varying mix of
applications, congestion in the network can last from 10s to 100s of
seconds
In commodity switches the buffer pool is shared by all interfaces
If long flows hog the memory, queues can build up for the short flows
Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 15 / 27
More problems for TCP
In production data centers, due to the widely-varying mix of
applications, congestion in the network can last from 10s to 100s of
seconds
In commodity switches the buffer pool is shared by all interfaces
If long flows hog the memory, queues can build up for the short flows
Many-to-one communication patterns can lead to TCP throughput
collapse or incast
Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 15 / 27
More problems for TCP
In production data centers, due to the widely-varying mix of
applications, congestion in the network can last from 10s to 100s of
seconds
In commodity switches the buffer pool is shared by all interfaces
If long flows hog the memory, queues can build up for the short flows
Many-to-one communication patterns can lead to TCP throughput
collapse or incast
This can cause overall application throughput to decrease by up to 90%
Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 15 / 27
More problems for TCP
In production data centers, due to the widely-varying mix of
applications, congestion in the network can last from 10s to 100s of
seconds
In commodity switches the buffer pool is shared by all interfaces
If long flows hog the memory, queues can build up for the short flows
Many-to-one communication patterns can lead to TCP throughput
collapse or incast
This can cause overall application throughput to decrease by up to 90%
In virtualized environments, the time sharing of resources increases
the latency faced by the VMs
Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 15 / 27
More problems for TCP
In production data centers, due to the widely-varying mix of
applications, congestion in the network can last from 10s to 100s of
seconds
In commodity switches the buffer pool is shared by all interfaces
If long flows hog the memory, queues can build up for the short flows
Many-to-one communication patterns can lead to TCP throughput
collapse or incast
This can cause overall application throughput to decrease by up to 90%
In virtualized environments, the time sharing of resources increases
the latency faced by the VMs
This latency can be orders of magnitude higher than the RTT between
hosts inside a datacenter, leading to slow progress of TCP connections
Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 15 / 27
Reaction
Some large-scale deployments have abandoned TCP altogether
Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 16 / 27
Reaction
Some large-scale deployments have abandoned TCP altogether
For instance, Facebook now uses a custom UDP transport
Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 16 / 27
Reaction
Some large-scale deployments have abandoned TCP altogether
For instance, Facebook now uses a custom UDP transport
It might be a “kitchen-sink” solution but it is sub-optimal in a datacenter
environment
Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 16 / 27
Reaction
Some large-scale deployments have abandoned TCP altogether
For instance, Facebook now uses a custom UDP transport
It might be a “kitchen-sink” solution but it is sub-optimal in a datacenter
environment
Over the years, a number of alternatives have been proposed
Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 16 / 27
Datacenter TCP (DCTCP)
Uses Explicit Congestion Notifications (ECN) from switches to perform
active queue management-based congestion control
Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 17 / 27
Datacenter TCP (DCTCP)
Uses Explicit Congestion Notifications (ECN) from switches to perform
active queue management-based congestion control
Switches set the congestion experienced flag in packets whenever the
buffer occupancy exceeds a small threshold
Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 17 / 27
Datacenter TCP (DCTCP)
Uses Explicit Congestion Notifications (ECN) from switches to perform
active queue management-based congestion control
Switches set the congestion experienced flag in packets whenever the
buffer occupancy exceeds a small threshold
DCTCP uses this information to reduce the size of the window based
on a fraction of the marked packets
Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 17 / 27
Datacenter TCP (DCTCP)
Uses Explicit Congestion Notifications (ECN) from switches to perform
active queue management-based congestion control
Switches set the congestion experienced flag in packets whenever the
buffer occupancy exceeds a small threshold
DCTCP uses this information to reduce the size of the window based
on a fraction of the marked packets
Enables it to react quickly to queue build and avoid buffer pressure
Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 17 / 27
Multipath TCP (MPTCP)
Establishes multiple subflows over different paths between a pair of
end-hosts
Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 18 / 27
Multipath TCP (MPTCP)
Establishes multiple subflows over different paths between a pair of
end-hosts
These subflows operate under a single TCP connection
Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 18 / 27
Multipath TCP (MPTCP)
Establishes multiple subflows over different paths between a pair of
end-hosts
These subflows operate under a single TCP connection
The fraction of the total congestion window for each flow is determined
by its speed
Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 18 / 27
Multipath TCP (MPTCP)
Establishes multiple subflows over different paths between a pair of
end-hosts
These subflows operate under a single TCP connection
The fraction of the total congestion window for each flow is determined
by its speed
Moves traffic away from the most congested paths
Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 18 / 27
tcpcrypt
Backwards compatible enhancement to TCP that aims to efficiently
and transparently provide encrypted communication to applications
Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 19 / 27
tcpcrypt
Backwards compatible enhancement to TCP that aims to efficiently
and transparently provide encrypted communication to applications
Uses a custom key exchange protocol that leverages the TCP options
field
Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 19 / 27
tcpcrypt
Backwards compatible enhancement to TCP that aims to efficiently
and transparently provide encrypted communication to applications
Uses a custom key exchange protocol that leverages the TCP options
field
Like SSL, to reduce the cost of connection setup for short-lived flows, it
enables cryptographic state from one TCP connection to bootstrap
subsequent ones
Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 19 / 27
tcpcrypt
Backwards compatible enhancement to TCP that aims to efficiently
and transparently provide encrypted communication to applications
Uses a custom key exchange protocol that leverages the TCP options
field
Like SSL, to reduce the cost of connection setup for short-lived flows, it
enables cryptographic state from one TCP connection to bootstrap
subsequent ones
Applications can also be made aware of the presence of tcpcrypt to
negate redundant encryption
Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 19 / 27
Deadline-Driven Delivery (D3
)
Targets applications with distributed workflow and latency targets
Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 20 / 27
Deadline-Driven Delivery (D3
)
Targets applications with distributed workflow and latency targets
Such applications associate a deadline with each network flow and the
flow is only useful if the deadline is met
Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 20 / 27
Deadline-Driven Delivery (D3
)
Targets applications with distributed workflow and latency targets
Such applications associate a deadline with each network flow and the
flow is only useful if the deadline is met
Applications expose flow deadline and size information which is
exploited by end hosts to request rates from routers along the data path
Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 20 / 27
Outline
1 Datacenter Topologies
2 Transport Protocols
3 Network Sharing
4 Wrapping Up
Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 21 / 27
Introduction
Network resources are shared amongst the tenants, which can lead to
contention and other undesired behaviour
Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 22 / 27
Introduction
Network resources are shared amongst the tenants, which can lead to
contention and other undesired behaviour
Network performance isolation between tenants can be an important
tool for:
Minimizing disruption from legitimate tenants that run network-intensive
workloads
Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 22 / 27
Introduction
Network resources are shared amongst the tenants, which can lead to
contention and other undesired behaviour
Network performance isolation between tenants can be an important
tool for:
Minimizing disruption from legitimate tenants that run network-intensive
workloads
Protecting against malicious tenants that launch DoS attacks
Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 22 / 27
Introduction
Network resources are shared amongst the tenants, which can lead to
contention and other undesired behaviour
Network performance isolation between tenants can be an important
tool for:
Minimizing disruption from legitimate tenants that run network-intensive
workloads
Protecting against malicious tenants that launch DoS attacks
The standard methodology to ensure isolation is to use VLANs
Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 22 / 27
Virtual LAN
Acts like an ordinary LAN but end-hosts do no necessarily have to be
physically connected to the same segment
Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 23 / 27
Virtual LAN
Acts like an ordinary LAN but end-hosts do no necessarily have to be
physically connected to the same segment
Nodes are grouped together by the VLAN
Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 23 / 27
Virtual LAN
Acts like an ordinary LAN but end-hosts do no necessarily have to be
physically connected to the same segment
Nodes are grouped together by the VLAN
Broadcasts can also be sent within the same VLAN
Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 23 / 27
Virtual LAN
Acts like an ordinary LAN but end-hosts do no necessarily have to be
physically connected to the same segment
Nodes are grouped together by the VLAN
Broadcasts can also be sent within the same VLAN
VLAN membership information is inserted into Ethernet frames
Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 23 / 27
Rate-limiting End-hosts
In Xen the network bandwidth available to each domU can be rate
limited
Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 24 / 27
Rate-limiting End-hosts
In Xen the network bandwidth available to each domU can be rate
limited
Can be used to implement basic QoS
Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 24 / 27
Rate-limiting End-hosts
In Xen the network bandwidth available to each domU can be rate
limited
Can be used to implement basic QoS
The virtual interface is simply rate-limited
Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 24 / 27
Outline
1 Datacenter Topologies
2 Transport Protocols
3 Network Sharing
4 Wrapping Up
Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 25 / 27
The End
In reverse order:
1 Cloud stacks be used to turn clusters and datacenters into private and
public clouds
Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 26 / 27
The End
In reverse order:
1 Cloud stacks be used to turn clusters and datacenters into private and
public clouds
2 Virtualization of computation, storage, and networking can allow many
tenants to co-exist
Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 26 / 27
The End
In reverse order:
1 Cloud stacks be used to turn clusters and datacenters into private and
public clouds
2 Virtualization of computation, storage, and networking can allow many
tenants to co-exist
3 Most data does not fit the relational model and is more suited for
NoSQL stores
Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 26 / 27
The End
In reverse order:
1 Cloud stacks be used to turn clusters and datacenters into private and
public clouds
2 Virtualization of computation, storage, and networking can allow many
tenants to co-exist
3 Most data does not fit the relational model and is more suited for
NoSQL stores
4 Data-intensive, task-parallel frameworks abstract away the details of
distribution, work allocation, sychronization, concurreny, and
communication; Perfect match for the cloud
Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 26 / 27
The End
In reverse order:
1 Cloud stacks be used to turn clusters and datacenters into private and
public clouds
2 Virtualization of computation, storage, and networking can allow many
tenants to co-exist
3 Most data does not fit the relational model and is more suited for
NoSQL stores
4 Data-intensive, task-parallel frameworks abstract away the details of
distribution, work allocation, sychronization, concurreny, and
communication; Perfect match for the cloud
5 The future is Big Data and Cloud Computing!
Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 26 / 27
References
1 Mohammad Al-Fares, Alexander Loukissas, and Amin Vahdat. 2008. A
scalable, commodity data center network architecture. In Proceedings
of the ACM SIGCOMM 2008 conference on Data communication
(SIGCOMM ’08). ACM, New York, NY, USA, 63-74.
2 Chuanxiong Guo, Haitao Wu, Kun Tan, Lei Shi, Yongguang Zhang, and
Songwu Lu. 2008. Dcell: a scalable and fault-tolerant network
structure for data centers. In Proceedings of the ACM SIGCOMM 2008
conference on Data communication (SIGCOMM ’08). ACM, New York,
NY, USA, 75-86.
Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 27 / 27

More Related Content

Viewers also liked

AOS Lab 8: Interrupts and Device Drivers
AOS Lab 8: Interrupts and Device DriversAOS Lab 8: Interrupts and Device Drivers
AOS Lab 8: Interrupts and Device DriversZubair Nabi
 
AOS Lab 5: System calls
AOS Lab 5: System callsAOS Lab 5: System calls
AOS Lab 5: System callsZubair Nabi
 
AOS Lab 6: Scheduling
AOS Lab 6: SchedulingAOS Lab 6: Scheduling
AOS Lab 6: SchedulingZubair Nabi
 
AOS Lab 9: File system -- Of buffers, logs, and blocks
AOS Lab 9: File system -- Of buffers, logs, and blocksAOS Lab 9: File system -- Of buffers, logs, and blocks
AOS Lab 9: File system -- Of buffers, logs, and blocksZubair Nabi
 
AOS Lab 2: Hello, xv6!
AOS Lab 2: Hello, xv6!AOS Lab 2: Hello, xv6!
AOS Lab 2: Hello, xv6!Zubair Nabi
 
Topic 14: Operating Systems and Virtualization
Topic 14: Operating Systems and VirtualizationTopic 14: Operating Systems and Virtualization
Topic 14: Operating Systems and Virtualization
Zubair Nabi
 
AOS Lab 12: Network Communication
AOS Lab 12: Network CommunicationAOS Lab 12: Network Communication
AOS Lab 12: Network CommunicationZubair Nabi
 
Re-architecting the Datacenter to Deliver Better Experiences (Intel)
Re-architecting the Datacenter to Deliver Better Experiences (Intel)Re-architecting the Datacenter to Deliver Better Experiences (Intel)
Re-architecting the Datacenter to Deliver Better Experiences (Intel)
COMPUTEX TAIPEI
 
Datacenter Computing with Apache Mesos - BigData DC
Datacenter Computing with Apache Mesos - BigData DCDatacenter Computing with Apache Mesos - BigData DC
Datacenter Computing with Apache Mesos - BigData DC
Paco Nathan
 
Data center Building & General Specification
Data center Building & General Specification Data center Building & General Specification
Data center Building & General Specification
Ali Mirfallah
 
Network soft layer(20141222-2)
Network soft layer(20141222-2)Network soft layer(20141222-2)
Network soft layer(20141222-2)
Yasuhiro Arai
 
Cisco at VMworld 2015 - Cisco UCS as the Foundation for Software-Defined Data...
Cisco at VMworld 2015 - Cisco UCS as the Foundation for Software-Defined Data...Cisco at VMworld 2015 - Cisco UCS as the Foundation for Software-Defined Data...
Cisco at VMworld 2015 - Cisco UCS as the Foundation for Software-Defined Data...
ldangelo0772
 
Datacenter event - green it amsterdam - maikel bouricius - 15-09-14
Datacenter event -  green it amsterdam - maikel bouricius - 15-09-14Datacenter event -  green it amsterdam - maikel bouricius - 15-09-14
Datacenter event - green it amsterdam - maikel bouricius - 15-09-14
Karim Network
 
OpenStack in Action 4! Jean-Louis Lezaun - Re-architecturing the datacenter :...
OpenStack in Action 4! Jean-Louis Lezaun - Re-architecturing the datacenter :...OpenStack in Action 4! Jean-Louis Lezaun - Re-architecturing the datacenter :...
OpenStack in Action 4! Jean-Louis Lezaun - Re-architecturing the datacenter :...
eNovance
 
Datacenter Revolution Dean Nelson, Sun
Datacenter  Revolution    Dean  Nelson,  SunDatacenter  Revolution    Dean  Nelson,  Sun
Datacenter Revolution Dean Nelson, SunNiklas Johnsson
 
Datacenter Design - DP Air
Datacenter Design - DP AirDatacenter Design - DP Air
Datacenter Design - DP Air
dpsir
 
TECHNOLOGY ACCELERATING INFRASTRUCTURE DEVELOPMENT FOR ATTAINING THE NIGERIAN...
TECHNOLOGY ACCELERATING INFRASTRUCTURE DEVELOPMENT FOR ATTAINING THE NIGERIAN...TECHNOLOGY ACCELERATING INFRASTRUCTURE DEVELOPMENT FOR ATTAINING THE NIGERIAN...
TECHNOLOGY ACCELERATING INFRASTRUCTURE DEVELOPMENT FOR ATTAINING THE NIGERIAN...
itnewsafrica
 
Welcome to Hybrid Cloud Innovation Tour 2016
Welcome to Hybrid Cloud Innovation Tour 2016Welcome to Hybrid Cloud Innovation Tour 2016
Welcome to Hybrid Cloud Innovation Tour 2016
LaurenWendler
 
Network Repairs
Network RepairsNetwork Repairs
Network Repairs
Networkrepairs
 

Viewers also liked (19)

AOS Lab 8: Interrupts and Device Drivers
AOS Lab 8: Interrupts and Device DriversAOS Lab 8: Interrupts and Device Drivers
AOS Lab 8: Interrupts and Device Drivers
 
AOS Lab 5: System calls
AOS Lab 5: System callsAOS Lab 5: System calls
AOS Lab 5: System calls
 
AOS Lab 6: Scheduling
AOS Lab 6: SchedulingAOS Lab 6: Scheduling
AOS Lab 6: Scheduling
 
AOS Lab 9: File system -- Of buffers, logs, and blocks
AOS Lab 9: File system -- Of buffers, logs, and blocksAOS Lab 9: File system -- Of buffers, logs, and blocks
AOS Lab 9: File system -- Of buffers, logs, and blocks
 
AOS Lab 2: Hello, xv6!
AOS Lab 2: Hello, xv6!AOS Lab 2: Hello, xv6!
AOS Lab 2: Hello, xv6!
 
Topic 14: Operating Systems and Virtualization
Topic 14: Operating Systems and VirtualizationTopic 14: Operating Systems and Virtualization
Topic 14: Operating Systems and Virtualization
 
AOS Lab 12: Network Communication
AOS Lab 12: Network CommunicationAOS Lab 12: Network Communication
AOS Lab 12: Network Communication
 
Re-architecting the Datacenter to Deliver Better Experiences (Intel)
Re-architecting the Datacenter to Deliver Better Experiences (Intel)Re-architecting the Datacenter to Deliver Better Experiences (Intel)
Re-architecting the Datacenter to Deliver Better Experiences (Intel)
 
Datacenter Computing with Apache Mesos - BigData DC
Datacenter Computing with Apache Mesos - BigData DCDatacenter Computing with Apache Mesos - BigData DC
Datacenter Computing with Apache Mesos - BigData DC
 
Data center Building & General Specification
Data center Building & General Specification Data center Building & General Specification
Data center Building & General Specification
 
Network soft layer(20141222-2)
Network soft layer(20141222-2)Network soft layer(20141222-2)
Network soft layer(20141222-2)
 
Cisco at VMworld 2015 - Cisco UCS as the Foundation for Software-Defined Data...
Cisco at VMworld 2015 - Cisco UCS as the Foundation for Software-Defined Data...Cisco at VMworld 2015 - Cisco UCS as the Foundation for Software-Defined Data...
Cisco at VMworld 2015 - Cisco UCS as the Foundation for Software-Defined Data...
 
Datacenter event - green it amsterdam - maikel bouricius - 15-09-14
Datacenter event -  green it amsterdam - maikel bouricius - 15-09-14Datacenter event -  green it amsterdam - maikel bouricius - 15-09-14
Datacenter event - green it amsterdam - maikel bouricius - 15-09-14
 
OpenStack in Action 4! Jean-Louis Lezaun - Re-architecturing the datacenter :...
OpenStack in Action 4! Jean-Louis Lezaun - Re-architecturing the datacenter :...OpenStack in Action 4! Jean-Louis Lezaun - Re-architecturing the datacenter :...
OpenStack in Action 4! Jean-Louis Lezaun - Re-architecturing the datacenter :...
 
Datacenter Revolution Dean Nelson, Sun
Datacenter  Revolution    Dean  Nelson,  SunDatacenter  Revolution    Dean  Nelson,  Sun
Datacenter Revolution Dean Nelson, Sun
 
Datacenter Design - DP Air
Datacenter Design - DP AirDatacenter Design - DP Air
Datacenter Design - DP Air
 
TECHNOLOGY ACCELERATING INFRASTRUCTURE DEVELOPMENT FOR ATTAINING THE NIGERIAN...
TECHNOLOGY ACCELERATING INFRASTRUCTURE DEVELOPMENT FOR ATTAINING THE NIGERIAN...TECHNOLOGY ACCELERATING INFRASTRUCTURE DEVELOPMENT FOR ATTAINING THE NIGERIAN...
TECHNOLOGY ACCELERATING INFRASTRUCTURE DEVELOPMENT FOR ATTAINING THE NIGERIAN...
 
Welcome to Hybrid Cloud Innovation Tour 2016
Welcome to Hybrid Cloud Innovation Tour 2016Welcome to Hybrid Cloud Innovation Tour 2016
Welcome to Hybrid Cloud Innovation Tour 2016
 
Network Repairs
Network RepairsNetwork Repairs
Network Repairs
 

Similar to Topic 15: Datacenter Design and Networking

Topic 12: NoSQL in Action
Topic 12: NoSQL in ActionTopic 12: NoSQL in Action
Topic 12: NoSQL in Action
Zubair Nabi
 
The architecture of oak
The architecture of oakThe architecture of oak
The architecture of oak
Michael Dürig
 
Database migration from Sybase ASE to PostgreSQL @2013.pgconf.eu
Database migration from Sybase ASE to PostgreSQL @2013.pgconf.euDatabase migration from Sybase ASE to PostgreSQL @2013.pgconf.eu
Database migration from Sybase ASE to PostgreSQL @2013.pgconf.eu
aldaschwede80
 
Topic 8: Enhancements and Alternative Architectures
Topic 8: Enhancements and Alternative ArchitecturesTopic 8: Enhancements and Alternative Architectures
Topic 8: Enhancements and Alternative Architectures
Zubair Nabi
 
BGP Traffic Engineering / Routing Optimisation
BGP Traffic Engineering / Routing OptimisationBGP Traffic Engineering / Routing Optimisation
BGP Traffic Engineering / Routing Optimisation
Andy Davidson
 
Large Partially-connected Erlang Clusters
 Large Partially-connected Erlang Clusters Large Partially-connected Erlang Clusters
Large Partially-connected Erlang Clusters
Motiejus Jakštys
 
Lab 5: Interconnecting a Datacenter using Mininet
Lab 5: Interconnecting a Datacenter using MininetLab 5: Interconnecting a Datacenter using Mininet
Lab 5: Interconnecting a Datacenter using Mininet
Zubair Nabi
 
IMS to DB2 Migration: How a Fortune 500 Company Made the Move in Record Time ...
IMS to DB2 Migration: How a Fortune 500 Company Made the Move in Record Time ...IMS to DB2 Migration: How a Fortune 500 Company Made the Move in Record Time ...
IMS to DB2 Migration: How a Fortune 500 Company Made the Move in Record Time ...
Precisely
 
Public Seminar_Final 18112014
Public Seminar_Final 18112014Public Seminar_Final 18112014
Public Seminar_Final 18112014Hossam Hassan
 
Capture the Streams of Database Changes
Capture the Streams of Database ChangesCapture the Streams of Database Changes
Capture the Streams of Database Changes
confluent
 
MUCUGL October 2013 - Lync Server Top To Bottom, Big To Small
MUCUGL October 2013 - Lync Server Top To Bottom, Big To Small MUCUGL October 2013 - Lync Server Top To Bottom, Big To Small
MUCUGL October 2013 - Lync Server Top To Bottom, Big To Small
MUCUGL
 
Leveraging Endpoint Flexibility in Data-Intensive Clusters
Leveraging Endpoint Flexibility in Data-Intensive ClustersLeveraging Endpoint Flexibility in Data-Intensive Clusters
Leveraging Endpoint Flexibility in Data-Intensive Clusters
Ran Ziv
 
PLNOG 9: Donald E. Eastlake 3rd - Transparent Interconnection of Lost of Links
PLNOG 9: Donald E. Eastlake 3rd - Transparent Interconnection of Lost of Links  PLNOG 9: Donald E. Eastlake 3rd - Transparent Interconnection of Lost of Links
PLNOG 9: Donald E. Eastlake 3rd - Transparent Interconnection of Lost of Links
PROIDEA
 
Bloomreach - BloomStore Compute Cloud Infrastructure
Bloomreach - BloomStore Compute Cloud Infrastructure Bloomreach - BloomStore Compute Cloud Infrastructure
Bloomreach - BloomStore Compute Cloud Infrastructure
bloomreacheng
 
Efficient Data Center Virtualization with QLogic 10GbE Solutions from HP
Efficient Data Center Virtualization with QLogic 10GbE Solutions from HPEfficient Data Center Virtualization with QLogic 10GbE Solutions from HP
Efficient Data Center Virtualization with QLogic 10GbE Solutions from HP
Jone Smith
 
Network design assignment
Network design assignmentNetwork design assignment
Network design assignment
Total Assignment Help
 
Computer Networking
Computer NetworkingComputer Networking
Computer Networking
Ranjan K.M.
 
OrientDB - the 2nd generation of (Multi-Model) NoSQL - Codemotion Warsaw 2016
OrientDB - the 2nd generation of (Multi-Model) NoSQL - Codemotion Warsaw 2016OrientDB - the 2nd generation of (Multi-Model) NoSQL - Codemotion Warsaw 2016
OrientDB - the 2nd generation of (Multi-Model) NoSQL - Codemotion Warsaw 2016
Luigi Dell'Aquila
 
Campus networks best practices core and edges network
Campus networks best practices core and edges networkCampus networks best practices core and edges network
Campus networks best practices core and edges network
Ashish Thomas
 
Designing Cloud and Grid Computing Systems with InfiniBand and High-Speed Eth...
Designing Cloud and Grid Computing Systems with InfiniBand and High-Speed Eth...Designing Cloud and Grid Computing Systems with InfiniBand and High-Speed Eth...
Designing Cloud and Grid Computing Systems with InfiniBand and High-Speed Eth...
Mason Mei
 

Similar to Topic 15: Datacenter Design and Networking (20)

Topic 12: NoSQL in Action
Topic 12: NoSQL in ActionTopic 12: NoSQL in Action
Topic 12: NoSQL in Action
 
The architecture of oak
The architecture of oakThe architecture of oak
The architecture of oak
 
Database migration from Sybase ASE to PostgreSQL @2013.pgconf.eu
Database migration from Sybase ASE to PostgreSQL @2013.pgconf.euDatabase migration from Sybase ASE to PostgreSQL @2013.pgconf.eu
Database migration from Sybase ASE to PostgreSQL @2013.pgconf.eu
 
Topic 8: Enhancements and Alternative Architectures
Topic 8: Enhancements and Alternative ArchitecturesTopic 8: Enhancements and Alternative Architectures
Topic 8: Enhancements and Alternative Architectures
 
BGP Traffic Engineering / Routing Optimisation
BGP Traffic Engineering / Routing OptimisationBGP Traffic Engineering / Routing Optimisation
BGP Traffic Engineering / Routing Optimisation
 
Large Partially-connected Erlang Clusters
 Large Partially-connected Erlang Clusters Large Partially-connected Erlang Clusters
Large Partially-connected Erlang Clusters
 
Lab 5: Interconnecting a Datacenter using Mininet
Lab 5: Interconnecting a Datacenter using MininetLab 5: Interconnecting a Datacenter using Mininet
Lab 5: Interconnecting a Datacenter using Mininet
 
IMS to DB2 Migration: How a Fortune 500 Company Made the Move in Record Time ...
IMS to DB2 Migration: How a Fortune 500 Company Made the Move in Record Time ...IMS to DB2 Migration: How a Fortune 500 Company Made the Move in Record Time ...
IMS to DB2 Migration: How a Fortune 500 Company Made the Move in Record Time ...
 
Public Seminar_Final 18112014
Public Seminar_Final 18112014Public Seminar_Final 18112014
Public Seminar_Final 18112014
 
Capture the Streams of Database Changes
Capture the Streams of Database ChangesCapture the Streams of Database Changes
Capture the Streams of Database Changes
 
MUCUGL October 2013 - Lync Server Top To Bottom, Big To Small
MUCUGL October 2013 - Lync Server Top To Bottom, Big To Small MUCUGL October 2013 - Lync Server Top To Bottom, Big To Small
MUCUGL October 2013 - Lync Server Top To Bottom, Big To Small
 
Leveraging Endpoint Flexibility in Data-Intensive Clusters
Leveraging Endpoint Flexibility in Data-Intensive ClustersLeveraging Endpoint Flexibility in Data-Intensive Clusters
Leveraging Endpoint Flexibility in Data-Intensive Clusters
 
PLNOG 9: Donald E. Eastlake 3rd - Transparent Interconnection of Lost of Links
PLNOG 9: Donald E. Eastlake 3rd - Transparent Interconnection of Lost of Links  PLNOG 9: Donald E. Eastlake 3rd - Transparent Interconnection of Lost of Links
PLNOG 9: Donald E. Eastlake 3rd - Transparent Interconnection of Lost of Links
 
Bloomreach - BloomStore Compute Cloud Infrastructure
Bloomreach - BloomStore Compute Cloud Infrastructure Bloomreach - BloomStore Compute Cloud Infrastructure
Bloomreach - BloomStore Compute Cloud Infrastructure
 
Efficient Data Center Virtualization with QLogic 10GbE Solutions from HP
Efficient Data Center Virtualization with QLogic 10GbE Solutions from HPEfficient Data Center Virtualization with QLogic 10GbE Solutions from HP
Efficient Data Center Virtualization with QLogic 10GbE Solutions from HP
 
Network design assignment
Network design assignmentNetwork design assignment
Network design assignment
 
Computer Networking
Computer NetworkingComputer Networking
Computer Networking
 
OrientDB - the 2nd generation of (Multi-Model) NoSQL - Codemotion Warsaw 2016
OrientDB - the 2nd generation of (Multi-Model) NoSQL - Codemotion Warsaw 2016OrientDB - the 2nd generation of (Multi-Model) NoSQL - Codemotion Warsaw 2016
OrientDB - the 2nd generation of (Multi-Model) NoSQL - Codemotion Warsaw 2016
 
Campus networks best practices core and edges network
Campus networks best practices core and edges networkCampus networks best practices core and edges network
Campus networks best practices core and edges network
 
Designing Cloud and Grid Computing Systems with InfiniBand and High-Speed Eth...
Designing Cloud and Grid Computing Systems with InfiniBand and High-Speed Eth...Designing Cloud and Grid Computing Systems with InfiniBand and High-Speed Eth...
Designing Cloud and Grid Computing Systems with InfiniBand and High-Speed Eth...
 

More from Zubair Nabi

AOS Lab 1: Hello, Linux!
AOS Lab 1: Hello, Linux!AOS Lab 1: Hello, Linux!
AOS Lab 1: Hello, Linux!Zubair Nabi
 
The Big Data Stack
The Big Data StackThe Big Data Stack
The Big Data StackZubair Nabi
 
Raabta: Low-cost Video Conferencing for the Developing World
Raabta: Low-cost Video Conferencing for the Developing WorldRaabta: Low-cost Video Conferencing for the Developing World
Raabta: Low-cost Video Conferencing for the Developing WorldZubair Nabi
 
The Anatomy of Web Censorship in Pakistan
The Anatomy of Web Censorship in PakistanThe Anatomy of Web Censorship in Pakistan
The Anatomy of Web Censorship in PakistanZubair Nabi
 
MapReduce and DBMS Hybrids
MapReduce and DBMS HybridsMapReduce and DBMS Hybrids
MapReduce and DBMS Hybrids
Zubair Nabi
 
MapReduce Application Scripting
MapReduce Application ScriptingMapReduce Application Scripting
MapReduce Application Scripting
Zubair Nabi
 
Lab 4: Interfacing with Cassandra
Lab 4: Interfacing with CassandraLab 4: Interfacing with Cassandra
Lab 4: Interfacing with Cassandra
Zubair Nabi
 
Topic 10: Taxonomy of Data and Storage
Topic 10: Taxonomy of Data and StorageTopic 10: Taxonomy of Data and Storage
Topic 10: Taxonomy of Data and Storage
Zubair Nabi
 
Topic 11: Google Filesystem
Topic 11: Google FilesystemTopic 11: Google Filesystem
Topic 11: Google Filesystem
Zubair Nabi
 
Lab 3: Writing a Naiad Application
Lab 3: Writing a Naiad ApplicationLab 3: Writing a Naiad Application
Lab 3: Writing a Naiad Application
Zubair Nabi
 
Topic 9: MR+
Topic 9: MR+Topic 9: MR+
Topic 9: MR+
Zubair Nabi
 
Topic 7: Shortcomings in the MapReduce Paradigm
Topic 7: Shortcomings in the MapReduce ParadigmTopic 7: Shortcomings in the MapReduce Paradigm
Topic 7: Shortcomings in the MapReduce Paradigm
Zubair Nabi
 
Lab 1: Introduction to Amazon EC2 and MPI
Lab 1: Introduction to Amazon EC2 and MPILab 1: Introduction to Amazon EC2 and MPI
Lab 1: Introduction to Amazon EC2 and MPI
Zubair Nabi
 
Topic 6: MapReduce Applications
Topic 6: MapReduce ApplicationsTopic 6: MapReduce Applications
Topic 6: MapReduce Applications
Zubair Nabi
 

More from Zubair Nabi (14)

AOS Lab 1: Hello, Linux!
AOS Lab 1: Hello, Linux!AOS Lab 1: Hello, Linux!
AOS Lab 1: Hello, Linux!
 
The Big Data Stack
The Big Data StackThe Big Data Stack
The Big Data Stack
 
Raabta: Low-cost Video Conferencing for the Developing World
Raabta: Low-cost Video Conferencing for the Developing WorldRaabta: Low-cost Video Conferencing for the Developing World
Raabta: Low-cost Video Conferencing for the Developing World
 
The Anatomy of Web Censorship in Pakistan
The Anatomy of Web Censorship in PakistanThe Anatomy of Web Censorship in Pakistan
The Anatomy of Web Censorship in Pakistan
 
MapReduce and DBMS Hybrids
MapReduce and DBMS HybridsMapReduce and DBMS Hybrids
MapReduce and DBMS Hybrids
 
MapReduce Application Scripting
MapReduce Application ScriptingMapReduce Application Scripting
MapReduce Application Scripting
 
Lab 4: Interfacing with Cassandra
Lab 4: Interfacing with CassandraLab 4: Interfacing with Cassandra
Lab 4: Interfacing with Cassandra
 
Topic 10: Taxonomy of Data and Storage
Topic 10: Taxonomy of Data and StorageTopic 10: Taxonomy of Data and Storage
Topic 10: Taxonomy of Data and Storage
 
Topic 11: Google Filesystem
Topic 11: Google FilesystemTopic 11: Google Filesystem
Topic 11: Google Filesystem
 
Lab 3: Writing a Naiad Application
Lab 3: Writing a Naiad ApplicationLab 3: Writing a Naiad Application
Lab 3: Writing a Naiad Application
 
Topic 9: MR+
Topic 9: MR+Topic 9: MR+
Topic 9: MR+
 
Topic 7: Shortcomings in the MapReduce Paradigm
Topic 7: Shortcomings in the MapReduce ParadigmTopic 7: Shortcomings in the MapReduce Paradigm
Topic 7: Shortcomings in the MapReduce Paradigm
 
Lab 1: Introduction to Amazon EC2 and MPI
Lab 1: Introduction to Amazon EC2 and MPILab 1: Introduction to Amazon EC2 and MPI
Lab 1: Introduction to Amazon EC2 and MPI
 
Topic 6: MapReduce Applications
Topic 6: MapReduce ApplicationsTopic 6: MapReduce Applications
Topic 6: MapReduce Applications
 

Recently uploaded

Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
RinaMondal9
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
Peter Spielvogel
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
Welocme to ViralQR, your best QR code generator.
Welocme to ViralQR, your best QR code generator.Welocme to ViralQR, your best QR code generator.
Welocme to ViralQR, your best QR code generator.
ViralQR
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
Quantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIsQuantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIs
Vlad Stirbu
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 

Recently uploaded (20)

Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
Welocme to ViralQR, your best QR code generator.
Welocme to ViralQR, your best QR code generator.Welocme to ViralQR, your best QR code generator.
Welocme to ViralQR, your best QR code generator.
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
Quantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIsQuantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIs
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 

Topic 15: Datacenter Design and Networking

  • 1. 15: Datacenter Design and Networking Zubair Nabi zubair.nabi@itu.edu.pk April 21, 2013 Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 1 / 27
  • 2. Outline 1 Datacenter Topologies 2 Transport Protocols 3 Network Sharing 4 Wrapping Up Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 2 / 27
  • 3. Outline 1 Datacenter Topologies 2 Transport Protocols 3 Network Sharing 4 Wrapping Up Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 3 / 27
  • 4. Introduction Datacenters are traditionally designed in the form of a 2/3-level tree Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 4 / 27
  • 5. Introduction Datacenters are traditionally designed in the form of a 2/3-level tree Switching elements become more specialized and faster when we go up the tree structure Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 4 / 27
  • 6. Introduction Datacenters are traditionally designed in the form of a 2/3-level tree Switching elements become more specialized and faster when we go up the tree structure A three-level tree has a core switch at the root, aggregation switches in the middle, and edge switches at the leaves of the tree Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 4 / 27
  • 7. Introduction Datacenters are traditionally designed in the form of a 2/3-level tree Switching elements become more specialized and faster when we go up the tree structure A three-level tree has a core switch at the root, aggregation switches in the middle, and edge switches at the leaves of the tree Edge switches have a large number of 1Gbps ports and a small number of 10Gbps ports Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 4 / 27
  • 8. Introduction Datacenters are traditionally designed in the form of a 2/3-level tree Switching elements become more specialized and faster when we go up the tree structure A three-level tree has a core switch at the root, aggregation switches in the middle, and edge switches at the leaves of the tree Edge switches have a large number of 1Gbps ports and a small number of 10Gbps ports The 1Gbps ports connect end-hosts while 10Gbps ports connect to aggregation switches Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 4 / 27
  • 9. Introduction Datacenters are traditionally designed in the form of a 2/3-level tree Switching elements become more specialized and faster when we go up the tree structure A three-level tree has a core switch at the root, aggregation switches in the middle, and edge switches at the leaves of the tree Edge switches have a large number of 1Gbps ports and a small number of 10Gbps ports The 1Gbps ports connect end-hosts while 10Gbps ports connect to aggregation switches Aggregation and core switches have 10Gbps ports Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 4 / 27
  • 10. Introduction Datacenters are traditionally designed in the form of a 2/3-level tree Switching elements become more specialized and faster when we go up the tree structure A three-level tree has a core switch at the root, aggregation switches in the middle, and edge switches at the leaves of the tree Edge switches have a large number of 1Gbps ports and a small number of 10Gbps ports The 1Gbps ports connect end-hosts while 10Gbps ports connect to aggregation switches Aggregation and core switches have 10Gbps ports Partitioning if switches up the tree go down Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 4 / 27
  • 11. Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 5 / 27
  • 12. Oversubscription Ideal value of 1:1 – All hosts may potentially communicate with others at full bandwidth of their interface Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 6 / 27
  • 13. Oversubscription Ideal value of 1:1 – All hosts may potentially communicate with others at full bandwidth of their interface 5:1 – Only 20% of the bandwidth is available (200Mbps) Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 6 / 27
  • 14. Oversubscription Ideal value of 1:1 – All hosts may potentially communicate with others at full bandwidth of their interface 5:1 – Only 20% of the bandwidth is available (200Mbps) Typical datacenter designs are oversubscribed by a factor of 2.5:1 (400Mbps) to 8:1 (125Mbps) Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 6 / 27
  • 15. Fat-tree Topology k-ary fat-tree has k pods Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 7 / 27
  • 16. Fat-tree Topology k-ary fat-tree has k pods Each pod contains two layers of k/2 switches Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 7 / 27
  • 17. Fat-tree Topology k-ary fat-tree has k pods Each pod contains two layers of k/2 switches Each k-port switch in the lower layer is directly connected to k/2 hosts Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 7 / 27
  • 18. Fat-tree Topology k-ary fat-tree has k pods Each pod contains two layers of k/2 switches Each k-port switch in the lower layer is directly connected to k/2 hosts Each of the remaining k/2 ports is connected to k/2 of the k ports of the aggregation switches Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 7 / 27
  • 19. Fat-tree Topology k-ary fat-tree has k pods Each pod contains two layers of k/2 switches Each k-port switch in the lower layer is directly connected to k/2 hosts Each of the remaining k/2 ports is connected to k/2 of the k ports of the aggregation switches (k/2)2 core switches Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 7 / 27
  • 20. Fat-tree Topology k-ary fat-tree has k pods Each pod contains two layers of k/2 switches Each k-port switch in the lower layer is directly connected to k/2 hosts Each of the remaining k/2 ports is connected to k/2 of the k ports of the aggregation switches (k/2)2 core switches Each core switch has one port connected to each of the k pods Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 7 / 27
  • 21. Fat-tree Topology k-ary fat-tree has k pods Each pod contains two layers of k/2 switches Each k-port switch in the lower layer is directly connected to k/2 hosts Each of the remaining k/2 ports is connected to k/2 of the k ports of the aggregation switches (k/2)2 core switches Each core switch has one port connected to each of the k pods The ith port of any core switch is connected to pod i Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 7 / 27
  • 22. Fat-tree Topology k-ary fat-tree has k pods Each pod contains two layers of k/2 switches Each k-port switch in the lower layer is directly connected to k/2 hosts Each of the remaining k/2 ports is connected to k/2 of the k ports of the aggregation switches (k/2)2 core switches Each core switch has one port connected to each of the k pods The ith port of any core switch is connected to pod i A k-ary fat-tree supports k3 /4 hosts Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 7 / 27
  • 23. Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 8 / 27
  • 24. DCell Uses a recursively defined structure to interconnect servers Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 9 / 27
  • 25. DCell Uses a recursively defined structure to interconnect servers Each server connects to different levels of DCells through multiple links Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 9 / 27
  • 26. DCell Uses a recursively defined structure to interconnect servers Each server connects to different levels of DCells through multiple links High-level DCells are built recursively from many low-level ones Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 9 / 27
  • 27. DCell Uses a recursively defined structure to interconnect servers Each server connects to different levels of DCells through multiple links High-level DCells are built recursively from many low-level ones Fault tolerant as there is no single point of failure Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 9 / 27
  • 28. Structure Uses servers with multiple network ports and mini-switches to construct its recursive structure Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 10 / 27
  • 29. Structure Uses servers with multiple network ports and mini-switches to construct its recursive structure DCell0 is the building block to construct larger DCells Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 10 / 27
  • 30. Structure Uses servers with multiple network ports and mini-switches to construct its recursive structure DCell0 is the building block to construct larger DCells Consists of n servers and a mini-switch Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 10 / 27
  • 31. Structure Uses servers with multiple network ports and mini-switches to construct its recursive structure DCell0 is the building block to construct larger DCells Consists of n servers and a mini-switch High-level DCells are built recursively from many low-level ones Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 10 / 27
  • 32. Structure Uses servers with multiple network ports and mini-switches to construct its recursive structure DCell0 is the building block to construct larger DCells Consists of n servers and a mini-switch High-level DCells are built recursively from many low-level ones DCell1 constructed using n +1 DCell0s Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 10 / 27
  • 33. Structure Uses servers with multiple network ports and mini-switches to construct its recursive structure DCell0 is the building block to construct larger DCells Consists of n servers and a mini-switch High-level DCells are built recursively from many low-level ones DCell1 constructed using n +1 DCell0s The same applies to DCellk Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 10 / 27
  • 34. Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 11 / 27
  • 35. Outline 1 Datacenter Topologies 2 Transport Protocols 3 Network Sharing 4 Wrapping Up Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 12 / 27
  • 36. TCP and UDP TCP: Connection-oriented with reliability, ordering, and congestion control Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 13 / 27
  • 37. TCP and UDP TCP: Connection-oriented with reliability, ordering, and congestion control UDP: Connectionless with no ordering, reliability, or congestion control Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 13 / 27
  • 38. TCP and Datacenter Networks Communication between different nodes is thought of as just opening a TCP connection between them Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 14 / 27
  • 39. TCP and Datacenter Networks Communication between different nodes is thought of as just opening a TCP connection between them Common sockets API Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 14 / 27
  • 40. TCP and Datacenter Networks Communication between different nodes is thought of as just opening a TCP connection between them Common sockets API But TCP was designed for a wide-area network Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 14 / 27
  • 41. TCP and Datacenter Networks Communication between different nodes is thought of as just opening a TCP connection between them Common sockets API But TCP was designed for a wide-area network Clearly, a datacenter is not a wide-area network Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 14 / 27
  • 42. TCP and Datacenter Networks Communication between different nodes is thought of as just opening a TCP connection between them Common sockets API But TCP was designed for a wide-area network Clearly, a datacenter is not a wide-area network Significantly different bandwidth-delay product, round-trip time (RTT), and retransmission timeout (RTO) Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 14 / 27
  • 43. TCP and Datacenter Networks Communication between different nodes is thought of as just opening a TCP connection between them Common sockets API But TCP was designed for a wide-area network Clearly, a datacenter is not a wide-area network Significantly different bandwidth-delay product, round-trip time (RTT), and retransmission timeout (RTO) For example, due to the low RTT, the congestion window for each flow is very small Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 14 / 27
  • 44. TCP and Datacenter Networks Communication between different nodes is thought of as just opening a TCP connection between them Common sockets API But TCP was designed for a wide-area network Clearly, a datacenter is not a wide-area network Significantly different bandwidth-delay product, round-trip time (RTT), and retransmission timeout (RTO) For example, due to the low RTT, the congestion window for each flow is very small As a result, flow recovery through TCP fast retransmit is impossible, leading to poor net throughput Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 14 / 27
  • 45. More problems for TCP In production data centers, due to the widely-varying mix of applications, congestion in the network can last from 10s to 100s of seconds Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 15 / 27
  • 46. More problems for TCP In production data centers, due to the widely-varying mix of applications, congestion in the network can last from 10s to 100s of seconds In commodity switches the buffer pool is shared by all interfaces Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 15 / 27
  • 47. More problems for TCP In production data centers, due to the widely-varying mix of applications, congestion in the network can last from 10s to 100s of seconds In commodity switches the buffer pool is shared by all interfaces If long flows hog the memory, queues can build up for the short flows Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 15 / 27
  • 48. More problems for TCP In production data centers, due to the widely-varying mix of applications, congestion in the network can last from 10s to 100s of seconds In commodity switches the buffer pool is shared by all interfaces If long flows hog the memory, queues can build up for the short flows Many-to-one communication patterns can lead to TCP throughput collapse or incast Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 15 / 27
  • 49. More problems for TCP In production data centers, due to the widely-varying mix of applications, congestion in the network can last from 10s to 100s of seconds In commodity switches the buffer pool is shared by all interfaces If long flows hog the memory, queues can build up for the short flows Many-to-one communication patterns can lead to TCP throughput collapse or incast This can cause overall application throughput to decrease by up to 90% Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 15 / 27
  • 50. More problems for TCP In production data centers, due to the widely-varying mix of applications, congestion in the network can last from 10s to 100s of seconds In commodity switches the buffer pool is shared by all interfaces If long flows hog the memory, queues can build up for the short flows Many-to-one communication patterns can lead to TCP throughput collapse or incast This can cause overall application throughput to decrease by up to 90% In virtualized environments, the time sharing of resources increases the latency faced by the VMs Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 15 / 27
  • 51. More problems for TCP In production data centers, due to the widely-varying mix of applications, congestion in the network can last from 10s to 100s of seconds In commodity switches the buffer pool is shared by all interfaces If long flows hog the memory, queues can build up for the short flows Many-to-one communication patterns can lead to TCP throughput collapse or incast This can cause overall application throughput to decrease by up to 90% In virtualized environments, the time sharing of resources increases the latency faced by the VMs This latency can be orders of magnitude higher than the RTT between hosts inside a datacenter, leading to slow progress of TCP connections Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 15 / 27
  • 52. Reaction Some large-scale deployments have abandoned TCP altogether Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 16 / 27
  • 53. Reaction Some large-scale deployments have abandoned TCP altogether For instance, Facebook now uses a custom UDP transport Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 16 / 27
  • 54. Reaction Some large-scale deployments have abandoned TCP altogether For instance, Facebook now uses a custom UDP transport It might be a “kitchen-sink” solution but it is sub-optimal in a datacenter environment Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 16 / 27
  • 55. Reaction Some large-scale deployments have abandoned TCP altogether For instance, Facebook now uses a custom UDP transport It might be a “kitchen-sink” solution but it is sub-optimal in a datacenter environment Over the years, a number of alternatives have been proposed Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 16 / 27
  • 56. Datacenter TCP (DCTCP) Uses Explicit Congestion Notifications (ECN) from switches to perform active queue management-based congestion control Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 17 / 27
  • 57. Datacenter TCP (DCTCP) Uses Explicit Congestion Notifications (ECN) from switches to perform active queue management-based congestion control Switches set the congestion experienced flag in packets whenever the buffer occupancy exceeds a small threshold Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 17 / 27
  • 58. Datacenter TCP (DCTCP) Uses Explicit Congestion Notifications (ECN) from switches to perform active queue management-based congestion control Switches set the congestion experienced flag in packets whenever the buffer occupancy exceeds a small threshold DCTCP uses this information to reduce the size of the window based on a fraction of the marked packets Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 17 / 27
  • 59. Datacenter TCP (DCTCP) Uses Explicit Congestion Notifications (ECN) from switches to perform active queue management-based congestion control Switches set the congestion experienced flag in packets whenever the buffer occupancy exceeds a small threshold DCTCP uses this information to reduce the size of the window based on a fraction of the marked packets Enables it to react quickly to queue build and avoid buffer pressure Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 17 / 27
  • 60. Multipath TCP (MPTCP) Establishes multiple subflows over different paths between a pair of end-hosts Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 18 / 27
  • 61. Multipath TCP (MPTCP) Establishes multiple subflows over different paths between a pair of end-hosts These subflows operate under a single TCP connection Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 18 / 27
  • 62. Multipath TCP (MPTCP) Establishes multiple subflows over different paths between a pair of end-hosts These subflows operate under a single TCP connection The fraction of the total congestion window for each flow is determined by its speed Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 18 / 27
  • 63. Multipath TCP (MPTCP) Establishes multiple subflows over different paths between a pair of end-hosts These subflows operate under a single TCP connection The fraction of the total congestion window for each flow is determined by its speed Moves traffic away from the most congested paths Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 18 / 27
  • 64. tcpcrypt Backwards compatible enhancement to TCP that aims to efficiently and transparently provide encrypted communication to applications Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 19 / 27
  • 65. tcpcrypt Backwards compatible enhancement to TCP that aims to efficiently and transparently provide encrypted communication to applications Uses a custom key exchange protocol that leverages the TCP options field Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 19 / 27
  • 66. tcpcrypt Backwards compatible enhancement to TCP that aims to efficiently and transparently provide encrypted communication to applications Uses a custom key exchange protocol that leverages the TCP options field Like SSL, to reduce the cost of connection setup for short-lived flows, it enables cryptographic state from one TCP connection to bootstrap subsequent ones Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 19 / 27
  • 67. tcpcrypt Backwards compatible enhancement to TCP that aims to efficiently and transparently provide encrypted communication to applications Uses a custom key exchange protocol that leverages the TCP options field Like SSL, to reduce the cost of connection setup for short-lived flows, it enables cryptographic state from one TCP connection to bootstrap subsequent ones Applications can also be made aware of the presence of tcpcrypt to negate redundant encryption Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 19 / 27
  • 68. Deadline-Driven Delivery (D3 ) Targets applications with distributed workflow and latency targets Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 20 / 27
  • 69. Deadline-Driven Delivery (D3 ) Targets applications with distributed workflow and latency targets Such applications associate a deadline with each network flow and the flow is only useful if the deadline is met Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 20 / 27
  • 70. Deadline-Driven Delivery (D3 ) Targets applications with distributed workflow and latency targets Such applications associate a deadline with each network flow and the flow is only useful if the deadline is met Applications expose flow deadline and size information which is exploited by end hosts to request rates from routers along the data path Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 20 / 27
  • 71. Outline 1 Datacenter Topologies 2 Transport Protocols 3 Network Sharing 4 Wrapping Up Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 21 / 27
  • 72. Introduction Network resources are shared amongst the tenants, which can lead to contention and other undesired behaviour Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 22 / 27
  • 73. Introduction Network resources are shared amongst the tenants, which can lead to contention and other undesired behaviour Network performance isolation between tenants can be an important tool for: Minimizing disruption from legitimate tenants that run network-intensive workloads Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 22 / 27
  • 74. Introduction Network resources are shared amongst the tenants, which can lead to contention and other undesired behaviour Network performance isolation between tenants can be an important tool for: Minimizing disruption from legitimate tenants that run network-intensive workloads Protecting against malicious tenants that launch DoS attacks Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 22 / 27
  • 75. Introduction Network resources are shared amongst the tenants, which can lead to contention and other undesired behaviour Network performance isolation between tenants can be an important tool for: Minimizing disruption from legitimate tenants that run network-intensive workloads Protecting against malicious tenants that launch DoS attacks The standard methodology to ensure isolation is to use VLANs Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 22 / 27
  • 76. Virtual LAN Acts like an ordinary LAN but end-hosts do no necessarily have to be physically connected to the same segment Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 23 / 27
  • 77. Virtual LAN Acts like an ordinary LAN but end-hosts do no necessarily have to be physically connected to the same segment Nodes are grouped together by the VLAN Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 23 / 27
  • 78. Virtual LAN Acts like an ordinary LAN but end-hosts do no necessarily have to be physically connected to the same segment Nodes are grouped together by the VLAN Broadcasts can also be sent within the same VLAN Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 23 / 27
  • 79. Virtual LAN Acts like an ordinary LAN but end-hosts do no necessarily have to be physically connected to the same segment Nodes are grouped together by the VLAN Broadcasts can also be sent within the same VLAN VLAN membership information is inserted into Ethernet frames Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 23 / 27
  • 80. Rate-limiting End-hosts In Xen the network bandwidth available to each domU can be rate limited Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 24 / 27
  • 81. Rate-limiting End-hosts In Xen the network bandwidth available to each domU can be rate limited Can be used to implement basic QoS Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 24 / 27
  • 82. Rate-limiting End-hosts In Xen the network bandwidth available to each domU can be rate limited Can be used to implement basic QoS The virtual interface is simply rate-limited Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 24 / 27
  • 83. Outline 1 Datacenter Topologies 2 Transport Protocols 3 Network Sharing 4 Wrapping Up Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 25 / 27
  • 84. The End In reverse order: 1 Cloud stacks be used to turn clusters and datacenters into private and public clouds Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 26 / 27
  • 85. The End In reverse order: 1 Cloud stacks be used to turn clusters and datacenters into private and public clouds 2 Virtualization of computation, storage, and networking can allow many tenants to co-exist Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 26 / 27
  • 86. The End In reverse order: 1 Cloud stacks be used to turn clusters and datacenters into private and public clouds 2 Virtualization of computation, storage, and networking can allow many tenants to co-exist 3 Most data does not fit the relational model and is more suited for NoSQL stores Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 26 / 27
  • 87. The End In reverse order: 1 Cloud stacks be used to turn clusters and datacenters into private and public clouds 2 Virtualization of computation, storage, and networking can allow many tenants to co-exist 3 Most data does not fit the relational model and is more suited for NoSQL stores 4 Data-intensive, task-parallel frameworks abstract away the details of distribution, work allocation, sychronization, concurreny, and communication; Perfect match for the cloud Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 26 / 27
  • 88. The End In reverse order: 1 Cloud stacks be used to turn clusters and datacenters into private and public clouds 2 Virtualization of computation, storage, and networking can allow many tenants to co-exist 3 Most data does not fit the relational model and is more suited for NoSQL stores 4 Data-intensive, task-parallel frameworks abstract away the details of distribution, work allocation, sychronization, concurreny, and communication; Perfect match for the cloud 5 The future is Big Data and Cloud Computing! Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 26 / 27
  • 89. References 1 Mohammad Al-Fares, Alexander Loukissas, and Amin Vahdat. 2008. A scalable, commodity data center network architecture. In Proceedings of the ACM SIGCOMM 2008 conference on Data communication (SIGCOMM ’08). ACM, New York, NY, USA, 63-74. 2 Chuanxiong Guo, Haitao Wu, Kun Tan, Lei Shi, Yongguang Zhang, and Songwu Lu. 2008. Dcell: a scalable and fault-tolerant network structure for data centers. In Proceedings of the ACM SIGCOMM 2008 conference on Data communication (SIGCOMM ’08). ACM, New York, NY, USA, 75-86. Zubair Nabi 15: Datacenter Design and Networking April 21, 2013 27 / 27