Assuming that majority of in-cloud networking is Ethernet-based at least at departure and entry points, it is widely recognized that TCP/UDP communications fail to achieve the necessary throughput during bulk transfers. While modern switches support maximum achievable throughput via the cut-through mode of operation, the practical benefit of this mode is diminished when the network is contended by multiple communication parties. This research removes this problem by implementing circuits-over-packets emulation. Circuits are simply optimal schedules for communication sessions where each session gets exclusive access to the network. Transfer of chunks of Big Data, pieces of storage, VM images, etc. all fall under the category of bulk transfers.
Circuit Emulation for Bulk Transfers in Distributed Storage and Clouds
1.
2. .
Setting the Mood
It's time to get rid of TCP/UDP protocols in DCs
DCs/Clouds are closed worlds, brand new technologies are OK
with bulk transfers (BigData, ...), the business value of a TCP/
UDP alternative is high
circuits are an alternative to packets
M.Zhanikeev -- maratishe@gmail.com -- Circuit Emulation for Bulk Transfers in Dist. Storage and Clouds -- http://bit.do/marat140903 2/32
.
2/32
3. .
Ethernet is the Best
.
Ethernet...
.
... is the cheapest and most available technology with e2e
support
.
Fiber Channel (FC), SATA, etc. require expensive hardware, low
compatibility, no e2e support
FCoE = Ethernet, same problems, expensive hardware, no e2e support
network virtualization is best fit for Ethernet
disclaimer: one of proposed models will work with optical networks as
well
M.Zhanikeev -- maratishe@gmail.com -- Circuit Emulation for Bulk Transfers in Dist. Storage and Clouds -- http://bit.do/marat140903 3/32
.
3/32
4. .
Ethernet is the Worst
.
Ethernet...
.
.... is the worst technology in terms of throughput
CSMA/CD is the biggest throughput limitation
not in modern switches, but still major problem in wireless
contention problem cannot be easily resolved
same applies to OBS/OPS optical technologies
M.Zhanikeev -- maratishe@gmail.com -- Circuit Emulation for Bulk Transfers in Dist. Storage and Clouds -- http://bit.do/marat140903 4/32
.
4/32
5. .
Ethernet Contention
M.Zhanikeev -- maratishe@gmail.com -- Circuit Emulation for Bulk Transfers in Dist. Storage and Clouds -- http://bit.do/marat140903 5/32
.
5/32
6. .
Ethernet and Contention
whaterver you do, Ethernet L2 domains cannot avoid contention
Switch Switch
Qualitatively
Identical
M.Zhanikeev -- maratishe@gmail.com -- Circuit Emulation for Bulk Transfers in Dist. Storage and Clouds -- http://bit.do/marat140903 6/32
.
6/32
7. .
Parallel vs Sequential (2 flows)
20 24 28 32 36 40
Transfer time in contention (s)
40
36
32
28
24
20
Transfer time by exclusive circuits (s)
M.Zhanikeev -- maratishe@gmail.com -- Circuit Emulation for Bulk Transfers in Dist. Storage and Clouds -- http://bit.do/marat140903 7/32
.
7/32
8. .
Ethernet Switches : Basic Facts
cut-through versus store-and-forward
cut-through is 10..15x better
Cisco has advanced cut-through : +bytes versus routing decision tradeoff
store-and-forward is subjected to QoS classes
L3 DSCP versus L2 CoS, AF, EF, BE, SBE models
M.Zhanikeev -- maratishe@gmail.com -- Circuit Emulation for Bulk Transfers in Dist. Storage and Clouds -- http://bit.do/marat140903 8/32
.
8/32
9. .
Switchess : Modeling
QoS
classes
Check,
etc. Q: Queue D: Drop
C: Cut Through
M.Zhanikeev -- maratishe@gmail.com -- Circuit Emulation for Bulk Transfers in Dist. Storage and Clouds -- http://bit.do/marat140903 9/32
.
9/32
10. .
Proposal
M.Zhanikeev -- maratishe@gmail.com -- Circuit Emulation for Bulk Transfers in Dist. Storage and Clouds -- http://bit.do/marat140903 10/32
.
10/32
11. .
Proposal : Circuits
.
Circuits
.
... are emulations which allow for exclusive access to L2 domain by
individual parties
.
circuits-over-packets emulation
cut-through mode for each circuit is guaranteed
highest possible throughput
NOTE: will work with cheepest switches
NOTE2: applies to optical networks as well (L2=lightpaths)
M.Zhanikeev -- maratishe@gmail.com -- Circuit Emulation for Bulk Transfers in Dist. Storage and Clouds -- http://bit.do/marat140903 11/32
.
11/32
12. .
Implementation : 2 cases
left: book-then-send, right: separate control layer
NOC
SWITCH
Storage
Node A
Storage
Node B
Step 1:
Book
session
Step 2:
Transfer
bulk
SWITCH
Booking
segment
Storage
Node A
Storage
Node B
SWITCH
Bulk
Segment
M.Zhanikeev -- maratishe@gmail.com -- Circuit Emulation for Bulk Transfers in Dist. Storage and Clouds -- http://bit.do/marat140903 12/32
.
12/32
13. .
Impl.: Centralized Case
NOC
SWITCH
Storage
Node A
Storage
Node B
Step 1:
Book
session
Step 2:
Transfer
bulk
same network for booking and
circuits
inefficient but still valid/practical
legacy-compatible,
partial implementation, etc.
M.Zhanikeev -- maratishe@gmail.com -- Circuit Emulation for Bulk Transfers in Dist. Storage and Clouds -- http://bit.do/marat140903 13/32
.
13/32
14. .
Impl.: Distributed Case
SWITCH
Booking
segment
Storage
Node A
Storage
Node B
SWITCH
Bulk
Segment
book on one network, send on another
legacy-incompatible
contention-sensing possible →
fully distributed models
can also use sensing and
contention control
M.Zhanikeev -- maratishe@gmail.com -- Circuit Emulation for Bulk Transfers in Dist. Storage and Clouds -- http://bit.do/marat140903 14/32
.
14/32
15. .
Optimization
M.Zhanikeev -- maratishe@gmail.com -- Circuit Emulation for Bulk Transfers in Dist. Storage and Clouds -- http://bit.do/marat140903 15/32
.
15/32
16. .
Optimization : Basics
same for distributed and centralized models
does not matter, optimization shows the overall utility of a heuristic
practical optimization = formulation + heuristic
given: demand matrix
expected result: a routing table mapping demand to topology
M.Zhanikeev -- maratishe@gmail.com -- Circuit Emulation for Bulk Transfers in Dist. Storage and Clouds -- http://bit.do/marat140903 16/32
.
16/32
17. .
Optimization : Basics
M.Zhanikeev -- maratishe@gmail.com -- Circuit Emulation for Bulk Transfers in Dist. Storage and Clouds -- http://bit.do/marat140903 17/32
.
17/32
18. .
Optim. : OSPF → tuple notation
OSPF is traditional in such optimizations, but too rigid for many practical cases
too complex for lightpaths in optical networks
no good heuristics for complex topologies
OSPF notation is not very convinient
1. capacity constraints
2. flow preservation
3. contention/congestion metrics
alternative: tuples ... for example ⟨s; d; v; t⟩ defines demand of traffic
volume v at time t from source s to destionation d
this notation is much more flexible for several coming formulations
M.Zhanikeev -- maratishe@gmail.com -- Circuit Emulation for Bulk Transfers in Dist. Storage and Clouds -- http://bit.do/marat140903 18/32
.
18/32
19. .
Optim. : Basic Tuple Notation
nodes: source s, destination: d and others a; b; c
individual demand tuple Ti = ⟨s; d; v; t⟩
lightpath for optical networks
time t, can be start time, start and end of a period, etc.
we do not care about utility so far, just the notation, but utility is obvious in
most cases
!means results in... or leads to...
M.Zhanikeev -- maratishe@gmail.com -- Circuit Emulation for Bulk Transfers in Dist. Storage and Clouds -- http://bit.do/marat140903 19/32
.
19/32
20. .
tOSPF : Traditional OSPF
Ti = ⟨s; d; v; t⟩ ! ⟨s; a; b; :::; d⟩
.
Externals
.
Using demand matrix, creates a set of per-link
weights, which define a unique route for each
.
demand item.
.
Internals
.
Per-link capacity constraint, in/out flow
conservation constraint, unstable for large
topologies and demand matrices
.
s source
d destination
a; b; c; ::: intermediate
nodes on e2e paths/routes
M.Zhanikeev -- maratishe@gmail.com -- Circuit Emulation for Bulk Transfers in Dist. Storage and Clouds -- http://bit.do/marat140903 20/32
.
20/32
21. .
oOSPF : Optical OSPF w/out Switching
Ti = ⟨s; d; v; t⟩ ! ⟨s; ⟩
.
Externals
.
Using demand matrix, maps each demand item on
isolated .
lightpath
.
Internals
.
Simple but inefficient because the number of
e2e lightpaths is small
.
s source
d destination
a wavelength for a fixed e2e
lightpath from s to destination
d
M.Zhanikeev -- maratishe@gmail.com -- Circuit Emulation for Bulk Transfers in Dist. Storage and Clouds -- http://bit.do/marat140903 21/32
.
21/32
22. .
oOSPFs : Optical OSPF with Switching
Ti = ⟨s; d; v; t⟩ ! ⟨s; s; a; b; :::⟩
.
Externals
.
Using demand matrix, maps each demand item on a
route .
of wavelengths
.
Internals
.
Efficient, but suffers from the same problems
as traditional OSPF
.
s source
d destination
x an exit wavelength at a
given node x
M.Zhanikeev -- maratishe@gmail.com -- Circuit Emulation for Bulk Transfers in Dist. Storage and Clouds -- http://bit.do/marat140903 22/32
.
22/32
23. .
Proposal : Sensing Formulation
Ti = ⟨s; d; v; t1; t2⟩ ! ⟨s; ; t⟩
.
Externals
.
Using a matrix of loosely scheduled demand, create
a schedule of sequential sessions with
.
exlusive access to paths
.
Internals
.
Same approach for Ethernet (one wavelength) and
optical networks
.
s source
d destination
t1 and t2 are
user-preferred range for
the start of a session, a value t is picked between them
M.Zhanikeev -- maratishe@gmail.com -- Circuit Emulation for Bulk Transfers in Dist. Storage and Clouds -- http://bit.do/marat140903 23/32
.
23/32
24. .
Heuristics
M.Zhanikeev -- maratishe@gmail.com -- Circuit Emulation for Bulk Transfers in Dist. Storage and Clouds -- http://bit.do/marat140903 24/32
.
24/32
25. .
Centralized Case
NOC
SWITCH
Storage
Node A
Storage
Node B
Step 1:
Book
session
Step 2:
Transfer
bulk
all optimization formulations except
sensing
very close to traditional OSPF
same problems as in OSPF
the biggest problem is to know
demand matrix in advance
M.Zhanikeev -- maratishe@gmail.com -- Circuit Emulation for Bulk Transfers in Dist. Storage and Clouds -- http://bit.do/marat140903 25/32
.
25/32
26. .
Distributed Case
SWITCH
Booking
segment
Storage
Node A
Storage
Node B
SWITCH
Bulk
Segment
can be used for all formulations
pefectly suited for the Sensing
formulation
M.Zhanikeev -- maratishe@gmail.com -- Circuit Emulation for Bulk Transfers in Dist. Storage and Clouds -- http://bit.do/marat140903 26/32
.
26/32
27. .
The Sensing Model
contention methods in wireless and OBS will work
in practice: sensing can beSNMP-like feedback on gate's status
no sync among users is necessary
same model for Ethernet (+virtual nets) and optical networks
main advantage: the offload, no need to implement funny OSPF
heuristics
M.Zhanikeev -- maratishe@gmail.com -- Circuit Emulation for Bulk Transfers in Dist. Storage and Clouds -- http://bit.do/marat140903 27/32
.
27/32
28. .
Realistic Gate/Sensing Model
an approximate view of JGN
topology
two way = one way + ring
Gates are created at optical/
ethernet border
NOTE: already working for Ethernet
M.Zhanikeev -- maratishe@gmail.com -- Circuit Emulation for Bulk Transfers in Dist. Storage and Clouds -- http://bit.do/marat140903 28/32
.
28/32
29. .
Wrapup
circuit emulation is necessary for effective bulk transfers
up to 40% faster in our lab tests
intra-DC, DC-DC, federations, etc. -- all can benefit from circuits
circuits formulated as OSPF are bad -- a Gate/Sensing model is better
validity: worst case is the existing technology, but upper performance
bound is very high
M.Zhanikeev -- maratishe@gmail.com -- Circuit Emulation for Bulk Transfers in Dist. Storage and Clouds -- http://bit.do/marat140903 29/32
.
29/32
30. .
That’s all, thank you ...
M.Zhanikeev -- maratishe@gmail.com -- Circuit Emulation for Bulk Transfers in Dist. Storage and Clouds -- http://bit.do/marat140903 30/32
.
30/32
31. .
[01] myself (2014)
High Availability Cloud Storage...
NS研
[02] Cisco (2014)
LAN Switching and Wireless, CCNA Exploration Companion Guide
Cisco Press
[03] Cisco (2014)
Cut-Through and Store-and-Forward Ethernet Switching for Low-Latency....
Cisco Press
[04] NetOptics (2014)
Cut-Through Ethernet Switching: A Versatile Resource for Low Latency...
White Paper
[05] Cisco (2006)
QoS: DSCP Classification Guidelines
RFC4594
M.Zhanikeev -- maratishe@gmail.com -- Circuit Emulation for Bulk Transfers in Dist. Storage and Clouds -- http://bit.do/marat140903 30/32
.
30/32
32. .
[06] Cisco (2010)
A Differentiated Services Code Point (DSCP)...
RFC5865
[07] open source (current)
PICA8 Project for Low Latency Virtual Networking
http://www.pica8.com/
M.Zhanikeev -- maratishe@gmail.com -- Circuit Emulation for Bulk Transfers in Dist. Storage and Clouds -- http://bit.do/marat140903 31/32
.
31/32
33. .
Wait-n-Send Model
Response
curve(s)
Bulk size per transmission
Goodput
2 potential distributions in practice
M.Zhanikeev -- maratishe@gmail.com -- Circuit Emulation for Bulk Transfers in Dist. Storage and Clouds -- http://bit.do/marat140903 31/32
.
31/32
34. .
Utility of Waiting (curve)
I called it Wait-n-See
Curve
source waits for some time for
exclusive access --
sensing and accumulating bulk
on timeout, the current bulk
is released at best effort
(fallback)
M.Zhanikeev -- maratishe@gmail.com -- Circuit Emulation for Bulk Transfers in Dist. Storage and Clouds -- http://bit.do/marat140903 32/32
.
32/32