The document discusses Takuya Asada's work implementing BCube, a new network architecture designed for modular data centers, for Linux. Some key points:
1) BCube uses servers as both end hosts and relay nodes, allowing for high network capacity and graceful performance degradation as servers/switches fail.
2) Asada leveraged existing Linux bridge code as a starting point, modifying it to support BCube addressing and routing.
3) Current implementation supports basic routing but not more advanced features like source routing or failure handling; evaluation on a testbed is planned.
2. I was in embedded software company,
worked on SMP support for router firmware
Ph. D. Student of Tokyo University of Technology,
researching improvement network I/O
architecture on modern x86 servers
Interested in: SMP, Network, Virtualization
GSoC ’11(FreeBSD) Multithread support for BPF
GSoC ’12(FreeBSD) BIOS support for BHyVe
Research assistant at IIJ research laboratory,
implementing BCube for Linux
Today’s topic!
3. BCube is a new network architecture
Designed for shipping-container based
modular data centers
Server-centric network structure
◦ Server act as
End hosts
Relay nodes for each other
The paper published in ACM SIGCOMM ’09 by
Microsoft Research Asia
4. Each server has one connection to each layers
Switches never connect to other switches
Servers relay traffic for each other
2,0 2,1 2,0 2,1
1,0 1,1 1,0 1,1
0,0 0,1 0,0 0,1
000 001 010 011 100 101 110 111
switch
Bcube0
Bcube1 server
Bcube2
5. 𝐵𝐶𝑢𝑏𝑒 𝑘 has k + 1 layers
𝐵𝐶𝑢𝑏𝑒 𝑥 contains n 𝐵𝐶𝑢𝑏𝑒 𝑥−1
𝐵𝐶𝑢𝑏𝑒0 contains n servers
Total servers = 𝑛 𝑘+1
2,0 2,1 2,0 2,1
1,0 1,1 1,0 1,1
0,0 0,1 0,0 0,1
000 001 010 011 100 101 110 111
switch
Bcube0
Bcube1 server
Bcube2
6. High network capacity for various traffic
patterns
◦ one-to-one
◦ one-to-all
◦ one-to-several
◦ all-to-all
Performance degrades gracefully as
servers/switches failure increases
Doesn’t need special hardware, only use
commodity switch
7. Each server has unique BCube address
Each digit pointed port number of switch in
the layer
2,0 2,1 2,0 2,1
1,0 1,1 1,0 1,1
0,0 0,1 0,0 0,1
000 001 010 011 100 101 110 111
Bcube0
switch
Bcube1 server
Bcube2
9. There are alternate routes between any nodes
Can bypass failure servers and switches
Also can use acceralate throughput to
parallelize traffic
2,0 2,1 2,0 2,1
1,0 1,1 1,0 1,1
0,0 0,1 0,0 0,1
000 001 010 011 100 101 110 111
Bcube0
Bcube1
Bcube2
10. Source server decides the best path for a flow
Bypass failure paths
To propagate routing path, source server
writes routing path information on packet
header
11. Add BCube header between Ethernet header
and IP header
Has src/dst address and also routing path
information on “Next Hop Index Array”
Ethernet Header
BCube dest address
BCube source address
BCube Header
Protocol type
IP Header Next Hop Index Array
12. Evaluating various "Data Center Network"
technologies, especially for container-
moduler datacenter architecture.
BCube is one of the candidate.
13. Try to use existing code as much as possible
Minimum implementation at first
BCube binds multiple interface,
assigns a BCube address and an IP address
What is the most similar function which
already existing on Linux? →Bridge!
◦ Forked bridge.ko and brctl command,
named bcube.ko and bcctl command
14. brctl addbr <bridge>
brctl delbr <bridge>
↓
bcctl addbc <bcube> <bcaddr> <N> <K>
bcctl delbc <bcube>
Modified addbr/delbr, add 3 args
◦ BCube address
◦ n and k parameter
Use MAC address format/size for BCube address
101 → 00:00:01:00:01
Use BCube address for HW address of BCube
device
◦ It works like fake MAC address on Linux network stack
16. Need to reconsider address resolution
Normal Ethernet
◦ IP Address → MAC Address (ARP)
BCube network
◦ IP Address → BCube Address
→ ARP?
◦ (Neighbor) BCube address → MAC Address
→ Need additional neighbor discovery protocol
17. Once broadcast works on BCube
implementation, ARP should work on it
But I haven’t implemented it yet, decided to
configure manually by following command:
arp –i bc0 –s 10.0.0.6 00:00:00:01:00:10
18. Need an ARP like protocol
Decided to configure manually too,
implemented following command:
bcctl addneighbour <bcube> <layer>
<bcaddr> <macaddr>
bcctl delneighbour <bcube> <layer>
<bcaddr>
bcube.ko maintenance neighbor table, use it
in packet transmitting/forwarding
19. In bridge.ko, it maintenance FDB(forwarding
database) to lookup destination MAC
address→output port using hash table
Deleted FDB, implemented function to decide
next hop BCube address, output port, and
MAC address of next hop
Haven’t implemented source routing – just
default routing for now
21. To add BCube Header between Ethernet Header
and IP header, I forked net/ethernet/eth.c
ETH_HLEN (14byte)
→ BCUBE_HLEN (24byte)
struct ethhdr (MAC header)
→ struct bcubehdr (MAC & BCube header)
eth_header_ops → bc_header_ops
To handle Bcube Header
Unfortunately GRO accesses ethernet header
directly, and it works before BCube handles a
packet – need to disable it
22. Found a way to implement new L2 framework
using existing bridge implementation
◦ Lot more easy than implement it from scrach
Development Status
◦ Implemented basic features, debugging now
◦ Will consider to add more features
broadcast / multicast
Intermediate node/switch failure detection, change the
routing
source routing
address resolution protocol
Planing more detail evaluation in our data center
testbed
Any comments and suggestions are welcome
23. This work was done as part of research
assistance work at IIJ research laboratory.