Implementing a layer 2 framework on linux networkPresentation Transcript
Takuya ASADA<firstname.lastname@example.org> @syuu1228
I was in embedded software company, worked on SMP support for router firmware Ph. D. Student of Tokyo University of Technology, researching improvement network I/O architecture on modern x86 servers Interested in: SMP, Network, Virtualization GSoC ’11(FreeBSD) Multithread support for BPF GSoC ’12(FreeBSD) BIOS support for BHyVe Research assistant at IIJ research laboratory, implementing BCube for Linux Today’s topic!
BCube is a new network architecture Designed for shipping-container based modular data centers Server-centric network structure ◦ Server act as End hosts Relay nodes for each other The paper published in ACM SIGCOMM ’09 by Microsoft Research Asia
Each server has one connection to each layers Switches never connect to other switches Servers relay traffic for each other 2,0 2,1 2,0 2,1 1,0 1,1 1,0 1,1 0,0 0,1 0,0 0,1 000 001 010 011 100 101 110 111 switch Bcube0 Bcube1 server Bcube2
𝐵𝐶𝑢𝑏𝑒 𝑘 has k + 1 layers 𝐵𝐶𝑢𝑏𝑒 𝑥 contains n 𝐵𝐶𝑢𝑏𝑒 𝑥−1 𝐵𝐶𝑢𝑏𝑒0 contains n servers Total servers = 𝑛 𝑘+1 2,0 2,1 2,0 2,1 1,0 1,1 1,0 1,1 0,0 0,1 0,0 0,1 000 001 010 011 100 101 110 111 switch Bcube0 Bcube1 server Bcube2
High network capacity for various traffic patterns ◦ one-to-one ◦ one-to-all ◦ one-to-several ◦ all-to-all Performance degrades gracefully as servers/switches failure increases Doesn’t need special hardware, only use commodity switch
Each server has unique BCube address Each digit pointed port number of switch in the layer 2,0 2,1 2,0 2,1 1,0 1,1 1,0 1,1 0,0 0,1 0,0 0,1 000 001 010 011 100 101 110 111 Bcube0 switch Bcube1 server Bcube2
There are alternate routes between any nodes Can bypass failure servers and switches Also can use acceralate throughput to parallelize traffic 2,0 2,1 2,0 2,1 1,0 1,1 1,0 1,1 0,0 0,1 0,0 0,1 000 001 010 011 100 101 110 111 Bcube0 Bcube1 Bcube2
Source server decides the best path for a flow Bypass failure paths To propagate routing path, source server writes routing path information on packet header
Add BCube header between Ethernet header and IP header Has src/dst address and also routing path information on “Next Hop Index Array” Ethernet Header BCube dest address BCube source address BCube Header Protocol type IP Header Next Hop Index Array
Evaluating various "Data Center Network" technologies, especially for container- moduler datacenter architecture. BCube is one of the candidate.
Try to use existing code as much as possible Minimum implementation at first BCube binds multiple interface, assigns a BCube address and an IP address What is the most similar function which already existing on Linux? →Bridge! ◦ Forked bridge.ko and brctl command, named bcube.ko and bcctl command
brctl addbr <bridge> brctl delbr <bridge> ↓ bcctl addbc <bcube> <bcaddr> <N> <K> bcctl delbc <bcube> Modified addbr/delbr, add 3 args ◦ BCube address ◦ n and k parameter Use MAC address format/size for BCube address 101 → 00:00:01:00:01 Use BCube address for HW address of BCube device ◦ It works like fake MAC address on Linux network stack
Need to reconsider address resolution Normal Ethernet ◦ IP Address → MAC Address (ARP) BCube network ◦ IP Address → BCube Address → ARP? ◦ (Neighbor) BCube address → MAC Address → Need additional neighbor discovery protocol
Once broadcast works on BCube implementation, ARP should work on it But I haven’t implemented it yet, decided to configure manually by following command: arp –i bc0 –s 10.0.0.6 00:00:00:01:00:10
Need an ARP like protocol Decided to configure manually too, implemented following command: bcctl addneighbour <bcube> <layer> <bcaddr> <macaddr> bcctl delneighbour <bcube> <layer> <bcaddr> bcube.ko maintenance neighbor table, use it in packet transmitting/forwarding
In bridge.ko, it maintenance FDB(forwarding database) to lookup destination MAC address→output port using hash table Deleted FDB, implemented function to decide next hop BCube address, output port, and MAC address of next hop Haven’t implemented source routing – just default routing for now
To add BCube Header between Ethernet Header and IP header, I forked net/ethernet/eth.c ETH_HLEN (14byte) → BCUBE_HLEN (24byte) struct ethhdr (MAC header) → struct bcubehdr (MAC & BCube header) eth_header_ops → bc_header_ops To handle Bcube Header Unfortunately GRO accesses ethernet header directly, and it works before BCube handles a packet – need to disable it
Found a way to implement new L2 framework using existing bridge implementation ◦ Lot more easy than implement it from scrach Development Status ◦ Implemented basic features, debugging now ◦ Will consider to add more features broadcast / multicast Intermediate node/switch failure detection, change the routing source routing address resolution protocol Planing more detail evaluation in our data center testbed Any comments and suggestions are welcome
This work was done as part of researchassistance work at IIJ research laboratory.