Implementing a layer 2 framework on linux network

2,901 views
2,799 views

Published on

Published in: Technology
0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,901
On SlideShare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
27
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide

Implementing a layer 2 framework on linux network

  1. 1. Takuya ASADA<syuu@dokukino.com> @syuu1228
  2. 2.  I was in embedded software company, worked on SMP support for router firmware Ph. D. Student of Tokyo University of Technology, researching improvement network I/O architecture on modern x86 servers Interested in: SMP, Network, Virtualization GSoC ’11(FreeBSD) Multithread support for BPF GSoC ’12(FreeBSD) BIOS support for BHyVe Research assistant at IIJ research laboratory, implementing BCube for Linux Today’s topic!
  3. 3.  BCube is a new network architecture Designed for shipping-container based modular data centers Server-centric network structure ◦ Server act as  End hosts  Relay nodes for each other The paper published in ACM SIGCOMM ’09 by Microsoft Research Asia
  4. 4.  Each server has one connection to each layers Switches never connect to other switches Servers relay traffic for each other 2,0 2,1 2,0 2,1 1,0 1,1 1,0 1,1 0,0 0,1 0,0 0,1 000 001 010 011 100 101 110 111 switch Bcube0 Bcube1 server Bcube2
  5. 5.  𝐵𝐶𝑢𝑏𝑒 𝑘 has k + 1 layers 𝐵𝐶𝑢𝑏𝑒 𝑥 contains n 𝐵𝐶𝑢𝑏𝑒 𝑥−1 𝐵𝐶𝑢𝑏𝑒0 contains n servers Total servers = 𝑛 𝑘+1 2,0 2,1 2,0 2,1 1,0 1,1 1,0 1,1 0,0 0,1 0,0 0,1 000 001 010 011 100 101 110 111 switch Bcube0 Bcube1 server Bcube2
  6. 6.  High network capacity for various traffic patterns ◦ one-to-one ◦ one-to-all ◦ one-to-several ◦ all-to-all Performance degrades gracefully as servers/switches failure increases Doesn’t need special hardware, only use commodity switch
  7. 7.  Each server has unique BCube address Each digit pointed port number of switch in the layer 2,0 2,1 2,0 2,1 1,0 1,1 1,0 1,1 0,0 0,1 0,0 0,1 000 001 010 011 100 101 110 111 Bcube0 switch Bcube1 server Bcube2
  8. 8.  Default routing rule ◦ Top layer→Bottom layer ◦ Ex: Route from 000 to 111 000 →100 →110 →111 2,0 2,1 2,0 2,1 1,0 1,1 1,0 1,1 0,0 0,1 0,0 0,1 000 001 010 011 100 101 110 111 Bcube0 Bcube1 Bcube2
  9. 9.  There are alternate routes between any nodes Can bypass failure servers and switches Also can use acceralate throughput to parallelize traffic 2,0 2,1 2,0 2,1 1,0 1,1 1,0 1,1 0,0 0,1 0,0 0,1 000 001 010 011 100 101 110 111 Bcube0 Bcube1 Bcube2
  10. 10.  Source server decides the best path for a flow Bypass failure paths To propagate routing path, source server writes routing path information on packet header
  11. 11.  Add BCube header between Ethernet header and IP header Has src/dst address and also routing path information on “Next Hop Index Array” Ethernet Header BCube dest address BCube source address BCube Header Protocol type IP Header Next Hop Index Array
  12. 12.  Evaluating various "Data Center Network" technologies, especially for container- moduler datacenter architecture. BCube is one of the candidate.
  13. 13.  Try to use existing code as much as possible Minimum implementation at first BCube binds multiple interface, assigns a BCube address and an IP address What is the most similar function which already existing on Linux? →Bridge! ◦ Forked bridge.ko and brctl command, named bcube.ko and bcctl command
  14. 14.  brctl addbr <bridge> brctl delbr <bridge> ↓ bcctl addbc <bcube> <bcaddr> <N> <K> bcctl delbc <bcube> Modified addbr/delbr, add 3 args ◦ BCube address ◦ n and k parameter Use MAC address format/size for BCube address 101 → 00:00:01:00:01 Use BCube address for HW address of BCube device ◦ It works like fake MAC address on Linux network stack
  15. 15.  brctl addif <bridge> <device> brctl delif <bridge> <device> ↓ bcctl assignif <bcube> <layer> <device> bcctl unassignif <bcube> <layer> <device> Modified assignif / unassignif command, add layer number on args
  16. 16.  Need to reconsider address resolution Normal Ethernet ◦ IP Address → MAC Address (ARP) BCube network ◦ IP Address → BCube Address → ARP? ◦ (Neighbor) BCube address → MAC Address → Need additional neighbor discovery protocol
  17. 17.  Once broadcast works on BCube implementation, ARP should work on it But I haven’t implemented it yet, decided to configure manually by following command: arp –i bc0 –s 10.0.0.6 00:00:00:01:00:10
  18. 18.  Need an ARP like protocol Decided to configure manually too, implemented following command: bcctl addneighbour <bcube> <layer> <bcaddr> <macaddr> bcctl delneighbour <bcube> <layer> <bcaddr> bcube.ko maintenance neighbor table, use it in packet transmitting/forwarding
  19. 19.  In bridge.ko, it maintenance FDB(forwarding database) to lookup destination MAC address→output port using hash table Deleted FDB, implemented function to decide next hop BCube address, output port, and MAC address of next hop Haven’t implemented source routing – just default routing for now
  20. 20.  Top layer→Bottom layer Ex: Route from 000 to 111 000 →100 →110 →111 2,0 2,1 2,0 2,1 1,0 1,1 1,0 1,1 0,0 0,1 0,0 0,1 000 001 010 011 100 101 110 111 Bcube0 Bcube1 Bcube2
  21. 21.  To add BCube Header between Ethernet Header and IP header, I forked net/ethernet/eth.c ETH_HLEN (14byte) → BCUBE_HLEN (24byte) struct ethhdr (MAC header) → struct bcubehdr (MAC & BCube header) eth_header_ops → bc_header_ops To handle Bcube Header Unfortunately GRO accesses ethernet header directly, and it works before BCube handles a packet – need to disable it
  22. 22.  Found a way to implement new L2 framework using existing bridge implementation ◦ Lot more easy than implement it from scrach Development Status ◦ Implemented basic features, debugging now ◦ Will consider to add more features  broadcast / multicast  Intermediate node/switch failure detection, change the routing  source routing  address resolution protocol Planing more detail evaluation in our data center testbed Any comments and suggestions are welcome 
  23. 23. This work was done as part of researchassistance work at IIJ research laboratory.

×