Implementing a layer 2 framework on linux network
Upcoming SlideShare
Loading in...5
×
 

Implementing a layer 2 framework on linux network

on

  • 2,675 views

 

Statistics

Views

Total Views
2,675
Slideshare-icon Views on SlideShare
1,629
Embed Views
1,046

Actions

Likes
3
Downloads
15
Comments
0

4 Embeds 1,046

http://d.hatena.ne.jp 735
http://www.pochi.cc 306
http://webcache.googleusercontent.com 4
http://cache.yahoofs.jp 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Implementing a layer 2 framework on linux network Implementing a layer 2 framework on linux network Presentation Transcript

    • Takuya ASADA<syuu@dokukino.com> @syuu1228
    •  I was in embedded software company, worked on SMP support for router firmware Ph. D. Student of Tokyo University of Technology, researching improvement network I/O architecture on modern x86 servers Interested in: SMP, Network, Virtualization GSoC ’11(FreeBSD) Multithread support for BPF GSoC ’12(FreeBSD) BIOS support for BHyVe Research assistant at IIJ research laboratory, implementing BCube for Linux Today’s topic!
    •  BCube is a new network architecture Designed for shipping-container based modular data centers Server-centric network structure ◦ Server act as  End hosts  Relay nodes for each other The paper published in ACM SIGCOMM ’09 by Microsoft Research Asia
    •  Each server has one connection to each layers Switches never connect to other switches Servers relay traffic for each other 2,0 2,1 2,0 2,1 1,0 1,1 1,0 1,1 0,0 0,1 0,0 0,1 000 001 010 011 100 101 110 111 switch Bcube0 Bcube1 server Bcube2
    •  𝐵𝐶𝑢𝑏𝑒 𝑘 has k + 1 layers 𝐵𝐶𝑢𝑏𝑒 𝑥 contains n 𝐵𝐶𝑢𝑏𝑒 𝑥−1 𝐵𝐶𝑢𝑏𝑒0 contains n servers Total servers = 𝑛 𝑘+1 2,0 2,1 2,0 2,1 1,0 1,1 1,0 1,1 0,0 0,1 0,0 0,1 000 001 010 011 100 101 110 111 switch Bcube0 Bcube1 server Bcube2
    •  High network capacity for various traffic patterns ◦ one-to-one ◦ one-to-all ◦ one-to-several ◦ all-to-all Performance degrades gracefully as servers/switches failure increases Doesn’t need special hardware, only use commodity switch
    •  Each server has unique BCube address Each digit pointed port number of switch in the layer 2,0 2,1 2,0 2,1 1,0 1,1 1,0 1,1 0,0 0,1 0,0 0,1 000 001 010 011 100 101 110 111 Bcube0 switch Bcube1 server Bcube2
    •  Default routing rule ◦ Top layer→Bottom layer ◦ Ex: Route from 000 to 111 000 →100 →110 →111 2,0 2,1 2,0 2,1 1,0 1,1 1,0 1,1 0,0 0,1 0,0 0,1 000 001 010 011 100 101 110 111 Bcube0 Bcube1 Bcube2
    •  There are alternate routes between any nodes Can bypass failure servers and switches Also can use acceralate throughput to parallelize traffic 2,0 2,1 2,0 2,1 1,0 1,1 1,0 1,1 0,0 0,1 0,0 0,1 000 001 010 011 100 101 110 111 Bcube0 Bcube1 Bcube2
    •  Source server decides the best path for a flow Bypass failure paths To propagate routing path, source server writes routing path information on packet header
    •  Add BCube header between Ethernet header and IP header Has src/dst address and also routing path information on “Next Hop Index Array” Ethernet Header BCube dest address BCube source address BCube Header Protocol type IP Header Next Hop Index Array
    •  Evaluating various "Data Center Network" technologies, especially for container- moduler datacenter architecture. BCube is one of the candidate.
    •  Try to use existing code as much as possible Minimum implementation at first BCube binds multiple interface, assigns a BCube address and an IP address What is the most similar function which already existing on Linux? →Bridge! ◦ Forked bridge.ko and brctl command, named bcube.ko and bcctl command
    •  brctl addbr <bridge> brctl delbr <bridge> ↓ bcctl addbc <bcube> <bcaddr> <N> <K> bcctl delbc <bcube> Modified addbr/delbr, add 3 args ◦ BCube address ◦ n and k parameter Use MAC address format/size for BCube address 101 → 00:00:01:00:01 Use BCube address for HW address of BCube device ◦ It works like fake MAC address on Linux network stack
    •  brctl addif <bridge> <device> brctl delif <bridge> <device> ↓ bcctl assignif <bcube> <layer> <device> bcctl unassignif <bcube> <layer> <device> Modified assignif / unassignif command, add layer number on args
    •  Need to reconsider address resolution Normal Ethernet ◦ IP Address → MAC Address (ARP) BCube network ◦ IP Address → BCube Address → ARP? ◦ (Neighbor) BCube address → MAC Address → Need additional neighbor discovery protocol
    •  Once broadcast works on BCube implementation, ARP should work on it But I haven’t implemented it yet, decided to configure manually by following command: arp –i bc0 –s 10.0.0.6 00:00:00:01:00:10
    •  Need an ARP like protocol Decided to configure manually too, implemented following command: bcctl addneighbour <bcube> <layer> <bcaddr> <macaddr> bcctl delneighbour <bcube> <layer> <bcaddr> bcube.ko maintenance neighbor table, use it in packet transmitting/forwarding
    •  In bridge.ko, it maintenance FDB(forwarding database) to lookup destination MAC address→output port using hash table Deleted FDB, implemented function to decide next hop BCube address, output port, and MAC address of next hop Haven’t implemented source routing – just default routing for now
    •  Top layer→Bottom layer Ex: Route from 000 to 111 000 →100 →110 →111 2,0 2,1 2,0 2,1 1,0 1,1 1,0 1,1 0,0 0,1 0,0 0,1 000 001 010 011 100 101 110 111 Bcube0 Bcube1 Bcube2
    •  To add BCube Header between Ethernet Header and IP header, I forked net/ethernet/eth.c ETH_HLEN (14byte) → BCUBE_HLEN (24byte) struct ethhdr (MAC header) → struct bcubehdr (MAC & BCube header) eth_header_ops → bc_header_ops To handle Bcube Header Unfortunately GRO accesses ethernet header directly, and it works before BCube handles a packet – need to disable it
    •  Found a way to implement new L2 framework using existing bridge implementation ◦ Lot more easy than implement it from scrach Development Status ◦ Implemented basic features, debugging now ◦ Will consider to add more features  broadcast / multicast  Intermediate node/switch failure detection, change the routing  source routing  address resolution protocol Planing more detail evaluation in our data center testbed Any comments and suggestions are welcome 
    • This work was done as part of researchassistance work at IIJ research laboratory.