SlideShare a Scribd company logo
1 of 23
Download to read offline
Takuya ASADA<syuu@dokukino.com>
                     @syuu1228
   I was in embedded software company,
    worked on SMP support for router firmware
   Ph. D. Student of Tokyo University of Technology,
    researching improvement network I/O
    architecture on modern x86 servers
   Interested in: SMP, Network, Virtualization
   GSoC ’11(FreeBSD) Multithread support for BPF
   GSoC ’12(FreeBSD) BIOS support for BHyVe
   Research assistant at IIJ research laboratory,
    implementing BCube for Linux

                            Today’s topic!
   BCube is a new network architecture
   Designed for shipping-container based
    modular data centers
   Server-centric network structure
    ◦ Server act as
      End hosts
      Relay nodes for each other
   The paper published in ACM SIGCOMM ’09 by
    Microsoft Research Asia
   Each server has one connection to each layers
   Switches never connect to other switches
   Servers relay traffic for each other
             2,0              2,1              2,0       2,1


             1,0              1,1              1,0       1,1


             0,0              0,1              0,0       0,1




           000 001          010 011          100 101   110 111
                                                                        switch
                   Bcube0
                                    Bcube1                              server
                                                               Bcube2
    𝐵𝐶𝑢𝑏𝑒 𝑘 has k + 1 layers
    𝐵𝐶𝑢𝑏𝑒 𝑥 contains n 𝐵𝐶𝑢𝑏𝑒 𝑥−1
    𝐵𝐶𝑢𝑏𝑒0 contains n servers
   Total servers = 𝑛 𝑘+1
                2,0              2,1              2,0       2,1


                1,0              1,1              1,0       1,1


                0,0              0,1              0,0       0,1




              000 001          010 011          100 101   110 111
                                                                           switch
                      Bcube0
                                       Bcube1                              server
                                                                  Bcube2
   High network capacity for various traffic
    patterns
    ◦   one-to-one
    ◦   one-to-all
    ◦   one-to-several
    ◦   all-to-all
   Performance degrades gracefully as
    servers/switches failure increases
   Doesn’t need special hardware, only use
    commodity switch
   Each server has unique BCube address
   Each digit pointed port number of switch in
    the layer
             2,0              2,1              2,0       2,1


             1,0              1,1              1,0       1,1


             0,0              0,1              0,0       0,1




           000 001          010 011          100 101   110 111
                   Bcube0
                                                                        switch

                                    Bcube1                              server
                                                               Bcube2
   Default routing rule
    ◦ Top layer→Bottom layer
    ◦ Ex: Route from 000 to 111
      000 →100 →110 →111
                2,0              2,1              2,0       2,1


                1,0              1,1              1,0       1,1


                0,0              0,1              0,0       0,1




              000 001          010 011          100 101   110 111
                      Bcube0
                                       Bcube1
                                                                  Bcube2
   There are alternate routes between any nodes
   Can bypass failure servers and switches
   Also can use acceralate throughput to
    parallelize traffic
               2,0              2,1              2,0       2,1


               1,0              1,1              1,0       1,1


               0,0              0,1              0,0       0,1




             000 001          010 011          100 101   110 111
                     Bcube0
                                      Bcube1
                                                                 Bcube2
   Source server decides the best path for a flow
   Bypass failure paths
   To propagate routing path, source server
    writes routing path information on packet
    header
   Add BCube header between Ethernet header
    and IP header
   Has src/dst address and also routing path
    information on “Next Hop Index Array”

          Ethernet Header
                             BCube dest address
                            BCube source address
           BCube Header
                               Protocol type


             IP Header      Next Hop Index Array
   Evaluating various "Data Center Network"
    technologies, especially for container-
    moduler datacenter architecture.
    BCube is one of the candidate.
   Try to use existing code as much as possible
   Minimum implementation at first

   BCube binds multiple interface,
    assigns a BCube address and an IP address
   What is the most similar function which
    already existing on Linux? →Bridge!
    ◦ Forked bridge.ko and brctl command,
      named bcube.ko and bcctl command
   brctl addbr <bridge>
    brctl delbr <bridge>
                        ↓
    bcctl addbc <bcube> <bcaddr> <N> <K>
    bcctl delbc <bcube>
   Modified addbr/delbr, add 3 args
    ◦ BCube address
    ◦ n and k parameter
   Use MAC address format/size for BCube address
                 101   → 00:00:01:00:01
   Use BCube address for HW address of BCube
    device
    ◦ It works like fake MAC address on Linux network stack
   brctl addif <bridge> <device>
    brctl delif <bridge> <device>
                         ↓
    bcctl assignif <bcube> <layer> <device>
    bcctl unassignif <bcube> <layer> <device>
   Modified assignif / unassignif command, add
    layer number on args
   Need to reconsider address resolution
   Normal Ethernet
    ◦ IP Address → MAC Address (ARP)
   BCube network
    ◦ IP Address → BCube Address
      → ARP?
    ◦ (Neighbor) BCube address → MAC Address
      → Need additional neighbor discovery protocol
   Once broadcast works on BCube
    implementation, ARP should work on it
   But I haven’t implemented it yet, decided to
    configure manually by following command:
    arp –i bc0 –s 10.0.0.6 00:00:00:01:00:10
   Need an ARP like protocol
   Decided to configure manually too,
    implemented following command:
    bcctl addneighbour <bcube> <layer>
    <bcaddr> <macaddr>
    bcctl delneighbour <bcube> <layer>
    <bcaddr>
   bcube.ko maintenance neighbor table, use it
    in packet transmitting/forwarding
   In bridge.ko, it maintenance FDB(forwarding
    database) to lookup destination MAC
    address→output port using hash table
   Deleted FDB, implemented function to decide
    next hop BCube address, output port, and
    MAC address of next hop
   Haven’t implemented source routing – just
    default routing for now
   Top layer→Bottom layer
   Ex: Route from 000 to 111
    000 →100 →110 →111

              2,0              2,1              2,0       2,1


              1,0              1,1              1,0       1,1


              0,0              0,1              0,0       0,1




            000 001          010 011          100 101   110 111
                    Bcube0
                                     Bcube1
                                                                Bcube2
   To add BCube Header between Ethernet Header
    and IP header, I forked net/ethernet/eth.c
   ETH_HLEN (14byte)
    → BCUBE_HLEN (24byte)
   struct ethhdr (MAC header)
    → struct bcubehdr (MAC & BCube header)
   eth_header_ops → bc_header_ops
    To handle Bcube Header
   Unfortunately GRO accesses ethernet header
    directly, and it works before BCube handles a
    packet – need to disable it
   Found a way to implement new L2 framework
    using existing bridge implementation
    ◦ Lot more easy than implement it from scrach
   Development Status
    ◦ Implemented basic features, debugging now
    ◦ Will consider to add more features
      broadcast / multicast
      Intermediate node/switch failure detection, change the
       routing
      source routing
      address resolution protocol
   Planing more detail evaluation in our data center
    testbed
   Any comments and suggestions are welcome 
This work was done as part of research
assistance work at IIJ research laboratory.

More Related Content

Viewers also liked

Fotos increíbles
Fotos increíblesFotos increíbles
Fotos increíbles
Manuel Fal
 
Prioritization to Production
Prioritization to ProductionPrioritization to Production
Prioritization to Production
Boaz Katz
 
Copying Isn’T Cool
Copying Isn’T CoolCopying Isn’T Cool
Copying Isn’T Cool
matt210
 
Onim Nov Supplement High Res
Onim Nov Supplement High ResOnim Nov Supplement High Res
Onim Nov Supplement High Res
cnunnally
 
イマドキなNetwork/IO
イマドキなNetwork/IOイマドキなNetwork/IO
イマドキなNetwork/IO
Takuya ASADA
 

Viewers also liked (17)

Fotos increíbles
Fotos increíblesFotos increíbles
Fotos increíbles
 
Learning Analytics
Learning AnalyticsLearning Analytics
Learning Analytics
 
Fachtagung eCommerce und PIM
Fachtagung eCommerce und PIMFachtagung eCommerce und PIM
Fachtagung eCommerce und PIM
 
Riddor reportable hand injury
Riddor reportable hand injuryRiddor reportable hand injury
Riddor reportable hand injury
 
Prioritization to Production
Prioritization to ProductionPrioritization to Production
Prioritization to Production
 
Copying Isn’T Cool
Copying Isn’T CoolCopying Isn’T Cool
Copying Isn’T Cool
 
Onim Nov Supplement High Res
Onim Nov Supplement High ResOnim Nov Supplement High Res
Onim Nov Supplement High Res
 
G8WAY
G8WAYG8WAY
G8WAY
 
Expo Booking Form Wynyard
Expo Booking Form WynyardExpo Booking Form Wynyard
Expo Booking Form Wynyard
 
Driving And Mobiles Don\'t Mix
Driving And Mobiles Don\'t MixDriving And Mobiles Don\'t Mix
Driving And Mobiles Don\'t Mix
 
RPD Selection Simple Guide Iso 16975 2 Draft
RPD Selection Simple Guide Iso 16975 2 DraftRPD Selection Simple Guide Iso 16975 2 Draft
RPD Selection Simple Guide Iso 16975 2 Draft
 
Vida
VidaVida
Vida
 
Keynote
Keynote Keynote
Keynote
 
イマドキなNetwork/IO
イマドキなNetwork/IOイマドキなNetwork/IO
イマドキなNetwork/IO
 
PCA10 Heres a Scenario For You
PCA10 Heres a Scenario For YouPCA10 Heres a Scenario For You
PCA10 Heres a Scenario For You
 
Designing E-learning for IMPACT Presented by Lars Hyland, Brightwave
Designing E-learning for IMPACT Presented by Lars Hyland, BrightwaveDesigning E-learning for IMPACT Presented by Lars Hyland, Brightwave
Designing E-learning for IMPACT Presented by Lars Hyland, Brightwave
 
Complete Streets Brochures
Complete Streets BrochuresComplete Streets Brochures
Complete Streets Brochures
 

Similar to Implementing a layer 2 framework on linux network

343logic-design-lab-manual-10 esl38-3rd-sem-2011
343logic-design-lab-manual-10 esl38-3rd-sem-2011343logic-design-lab-manual-10 esl38-3rd-sem-2011
343logic-design-lab-manual-10 esl38-3rd-sem-2011
e11ie
 
Ip addressing and_subnetting_workbook (1)
Ip addressing and_subnetting_workbook (1)Ip addressing and_subnetting_workbook (1)
Ip addressing and_subnetting_workbook (1)
edissG
 
Ccna new lab_manual_by_esp_team
Ccna new lab_manual_by_esp_teamCcna new lab_manual_by_esp_team
Ccna new lab_manual_by_esp_team
Raja Mazhar
 

Similar to Implementing a layer 2 framework on linux network (20)

2013/2/1 ゼミ発表 資料
2013/2/1 ゼミ発表 資料2013/2/1 ゼミ発表 資料
2013/2/1 ゼミ発表 資料
 
Automating auto-scaled load balancer based on linux and vm orchestrator
Automating auto-scaled load balancer based on linux and vm orchestratorAutomating auto-scaled load balancer based on linux and vm orchestrator
Automating auto-scaled load balancer based on linux and vm orchestrator
 
National Society of Black Engineers 36th Annual Conference Toronto Presentation
National Society of Black Engineers 36th Annual Conference Toronto PresentationNational Society of Black Engineers 36th Annual Conference Toronto Presentation
National Society of Black Engineers 36th Annual Conference Toronto Presentation
 
Tunnel without tunnel
Tunnel without tunnelTunnel without tunnel
Tunnel without tunnel
 
Switching
SwitchingSwitching
Switching
 
[2018.10.19] Andrew Kong - Tunnel without tunnel (Seminar at OpenStack Korea ...
[2018.10.19] Andrew Kong - Tunnel without tunnel (Seminar at OpenStack Korea ...[2018.10.19] Andrew Kong - Tunnel without tunnel (Seminar at OpenStack Korea ...
[2018.10.19] Andrew Kong - Tunnel without tunnel (Seminar at OpenStack Korea ...
 
Lab 3.5.1 basic frame relay
Lab 3.5.1 basic frame relayLab 3.5.1 basic frame relay
Lab 3.5.1 basic frame relay
 
Olive Introduction for TOI
Olive Introduction for TOIOlive Introduction for TOI
Olive Introduction for TOI
 
D0532025
D0532025D0532025
D0532025
 
Day03
Day03 Day03
Day03
 
Patent Pending Linear Bit Counting Implementations
Patent Pending Linear Bit Counting ImplementationsPatent Pending Linear Bit Counting Implementations
Patent Pending Linear Bit Counting Implementations
 
2014 OpenStack Summit - Neutron OVS to LinuxBridge Migration
2014 OpenStack Summit - Neutron OVS to LinuxBridge Migration2014 OpenStack Summit - Neutron OVS to LinuxBridge Migration
2014 OpenStack Summit - Neutron OVS to LinuxBridge Migration
 
343logic-design-lab-manual-10 esl38-3rd-sem-2011
343logic-design-lab-manual-10 esl38-3rd-sem-2011343logic-design-lab-manual-10 esl38-3rd-sem-2011
343logic-design-lab-manual-10 esl38-3rd-sem-2011
 
Ip addressing and_subnetting_workbook (1)
Ip addressing and_subnetting_workbook (1)Ip addressing and_subnetting_workbook (1)
Ip addressing and_subnetting_workbook (1)
 
IPv6 Static Routes
IPv6 Static RoutesIPv6 Static Routes
IPv6 Static Routes
 
VXLAN with Cumulus
VXLAN with CumulusVXLAN with Cumulus
VXLAN with Cumulus
 
Ccna new lab_manual_by_esp_team
Ccna new lab_manual_by_esp_teamCcna new lab_manual_by_esp_team
Ccna new lab_manual_by_esp_team
 
Limitation of Cloud Networking & Eywa virtual network model for full HA and LB
Limitation of Cloud Networking & Eywa virtual network model for full HA and LBLimitation of Cloud Networking & Eywa virtual network model for full HA and LB
Limitation of Cloud Networking & Eywa virtual network model for full HA and LB
 
Day 15.1 spanningtreeprotocol
Day 15.1 spanningtreeprotocolDay 15.1 spanningtreeprotocol
Day 15.1 spanningtreeprotocol
 
Oct. 14, 2011 webcast ch7 subnets bruce hartpence
Oct. 14, 2011 webcast ch7 subnets bruce hartpenceOct. 14, 2011 webcast ch7 subnets bruce hartpence
Oct. 14, 2011 webcast ch7 subnets bruce hartpence
 

More from Takuya ASADA

Seastar in 歌舞伎座.tech#8「C++初心者会」
Seastar in 歌舞伎座.tech#8「C++初心者会」Seastar in 歌舞伎座.tech#8「C++初心者会」
Seastar in 歌舞伎座.tech#8「C++初心者会」
Takuya ASADA
 
Seastar:高スループットなサーバアプリケーションの為の新しいフレームワーク
Seastar:高スループットなサーバアプリケーションの為の新しいフレームワークSeastar:高スループットなサーバアプリケーションの為の新しいフレームワーク
Seastar:高スループットなサーバアプリケーションの為の新しいフレームワーク
Takuya ASADA
 
僕のIntel nucが起動しないわけがない
僕のIntel nucが起動しないわけがない僕のIntel nucが起動しないわけがない
僕のIntel nucが起動しないわけがない
Takuya ASADA
 
Interrupt Affinityについて
Interrupt AffinityについてInterrupt Affinityについて
Interrupt Affinityについて
Takuya ASADA
 
OSvパンフレット
OSvパンフレットOSvパンフレット
OSvパンフレット
Takuya ASADA
 
BHyVeでOSvを起動したい
〜BIOSがなくてもこの先生きのこるには〜
BHyVeでOSvを起動したい
〜BIOSがなくてもこの先生きのこるには〜BHyVeでOSvを起動したい
〜BIOSがなくてもこの先生きのこるには〜
BHyVeでOSvを起動したい
〜BIOSがなくてもこの先生きのこるには〜
Takuya ASADA
 
「ハイパーバイザの作り方」読書会#2
「ハイパーバイザの作り方」読書会#2「ハイパーバイザの作り方」読書会#2
「ハイパーバイザの作り方」読書会#2
Takuya ASADA
 
「ハイパーバイザの作り方」読書会#1
「ハイパーバイザの作り方」読書会#1「ハイパーバイザの作り方」読書会#1
「ハイパーバイザの作り方」読書会#1
Takuya ASADA
 

More from Takuya ASADA (20)

Seastar in 歌舞伎座.tech#8「C++初心者会」
Seastar in 歌舞伎座.tech#8「C++初心者会」Seastar in 歌舞伎座.tech#8「C++初心者会」
Seastar in 歌舞伎座.tech#8「C++初心者会」
 
Seastar:高スループットなサーバアプリケーションの為の新しいフレームワーク
Seastar:高スループットなサーバアプリケーションの為の新しいフレームワークSeastar:高スループットなサーバアプリケーションの為の新しいフレームワーク
Seastar:高スループットなサーバアプリケーションの為の新しいフレームワーク
 
高スループットなサーバアプリケーションの為の新しいフレームワーク
「Seastar」
高スループットなサーバアプリケーションの為の新しいフレームワーク
「Seastar」高スループットなサーバアプリケーションの為の新しいフレームワーク
「Seastar」
高スループットなサーバアプリケーションの為の新しいフレームワーク
「Seastar」
 
ヤマノススメ〜秋山郷 de ハッカソン〜
ヤマノススメ〜秋山郷 de ハッカソン〜ヤマノススメ〜秋山郷 de ハッカソン〜
ヤマノススメ〜秋山郷 de ハッカソン〜
 
UEFI時代のブートローダ
UEFI時代のブートローダUEFI時代のブートローダ
UEFI時代のブートローダ
 
OSvのご紹介 in 
Java 8 HotSpot meeting
OSvのご紹介 in 
Java 8 HotSpot meetingOSvのご紹介 in 
Java 8 HotSpot meeting
OSvのご紹介 in 
Java 8 HotSpot meeting
 
OSvパンフレット v3
OSvパンフレット v3OSvパンフレット v3
OSvパンフレット v3
 
OSvのご紹介 in OSC2014 Tokyo/Fall
OSvのご紹介 in OSC2014 Tokyo/FallOSvのご紹介 in OSC2014 Tokyo/Fall
OSvのご紹介 in OSC2014 Tokyo/Fall
 
OSv噺
OSv噺OSv噺
OSv噺
 
OSvの概要と実装
OSvの概要と実装OSvの概要と実装
OSvの概要と実装
 
Linux network stack
Linux network stackLinux network stack
Linux network stack
 
Ethernetの受信処理
Ethernetの受信処理Ethernetの受信処理
Ethernetの受信処理
 
Presentation on your terminal
Presentation on your terminalPresentation on your terminal
Presentation on your terminal
 
僕のIntel nucが起動しないわけがない
僕のIntel nucが起動しないわけがない僕のIntel nucが起動しないわけがない
僕のIntel nucが起動しないわけがない
 
Interrupt Affinityについて
Interrupt AffinityについてInterrupt Affinityについて
Interrupt Affinityについて
 
OSvパンフレット
OSvパンフレットOSvパンフレット
OSvパンフレット
 
BHyVeでOSvを起動したい
〜BIOSがなくてもこの先生きのこるには〜
BHyVeでOSvを起動したい
〜BIOSがなくてもこの先生きのこるには〜BHyVeでOSvを起動したい
〜BIOSがなくてもこの先生きのこるには〜
BHyVeでOSvを起動したい
〜BIOSがなくてもこの先生きのこるには〜
 
「ハイパーバイザの作り方」読書会#2
「ハイパーバイザの作り方」読書会#2「ハイパーバイザの作り方」読書会#2
「ハイパーバイザの作り方」読書会#2
 
「ハイパーバイザの作り方」読書会#1
「ハイパーバイザの作り方」読書会#1「ハイパーバイザの作り方」読書会#1
「ハイパーバイザの作り方」読書会#1
 
10GbE時代のネットワークI/O高速化
10GbE時代のネットワークI/O高速化10GbE時代のネットワークI/O高速化
10GbE時代のネットワークI/O高速化
 

Recently uploaded

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 

Recently uploaded (20)

Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 

Implementing a layer 2 framework on linux network

  • 2. I was in embedded software company, worked on SMP support for router firmware  Ph. D. Student of Tokyo University of Technology, researching improvement network I/O architecture on modern x86 servers  Interested in: SMP, Network, Virtualization  GSoC ’11(FreeBSD) Multithread support for BPF  GSoC ’12(FreeBSD) BIOS support for BHyVe  Research assistant at IIJ research laboratory, implementing BCube for Linux Today’s topic!
  • 3. BCube is a new network architecture  Designed for shipping-container based modular data centers  Server-centric network structure ◦ Server act as  End hosts  Relay nodes for each other  The paper published in ACM SIGCOMM ’09 by Microsoft Research Asia
  • 4. Each server has one connection to each layers  Switches never connect to other switches  Servers relay traffic for each other 2,0 2,1 2,0 2,1 1,0 1,1 1,0 1,1 0,0 0,1 0,0 0,1 000 001 010 011 100 101 110 111 switch Bcube0 Bcube1 server Bcube2
  • 5. 𝐵𝐶𝑢𝑏𝑒 𝑘 has k + 1 layers  𝐵𝐶𝑢𝑏𝑒 𝑥 contains n 𝐵𝐶𝑢𝑏𝑒 𝑥−1  𝐵𝐶𝑢𝑏𝑒0 contains n servers  Total servers = 𝑛 𝑘+1 2,0 2,1 2,0 2,1 1,0 1,1 1,0 1,1 0,0 0,1 0,0 0,1 000 001 010 011 100 101 110 111 switch Bcube0 Bcube1 server Bcube2
  • 6. High network capacity for various traffic patterns ◦ one-to-one ◦ one-to-all ◦ one-to-several ◦ all-to-all  Performance degrades gracefully as servers/switches failure increases  Doesn’t need special hardware, only use commodity switch
  • 7. Each server has unique BCube address  Each digit pointed port number of switch in the layer 2,0 2,1 2,0 2,1 1,0 1,1 1,0 1,1 0,0 0,1 0,0 0,1 000 001 010 011 100 101 110 111 Bcube0 switch Bcube1 server Bcube2
  • 8. Default routing rule ◦ Top layer→Bottom layer ◦ Ex: Route from 000 to 111 000 →100 →110 →111 2,0 2,1 2,0 2,1 1,0 1,1 1,0 1,1 0,0 0,1 0,0 0,1 000 001 010 011 100 101 110 111 Bcube0 Bcube1 Bcube2
  • 9. There are alternate routes between any nodes  Can bypass failure servers and switches  Also can use acceralate throughput to parallelize traffic 2,0 2,1 2,0 2,1 1,0 1,1 1,0 1,1 0,0 0,1 0,0 0,1 000 001 010 011 100 101 110 111 Bcube0 Bcube1 Bcube2
  • 10. Source server decides the best path for a flow  Bypass failure paths  To propagate routing path, source server writes routing path information on packet header
  • 11. Add BCube header between Ethernet header and IP header  Has src/dst address and also routing path information on “Next Hop Index Array” Ethernet Header BCube dest address BCube source address BCube Header Protocol type IP Header Next Hop Index Array
  • 12. Evaluating various "Data Center Network" technologies, especially for container- moduler datacenter architecture. BCube is one of the candidate.
  • 13. Try to use existing code as much as possible  Minimum implementation at first  BCube binds multiple interface, assigns a BCube address and an IP address  What is the most similar function which already existing on Linux? →Bridge! ◦ Forked bridge.ko and brctl command, named bcube.ko and bcctl command
  • 14. brctl addbr <bridge> brctl delbr <bridge> ↓ bcctl addbc <bcube> <bcaddr> <N> <K> bcctl delbc <bcube>  Modified addbr/delbr, add 3 args ◦ BCube address ◦ n and k parameter  Use MAC address format/size for BCube address 101 → 00:00:01:00:01  Use BCube address for HW address of BCube device ◦ It works like fake MAC address on Linux network stack
  • 15. brctl addif <bridge> <device> brctl delif <bridge> <device> ↓ bcctl assignif <bcube> <layer> <device> bcctl unassignif <bcube> <layer> <device>  Modified assignif / unassignif command, add layer number on args
  • 16. Need to reconsider address resolution  Normal Ethernet ◦ IP Address → MAC Address (ARP)  BCube network ◦ IP Address → BCube Address → ARP? ◦ (Neighbor) BCube address → MAC Address → Need additional neighbor discovery protocol
  • 17. Once broadcast works on BCube implementation, ARP should work on it  But I haven’t implemented it yet, decided to configure manually by following command: arp –i bc0 –s 10.0.0.6 00:00:00:01:00:10
  • 18. Need an ARP like protocol  Decided to configure manually too, implemented following command: bcctl addneighbour <bcube> <layer> <bcaddr> <macaddr> bcctl delneighbour <bcube> <layer> <bcaddr>  bcube.ko maintenance neighbor table, use it in packet transmitting/forwarding
  • 19. In bridge.ko, it maintenance FDB(forwarding database) to lookup destination MAC address→output port using hash table  Deleted FDB, implemented function to decide next hop BCube address, output port, and MAC address of next hop  Haven’t implemented source routing – just default routing for now
  • 20. Top layer→Bottom layer  Ex: Route from 000 to 111 000 →100 →110 →111 2,0 2,1 2,0 2,1 1,0 1,1 1,0 1,1 0,0 0,1 0,0 0,1 000 001 010 011 100 101 110 111 Bcube0 Bcube1 Bcube2
  • 21. To add BCube Header between Ethernet Header and IP header, I forked net/ethernet/eth.c  ETH_HLEN (14byte) → BCUBE_HLEN (24byte)  struct ethhdr (MAC header) → struct bcubehdr (MAC & BCube header)  eth_header_ops → bc_header_ops To handle Bcube Header  Unfortunately GRO accesses ethernet header directly, and it works before BCube handles a packet – need to disable it
  • 22. Found a way to implement new L2 framework using existing bridge implementation ◦ Lot more easy than implement it from scrach  Development Status ◦ Implemented basic features, debugging now ◦ Will consider to add more features  broadcast / multicast  Intermediate node/switch failure detection, change the routing  source routing  address resolution protocol  Planing more detail evaluation in our data center testbed  Any comments and suggestions are welcome 
  • 23. This work was done as part of research assistance work at IIJ research laboratory.