Accelerate networking innovation through programmable data plane 
Removing switches from datacenters with TRILL/VNT and smartNIC 
Ahmed Amamou, ahmed@gandi.net 
Benoît Ganne, bganne@kalray.eu
•Gandi is a domain name registrars since 1999 and a cloud provider since 2008 
•We provide both 
–IaaS: Infrastructure As A Service 
–PaaS: Platform As A Service 
•We support open source community: 
–Provide open source code : https://github.com/Gandi 
–Support open source project: VLC, Debian, … * 
* Check http://www.gandi.net/supports/ for exhaustive list 
Who is Gandi? 
2
IaaS new network’s challenges 
3 
•Cisco Forecast report*: 
–Cloud traffic was about 3.3 zetta (1021) Bytes in 2013 
–Cloud traffic will reach 6.6 zetta Bytes in 2016 
–76% of cloud traffic are East-West (within the same datacenter) 
 A high density of links within a datacenter is needed 
•Customer need a full network access 
–Should be isolated 
– VM network configuration should not be restrictive 
Overlaying tenant traffic should be considered 
* Cisco Global Cloud Index Forecast and Methodology, 2011-2016.
•New protocols are proposed to solve these problems (TRILL , VXLAN, 802.1 ad STT …) but: 
– Hardware integration is slow 
– Protocol extensions are hard to integrate 
•We believe the OpenCompute community can help us 
–To define an open, vendor-neutral API for programmable data plane 
–Bring open hardware fulfilling those needs 
Why OpenCompute? 
4
•Switch from classic datacenter architecture to a full-mesh one 
•Upgrade hardware to improve performances 
New datacenter architecture 
5
TRILL @Gandi 
6 
•Gandi uses commodity hardware as TRILL Rbridges since 2013 
•We did not yet found hardware that suits our needs.
•Layer 2 Routing Protocol 
•Uses a control and a data plane 
•Control plane : based on IS-IS that computes all Routing information 
•Data plane : forward packets using provided information from control plane 
•Uses Mac-in-Mac encapsulation 
TRILL: TRansparent Interconnection of Lot of Links 
7 
Original payload 
TRILL Header
TRILL benefits 
8 
Commutation(L2) 
Routing (L3) 
TRILL 
Configuration 
Minimal 
Intense 
Minimal 
Plug & play 
Yes 
No 
Yes 
Discovery 
Automatic 
Configured 
Automatic 
Learning 
Automatic 
Configured 
Automatic 
Multi path 
No 
Yes 
Yes 
Convergence 
Slow 
Fast 
Fast 
Connectivity 
Inflexible 
Flexible 
Flexible 
Scale 
Limited 
Important 
Important
Control Plane: Forwarding database 
9
Multitenancy: Virtual Network over TRILL (VNT) 
10 
New cloud architecture have to take into consideration Multitenancy 
Trill does not provide Multitenancy handling mechanisms 
→ We need to extend it
•Update Both control and data planes 
–Control plane : Prune multicast tree to limit multicast traffic 
–Data plane : Forwarding is conditioned by VNI support 
VNT vs TRILL 
11 
VNT Encapsulation 
Original Ethernet Frame 
Outer Destination 
Mac Address 
Outer Source 
Mac Address 
Optional 
Outer IEEE 802.1Q 
TRILL Header 
VNT Header 
Extensions 
Original 
Packet Payload 
Egress Rbridge Nickname 
Ingress Rbridge Nickname 
Options description 
TLV 
VNI Tag (24 bits) 
L2 Routing information 
Tenant 
identification 
Publication: 
Amamou, A., Haddadou, K., & Pujolle, G. (2014). 
A TRILL-based multi-tenant data center network. Computer Networks.
VNT: Multicast tree pruning 
12 
n3 
n4 
n5 
n8 
n7 
n6 
n1 
n2 
i1 
i1 
i2 
i2 
i1 
i2 
i1 
i2 
i1 
i2 
i1 
i3 
i3 
i3 
i2 
i2 
i1 
i2 
i1 
i3 
n3 
n4 
n5 
n8 
n7 
n6 
n1 
n2 
i1 
i1 
i2 
i2 
i1 
i2 
i1 
i2 
i1 
i2 
i1 
i3 
i3 
i3 
i2 
i2 
i1 
i2 
i1 
i3 
n5 
n2 
n8 
n1 
n7 
n6 
n4 
n3 
n5 
n2 
n1 
n6 
A –Vni1 
A –Vni1 
B –Vni1 
Topology 
Multicast tree
Current VNT implementation on Linux 
13 
Control plane : Quagga daemon 
Data plane: 
Linux Bridge Module
Current VNT implementation on Linux 
14 
Control plane : Quagga daemon 
Data plane: 
Linux Bridge Module 
https://github.com/Gandi/
•Throughput is affected by the addition processing operation 
•Processing for a single packet is not affected 
Data plane: performance 
15 
Throughput 
Delay
•Shift data plane from host to smartNIC 
–Increase performance 
–Offload x86 for other usages 
•eg. Customers workload 
Improving performance 
16 
Host 
Host 
NIC 
smartNIC 
Control plane 
Data plane 
Control plane 
Data plane
•Founded in 2008, fabless semiconductor company 
•Kalray has developed the disruptive MPPA® (Multi-Purpose Processing Array) programmable architecture 
–Leading Performance / Energy Ratio Worldwide 
–Time predictability and low latency 
–Heterogeneous applications on the same chip 
–High programmability 
•Working with industry-leading partners and customers 
•55 employees 
•Offices in France and US 
KALRAY deterministic supercomputing on a chip 
17 
First MPPA®-256 Chips with CMOS 28nm TSMC Leading Performance / Energy Ratio Worldwide
Software Defined NIC 
Smart packet classification/dispatching 
256 cores for packets processing 
Standard C/C++ with GCC-4.9 
Advanced debugging and profiling 
Low latency 
Zero-copy Ethernet  PCIe 
< 1μs port-to-port transparent mode 
< 1μs port to system memory 
System integration 
Linux support 
Virtualization support 
Low power 
High throughput / Line rate 
80 Gbps full-duplex line-rate (2x120MPPS) 
3400 instructions per packet @64B 
AES, SHA-1, SHA-2,CRC accelerators 
2 x PCIe Gen3 8-lanes 
MPPA®-256 Bostan Networking Strengths 
18
MPPA®-256 Bostan 
•64-bit processor 
•Up to 800MHz 
•High Performance 
–845 GFLOPS SP / 422 GFLOPS DP 
–1 TOPS 
•High Bandwidth Network On a Chip 
–2 x 12.8 GB/s 
•High Speed Ethernet 
–Up to 2x40 Gbps / 2x120 MPPS @ 64B 
•DDR3 Memory interfaces 
–2 x 64-bit + ECC @2133MT/s / 2 x 17GB/s 
•PCIe Gen3 interface 
–2 x 8-lanes / 2 x 8 GB/s full duplex 
–End Point / Root Complex 
•NoCX extension 
–2 x 40 Gbps + 2 x 80 Gbps ILK 
•Flash controller, GPIOs… 
19
MPPA®-256 Processor Hierarchical Architecture 256 Processing Engine cores + 32 Resource Management cores 
20 
Manycore Processor 
Compute Cluster 
VLIW Core 
Instruction Level Parallelism 
Thread Level Parallelism 
Process Level Parallelism
High Speed Ethernet Packet processing 
•Ethernet Rx dispatcher 
–8 classification tables 
•Classify 
•Extract fields 
•Smart Dispatch 
–Round Robin way 
–Flexible cores allocation 
•Round Robin vs. classification 
•Per 10G Ports 
• Ethernet Tx 
–64 Tx FIFOs 
–QoS between the FIFOs 
–Flow Control between clusters and Tx FIFOs 
21 
Patent pending
VNT on a programmable data plane Multicast forwarding example 
22 
MPPA Linux ethernet driver 
Linux networking stack 
TRILL controller 
Kalray Bostan smartNIC 
x86 
Hypervisor 
MPPA Linux ethernet driver 
Linux networking stack 
Userspace application 
•On-going work between Gandi and Kalray 
–Explore programmable data plane opportunities 
–Study a VNT smartNIC feasibility and architecture 
•Multicast forwarding put a high load on each node 
IO ethernet driver 
8x10GbE
VNT on a programmable data plane Multicast forwarding example 
23 
MPPA Linux ethernet driver 
Linux networking stack 
TRILL controller 
x86 
Hypervisor 
MPPA Linux ethernet driver 
Linux networking stack 
Userspace application 
•Dispatch the packet based on Egress Rbridge 
–In case of multicast, Egress RBridge is set to the tree root 
–Each cluster “owns” a subset of the possible Egress RBridge (ie. a FIB subset) 
8x10GbE 
IO ethernet driver 
if (Packet[Ethertype] == TRILL) { 
send to cluster #HASH(Egress RBridge) 
} 
Kalray Bostan smartNIC 
<Ethertype=TRILL, Egress=DTROOT, VNI=VNI-1>
VNT on a programmable data plane Multicast forwarding example 
24 
MPPA Linux ethernet driver 
Linux networking stack 
TRILL controller 
x86 
Hypervisor 
MPPA Linux ethernet driver 
Linux networking stack 
Userspace application 
8x10GbE 
IO ethernet driver 
Kalray Bostan smartNIC 
•Dispatch the packet based on Egress Rbridge 
–In case of multicast, Egress RBridge is set to the tree root 
–Each cluster “owns” a subset of the possible Egress RBridge (ie. a FIB subset)
VNT on a programmable data plane Multicast forwarding example 
25 
MPPA Linux ethernet driver 
Linux networking stack 
TRILL controller 
x86 
Hypervisor 
MPPA Linux ethernet driver 
Linux networking stack 
Userspace application 
•Lookup the list of next-hop RBridges for this multicast tree 
–RBridge owner clusters can be local or remote 
•Lookup the LIB for local ports if any 
8x10GbE 
IO ethernet driver 
Kalray Bostan smartNIC 
FIB[Egress RBridge] = { 
Egress RBridge MAC; 
Egress RBridge Interface; 
MCTree = [ RBx, RBy, … ]; 
VNI = [ VNI-1, VNI-2, … ]; 
} 
LIB = { 
(Local MACx, Local Portx, VNI-1); 
… 
}
VNT on a programmable data plane Multicast forwarding example 
26 
MPPA Linux ethernet driver 
Linux networking stack 
TRILL controller 
x86 
Hypervisor 
MPPA Linux ethernet driver 
Linux networking stack 
Userspace application 
•Forward the frame 
–Remote 
•Forward to clusters owning the next-hop RBridge 
–Local 
•Decapsulte inner frame 
•Forward it the local VM 
8x10GbE 
IO ethernet driver 
Kalray Bostan smartNIC
VNT on a programmable data plane Multicast forwarding example 
27 
MPPA Linux ethernet driver 
Linux networking stack 
TRILL controller 
x86 
Hypervisor 
MPPA Linux ethernet driver 
Linux networking stack 
Userspace application 
•Check if the RBridge support the appropriate VNI 
–If yes forward to Rbridge 
–If not, stop here 
8x10GbE 
IO ethernet driver 
Kalray Bostan smartNIC 
FIB[Egress RBridge] = { 
Egress RBridge MAC; 
Egress RBridge Interface; 
MCTree = [ RBx, RBy, … ]; 
VNI = [ VNI-1, VNI-2, … ]; 
}
VNT on a programmable data plane Multicast forwarding example 
28 
MPPA Linux ethernet driver 
Linux networking stack 
TRILL controller 
x86 
Hypervisor 
MPPA Linux ethernet driver 
Linux networking stack 
Userspace application 
•Check if the RBridge support the appropriate VNI 
–If yes forward to Rbridge 
–If not, stop here 
8x10GbE 
IO ethernet driver 
Kalray Bostan smartNIC
•Solving SDN and network virtualization challenges requires new protocols 
–eg. VXLAN, NVGRE, TRILL/VNT… 
•Efficiency generally means hardware support 
…But hardware development cannot keep up with software and slow down innovation 
•Gandi and Kalray think a programmable data plane can reconcile efficiency and innovation 
…But we need open ecosystems, standards and API 
Innovation and efficiency 
29
Thank you for your attention! 
Questions? 
Ahmed Amamou, ahmed@gandi.net 
Benoît Ganne, bganne@kalray.eu

Ocpeu14

  • 1.
    Accelerate networking innovationthrough programmable data plane Removing switches from datacenters with TRILL/VNT and smartNIC Ahmed Amamou, ahmed@gandi.net Benoît Ganne, bganne@kalray.eu
  • 2.
    •Gandi is adomain name registrars since 1999 and a cloud provider since 2008 •We provide both –IaaS: Infrastructure As A Service –PaaS: Platform As A Service •We support open source community: –Provide open source code : https://github.com/Gandi –Support open source project: VLC, Debian, … * * Check http://www.gandi.net/supports/ for exhaustive list Who is Gandi? 2
  • 3.
    IaaS new network’schallenges 3 •Cisco Forecast report*: –Cloud traffic was about 3.3 zetta (1021) Bytes in 2013 –Cloud traffic will reach 6.6 zetta Bytes in 2016 –76% of cloud traffic are East-West (within the same datacenter)  A high density of links within a datacenter is needed •Customer need a full network access –Should be isolated – VM network configuration should not be restrictive Overlaying tenant traffic should be considered * Cisco Global Cloud Index Forecast and Methodology, 2011-2016.
  • 4.
    •New protocols areproposed to solve these problems (TRILL , VXLAN, 802.1 ad STT …) but: – Hardware integration is slow – Protocol extensions are hard to integrate •We believe the OpenCompute community can help us –To define an open, vendor-neutral API for programmable data plane –Bring open hardware fulfilling those needs Why OpenCompute? 4
  • 5.
    •Switch from classicdatacenter architecture to a full-mesh one •Upgrade hardware to improve performances New datacenter architecture 5
  • 6.
    TRILL @Gandi 6 •Gandi uses commodity hardware as TRILL Rbridges since 2013 •We did not yet found hardware that suits our needs.
  • 7.
    •Layer 2 RoutingProtocol •Uses a control and a data plane •Control plane : based on IS-IS that computes all Routing information •Data plane : forward packets using provided information from control plane •Uses Mac-in-Mac encapsulation TRILL: TRansparent Interconnection of Lot of Links 7 Original payload TRILL Header
  • 8.
    TRILL benefits 8 Commutation(L2) Routing (L3) TRILL Configuration Minimal Intense Minimal Plug & play Yes No Yes Discovery Automatic Configured Automatic Learning Automatic Configured Automatic Multi path No Yes Yes Convergence Slow Fast Fast Connectivity Inflexible Flexible Flexible Scale Limited Important Important
  • 9.
  • 10.
    Multitenancy: Virtual Networkover TRILL (VNT) 10 New cloud architecture have to take into consideration Multitenancy Trill does not provide Multitenancy handling mechanisms → We need to extend it
  • 11.
    •Update Both controland data planes –Control plane : Prune multicast tree to limit multicast traffic –Data plane : Forwarding is conditioned by VNI support VNT vs TRILL 11 VNT Encapsulation Original Ethernet Frame Outer Destination Mac Address Outer Source Mac Address Optional Outer IEEE 802.1Q TRILL Header VNT Header Extensions Original Packet Payload Egress Rbridge Nickname Ingress Rbridge Nickname Options description TLV VNI Tag (24 bits) L2 Routing information Tenant identification Publication: Amamou, A., Haddadou, K., & Pujolle, G. (2014). A TRILL-based multi-tenant data center network. Computer Networks.
  • 12.
    VNT: Multicast treepruning 12 n3 n4 n5 n8 n7 n6 n1 n2 i1 i1 i2 i2 i1 i2 i1 i2 i1 i2 i1 i3 i3 i3 i2 i2 i1 i2 i1 i3 n3 n4 n5 n8 n7 n6 n1 n2 i1 i1 i2 i2 i1 i2 i1 i2 i1 i2 i1 i3 i3 i3 i2 i2 i1 i2 i1 i3 n5 n2 n8 n1 n7 n6 n4 n3 n5 n2 n1 n6 A –Vni1 A –Vni1 B –Vni1 Topology Multicast tree
  • 13.
    Current VNT implementationon Linux 13 Control plane : Quagga daemon Data plane: Linux Bridge Module
  • 14.
    Current VNT implementationon Linux 14 Control plane : Quagga daemon Data plane: Linux Bridge Module https://github.com/Gandi/
  • 15.
    •Throughput is affectedby the addition processing operation •Processing for a single packet is not affected Data plane: performance 15 Throughput Delay
  • 16.
    •Shift data planefrom host to smartNIC –Increase performance –Offload x86 for other usages •eg. Customers workload Improving performance 16 Host Host NIC smartNIC Control plane Data plane Control plane Data plane
  • 17.
    •Founded in 2008,fabless semiconductor company •Kalray has developed the disruptive MPPA® (Multi-Purpose Processing Array) programmable architecture –Leading Performance / Energy Ratio Worldwide –Time predictability and low latency –Heterogeneous applications on the same chip –High programmability •Working with industry-leading partners and customers •55 employees •Offices in France and US KALRAY deterministic supercomputing on a chip 17 First MPPA®-256 Chips with CMOS 28nm TSMC Leading Performance / Energy Ratio Worldwide
  • 18.
    Software Defined NIC Smart packet classification/dispatching 256 cores for packets processing Standard C/C++ with GCC-4.9 Advanced debugging and profiling Low latency Zero-copy Ethernet  PCIe < 1μs port-to-port transparent mode < 1μs port to system memory System integration Linux support Virtualization support Low power High throughput / Line rate 80 Gbps full-duplex line-rate (2x120MPPS) 3400 instructions per packet @64B AES, SHA-1, SHA-2,CRC accelerators 2 x PCIe Gen3 8-lanes MPPA®-256 Bostan Networking Strengths 18
  • 19.
    MPPA®-256 Bostan •64-bitprocessor •Up to 800MHz •High Performance –845 GFLOPS SP / 422 GFLOPS DP –1 TOPS •High Bandwidth Network On a Chip –2 x 12.8 GB/s •High Speed Ethernet –Up to 2x40 Gbps / 2x120 MPPS @ 64B •DDR3 Memory interfaces –2 x 64-bit + ECC @2133MT/s / 2 x 17GB/s •PCIe Gen3 interface –2 x 8-lanes / 2 x 8 GB/s full duplex –End Point / Root Complex •NoCX extension –2 x 40 Gbps + 2 x 80 Gbps ILK •Flash controller, GPIOs… 19
  • 20.
    MPPA®-256 Processor HierarchicalArchitecture 256 Processing Engine cores + 32 Resource Management cores 20 Manycore Processor Compute Cluster VLIW Core Instruction Level Parallelism Thread Level Parallelism Process Level Parallelism
  • 21.
    High Speed EthernetPacket processing •Ethernet Rx dispatcher –8 classification tables •Classify •Extract fields •Smart Dispatch –Round Robin way –Flexible cores allocation •Round Robin vs. classification •Per 10G Ports • Ethernet Tx –64 Tx FIFOs –QoS between the FIFOs –Flow Control between clusters and Tx FIFOs 21 Patent pending
  • 22.
    VNT on aprogrammable data plane Multicast forwarding example 22 MPPA Linux ethernet driver Linux networking stack TRILL controller Kalray Bostan smartNIC x86 Hypervisor MPPA Linux ethernet driver Linux networking stack Userspace application •On-going work between Gandi and Kalray –Explore programmable data plane opportunities –Study a VNT smartNIC feasibility and architecture •Multicast forwarding put a high load on each node IO ethernet driver 8x10GbE
  • 23.
    VNT on aprogrammable data plane Multicast forwarding example 23 MPPA Linux ethernet driver Linux networking stack TRILL controller x86 Hypervisor MPPA Linux ethernet driver Linux networking stack Userspace application •Dispatch the packet based on Egress Rbridge –In case of multicast, Egress RBridge is set to the tree root –Each cluster “owns” a subset of the possible Egress RBridge (ie. a FIB subset) 8x10GbE IO ethernet driver if (Packet[Ethertype] == TRILL) { send to cluster #HASH(Egress RBridge) } Kalray Bostan smartNIC <Ethertype=TRILL, Egress=DTROOT, VNI=VNI-1>
  • 24.
    VNT on aprogrammable data plane Multicast forwarding example 24 MPPA Linux ethernet driver Linux networking stack TRILL controller x86 Hypervisor MPPA Linux ethernet driver Linux networking stack Userspace application 8x10GbE IO ethernet driver Kalray Bostan smartNIC •Dispatch the packet based on Egress Rbridge –In case of multicast, Egress RBridge is set to the tree root –Each cluster “owns” a subset of the possible Egress RBridge (ie. a FIB subset)
  • 25.
    VNT on aprogrammable data plane Multicast forwarding example 25 MPPA Linux ethernet driver Linux networking stack TRILL controller x86 Hypervisor MPPA Linux ethernet driver Linux networking stack Userspace application •Lookup the list of next-hop RBridges for this multicast tree –RBridge owner clusters can be local or remote •Lookup the LIB for local ports if any 8x10GbE IO ethernet driver Kalray Bostan smartNIC FIB[Egress RBridge] = { Egress RBridge MAC; Egress RBridge Interface; MCTree = [ RBx, RBy, … ]; VNI = [ VNI-1, VNI-2, … ]; } LIB = { (Local MACx, Local Portx, VNI-1); … }
  • 26.
    VNT on aprogrammable data plane Multicast forwarding example 26 MPPA Linux ethernet driver Linux networking stack TRILL controller x86 Hypervisor MPPA Linux ethernet driver Linux networking stack Userspace application •Forward the frame –Remote •Forward to clusters owning the next-hop RBridge –Local •Decapsulte inner frame •Forward it the local VM 8x10GbE IO ethernet driver Kalray Bostan smartNIC
  • 27.
    VNT on aprogrammable data plane Multicast forwarding example 27 MPPA Linux ethernet driver Linux networking stack TRILL controller x86 Hypervisor MPPA Linux ethernet driver Linux networking stack Userspace application •Check if the RBridge support the appropriate VNI –If yes forward to Rbridge –If not, stop here 8x10GbE IO ethernet driver Kalray Bostan smartNIC FIB[Egress RBridge] = { Egress RBridge MAC; Egress RBridge Interface; MCTree = [ RBx, RBy, … ]; VNI = [ VNI-1, VNI-2, … ]; }
  • 28.
    VNT on aprogrammable data plane Multicast forwarding example 28 MPPA Linux ethernet driver Linux networking stack TRILL controller x86 Hypervisor MPPA Linux ethernet driver Linux networking stack Userspace application •Check if the RBridge support the appropriate VNI –If yes forward to Rbridge –If not, stop here 8x10GbE IO ethernet driver Kalray Bostan smartNIC
  • 29.
    •Solving SDN andnetwork virtualization challenges requires new protocols –eg. VXLAN, NVGRE, TRILL/VNT… •Efficiency generally means hardware support …But hardware development cannot keep up with software and slow down innovation •Gandi and Kalray think a programmable data plane can reconcile efficiency and innovation …But we need open ecosystems, standards and API Innovation and efficiency 29
  • 30.
    Thank you foryour attention! Questions? Ahmed Amamou, ahmed@gandi.net Benoît Ganne, bganne@kalray.eu