SlideShare a Scribd company logo
1 of 29
Download to read offline
Recent advance in netmap/
VALE(mSwitch)	
Michio Honda, Felipe Huici	

(NEC Europe Ltd.)	

Giuseppe Lettieri and Luigi Rizzo 	

(Universita di Pisa)	

Kernel/VM@Jimbo-cho, Japan on Dec. 8 2013	

	

michio.honda@neclab.eu / @michioh
Outline	
•  netmap API basics	

–  Architecture	

–  How to write apps	


•  VALE (mSwitch)	

–  Architecture	

–  System design	

–  Evaluation	

–  Use cases
NETMAP API BASICS
netmap overview	
•  A fast packet I/O mechanism between the NIC and the
user-space	

–  Remove unnecessary metadata (e.g., sk_buff) allocation	

–  Amortized systemcall costs, reduced/removed data copies	


Page	
  4	
  

From	
  h,p://info.iet.unipi.it/~luigi/netmap/
Performance	
•  Saturate 10 Gbps pipe with low CPU frequency	

From	
  h,p://info.iet.unipi.it/~luigi/netmap/
netmap API (initialization)	
•  open(“/dev/netmap”) returns a file descriptor	

•  ioctl(fd, NIOCREG, arg) puts an interface in netmap mode	

•  mmap(…, fd, 0) maps buffers and rings	


Page	
  6	
  

From	
  h,p://info.iet.unipi.it/~luigi/netmap/
netmap API (TX)	
•  TX	

–  Fill up to avail buffers, starting from slot cur	

–  ioctl(fd, NIOCTXSYNC) queues the packets	

•  poll() can be used for blocking I/O	


Page	
  7	
  

From	
  h,p://info.iet.unipi.it/~luigi/netmap/
netmap API (RX)	
•  RX	

–  ioctl(fd, NIOCTXSYNC) reports newly received packets	

–  Process up to avail buffers, starting from slot cur	

•  poll() can be used for blocking I/O	


Page	
  8	
  

From	
  h,p://info.iet.unipi.it/~luigi/netmap/
Other features	
•  Multi queue support	

–  One netmap ring is initialized for a single physical ring 	

•  e.g., different pthreads can be assigned to different netmap/
physical rings	


•  Host stack support	

–  NIC is put into netmap mode, 
resetting its phy	

–  The host stack still sees the interface,
and packets can be sent to/from the
NIC via “software rings”	

•  Either implicitly by the kernel or explicitly
by the app	

From	
  h,p://info.iet.unipi.it/~luigi/netmap/
Implementation	
•  Available for FreeBSD and Linux	

–  Linux code is glued from FreeBSD one	


•  Common code	

–  control, systemcall backends, memory allocator etc	


•  Device-specific code	

–  Each of the supported drivers implements some functions	

•  nm_register(struct ifnet *ifp, int onoff)	

–  Put the NIC into netmap mode, allocate netmap rings and slots	

•  nm_txsync(struct ifnet *ifp, u_int ring_nr)	

–  Flush out the packets from the netmap ring filled by the user	

•  nm_rxsync(struct ifnet *ifp, u_int ring_nr)	

–  refil the netmap ring with the receiving packets
Implementation (cont.)	
•  Small modifications in device drivers	

	


From	
  h,p://info.iet.unipi.it/~luigi/netmap/
VALE (MSWITCH): A NETMAPBASED, VERY FAST
SOFTWARE SWITCH
Software switch	
•  Switching packets between network interfaces	

–  General-purpose OS and processor	

–  Virtual and hardware interfaces	


•  20 years ago	

–  Prototyping, low-performance alternative	


•  Now and near future	

–  Replacement of hardware switch	

–  Hosting virtual machines (incl. virtual 
network functions)	

Page	
  13	
  

Software switch
Performance of today’s software switch	
•  Forward packets from a 10Gbps NIC to another one	


Throughput (Gbps)

–  Xeon E5-1650 (3.8Ghz) (1 CPU core is used)	

–  Lower than 1 Gbps for the minimum-sized packets	

FreeBSD bridge

10

5
2
1
64

Page	
  14	
  

Openvswitch

128
256
512
Packet size (Bytes)

1024
Problems	
1.  Inefficient packet I/O mechanism	

–  Today’s software switches use a dynamically-allocated, big 
metadata (e.g., sk_buff) designed for end systems	

–  it should be simplified, because the packet is just forwarded	

–  For switches it is more important to process small packets
efficiently	

2.  Inefficient packet switching algorithm	

–  How to move packets from the source
to the destination(s) efficiently?	

–  Traditional way	

Page	
  15	
  

1

2

3

4

•  Lock a destination, send a single packet, then unlock the destination	

•  Inefficient due to locking cost/contention
Problems (cont.)	
3.  Lack of flexibility in packet processing	

–  How to decide packet’s destination?	

–  One could use layer 2 learning bridge to decide packet’s
destination	

–  One could use OpenFlow packet matching to do so	


packet
processing

Page	
  16	
  
Solutions	
1.  “Inefficient packet I/O mechanisms”	

–  Simple, minimalistic packet representation (netmap API*)	

•  No metadata allocation cost	

•  Reduced cache pollution	


2.  “Inefficient packet switching”	


1

–  Group multiple packets going to the same
destination	

3
–  Lock the destination only once for a group of packets	


Page	
  17	
  

2

4

Netmap	
  –	
  a	
  novel	
  frameworl	
  for	
  fast	
  
packet	
  I/O	
  
h,p://info.iet.unipi.it/~luigi/netmap/	
  
Luigi	
  Rizzo	
  
Università	
  di	
  Pisa	
  
Bitmap-based forwarding algorithm	
•  Algorithm in the original VALE	

•  Support for unicast, multicast and broadcast	

–  Get pointers to a batch of packets	

pkt id dst
–  Identify the destination of each 
p0 0010
packet and represent as a bitmap	

 p1 0001
–  Lock each destination, and send all  p2 0010
p3 1111
the packets going there	


•  Problem	

–  Scalability issue in the

p4 0010

0010
0001
0010
1111
0010

Figure 3. Bitmap-based packet forwarding algorithm: packets are labeled destinations	

presence of many from p0 to p4; for each packet, destination(s)
are identified and represented in a bitmap (a bit for each posVALE,	
  a	
  V forwarder considers
sible destination port). Theirtual	
  Local	
  Ethernet	
   each destih,p://info.iet.unipi.it/~luigi/vale/	
  
nation port in turn, scanning the corresponding column of
Luigi	
  Rizzo,	
  Giuseppe	
  LeReri	
  
the bitmap to identify the packets bound to the current destiUniversità	
  di	
  Pisa	
  
nation port.
	
  
p1
p2
p3
p4

0001
0010
1111
0010

0010
1111
0010

List-based forwarding algorithm	

Figure 3. Bitmap-based packet forwarding algorithm: packets are labeled from p0 to p4; for each packet, destination(s)
are identified and represented in a bitmap (a bit for each possible destination port). The forwarder considers each destination port in turn, scanning the corresponding column of
the bitmap to identify the packets bound to the current destination port.

•  Algorithm in the current VALE (mSwitch)	

•  Support for unicast and broadcast	

–  Make a linked-list for each destination	

–  Broadcast packets are pkt id dst next
p0 d1 null
mapped into destination p1 d0 null
p2
index 254	

p3
–  Scan each destination, p4
and broadcast packets
are inserted in-order	

 pkt id dst next
p0 d1
p2
p1 d0 null
p2 d1
p4
p3 d254 null
p4 d1 null

dst d0 d1 d2
head p1 p0
tail p1 p0

d254

...

dst d0 d1 d2
head p1 p0
tail p1 p4

d254
... p3
p3

Figure 4. List-based packet forwarding: packets are labeled
from p0 to p4, destination port indices are labeled from d1

in the
in the
well f
To
cast o
This l
tinatio
forwa
curren
time
9 sho
4.3

Once
identi
and m
menta
the du
of pac
Ins
two p
large
leases
tion o
vance
order
many
page
for th
queue
tolera
Solutions (cont.)	
3.  “Lack of flexibility in packet processing”	

–  Separate a switching fabric 
and packet processing	

–  Switching fabric	


Packet processing

•  Move packets quickly	


Switching fabric

–  Packet processing	

•  Decide packets’ destination and tell the switching fabric 	


typedef	
  u_int	
  (*BDG_LOOKUP_T)(char	
  *buf,	
  u_int	
  len,	
  uint8_t	
  *ring_nr,	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  struct	
  netmap_adapter	
  *srcif);	
–  Return	

•  The index of the destination port for unicast	

•  NM_BDG_BROADCAST for broadcast	

•  NM_BDG_NOPORT for dropping this packet	


–  By default L2 learning is set	

Page	
  20	
  
VALE (mSwitch) architecture	

...

Netmap API

Virtual
interfaces

Netmap API

app1/vm1 . . . appN/vmN

apps

User

Socket API

Kernel

OS stack

Packet processing
Switching fabric
NIC

•  Packet forwarding (identifying packets’ destination
(packet processing) and copying packets to the
destination ring) takes place in the sender’s context	

–  The receiver just consumes the packets	
Page	
  21	
  
The other features	
•  Indirect buffer support	

–  netmap slot can contain a pointer to the actual buffer	

–  Useful to eliminate data copy from VM’s backend to a
netmap slot	


•  Support for a large packet	

–  Multiple netmap slots (by default 2048 byte each) can be
used to contain a single packet
Bare mSwitch performance	

8
6
4
2
64 128 256 512 1518
Packet size (Bytes)

Dummy

20

64B

128B

256B

15
10
5
1.3 1.9 2.6 3.2 3.8
CPU Clock Frequency (Ghz)

Experiments	
  are	
  done	
  with	
  Xeon	
  E5-­‐1650	
  CPU	
  (6	
  core,	
  3.8	
  Ghz	
  with	
  Turboboost),	
  16GB	
  DDR3	
  
(a) NIC to NIC
Page	
  23	
   RAM	
  (quad	
  channel)	
  and	
  Intel	
  X520-­‐T2	
  10Gbps	
  NICs	
  

Throughput (Gbps)

10

10Gbps line rate

Throughput (Gbps)

Throughput (Gbps)

•  NIC to NIC (10Gbps)	
•  “Dummy” processing module 	


20

64

15
10
5

1.3
CPU C

(b

Figure 5. Throughput between 10 Gb/s NICs and vir
port, and 20 Gb/s ones with two pairs). In addition, we assign one CPU core per port pair. The results in figure 5(b)
are similar to those in the NIC-to-NIC case, with line rate
values for all packet sizes at 3.8 GHz. The graph also shows
that mSwitch scales well with the number of ports and CPU
cores: we achieve line rate for two 10 Gb/s ports for all
•  Virtual port Finally, figure port	
packet sizes. to virtual 5(c) presents throughput numbers in the opposite direction, from virtual
Dummy
•  “Dummy” processing module 	

 ports to NICs,
with similar results.
of baseline pery plug-in mod1 CPU core
figured to mark
200
2 CPU cores
here, the packet
3 CPU cores
mediately (thus
150
s for packets to
then the switch

different packet
orts. We further
Turbo Boost to
ets us shed light

ts of the NIC to
one to the other
h this setup, we
Page	
  
ies for 256-byte 24	
  
ones starting at

Bare mSwitch performance	

Throughput (Gbps)

gen, a fast genbe plugged into
e Gb/s to mean
ackets per sectch size of 1024

100
50
25
60

508 1514 8K 64K
Packet size (Bytes)

Figure 6. Forwarding performance between two virtual
Bare mSwitch performance	

250

1514B packets
64KB packets

200
150
100
50
0

broadcast
(a) Experiment topologies.

Throughput (Gbps)

unicast

Throughput (Gbps)

•  Dummy packet processing module”	
•  N virtual ports to N virtual ports	

•  “	
64B packets

250
200

64B packets
1514B packets
64KB packets

150
100
50
0

2

4
6
# of ports

(b) Unicast throughput.

8

2

4
6
# of ports

8

(c) Broadcast throughput.

Figure 7. Switching capacity with an increasing number of virtual ports. For unicast, each src/dst port pair is assigned a single
CPU core, for broadcast each port is given a core. For setups with more than 6 ports (our system has 6 cores) we assign cores
in a round-robin fashion.
Page	
  25	
  
adcast

# of ports

ent topologies.

# of ports

(b) Unicast throughput.

(c) Broadcast throughput.

g capacity with an increasing number of virtual ports. For unicast, each src/dst port pair is assigned a single
cast each port is given a core. For setups with more than 6 ports (our system has 6 cores) we assign cores
hion.

mSwitch’s Scalability	

–  Bitmap- vs List-based
algorithm 	

–  List-based algorithm
scales very well	
2
3
4
# of destination ports

5

forwarding throughput from a single
iple destinations using minimum-sized
compares mSwitch’s forwarding algo’s (bitmap).
Page	
  26	
  

Aggregate throughput (Gbps)

•  A single virtual port to  14
List Algo.
Bitmapmany virtual ports	

Algo.
12

List Algo.
Bitmap Algo.

10
8
6
4
2
0
1 20 40 60 100 150 200
# of destination ports

250

Figure 9. Comparison ofWe	
  use	
  the	
  minimum-­‐sized	
  
mSwitch’s forwarding algorithm
packets	
  w the single	
  CPU	
  c a large
(list) to that of VALE (bitmap) in ith	
  a	
  presence of ore	
  
number of active destination ports (single sender, minimum-
Learning bridge performance	

8
6
4
2

Page	
  27	
  

64 128 256 512 1024
Packet size (Bytes)
(a) Layer 2 learning bridge.

Learning bridge

8
6
4
2
64 128 256 512 1024
Packet size (Bytes)

(b) 3-tuple filter (user-space nw stack support).

Throughput (Gbps)

–  Adding a cost of MAC address hashing at packet processing	
FreeBSD bridge
mSwitch-3-tuple
mSwitch-learn
10
10

Throughput (Gbps)

Throughput (Gbps)

•  mSwitch-learn: pure learning bridge processing	


10
8
6
4
2

6
OpenVswitch acceleration	
•  mSwitch-OVS: mSwitch with Openswitch’s packet
processing	


512 1024
Bytes)

tack support).

Throughput (Gbps)

ch-3-tuple

10

Page	
  28	
  

OVS
mSwitch-OVS

8
6
4
2
64 128 256 512 1024
Packet size (Bytes)
(c) Open vSwitch.

Openvswitch
Conclusion	
•  Our contribution	


–  VALE(mSwitch): fast, modular software switch	

–  Very fast packet forwarding at bare metal	

•   200 Gbps between virtual ports (with 1500 Byte packets and 3 CPU
cores)	

•  Almost the line rate with using 1 CPU core and 2 10 Gbps NICs	


–  Useful to implement various systems	

•  Very fast learning bridge	

•  Accelerate OpenVswitch up to 2.6 times	

–  Small modifications, preserving control interface	

•  Fast protocol multiplexer/demultiplexer for user-space protocol stacks	


•  Code (Linux, FreeBSD) is available at:	

–  http://info.iet.unipi.it/~luigi/netmap/	


Page	
  29	
  

More Related Content

What's hot

Lagopus presentation on 14th Annual ON*VECTOR International Photonics Workshop
Lagopus presentation on 14th Annual ON*VECTOR International Photonics WorkshopLagopus presentation on 14th Annual ON*VECTOR International Photonics Workshop
Lagopus presentation on 14th Annual ON*VECTOR International Photonics WorkshopLagopus SDN/OpenFlow switch
 
Packet Framework - Cristian Dumitrescu
Packet Framework - Cristian DumitrescuPacket Framework - Cristian Dumitrescu
Packet Framework - Cristian Dumitrescuharryvanhaaren
 
VLANs in the Linux Kernel
VLANs in the Linux KernelVLANs in the Linux Kernel
VLANs in the Linux KernelKernel TLV
 
DPDK in Containers Hands-on Lab
DPDK in Containers Hands-on LabDPDK in Containers Hands-on Lab
DPDK in Containers Hands-on LabMichelle Holley
 
DPDK Summit - 08 Sept 2014 - Futurewei - Jun Xu - Revisit the IP Stack in Lin...
DPDK Summit - 08 Sept 2014 - Futurewei - Jun Xu - Revisit the IP Stack in Lin...DPDK Summit - 08 Sept 2014 - Futurewei - Jun Xu - Revisit the IP Stack in Lin...
DPDK Summit - 08 Sept 2014 - Futurewei - Jun Xu - Revisit the IP Stack in Lin...Jim St. Leger
 
Performance challenges in software networking
Performance challenges in software networkingPerformance challenges in software networking
Performance challenges in software networkingStephen Hemminger
 
FD.io Vector Packet Processing (VPP)
FD.io Vector Packet Processing (VPP)FD.io Vector Packet Processing (VPP)
FD.io Vector Packet Processing (VPP)Kirill Tsym
 
DPDK Summit 2015 - HP - Al Sanders
DPDK Summit 2015 - HP - Al SandersDPDK Summit 2015 - HP - Al Sanders
DPDK Summit 2015 - HP - Al SandersJim St. Leger
 
Enable DPDK and SR-IOV for containerized virtual network functions with zun
Enable DPDK and SR-IOV for containerized virtual network functions with zunEnable DPDK and SR-IOV for containerized virtual network functions with zun
Enable DPDK and SR-IOV for containerized virtual network functions with zunheut2008
 
The n00bs guide to ovs dpdk
The n00bs guide to ovs dpdkThe n00bs guide to ovs dpdk
The n00bs guide to ovs dpdkmarkdgray
 
Introduction to DPDK
Introduction to DPDKIntroduction to DPDK
Introduction to DPDKKernel TLV
 
Integrating Linux routing with FusionCLI™
Integrating Linux routing with FusionCLI™Integrating Linux routing with FusionCLI™
Integrating Linux routing with FusionCLI™Stephen Hemminger
 
DPDK Summit - 08 Sept 2014 - 6WIND - High Perf Networking Leveraging the DPDK...
DPDK Summit - 08 Sept 2014 - 6WIND - High Perf Networking Leveraging the DPDK...DPDK Summit - 08 Sept 2014 - 6WIND - High Perf Networking Leveraging the DPDK...
DPDK Summit - 08 Sept 2014 - 6WIND - High Perf Networking Leveraging the DPDK...Jim St. Leger
 
High Performance Networking Leveraging the DPDK and Growing Community
High Performance Networking Leveraging the DPDK and Growing CommunityHigh Performance Networking Leveraging the DPDK and Growing Community
High Performance Networking Leveraging the DPDK and Growing Community6WIND
 
DPDK Summit 2015 - RIFT.io - Tim Mortsolf
DPDK Summit 2015 - RIFT.io - Tim MortsolfDPDK Summit 2015 - RIFT.io - Tim Mortsolf
DPDK Summit 2015 - RIFT.io - Tim MortsolfJim St. Leger
 

What's hot (20)

Lagopus presentation on 14th Annual ON*VECTOR International Photonics Workshop
Lagopus presentation on 14th Annual ON*VECTOR International Photonics WorkshopLagopus presentation on 14th Annual ON*VECTOR International Photonics Workshop
Lagopus presentation on 14th Annual ON*VECTOR International Photonics Workshop
 
Understanding DPDK
Understanding DPDKUnderstanding DPDK
Understanding DPDK
 
Packet Framework - Cristian Dumitrescu
Packet Framework - Cristian DumitrescuPacket Framework - Cristian Dumitrescu
Packet Framework - Cristian Dumitrescu
 
VLANs in the Linux Kernel
VLANs in the Linux KernelVLANs in the Linux Kernel
VLANs in the Linux Kernel
 
Dpdk performance
Dpdk performanceDpdk performance
Dpdk performance
 
DPDK in Containers Hands-on Lab
DPDK in Containers Hands-on LabDPDK in Containers Hands-on Lab
DPDK in Containers Hands-on Lab
 
DPDK Summit - 08 Sept 2014 - Futurewei - Jun Xu - Revisit the IP Stack in Lin...
DPDK Summit - 08 Sept 2014 - Futurewei - Jun Xu - Revisit the IP Stack in Lin...DPDK Summit - 08 Sept 2014 - Futurewei - Jun Xu - Revisit the IP Stack in Lin...
DPDK Summit - 08 Sept 2014 - Futurewei - Jun Xu - Revisit the IP Stack in Lin...
 
Performance challenges in software networking
Performance challenges in software networkingPerformance challenges in software networking
Performance challenges in software networking
 
100 M pps on PC.
100 M pps on PC.100 M pps on PC.
100 M pps on PC.
 
FD.io Vector Packet Processing (VPP)
FD.io Vector Packet Processing (VPP)FD.io Vector Packet Processing (VPP)
FD.io Vector Packet Processing (VPP)
 
DPDK Summit 2015 - HP - Al Sanders
DPDK Summit 2015 - HP - Al SandersDPDK Summit 2015 - HP - Al Sanders
DPDK Summit 2015 - HP - Al Sanders
 
Enable DPDK and SR-IOV for containerized virtual network functions with zun
Enable DPDK and SR-IOV for containerized virtual network functions with zunEnable DPDK and SR-IOV for containerized virtual network functions with zun
Enable DPDK and SR-IOV for containerized virtual network functions with zun
 
The n00bs guide to ovs dpdk
The n00bs guide to ovs dpdkThe n00bs guide to ovs dpdk
The n00bs guide to ovs dpdk
 
Introduction to DPDK
Introduction to DPDKIntroduction to DPDK
Introduction to DPDK
 
Integrating Linux routing with FusionCLI™
Integrating Linux routing with FusionCLI™Integrating Linux routing with FusionCLI™
Integrating Linux routing with FusionCLI™
 
DPDK In Depth
DPDK In DepthDPDK In Depth
DPDK In Depth
 
Linux Network Stack
Linux Network StackLinux Network Stack
Linux Network Stack
 
DPDK Summit - 08 Sept 2014 - 6WIND - High Perf Networking Leveraging the DPDK...
DPDK Summit - 08 Sept 2014 - 6WIND - High Perf Networking Leveraging the DPDK...DPDK Summit - 08 Sept 2014 - 6WIND - High Perf Networking Leveraging the DPDK...
DPDK Summit - 08 Sept 2014 - 6WIND - High Perf Networking Leveraging the DPDK...
 
High Performance Networking Leveraging the DPDK and Growing Community
High Performance Networking Leveraging the DPDK and Growing CommunityHigh Performance Networking Leveraging the DPDK and Growing Community
High Performance Networking Leveraging the DPDK and Growing Community
 
DPDK Summit 2015 - RIFT.io - Tim Mortsolf
DPDK Summit 2015 - RIFT.io - Tim MortsolfDPDK Summit 2015 - RIFT.io - Tim Mortsolf
DPDK Summit 2015 - RIFT.io - Tim Mortsolf
 

Viewers also liked

DPDK (Data Plane Development Kit)
DPDK (Data Plane Development Kit) DPDK (Data Plane Development Kit)
DPDK (Data Plane Development Kit) ymtech
 
[1A6]Docker로 보는 서버 운영의 미래
[1A6]Docker로 보는 서버 운영의 미래[1A6]Docker로 보는 서버 운영의 미래
[1A6]Docker로 보는 서버 운영의 미래NAVER D2
 
Ethernet and TCP optimizations
Ethernet and TCP optimizationsEthernet and TCP optimizations
Ethernet and TCP optimizationsJeff Squyres
 
Virtualized network with openvswitch
Virtualized network with openvswitchVirtualized network with openvswitch
Virtualized network with openvswitchSim Janghoon
 
[Ppt발표팁]효과적인 슬라이드 발표를 위한 10가지 팁
[Ppt발표팁]효과적인 슬라이드 발표를 위한 10가지 팁[Ppt발표팁]효과적인 슬라이드 발표를 위한 10가지 팁
[Ppt발표팁]효과적인 슬라이드 발표를 위한 10가지 팁에디티지(Editage Korea)
 
Windows Server Containers- How we hot here and architecture deep dive
Windows Server Containers- How we hot here and architecture deep diveWindows Server Containers- How we hot here and architecture deep dive
Windows Server Containers- How we hot here and architecture deep diveDocker, Inc.
 
Open vSwitch 패킷 처리 구조
Open vSwitch 패킷 처리 구조Open vSwitch 패킷 처리 구조
Open vSwitch 패킷 처리 구조Seung-Hoon Baek
 
[OpenStack 하반기 스터디] Docker를 이용한 OpenStack 가상화
[OpenStack 하반기 스터디] Docker를 이용한 OpenStack 가상화[OpenStack 하반기 스터디] Docker를 이용한 OpenStack 가상화
[OpenStack 하반기 스터디] Docker를 이용한 OpenStack 가상화OpenStack Korea Community
 
Direct Code Execution @ CoNEXT 2013
Direct Code Execution @ CoNEXT 2013Direct Code Execution @ CoNEXT 2013
Direct Code Execution @ CoNEXT 2013Hajime Tazaki
 
NUSE (Network Stack in Userspace) at #osio
NUSE (Network Stack in Userspace) at #osioNUSE (Network Stack in Userspace) at #osio
NUSE (Network Stack in Userspace) at #osioHajime Tazaki
 

Viewers also liked (11)

DPDK (Data Plane Development Kit)
DPDK (Data Plane Development Kit) DPDK (Data Plane Development Kit)
DPDK (Data Plane Development Kit)
 
[1A6]Docker로 보는 서버 운영의 미래
[1A6]Docker로 보는 서버 운영의 미래[1A6]Docker로 보는 서버 운영의 미래
[1A6]Docker로 보는 서버 운영의 미래
 
Docker infiniband
Docker infinibandDocker infiniband
Docker infiniband
 
Ethernet and TCP optimizations
Ethernet and TCP optimizationsEthernet and TCP optimizations
Ethernet and TCP optimizations
 
Virtualized network with openvswitch
Virtualized network with openvswitchVirtualized network with openvswitch
Virtualized network with openvswitch
 
[Ppt발표팁]효과적인 슬라이드 발표를 위한 10가지 팁
[Ppt발표팁]효과적인 슬라이드 발표를 위한 10가지 팁[Ppt발표팁]효과적인 슬라이드 발표를 위한 10가지 팁
[Ppt발표팁]효과적인 슬라이드 발표를 위한 10가지 팁
 
Windows Server Containers- How we hot here and architecture deep dive
Windows Server Containers- How we hot here and architecture deep diveWindows Server Containers- How we hot here and architecture deep dive
Windows Server Containers- How we hot here and architecture deep dive
 
Open vSwitch 패킷 처리 구조
Open vSwitch 패킷 처리 구조Open vSwitch 패킷 처리 구조
Open vSwitch 패킷 처리 구조
 
[OpenStack 하반기 스터디] Docker를 이용한 OpenStack 가상화
[OpenStack 하반기 스터디] Docker를 이용한 OpenStack 가상화[OpenStack 하반기 스터디] Docker를 이용한 OpenStack 가상화
[OpenStack 하반기 스터디] Docker를 이용한 OpenStack 가상화
 
Direct Code Execution @ CoNEXT 2013
Direct Code Execution @ CoNEXT 2013Direct Code Execution @ CoNEXT 2013
Direct Code Execution @ CoNEXT 2013
 
NUSE (Network Stack in Userspace) at #osio
NUSE (Network Stack in Userspace) at #osioNUSE (Network Stack in Userspace) at #osio
NUSE (Network Stack in Userspace) at #osio
 

Similar to Recent advance in netmap/VALE(mSwitch)

An FPGA for high end Open Networking
An FPGA for high end Open NetworkingAn FPGA for high end Open Networking
An FPGA for high end Open Networkingrinnocente
 
PLNOG16: Obsługa 100M pps na platformie PC , Przemysław Frasunek, Paweł Mała...
PLNOG16: Obsługa 100M pps na platformie PC, Przemysław Frasunek, Paweł Mała...PLNOG16: Obsługa 100M pps na platformie PC, Przemysław Frasunek, Paweł Mała...
PLNOG16: Obsługa 100M pps na platformie PC , Przemysław Frasunek, Paweł Mała...PROIDEA
 
Introduction to tcp ip linux networking
Introduction to tcp ip   linux networkingIntroduction to tcp ip   linux networking
Introduction to tcp ip linux networkingSreenatha Reddy K R
 
Making our networking stack truly extensible
Making our networking stack truly extensible Making our networking stack truly extensible
Making our networking stack truly extensible Olivier Bonaventure
 
Network Programming: Data Plane Development Kit (DPDK)
Network Programming: Data Plane Development Kit (DPDK)Network Programming: Data Plane Development Kit (DPDK)
Network Programming: Data Plane Development Kit (DPDK)Andriy Berestovskyy
 
Ocpeu14
Ocpeu14Ocpeu14
Ocpeu14KALRAY
 
NUMA-aware thread-parallel breadth-first search for Graph500 and Green Graph5...
NUMA-aware thread-parallel breadth-first search for Graph500 and Green Graph5...NUMA-aware thread-parallel breadth-first search for Graph500 and Green Graph5...
NUMA-aware thread-parallel breadth-first search for Graph500 and Green Graph5...Yuichiro Yasui
 
SF-TAP: Scalable and Flexible Traffic Analysis Platform (USENIX LISA 2015)
SF-TAP: Scalable and Flexible Traffic Analysis Platform (USENIX LISA 2015)SF-TAP: Scalable and Flexible Traffic Analysis Platform (USENIX LISA 2015)
SF-TAP: Scalable and Flexible Traffic Analysis Platform (USENIX LISA 2015)Yuuki Takano
 
Install FD.IO VPP On Intel(r) Architecture & Test with Trex*
Install FD.IO VPP On Intel(r) Architecture & Test with Trex*Install FD.IO VPP On Intel(r) Architecture & Test with Trex*
Install FD.IO VPP On Intel(r) Architecture & Test with Trex*Michelle Holley
 
LEGaTO: Software Stack Runtimes
LEGaTO: Software Stack RuntimesLEGaTO: Software Stack Runtimes
LEGaTO: Software Stack RuntimesLEGATO project
 
FD.IO Vector Packet Processing
FD.IO Vector Packet ProcessingFD.IO Vector Packet Processing
FD.IO Vector Packet ProcessingKernel TLV
 
Generic network architecture discussion
Generic network architecture discussionGeneric network architecture discussion
Generic network architecture discussionARCFIRE ICT
 
Named Data Networking Operational Aspects - IoT as a Use-case
Named Data Networking Operational Aspects - IoT as a Use-caseNamed Data Networking Operational Aspects - IoT as a Use-case
Named Data Networking Operational Aspects - IoT as a Use-caseRute C. Sofia
 
Pristine rina-sdk-icc-2016
Pristine rina-sdk-icc-2016Pristine rina-sdk-icc-2016
Pristine rina-sdk-icc-2016ICT PRISTINE
 
PFQ@ 9th Italian Networking Workshop (Courmayeur)
PFQ@ 9th Italian Networking Workshop (Courmayeur)PFQ@ 9th Italian Networking Workshop (Courmayeur)
PFQ@ 9th Italian Networking Workshop (Courmayeur)Nicola Bonelli
 
Disaggregating Ceph using NVMeoF
Disaggregating Ceph using NVMeoFDisaggregating Ceph using NVMeoF
Disaggregating Ceph using NVMeoFZoltan Arnold Nagy
 
SRv6 On-Path Delay Measurement with Anomaly Detection OPSAWG WG
SRv6 On-Path Delay Measurement with Anomaly Detection OPSAWG WGSRv6 On-Path Delay Measurement with Anomaly Detection OPSAWG WG
SRv6 On-Path Delay Measurement with Anomaly Detection OPSAWG WGThomasGraf42
 

Similar to Recent advance in netmap/VALE(mSwitch) (20)

An FPGA for high end Open Networking
An FPGA for high end Open NetworkingAn FPGA for high end Open Networking
An FPGA for high end Open Networking
 
mTCP使ってみた
mTCP使ってみたmTCP使ってみた
mTCP使ってみた
 
PLNOG16: Obsługa 100M pps na platformie PC , Przemysław Frasunek, Paweł Mała...
PLNOG16: Obsługa 100M pps na platformie PC, Przemysław Frasunek, Paweł Mała...PLNOG16: Obsługa 100M pps na platformie PC, Przemysław Frasunek, Paweł Mała...
PLNOG16: Obsługa 100M pps na platformie PC , Przemysław Frasunek, Paweł Mała...
 
Introduction to tcp ip linux networking
Introduction to tcp ip   linux networkingIntroduction to tcp ip   linux networking
Introduction to tcp ip linux networking
 
Making our networking stack truly extensible
Making our networking stack truly extensible Making our networking stack truly extensible
Making our networking stack truly extensible
 
Network Programming: Data Plane Development Kit (DPDK)
Network Programming: Data Plane Development Kit (DPDK)Network Programming: Data Plane Development Kit (DPDK)
Network Programming: Data Plane Development Kit (DPDK)
 
Internet Protocol Version 4
Internet Protocol Version 4Internet Protocol Version 4
Internet Protocol Version 4
 
Ocpeu14
Ocpeu14Ocpeu14
Ocpeu14
 
NUMA-aware thread-parallel breadth-first search for Graph500 and Green Graph5...
NUMA-aware thread-parallel breadth-first search for Graph500 and Green Graph5...NUMA-aware thread-parallel breadth-first search for Graph500 and Green Graph5...
NUMA-aware thread-parallel breadth-first search for Graph500 and Green Graph5...
 
SF-TAP: Scalable and Flexible Traffic Analysis Platform (USENIX LISA 2015)
SF-TAP: Scalable and Flexible Traffic Analysis Platform (USENIX LISA 2015)SF-TAP: Scalable and Flexible Traffic Analysis Platform (USENIX LISA 2015)
SF-TAP: Scalable and Flexible Traffic Analysis Platform (USENIX LISA 2015)
 
Install FD.IO VPP On Intel(r) Architecture & Test with Trex*
Install FD.IO VPP On Intel(r) Architecture & Test with Trex*Install FD.IO VPP On Intel(r) Architecture & Test with Trex*
Install FD.IO VPP On Intel(r) Architecture & Test with Trex*
 
LEGaTO: Software Stack Runtimes
LEGaTO: Software Stack RuntimesLEGaTO: Software Stack Runtimes
LEGaTO: Software Stack Runtimes
 
FD.IO Vector Packet Processing
FD.IO Vector Packet ProcessingFD.IO Vector Packet Processing
FD.IO Vector Packet Processing
 
Generic network architecture discussion
Generic network architecture discussionGeneric network architecture discussion
Generic network architecture discussion
 
Named Data Networking Operational Aspects - IoT as a Use-case
Named Data Networking Operational Aspects - IoT as a Use-caseNamed Data Networking Operational Aspects - IoT as a Use-case
Named Data Networking Operational Aspects - IoT as a Use-case
 
Pristine rina-sdk-icc-2016
Pristine rina-sdk-icc-2016Pristine rina-sdk-icc-2016
Pristine rina-sdk-icc-2016
 
PFQ@ 9th Italian Networking Workshop (Courmayeur)
PFQ@ 9th Italian Networking Workshop (Courmayeur)PFQ@ 9th Italian Networking Workshop (Courmayeur)
PFQ@ 9th Italian Networking Workshop (Courmayeur)
 
Disaggregating Ceph using NVMeoF
Disaggregating Ceph using NVMeoFDisaggregating Ceph using NVMeoF
Disaggregating Ceph using NVMeoF
 
Tos tutorial
Tos tutorialTos tutorial
Tos tutorial
 
SRv6 On-Path Delay Measurement with Anomaly Detection OPSAWG WG
SRv6 On-Path Delay Measurement with Anomaly Detection OPSAWG WGSRv6 On-Path Delay Measurement with Anomaly Detection OPSAWG WG
SRv6 On-Path Delay Measurement with Anomaly Detection OPSAWG WG
 

Recently uploaded

CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 

Recently uploaded (20)

CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 

Recent advance in netmap/VALE(mSwitch)

  • 1. Recent advance in netmap/ VALE(mSwitch) Michio Honda, Felipe Huici (NEC Europe Ltd.) Giuseppe Lettieri and Luigi Rizzo (Universita di Pisa) Kernel/VM@Jimbo-cho, Japan on Dec. 8 2013 michio.honda@neclab.eu / @michioh
  • 2. Outline •  netmap API basics –  Architecture –  How to write apps •  VALE (mSwitch) –  Architecture –  System design –  Evaluation –  Use cases
  • 4. netmap overview •  A fast packet I/O mechanism between the NIC and the user-space –  Remove unnecessary metadata (e.g., sk_buff) allocation –  Amortized systemcall costs, reduced/removed data copies Page  4   From  h,p://info.iet.unipi.it/~luigi/netmap/
  • 5. Performance •  Saturate 10 Gbps pipe with low CPU frequency From  h,p://info.iet.unipi.it/~luigi/netmap/
  • 6. netmap API (initialization) •  open(“/dev/netmap”) returns a file descriptor •  ioctl(fd, NIOCREG, arg) puts an interface in netmap mode •  mmap(…, fd, 0) maps buffers and rings Page  6   From  h,p://info.iet.unipi.it/~luigi/netmap/
  • 7. netmap API (TX) •  TX –  Fill up to avail buffers, starting from slot cur –  ioctl(fd, NIOCTXSYNC) queues the packets •  poll() can be used for blocking I/O Page  7   From  h,p://info.iet.unipi.it/~luigi/netmap/
  • 8. netmap API (RX) •  RX –  ioctl(fd, NIOCTXSYNC) reports newly received packets –  Process up to avail buffers, starting from slot cur •  poll() can be used for blocking I/O Page  8   From  h,p://info.iet.unipi.it/~luigi/netmap/
  • 9. Other features •  Multi queue support –  One netmap ring is initialized for a single physical ring •  e.g., different pthreads can be assigned to different netmap/ physical rings •  Host stack support –  NIC is put into netmap mode, resetting its phy –  The host stack still sees the interface, and packets can be sent to/from the NIC via “software rings” •  Either implicitly by the kernel or explicitly by the app From  h,p://info.iet.unipi.it/~luigi/netmap/
  • 10. Implementation •  Available for FreeBSD and Linux –  Linux code is glued from FreeBSD one •  Common code –  control, systemcall backends, memory allocator etc •  Device-specific code –  Each of the supported drivers implements some functions •  nm_register(struct ifnet *ifp, int onoff) –  Put the NIC into netmap mode, allocate netmap rings and slots •  nm_txsync(struct ifnet *ifp, u_int ring_nr) –  Flush out the packets from the netmap ring filled by the user •  nm_rxsync(struct ifnet *ifp, u_int ring_nr) –  refil the netmap ring with the receiving packets
  • 11. Implementation (cont.) •  Small modifications in device drivers From  h,p://info.iet.unipi.it/~luigi/netmap/
  • 12. VALE (MSWITCH): A NETMAPBASED, VERY FAST SOFTWARE SWITCH
  • 13. Software switch •  Switching packets between network interfaces –  General-purpose OS and processor –  Virtual and hardware interfaces •  20 years ago –  Prototyping, low-performance alternative •  Now and near future –  Replacement of hardware switch –  Hosting virtual machines (incl. virtual network functions) Page  13   Software switch
  • 14. Performance of today’s software switch •  Forward packets from a 10Gbps NIC to another one Throughput (Gbps) –  Xeon E5-1650 (3.8Ghz) (1 CPU core is used) –  Lower than 1 Gbps for the minimum-sized packets FreeBSD bridge 10 5 2 1 64 Page  14   Openvswitch 128 256 512 Packet size (Bytes) 1024
  • 15. Problems 1.  Inefficient packet I/O mechanism –  Today’s software switches use a dynamically-allocated, big metadata (e.g., sk_buff) designed for end systems –  it should be simplified, because the packet is just forwarded –  For switches it is more important to process small packets efficiently 2.  Inefficient packet switching algorithm –  How to move packets from the source to the destination(s) efficiently? –  Traditional way Page  15   1 2 3 4 •  Lock a destination, send a single packet, then unlock the destination •  Inefficient due to locking cost/contention
  • 16. Problems (cont.) 3.  Lack of flexibility in packet processing –  How to decide packet’s destination? –  One could use layer 2 learning bridge to decide packet’s destination –  One could use OpenFlow packet matching to do so packet processing Page  16  
  • 17. Solutions 1.  “Inefficient packet I/O mechanisms” –  Simple, minimalistic packet representation (netmap API*) •  No metadata allocation cost •  Reduced cache pollution 2.  “Inefficient packet switching” 1 –  Group multiple packets going to the same destination 3 –  Lock the destination only once for a group of packets Page  17   2 4 Netmap  –  a  novel  frameworl  for  fast   packet  I/O   h,p://info.iet.unipi.it/~luigi/netmap/   Luigi  Rizzo   Università  di  Pisa  
  • 18. Bitmap-based forwarding algorithm •  Algorithm in the original VALE •  Support for unicast, multicast and broadcast –  Get pointers to a batch of packets pkt id dst –  Identify the destination of each p0 0010 packet and represent as a bitmap p1 0001 –  Lock each destination, and send all p2 0010 p3 1111 the packets going there •  Problem –  Scalability issue in the p4 0010 0010 0001 0010 1111 0010 Figure 3. Bitmap-based packet forwarding algorithm: packets are labeled destinations presence of many from p0 to p4; for each packet, destination(s) are identified and represented in a bitmap (a bit for each posVALE,  a  V forwarder considers sible destination port). Theirtual  Local  Ethernet   each destih,p://info.iet.unipi.it/~luigi/vale/   nation port in turn, scanning the corresponding column of Luigi  Rizzo,  Giuseppe  LeReri   the bitmap to identify the packets bound to the current destiUniversità  di  Pisa   nation port.  
  • 19. p1 p2 p3 p4 0001 0010 1111 0010 0010 1111 0010 List-based forwarding algorithm Figure 3. Bitmap-based packet forwarding algorithm: packets are labeled from p0 to p4; for each packet, destination(s) are identified and represented in a bitmap (a bit for each possible destination port). The forwarder considers each destination port in turn, scanning the corresponding column of the bitmap to identify the packets bound to the current destination port. •  Algorithm in the current VALE (mSwitch) •  Support for unicast and broadcast –  Make a linked-list for each destination –  Broadcast packets are pkt id dst next p0 d1 null mapped into destination p1 d0 null p2 index 254 p3 –  Scan each destination, p4 and broadcast packets are inserted in-order pkt id dst next p0 d1 p2 p1 d0 null p2 d1 p4 p3 d254 null p4 d1 null dst d0 d1 d2 head p1 p0 tail p1 p0 d254 ... dst d0 d1 d2 head p1 p0 tail p1 p4 d254 ... p3 p3 Figure 4. List-based packet forwarding: packets are labeled from p0 to p4, destination port indices are labeled from d1 in the in the well f To cast o This l tinatio forwa curren time 9 sho 4.3 Once identi and m menta the du of pac Ins two p large leases tion o vance order many page for th queue tolera
  • 20. Solutions (cont.) 3.  “Lack of flexibility in packet processing” –  Separate a switching fabric and packet processing –  Switching fabric Packet processing •  Move packets quickly Switching fabric –  Packet processing •  Decide packets’ destination and tell the switching fabric typedef  u_int  (*BDG_LOOKUP_T)(char  *buf,  u_int  len,  uint8_t  *ring_nr,                                  struct  netmap_adapter  *srcif); –  Return •  The index of the destination port for unicast •  NM_BDG_BROADCAST for broadcast •  NM_BDG_NOPORT for dropping this packet –  By default L2 learning is set Page  20  
  • 21. VALE (mSwitch) architecture ... Netmap API Virtual interfaces Netmap API app1/vm1 . . . appN/vmN apps User Socket API Kernel OS stack Packet processing Switching fabric NIC •  Packet forwarding (identifying packets’ destination (packet processing) and copying packets to the destination ring) takes place in the sender’s context –  The receiver just consumes the packets Page  21  
  • 22. The other features •  Indirect buffer support –  netmap slot can contain a pointer to the actual buffer –  Useful to eliminate data copy from VM’s backend to a netmap slot •  Support for a large packet –  Multiple netmap slots (by default 2048 byte each) can be used to contain a single packet
  • 23. Bare mSwitch performance 8 6 4 2 64 128 256 512 1518 Packet size (Bytes) Dummy 20 64B 128B 256B 15 10 5 1.3 1.9 2.6 3.2 3.8 CPU Clock Frequency (Ghz) Experiments  are  done  with  Xeon  E5-­‐1650  CPU  (6  core,  3.8  Ghz  with  Turboboost),  16GB  DDR3   (a) NIC to NIC Page  23   RAM  (quad  channel)  and  Intel  X520-­‐T2  10Gbps  NICs   Throughput (Gbps) 10 10Gbps line rate Throughput (Gbps) Throughput (Gbps) •  NIC to NIC (10Gbps) •  “Dummy” processing module 20 64 15 10 5 1.3 CPU C (b Figure 5. Throughput between 10 Gb/s NICs and vir
  • 24. port, and 20 Gb/s ones with two pairs). In addition, we assign one CPU core per port pair. The results in figure 5(b) are similar to those in the NIC-to-NIC case, with line rate values for all packet sizes at 3.8 GHz. The graph also shows that mSwitch scales well with the number of ports and CPU cores: we achieve line rate for two 10 Gb/s ports for all •  Virtual port Finally, figure port packet sizes. to virtual 5(c) presents throughput numbers in the opposite direction, from virtual Dummy •  “Dummy” processing module ports to NICs, with similar results. of baseline pery plug-in mod1 CPU core figured to mark 200 2 CPU cores here, the packet 3 CPU cores mediately (thus 150 s for packets to then the switch different packet orts. We further Turbo Boost to ets us shed light ts of the NIC to one to the other h this setup, we Page   ies for 256-byte 24   ones starting at Bare mSwitch performance Throughput (Gbps) gen, a fast genbe plugged into e Gb/s to mean ackets per sectch size of 1024 100 50 25 60 508 1514 8K 64K Packet size (Bytes) Figure 6. Forwarding performance between two virtual
  • 25. Bare mSwitch performance 250 1514B packets 64KB packets 200 150 100 50 0 broadcast (a) Experiment topologies. Throughput (Gbps) unicast Throughput (Gbps) •  Dummy packet processing module” •  N virtual ports to N virtual ports •  “ 64B packets 250 200 64B packets 1514B packets 64KB packets 150 100 50 0 2 4 6 # of ports (b) Unicast throughput. 8 2 4 6 # of ports 8 (c) Broadcast throughput. Figure 7. Switching capacity with an increasing number of virtual ports. For unicast, each src/dst port pair is assigned a single CPU core, for broadcast each port is given a core. For setups with more than 6 ports (our system has 6 cores) we assign cores in a round-robin fashion. Page  25  
  • 26. adcast # of ports ent topologies. # of ports (b) Unicast throughput. (c) Broadcast throughput. g capacity with an increasing number of virtual ports. For unicast, each src/dst port pair is assigned a single cast each port is given a core. For setups with more than 6 ports (our system has 6 cores) we assign cores hion. mSwitch’s Scalability –  Bitmap- vs List-based algorithm –  List-based algorithm scales very well 2 3 4 # of destination ports 5 forwarding throughput from a single iple destinations using minimum-sized compares mSwitch’s forwarding algo’s (bitmap). Page  26   Aggregate throughput (Gbps) •  A single virtual port to 14 List Algo. Bitmapmany virtual ports Algo. 12 List Algo. Bitmap Algo. 10 8 6 4 2 0 1 20 40 60 100 150 200 # of destination ports 250 Figure 9. Comparison ofWe  use  the  minimum-­‐sized   mSwitch’s forwarding algorithm packets  w the single  CPU  c a large (list) to that of VALE (bitmap) in ith  a  presence of ore   number of active destination ports (single sender, minimum-
  • 27. Learning bridge performance 8 6 4 2 Page  27   64 128 256 512 1024 Packet size (Bytes) (a) Layer 2 learning bridge. Learning bridge 8 6 4 2 64 128 256 512 1024 Packet size (Bytes) (b) 3-tuple filter (user-space nw stack support). Throughput (Gbps) –  Adding a cost of MAC address hashing at packet processing FreeBSD bridge mSwitch-3-tuple mSwitch-learn 10 10 Throughput (Gbps) Throughput (Gbps) •  mSwitch-learn: pure learning bridge processing 10 8 6 4 2 6
  • 28. OpenVswitch acceleration •  mSwitch-OVS: mSwitch with Openswitch’s packet processing 512 1024 Bytes) tack support). Throughput (Gbps) ch-3-tuple 10 Page  28   OVS mSwitch-OVS 8 6 4 2 64 128 256 512 1024 Packet size (Bytes) (c) Open vSwitch. Openvswitch
  • 29. Conclusion •  Our contribution –  VALE(mSwitch): fast, modular software switch –  Very fast packet forwarding at bare metal •  200 Gbps between virtual ports (with 1500 Byte packets and 3 CPU cores) •  Almost the line rate with using 1 CPU core and 2 10 Gbps NICs –  Useful to implement various systems •  Very fast learning bridge •  Accelerate OpenVswitch up to 2.6 times –  Small modifications, preserving control interface •  Fast protocol multiplexer/demultiplexer for user-space protocol stacks •  Code (Linux, FreeBSD) is available at: –  http://info.iet.unipi.it/~luigi/netmap/ Page  29