SlideShare a Scribd company logo
Network Stack in 
Userspace (NUSE) 
! 
Hajime Tazaki 
Ryo Nakamura 
(University of Tokyo) 
! 
New Directions in Operating Systems 
London, 2014
Motivation 
Implementation of the Internet 
is not finished yet 
! 
! 
Faster evolution of OSes (network 
stack) 
OS personalization 
2
I have a new Layer-3/4 
protocol! Yay! 
I have new, great Layer-3/4 protocol ! It 
will change the WORLD ! 
Replace network stack ? 
No: destroy my life ?! 
(experimental ? not tested ?) 
Yes: I wanna be your slave. 
Slow evolution of network stack ? 
VM on personal device ? 
3
Virtual Machine ? 
Poll: “When you download and run software, how often do you use a virtual machine (to reduce 
security risks)?” 
Jon Howell, Galen Hunt, David Molnar, and Donald E. Porter, Living Dangerously: A Survey of Software Download 
Practices, no. MSR-TR-2010-51, May 2010 
4
costin.raiciu@cs.pub.ro, j.araujo@ucl.ac.uk, rizzo@iet.unipi.it 
Internet paths 
that it is still 
despite the 
the blame 
extensions taking 
placed on end 
moving protocols 
deployment 
optimizations. 
support for user-level 
commodity 
number of 
host stack, 
s. 
our mux/de-mux 
line rate (up 
Slow evolution of network stack 
Honda et al., Rekindling Network Protocol Innovation with User-Level Stacks, ACM 
SIGCOMM CCR, Vol.44, Num. 2, April 2014 
cores, and 
over a basic 
same server 
1.00 
0.75 
0.50 
0.25 
0.00 
2007 2008 2009 2010 2011 2012 
Date 
Ratio of flows 
Option 
SACK 
Timestamp 
Windowscale 
Direction 
Inbound 
Outbound 
Figure 1: TCP options deployment over time. 
pen infrequently not only because of slow release cycles, but 
also due to their cost and potential disruption to existing 
setups. If protocol stacks were embedded into applications, 
they could be updated on a case-by-case basis, and deploy-ment 
would be a lot more timely. 
For example, Mac OS, Windows XP and FreeBSD still 
use a traditional Additive Increase Multiplicative Decrease 
(AIMD) algorithm for TCP congestion control, while Linux
Meanwhile in 
Filesystem world.. 
There is, 
Filesystem in Userspace 
(FUSE) 
Userspace code can host 
new filesystem (sshfs, 
GmailFS, etc) 
Performance is bad, 
but doesn’t matter 
Flexibility and 
functionality do matter 
6 
http://fuse.sourceforge.net/
Alternatives 
Container (LXC, OpenVZ, vimage) 
share kernel with host operating system (no 
flexibility) 
Library OS 
full scratch: mtcp, Mirage, lwIP 
Porting: OSv, Sandstorm, libuinet (FreeBSD), 
Arrakis (lwIP), OpenOnload (lwIP?) 
Glue-layer: LKL (Linux-2.6), rumpkernel (NetBSD) 
7
Network Stack 
in Userspace
What’s NUSE ? 
Network stack in Userspace 
A library operating system 
Library version of network 
stack (of monolithic kernel) 
Linux (latest), FreeBSD (plan) 
(UNIX) Process-based 
virtualization 
9 
nuse example 
kernel bypassed 
TCP/IP 
ARP/ 
ndisc 
libnuse 
glibc 
NIC 
userspace 
kernel 
raw sock 
netmap 
DPDK (etc)
Why NUSE ? 
minimized porting effort 
Linux (net-next) changes frequently 
! 
full functional network stack for 
netmap 
DPDK 
(any kernel-bypass technology) 
10
How it works 
Application 
POSIX glue 
TCP UDP DCCP SCTP 
ICMP ARP 
IPv6 IPv4 
Qdisc 
Netfilter Bridging 
Netlink 
IPSec Tunneling 
Kernel layer 
NUSE core 
bottom halves/ 
rcu/timer/ 
interrupt 
struct 
net_device 
RAW DPDK netmap ... 
NIC 
petit-scheduler 
1. (monolithic) kernel 
source 
2. scheduler 
3. POSIX glue 
redirect system calls 
4. network I/O 
raw socket, DPDK, 
netmap, etc.. 
11
1) kernel build 
Application 
POSIX glue 
TCP UDP DCCP SCTP 
ICMP ARP 
IPv6 IPv4 
Qdisc 
Netfilter Bridging 
Netlink 
IPSec Tunneling 
Kernel layer 
NUSE core 
bottom halves/ 
rcu/timer/ 
interrupt 
struct 
net_device 
RAW DPDK netmap ... 
NIC 
petit-scheduler 
patch to kernel tree 
with new (hw independent) 
arch (arch/sim) 
robust to (frequent) 
mainstream changes 
12
2) scheduler 
Application 
POSIX glue 
TCP UDP DCCP SCTP 
ICMP ARP 
IPv6 IPv4 
Qdisc 
Netfilter Bridging 
Netlink 
IPSec Tunneling 
Kernel layer 
NUSE core 
bottom halves/ 
rcu/timer/ 
interrupt 
struct 
net_device 
RAW DPDK netmap ... 
NIC 
petit-scheduler 
offer alternate 
context primitives 
interrupts, timer, 
thread, bottom halves 
(tasklet, workqueue, 
waiter, etc) 
wrap with POSIX thread 
easily debuggable 
ucontext fiber for low 
overhead (not yet) 
13
3) POSIX glue code 
Application 
POSIX glue 
TCP UDP DCCP SCTP 
ICMP ARP 
IPv6 IPv4 
Qdisc 
Netfilter Bridging 
Netlink 
IPSec Tunneling 
Kernel layer 
NUSE core 
bottom halves/ 
rcu/timer/ 
interrupt 
struct 
net_device 
RAW DPDK netmap ... 
NIC 
petit-scheduler 
Hijack function calls 
socket => nuse_socket 
read => nuse_read 
apps not aware of 
LD_PRELOAD=libnuse.so .. 
14
4) network I/O 
Application 
POSIX glue 
TCP UDP DCCP SCTP 
ICMP ARP 
IPv6 IPv4 
Qdisc 
Netfilter Bridging 
Netlink 
IPSec Tunneling 
Kernel layer 
NUSE core 
bottom halves/ 
rcu/timer/ 
interrupt 
struct 
net_device 
RAW DPDK netmap ... 
NIC 
petit-scheduler 
connect NUSE to NIC 
options 
raw socket (default) 
DPDK (if available) 
netmap (if available) 
Tap 
15
Usage 
download 
git clone git://github.com/libos-nuse/net-next- 
nuse 
compile 
make library ARCH=sim (NETMAP=yes) (DPDK=yes) 
execute 
sudo NUSECONF=nuse.conf ./nuse (application) 
16
configs 
17 
# Interface definition.! 
interface eth0! 
address 192.168.0.10! 
netmask 255.255.255.0! 
macaddr 00:01:01:01:01:01! 
viftype TAP! 
! 
interface p1p1! 
address 172.16.0.1! 
netmask 255.255.255.0! 
macaddr 00:01:01:01:01:02! 
! 
# route entry definition.! 
route! 
network 0.0.0.0! 
netmask 0.0.0.0! 
gateway 192.168.0.1
(possible) use cases 
New protocol deployment 
Chrome + Linux mptcp (on NUSE) 
Process-level virtual instance 
% NUSE-linux-ovs | NUSE-freebsd-NAT | 
NUSE-router | NUSE-nginx! 
VM chaining via UNIX command line 
18
Limitation (ongoings) 
no fork(2)/exec(2) support 
no multi-processes 
no sysctl/proc 
(inefficient) thread scheduling 
19
Experiments 
1. Can we benefit with OS personalization? 
present a custom (NUSE) kernel with an 
application (OS personalization) 
2. How much overhead does NUSE add? 
Simple performance measurements 
20
Tested applications 
working 
ping, iperf, nginx (partially), sleep, 
need patches 
nc, wget, dig, host 
21
Setup: Performance 
measurement 
NUSE node Tx/Rx nodes 
CPU Xeon E5-2650v2 @ 
ping! 
flowgen 
10G 10G 
22 
2.60GHz 
(16 core) 
Xeon L3426 @ 1.87GHz 
(8 core) 
Memory 32GB 4GB 
NIC Intel X520 Intel X520 
ping! 
flowgen 
vnstat! 
(packet count) 
Tx NUSE Rx
Host Tx 
(NUSE->Receiver) 
NUSE Rx 
23 
avg max min 
dpdk! 2.610 8.000 0.156 
netmap 0.370 0.494 0.252 
raw 0.396 0.501 0.290 
tap 0.397 0.538 0.303 
500 
450 
400 
350 
300 
250 
200 
150 
100 
50 
0 
dpdk netmap raw tap 
Throughput (Mbps) 
ping (RTT) throughput 
(1024byte,UDP) 
8 
7 
6 
5 
4 
3 
2 
1 
0 
dpdk netmap raw tap 
RTT (ms)
L3 Routing 
Sender->NUSE->Receiver 
Tx NUSE Rx 
24 
avg max min 
dpdk! 11.998 27.700 0.252 
netmap 0.664 0.741 0.556 
raw 0.663 0.761 0.575 
tap 0.694 0.749 0.602 
ping (RTT) 
500 
450 
400 
350 
300 
250 
200 
150 
100 
50 
0 
netmap raw tap 
Throughput (Mbps) 
throughput 
(1024byte,UDP) 
30 
25 
20 
15 
10 
5 
0 
dpdk netmap raw tap 
RTT (ms)
Discussions 
not so bad performance 
we don’t care much about performance 
network stack is full functional 
but supplemental tools are not sufficient 
25
Network Simulator 
Integration (ns-3) 
network stack +ns-3 network simulator 
! 
Direct Code Execution (DCE) 
Established by Mathieu Lacage (2006) 
part of ns-3 project 
! 
Features 
reproducible (deterministic clock) 
controllable (simulator’s facility) 
http://www.nsnam.org/overview/projects/direct-code-execution/ 
26
27
NUSE vs DCE 
NUSE DCE 
shared 
kernel library ARCH=sim ARCH=sim 
scheduler (host) pthread simulator’s scheduler! 
28 
(deterministic) 
POSIX hijack hijack 
network I/O raw/netmap/DPDK/tap ns3:NetDevice 
execution LD_PRELOAD 
dlmopen(3)! 
single proc/multi-instances
DCE Architecture 
Application 
(ip, iptables, quagga) 
POSIX layer 
TCP UDP DCCP SCTP 
ICMP ARP 
IPv6 IPv4 
Netfilter Bridging 
struct net_device 
29 
Qdisc 
Netlink 
IPSec Tunneling 
bottom halves/rcu/ 
timer/interrupt 
Kernel layer 
Heap Stack 
memory 
Virtualization Core 
layer 
ns-3 (network simulation core) 
DCE 
ns-3 
applicati 
on 
ns-3 
TCP/IP 
stack 
3) POSIX! 
Layer 
1) Core! 
Layer 
2) Kernel! 
Layer
Bug reproducibility 
Home Agent 
AP1 AP2 
30 
Wi-Fi Wi-Fi 
handoff 
ping6 
correspondent 
node 
mobile node 
(gdb) b mip6_mh_filter if dce_debug_nodeid()==0 
Breakpoint 1 at 0x7ffff287c569: file net/ipv6/mip6.c, line 88. 
<continue> 
(gdb) bt 4 
#0 mip6_mh_filter 
(sk=0x7ffff7f69e10, skb=0x7ffff7cde8b0) 
at net/ipv6/mip6.c:109 
#1 0x00007ffff2831418 in ipv6_raw_deliver 
(skb=0x7ffff7cde8b0, nexthdr=135) 
at net/ipv6/raw.c:199 
#2 0x00007ffff2831697 in raw6_local_deliver 
(skb=0x7ffff7cde8b0, nexthdr=135) 
at net/ipv6/raw.c:232 
#3 0x00007ffff27e6068 in ip6_input_finish 
(skb=0x7ffff7cde8b0) 
at net/ipv6/ip6_input.c:197
Debugging 
==5864== Memcheck, a memory error detector 
==5864== Copyright (C) 2002-2009, and GNU GPL'd, by Julian Seward et al. 
==5864== Using Valgrind-3.6.0.SVN and LibVEX; rerun with -h for copyright info 
==5864== Command: ../build/bin/ns3test-dce-vdl --verbose 
==5864== 
==5864== Conditional jump or move depends on uninitialised value(s) 
==5864== at 0x7D5AE32: tcp_parse_options (tcp_input.c:3782) 
==5864== by 0x7D65DCB: tcp_check_req (tcp_minisocks.c:532) 
==5864== by 0x7D63B09: tcp_v4_hnd_req (tcp_ipv4.c:1496) 
==5864== by 0x7D63CB4: tcp_v4_do_rcv (tcp_ipv4.c:1576) 
==5864== by 0x7D6439C: tcp_v4_rcv (tcp_ipv4.c:1696) 
==5864== by 0x7D447CC: ip_local_deliver_finish (ip_input.c:226) 
==5864== by 0x7D442E4: ip_rcv_finish (dst.h:318) 
==5864== by 0x7D2313F: process_backlog (dev.c:3368) 
==5864== by 0x7D23455: net_rx_action (dev.c:3526) 
==5864== by 0x7CF2477: do_softirq (softirq.c:65) 
==5864== by 0x7CF2544: softirq_task_function (softirq.c:21) 
==5864== by 0x4FA2BE1: ns3::TaskManager::Trampoline(void*) (task-manager.==5864== Uninitialised value was created by a stack allocation 
==5864== at 0x7D65B30: tcp_check_req (tcp_minisocks.c:522) 
==5864== 
Memory error detection 
among distributed nodes 
in a single process 
using Valgrind 
! 
! 
31
Fine-grained parameter coverage 
Code coverage measurement with DCE 
With fine-grained network, node, protocol parameters 
32
Continuous integration 
http://ns-3-dce.cloud.wide.ad.jp/jenkins/job/daily-net-next-sim/ 
33
Summary 
NUSE (Network Stack in Userspace) 
OS personalization (fast evolution, easy 
deployment) 
DCE (Direct Code Execution) 
Flexible network experiment/test with 
deterministic clock 
34
github: https://github.com/libos-nuse/net-next- 
nuse 
DCE: http://bit.ly/ns-3-dce 
twitter: @thehajime 
35
Backups
Potentials of Userspace 
Networking 
High-performance networking 
Useful debugging facilities 
Operating system personalization 
37
1) kernel build 
build kernel source tree w/ the patch 
make menuconfig ARCH=sim 
make library ARCH=sim 
➔ libnuse-linux-3.17-rc1.so 
38
Example: How timer 
works 
39 
add_timer() 
TIMER_SOFTIRQ 
timer_list 
run_timer_softirq () 
timer handler 
timer thread 
(timer_create (2))
3) POSIX glue code 
extern int sim_sock_socket (int,int,int, struct socket **); 
int socket (int family, int type, int proto) 
{ 
sim_update_jiffies (); 
struct socket *kernel_socket = 
sim_malloc (sizeof (struct socket)); 
memset (kernel_socket, 0, sizeof (struct socket)); 
int ret = sim_sock_socket (family, type, proto, &kernel_socket); 
g_fd_table[curfd++] = kernel_socket; 
sim_softirq_wakeup (); 
return curfd - 1; 
} 
https://github.com/libos-nuse/net-next-nuse/blob/nuse/arch/sim/nuse-glue.c 
40
Tx callgraph 
sendmsg () (socket API) 
sim_sock_sendmsg () (NUSE) 
sock_sendmsg () 
ip_send_skb () 
ip_finish_output2 () 
dst_neigh_output () (ex-kernel) 
neigh_resolve_output () 
arp_solicit () 
dev_queue_xmit () 
sim_dev_xmit () (NUSE) 
nuse_vif_raw_write () 
41
Rx callgraph 
start_thread () (pthread) 
nuse_netdev_rx_trampoline () 
nuse_vif_raw_read () (NUSE) 
sim_dev_rx () 
netif_rx () (ex-kernel) 
start_thread () (pthread) 
do_softirq () (NUSE) 
net_rx_action () 
process_backlog () (ex-kernel) 
__netif_receive_skb_core () 
ip_rcv () 
42 
vNIC! 
rx 
softirq! 
rx

More Related Content

What's hot

Linux-Internals-and-Networking
Linux-Internals-and-NetworkingLinux-Internals-and-Networking
Linux-Internals-and-Networking
Emertxe Information Technologies Pvt Ltd
 
BPF & Cilium - Turning Linux into a Microservices-aware Operating System
BPF  & Cilium - Turning Linux into a Microservices-aware Operating SystemBPF  & Cilium - Turning Linux into a Microservices-aware Operating System
BPF & Cilium - Turning Linux into a Microservices-aware Operating System
Thomas Graf
 
Linux Kernel and Driver Development Training
Linux Kernel and Driver Development TrainingLinux Kernel and Driver Development Training
Linux Kernel and Driver Development Training
Stephan Cadene
 
[ko] Kernel Networking Stack 진입 장벽 허물기
[ko] Kernel Networking Stack 진입 장벽 허물기[ko] Kernel Networking Stack 진입 장벽 허물기
[ko] Kernel Networking Stack 진입 장벽 허물기
Juhee Kang
 
Building Network Functions with eBPF & BCC
Building Network Functions with eBPF & BCCBuilding Network Functions with eBPF & BCC
Building Network Functions with eBPF & BCC
Kernel TLV
 
UM2019 Extended BPF: A New Type of Software
UM2019 Extended BPF: A New Type of SoftwareUM2019 Extended BPF: A New Type of Software
UM2019 Extended BPF: A New Type of Software
Brendan Gregg
 
eBPF maps 101
eBPF maps 101eBPF maps 101
eBPF maps 101
SUSE Labs Taipei
 
eBPF Trace from Kernel to Userspace
eBPF Trace from Kernel to UserspaceeBPF Trace from Kernel to Userspace
eBPF Trace from Kernel to Userspace
SUSE Labs Taipei
 
OpenWRT guide and memo
OpenWRT guide and memoOpenWRT guide and memo
OpenWRT guide and memo
家榮 吳
 
Anatomy of the loadable kernel module (lkm)
Anatomy of the loadable kernel module (lkm)Anatomy of the loadable kernel module (lkm)
Anatomy of the loadable kernel module (lkm)
Adrian Huang
 
VyOS Users Meeting #2, VyOSのVXLANの話
VyOS Users Meeting #2, VyOSのVXLANの話VyOS Users Meeting #2, VyOSのVXLANの話
VyOS Users Meeting #2, VyOSのVXLANの話
upaa
 
Dpdk pmd
Dpdk pmdDpdk pmd
Dpdk pmd
Masaru Oki
 
Introduction to yocto
Introduction to yoctoIntroduction to yocto
Introduction to yocto
Alex Gonzalez
 
Linux kernel
Linux kernelLinux kernel
eBPF/XDP
eBPF/XDP eBPF/XDP
eBPF/XDP
Netronome
 
Understanding eBPF in a Hurry!
Understanding eBPF in a Hurry!Understanding eBPF in a Hurry!
Understanding eBPF in a Hurry!
Ray Jenkins
 
netLec5.pdf
netLec5.pdfnetLec5.pdf
netLec5.pdf
MuthuramanElangovan
 
The TCP/IP Stack in the Linux Kernel
The TCP/IP Stack in the Linux KernelThe TCP/IP Stack in the Linux Kernel
The TCP/IP Stack in the Linux Kernel
Divye Kapoor
 
DockerCon 2017 - Cilium - Network and Application Security with BPF and XDP
DockerCon 2017 - Cilium - Network and Application Security with BPF and XDPDockerCon 2017 - Cilium - Network and Application Security with BPF and XDP
DockerCon 2017 - Cilium - Network and Application Security with BPF and XDP
Thomas Graf
 
Xdp and ebpf_maps
Xdp and ebpf_mapsXdp and ebpf_maps
Xdp and ebpf_maps
lcplcp1
 

What's hot (20)

Linux-Internals-and-Networking
Linux-Internals-and-NetworkingLinux-Internals-and-Networking
Linux-Internals-and-Networking
 
BPF & Cilium - Turning Linux into a Microservices-aware Operating System
BPF  & Cilium - Turning Linux into a Microservices-aware Operating SystemBPF  & Cilium - Turning Linux into a Microservices-aware Operating System
BPF & Cilium - Turning Linux into a Microservices-aware Operating System
 
Linux Kernel and Driver Development Training
Linux Kernel and Driver Development TrainingLinux Kernel and Driver Development Training
Linux Kernel and Driver Development Training
 
[ko] Kernel Networking Stack 진입 장벽 허물기
[ko] Kernel Networking Stack 진입 장벽 허물기[ko] Kernel Networking Stack 진입 장벽 허물기
[ko] Kernel Networking Stack 진입 장벽 허물기
 
Building Network Functions with eBPF & BCC
Building Network Functions with eBPF & BCCBuilding Network Functions with eBPF & BCC
Building Network Functions with eBPF & BCC
 
UM2019 Extended BPF: A New Type of Software
UM2019 Extended BPF: A New Type of SoftwareUM2019 Extended BPF: A New Type of Software
UM2019 Extended BPF: A New Type of Software
 
eBPF maps 101
eBPF maps 101eBPF maps 101
eBPF maps 101
 
eBPF Trace from Kernel to Userspace
eBPF Trace from Kernel to UserspaceeBPF Trace from Kernel to Userspace
eBPF Trace from Kernel to Userspace
 
OpenWRT guide and memo
OpenWRT guide and memoOpenWRT guide and memo
OpenWRT guide and memo
 
Anatomy of the loadable kernel module (lkm)
Anatomy of the loadable kernel module (lkm)Anatomy of the loadable kernel module (lkm)
Anatomy of the loadable kernel module (lkm)
 
VyOS Users Meeting #2, VyOSのVXLANの話
VyOS Users Meeting #2, VyOSのVXLANの話VyOS Users Meeting #2, VyOSのVXLANの話
VyOS Users Meeting #2, VyOSのVXLANの話
 
Dpdk pmd
Dpdk pmdDpdk pmd
Dpdk pmd
 
Introduction to yocto
Introduction to yoctoIntroduction to yocto
Introduction to yocto
 
Linux kernel
Linux kernelLinux kernel
Linux kernel
 
eBPF/XDP
eBPF/XDP eBPF/XDP
eBPF/XDP
 
Understanding eBPF in a Hurry!
Understanding eBPF in a Hurry!Understanding eBPF in a Hurry!
Understanding eBPF in a Hurry!
 
netLec5.pdf
netLec5.pdfnetLec5.pdf
netLec5.pdf
 
The TCP/IP Stack in the Linux Kernel
The TCP/IP Stack in the Linux KernelThe TCP/IP Stack in the Linux Kernel
The TCP/IP Stack in the Linux Kernel
 
DockerCon 2017 - Cilium - Network and Application Security with BPF and XDP
DockerCon 2017 - Cilium - Network and Application Security with BPF and XDPDockerCon 2017 - Cilium - Network and Application Security with BPF and XDP
DockerCon 2017 - Cilium - Network and Application Security with BPF and XDP
 
Xdp and ebpf_maps
Xdp and ebpf_mapsXdp and ebpf_maps
Xdp and ebpf_maps
 

Similar to NUSE (Network Stack in Userspace) at #osio

Direct Code Execution - LinuxCon Japan 2014
Direct Code Execution - LinuxCon Japan 2014Direct Code Execution - LinuxCon Japan 2014
Direct Code Execution - LinuxCon Japan 2014Hajime Tazaki
 
Network Stack in Userspace (NUSE)
Network Stack in Userspace (NUSE)Network Stack in Userspace (NUSE)
Network Stack in Userspace (NUSE)
Hajime Tazaki
 
Dpdk applications
Dpdk applicationsDpdk applications
Dpdk applications
Vipin Varghese
 
LibOS as a regression test framework for Linux networking #netdev1.1
LibOS as a regression test framework for Linux networking #netdev1.1LibOS as a regression test framework for Linux networking #netdev1.1
LibOS as a regression test framework for Linux networking #netdev1.1
Hajime Tazaki
 
PLNOG16: Obsługa 100M pps na platformie PC , Przemysław Frasunek, Paweł Mała...
PLNOG16: Obsługa 100M pps na platformie PC, Przemysław Frasunek, Paweł Mała...PLNOG16: Obsługa 100M pps na platformie PC, Przemysław Frasunek, Paweł Mała...
PLNOG16: Obsługa 100M pps na platformie PC , Przemysław Frasunek, Paweł Mała...
PROIDEA
 
Linux Network Stack
Linux Network StackLinux Network Stack
Linux Network Stack
Adrien Mahieux
 
Network Programming: Data Plane Development Kit (DPDK)
Network Programming: Data Plane Development Kit (DPDK)Network Programming: Data Plane Development Kit (DPDK)
Network Programming: Data Plane Development Kit (DPDK)
Andriy Berestovskyy
 
FD.io - The Universal Dataplane
FD.io - The Universal DataplaneFD.io - The Universal Dataplane
FD.io - The Universal Dataplane
Open Networking Summit
 
Fast datastacks - fast and flexible nfv solution stacks leveraging fd.io
Fast datastacks - fast and flexible nfv solution stacks leveraging fd.ioFast datastacks - fast and flexible nfv solution stacks leveraging fd.io
Fast datastacks - fast and flexible nfv solution stacks leveraging fd.io
OPNFV
 
D. Fast, Simple User-Space Network Functions with Snabb (RIPE 77)
D. Fast, Simple User-Space Network Functions with Snabb (RIPE 77)D. Fast, Simple User-Space Network Functions with Snabb (RIPE 77)
D. Fast, Simple User-Space Network Functions with Snabb (RIPE 77)
Igalia
 
Playing BBR with a userspace network stack
Playing BBR with a userspace network stackPlaying BBR with a userspace network stack
Playing BBR with a userspace network stack
Hajime Tazaki
 
Skills Summary for GASteele
Skills Summary for GASteeleSkills Summary for GASteele
Skills Summary for GASteele
Greg A. Steele
 
Understanding DPDK
Understanding DPDKUnderstanding DPDK
Understanding DPDK
Denys Haryachyy
 
Mirage: ML kernels in the cloud (ML Workshop 2010)
Mirage: ML kernels in the cloud (ML Workshop 2010)Mirage: ML kernels in the cloud (ML Workshop 2010)
Mirage: ML kernels in the cloud (ML Workshop 2010)Anil Madhavapeddy
 
Install FD.IO VPP On Intel(r) Architecture & Test with Trex*
Install FD.IO VPP On Intel(r) Architecture & Test with Trex*Install FD.IO VPP On Intel(r) Architecture & Test with Trex*
Install FD.IO VPP On Intel(r) Architecture & Test with Trex*
Michelle Holley
 
100 M pps on PC.
100 M pps on PC.100 M pps on PC.
100 M pps on PC.
Redge Technologies
 
Pushing Packets - How do the ML2 Mechanism Drivers Stack Up
Pushing Packets - How do the ML2 Mechanism Drivers Stack UpPushing Packets - How do the ML2 Mechanism Drivers Stack Up
Pushing Packets - How do the ML2 Mechanism Drivers Stack Up
James Denton
 
LinuxCon2009: 10Gbit/s Bi-Directional Routing on standard hardware running Linux
LinuxCon2009: 10Gbit/s Bi-Directional Routing on standard hardware running LinuxLinuxCon2009: 10Gbit/s Bi-Directional Routing on standard hardware running Linux
LinuxCon2009: 10Gbit/s Bi-Directional Routing on standard hardware running Linux
brouer
 
Seastar at Linux Foundation Collaboration Summit
Seastar at Linux Foundation Collaboration SummitSeastar at Linux Foundation Collaboration Summit
Seastar at Linux Foundation Collaboration Summit
Don Marti
 

Similar to NUSE (Network Stack in Userspace) at #osio (20)

Direct Code Execution - LinuxCon Japan 2014
Direct Code Execution - LinuxCon Japan 2014Direct Code Execution - LinuxCon Japan 2014
Direct Code Execution - LinuxCon Japan 2014
 
Network Stack in Userspace (NUSE)
Network Stack in Userspace (NUSE)Network Stack in Userspace (NUSE)
Network Stack in Userspace (NUSE)
 
mTCP使ってみた
mTCP使ってみたmTCP使ってみた
mTCP使ってみた
 
Dpdk applications
Dpdk applicationsDpdk applications
Dpdk applications
 
LibOS as a regression test framework for Linux networking #netdev1.1
LibOS as a regression test framework for Linux networking #netdev1.1LibOS as a regression test framework for Linux networking #netdev1.1
LibOS as a regression test framework for Linux networking #netdev1.1
 
PLNOG16: Obsługa 100M pps na platformie PC , Przemysław Frasunek, Paweł Mała...
PLNOG16: Obsługa 100M pps na platformie PC, Przemysław Frasunek, Paweł Mała...PLNOG16: Obsługa 100M pps na platformie PC, Przemysław Frasunek, Paweł Mała...
PLNOG16: Obsługa 100M pps na platformie PC , Przemysław Frasunek, Paweł Mała...
 
Linux Network Stack
Linux Network StackLinux Network Stack
Linux Network Stack
 
Network Programming: Data Plane Development Kit (DPDK)
Network Programming: Data Plane Development Kit (DPDK)Network Programming: Data Plane Development Kit (DPDK)
Network Programming: Data Plane Development Kit (DPDK)
 
FD.io - The Universal Dataplane
FD.io - The Universal DataplaneFD.io - The Universal Dataplane
FD.io - The Universal Dataplane
 
Fast datastacks - fast and flexible nfv solution stacks leveraging fd.io
Fast datastacks - fast and flexible nfv solution stacks leveraging fd.ioFast datastacks - fast and flexible nfv solution stacks leveraging fd.io
Fast datastacks - fast and flexible nfv solution stacks leveraging fd.io
 
D. Fast, Simple User-Space Network Functions with Snabb (RIPE 77)
D. Fast, Simple User-Space Network Functions with Snabb (RIPE 77)D. Fast, Simple User-Space Network Functions with Snabb (RIPE 77)
D. Fast, Simple User-Space Network Functions with Snabb (RIPE 77)
 
Playing BBR with a userspace network stack
Playing BBR with a userspace network stackPlaying BBR with a userspace network stack
Playing BBR with a userspace network stack
 
Skills Summary for GASteele
Skills Summary for GASteeleSkills Summary for GASteele
Skills Summary for GASteele
 
Understanding DPDK
Understanding DPDKUnderstanding DPDK
Understanding DPDK
 
Mirage: ML kernels in the cloud (ML Workshop 2010)
Mirage: ML kernels in the cloud (ML Workshop 2010)Mirage: ML kernels in the cloud (ML Workshop 2010)
Mirage: ML kernels in the cloud (ML Workshop 2010)
 
Install FD.IO VPP On Intel(r) Architecture & Test with Trex*
Install FD.IO VPP On Intel(r) Architecture & Test with Trex*Install FD.IO VPP On Intel(r) Architecture & Test with Trex*
Install FD.IO VPP On Intel(r) Architecture & Test with Trex*
 
100 M pps on PC.
100 M pps on PC.100 M pps on PC.
100 M pps on PC.
 
Pushing Packets - How do the ML2 Mechanism Drivers Stack Up
Pushing Packets - How do the ML2 Mechanism Drivers Stack UpPushing Packets - How do the ML2 Mechanism Drivers Stack Up
Pushing Packets - How do the ML2 Mechanism Drivers Stack Up
 
LinuxCon2009: 10Gbit/s Bi-Directional Routing on standard hardware running Linux
LinuxCon2009: 10Gbit/s Bi-Directional Routing on standard hardware running LinuxLinuxCon2009: 10Gbit/s Bi-Directional Routing on standard hardware running Linux
LinuxCon2009: 10Gbit/s Bi-Directional Routing on standard hardware running Linux
 
Seastar at Linux Foundation Collaboration Summit
Seastar at Linux Foundation Collaboration SummitSeastar at Linux Foundation Collaboration Summit
Seastar at Linux Foundation Collaboration Summit
 

More from Hajime Tazaki

Linux rumpkernel - ABC2018 (AsiaBSDCon 2018)
Linux rumpkernel - ABC2018 (AsiaBSDCon 2018)Linux rumpkernel - ABC2018 (AsiaBSDCon 2018)
Linux rumpkernel - ABC2018 (AsiaBSDCon 2018)
Hajime Tazaki
 
Network stack personality in Android phone - netdev 2.2
Network stack personality in Android phone - netdev 2.2Network stack personality in Android phone - netdev 2.2
Network stack personality in Android phone - netdev 2.2
Hajime Tazaki
 
Linux Kernel Library - Reusing Monolithic Kernel
Linux Kernel Library - Reusing Monolithic KernelLinux Kernel Library - Reusing Monolithic Kernel
Linux Kernel Library - Reusing Monolithic Kernel
Hajime Tazaki
 
IIJlab seminar - Linux Kernel Library: Reusable monolithic kernel (in Japanese)
IIJlab seminar - Linux Kernel Library: Reusable monolithic kernel (in Japanese)IIJlab seminar - Linux Kernel Library: Reusable monolithic kernel (in Japanese)
IIJlab seminar - Linux Kernel Library: Reusable monolithic kernel (in Japanese)
Hajime Tazaki
 
Direct Code Execution @ CoNEXT 2013
Direct Code Execution @ CoNEXT 2013Direct Code Execution @ CoNEXT 2013
Direct Code Execution @ CoNEXT 2013
Hajime Tazaki
 
Kernelvm 201312-dlmopen
Kernelvm 201312-dlmopenKernelvm 201312-dlmopen
Kernelvm 201312-dlmopenHajime Tazaki
 

More from Hajime Tazaki (6)

Linux rumpkernel - ABC2018 (AsiaBSDCon 2018)
Linux rumpkernel - ABC2018 (AsiaBSDCon 2018)Linux rumpkernel - ABC2018 (AsiaBSDCon 2018)
Linux rumpkernel - ABC2018 (AsiaBSDCon 2018)
 
Network stack personality in Android phone - netdev 2.2
Network stack personality in Android phone - netdev 2.2Network stack personality in Android phone - netdev 2.2
Network stack personality in Android phone - netdev 2.2
 
Linux Kernel Library - Reusing Monolithic Kernel
Linux Kernel Library - Reusing Monolithic KernelLinux Kernel Library - Reusing Monolithic Kernel
Linux Kernel Library - Reusing Monolithic Kernel
 
IIJlab seminar - Linux Kernel Library: Reusable monolithic kernel (in Japanese)
IIJlab seminar - Linux Kernel Library: Reusable monolithic kernel (in Japanese)IIJlab seminar - Linux Kernel Library: Reusable monolithic kernel (in Japanese)
IIJlab seminar - Linux Kernel Library: Reusable monolithic kernel (in Japanese)
 
Direct Code Execution @ CoNEXT 2013
Direct Code Execution @ CoNEXT 2013Direct Code Execution @ CoNEXT 2013
Direct Code Execution @ CoNEXT 2013
 
Kernelvm 201312-dlmopen
Kernelvm 201312-dlmopenKernelvm 201312-dlmopen
Kernelvm 201312-dlmopen
 

Recently uploaded

To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 

Recently uploaded (20)

To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 

NUSE (Network Stack in Userspace) at #osio

  • 1. Network Stack in Userspace (NUSE) ! Hajime Tazaki Ryo Nakamura (University of Tokyo) ! New Directions in Operating Systems London, 2014
  • 2. Motivation Implementation of the Internet is not finished yet ! ! Faster evolution of OSes (network stack) OS personalization 2
  • 3. I have a new Layer-3/4 protocol! Yay! I have new, great Layer-3/4 protocol ! It will change the WORLD ! Replace network stack ? No: destroy my life ?! (experimental ? not tested ?) Yes: I wanna be your slave. Slow evolution of network stack ? VM on personal device ? 3
  • 4. Virtual Machine ? Poll: “When you download and run software, how often do you use a virtual machine (to reduce security risks)?” Jon Howell, Galen Hunt, David Molnar, and Donald E. Porter, Living Dangerously: A Survey of Software Download Practices, no. MSR-TR-2010-51, May 2010 4
  • 5. costin.raiciu@cs.pub.ro, j.araujo@ucl.ac.uk, rizzo@iet.unipi.it Internet paths that it is still despite the the blame extensions taking placed on end moving protocols deployment optimizations. support for user-level commodity number of host stack, s. our mux/de-mux line rate (up Slow evolution of network stack Honda et al., Rekindling Network Protocol Innovation with User-Level Stacks, ACM SIGCOMM CCR, Vol.44, Num. 2, April 2014 cores, and over a basic same server 1.00 0.75 0.50 0.25 0.00 2007 2008 2009 2010 2011 2012 Date Ratio of flows Option SACK Timestamp Windowscale Direction Inbound Outbound Figure 1: TCP options deployment over time. pen infrequently not only because of slow release cycles, but also due to their cost and potential disruption to existing setups. If protocol stacks were embedded into applications, they could be updated on a case-by-case basis, and deploy-ment would be a lot more timely. For example, Mac OS, Windows XP and FreeBSD still use a traditional Additive Increase Multiplicative Decrease (AIMD) algorithm for TCP congestion control, while Linux
  • 6. Meanwhile in Filesystem world.. There is, Filesystem in Userspace (FUSE) Userspace code can host new filesystem (sshfs, GmailFS, etc) Performance is bad, but doesn’t matter Flexibility and functionality do matter 6 http://fuse.sourceforge.net/
  • 7. Alternatives Container (LXC, OpenVZ, vimage) share kernel with host operating system (no flexibility) Library OS full scratch: mtcp, Mirage, lwIP Porting: OSv, Sandstorm, libuinet (FreeBSD), Arrakis (lwIP), OpenOnload (lwIP?) Glue-layer: LKL (Linux-2.6), rumpkernel (NetBSD) 7
  • 8. Network Stack in Userspace
  • 9. What’s NUSE ? Network stack in Userspace A library operating system Library version of network stack (of monolithic kernel) Linux (latest), FreeBSD (plan) (UNIX) Process-based virtualization 9 nuse example kernel bypassed TCP/IP ARP/ ndisc libnuse glibc NIC userspace kernel raw sock netmap DPDK (etc)
  • 10. Why NUSE ? minimized porting effort Linux (net-next) changes frequently ! full functional network stack for netmap DPDK (any kernel-bypass technology) 10
  • 11. How it works Application POSIX glue TCP UDP DCCP SCTP ICMP ARP IPv6 IPv4 Qdisc Netfilter Bridging Netlink IPSec Tunneling Kernel layer NUSE core bottom halves/ rcu/timer/ interrupt struct net_device RAW DPDK netmap ... NIC petit-scheduler 1. (monolithic) kernel source 2. scheduler 3. POSIX glue redirect system calls 4. network I/O raw socket, DPDK, netmap, etc.. 11
  • 12. 1) kernel build Application POSIX glue TCP UDP DCCP SCTP ICMP ARP IPv6 IPv4 Qdisc Netfilter Bridging Netlink IPSec Tunneling Kernel layer NUSE core bottom halves/ rcu/timer/ interrupt struct net_device RAW DPDK netmap ... NIC petit-scheduler patch to kernel tree with new (hw independent) arch (arch/sim) robust to (frequent) mainstream changes 12
  • 13. 2) scheduler Application POSIX glue TCP UDP DCCP SCTP ICMP ARP IPv6 IPv4 Qdisc Netfilter Bridging Netlink IPSec Tunneling Kernel layer NUSE core bottom halves/ rcu/timer/ interrupt struct net_device RAW DPDK netmap ... NIC petit-scheduler offer alternate context primitives interrupts, timer, thread, bottom halves (tasklet, workqueue, waiter, etc) wrap with POSIX thread easily debuggable ucontext fiber for low overhead (not yet) 13
  • 14. 3) POSIX glue code Application POSIX glue TCP UDP DCCP SCTP ICMP ARP IPv6 IPv4 Qdisc Netfilter Bridging Netlink IPSec Tunneling Kernel layer NUSE core bottom halves/ rcu/timer/ interrupt struct net_device RAW DPDK netmap ... NIC petit-scheduler Hijack function calls socket => nuse_socket read => nuse_read apps not aware of LD_PRELOAD=libnuse.so .. 14
  • 15. 4) network I/O Application POSIX glue TCP UDP DCCP SCTP ICMP ARP IPv6 IPv4 Qdisc Netfilter Bridging Netlink IPSec Tunneling Kernel layer NUSE core bottom halves/ rcu/timer/ interrupt struct net_device RAW DPDK netmap ... NIC petit-scheduler connect NUSE to NIC options raw socket (default) DPDK (if available) netmap (if available) Tap 15
  • 16. Usage download git clone git://github.com/libos-nuse/net-next- nuse compile make library ARCH=sim (NETMAP=yes) (DPDK=yes) execute sudo NUSECONF=nuse.conf ./nuse (application) 16
  • 17. configs 17 # Interface definition.! interface eth0! address 192.168.0.10! netmask 255.255.255.0! macaddr 00:01:01:01:01:01! viftype TAP! ! interface p1p1! address 172.16.0.1! netmask 255.255.255.0! macaddr 00:01:01:01:01:02! ! # route entry definition.! route! network 0.0.0.0! netmask 0.0.0.0! gateway 192.168.0.1
  • 18. (possible) use cases New protocol deployment Chrome + Linux mptcp (on NUSE) Process-level virtual instance % NUSE-linux-ovs | NUSE-freebsd-NAT | NUSE-router | NUSE-nginx! VM chaining via UNIX command line 18
  • 19. Limitation (ongoings) no fork(2)/exec(2) support no multi-processes no sysctl/proc (inefficient) thread scheduling 19
  • 20. Experiments 1. Can we benefit with OS personalization? present a custom (NUSE) kernel with an application (OS personalization) 2. How much overhead does NUSE add? Simple performance measurements 20
  • 21. Tested applications working ping, iperf, nginx (partially), sleep, need patches nc, wget, dig, host 21
  • 22. Setup: Performance measurement NUSE node Tx/Rx nodes CPU Xeon E5-2650v2 @ ping! flowgen 10G 10G 22 2.60GHz (16 core) Xeon L3426 @ 1.87GHz (8 core) Memory 32GB 4GB NIC Intel X520 Intel X520 ping! flowgen vnstat! (packet count) Tx NUSE Rx
  • 23. Host Tx (NUSE->Receiver) NUSE Rx 23 avg max min dpdk! 2.610 8.000 0.156 netmap 0.370 0.494 0.252 raw 0.396 0.501 0.290 tap 0.397 0.538 0.303 500 450 400 350 300 250 200 150 100 50 0 dpdk netmap raw tap Throughput (Mbps) ping (RTT) throughput (1024byte,UDP) 8 7 6 5 4 3 2 1 0 dpdk netmap raw tap RTT (ms)
  • 24. L3 Routing Sender->NUSE->Receiver Tx NUSE Rx 24 avg max min dpdk! 11.998 27.700 0.252 netmap 0.664 0.741 0.556 raw 0.663 0.761 0.575 tap 0.694 0.749 0.602 ping (RTT) 500 450 400 350 300 250 200 150 100 50 0 netmap raw tap Throughput (Mbps) throughput (1024byte,UDP) 30 25 20 15 10 5 0 dpdk netmap raw tap RTT (ms)
  • 25. Discussions not so bad performance we don’t care much about performance network stack is full functional but supplemental tools are not sufficient 25
  • 26. Network Simulator Integration (ns-3) network stack +ns-3 network simulator ! Direct Code Execution (DCE) Established by Mathieu Lacage (2006) part of ns-3 project ! Features reproducible (deterministic clock) controllable (simulator’s facility) http://www.nsnam.org/overview/projects/direct-code-execution/ 26
  • 27. 27
  • 28. NUSE vs DCE NUSE DCE shared kernel library ARCH=sim ARCH=sim scheduler (host) pthread simulator’s scheduler! 28 (deterministic) POSIX hijack hijack network I/O raw/netmap/DPDK/tap ns3:NetDevice execution LD_PRELOAD dlmopen(3)! single proc/multi-instances
  • 29. DCE Architecture Application (ip, iptables, quagga) POSIX layer TCP UDP DCCP SCTP ICMP ARP IPv6 IPv4 Netfilter Bridging struct net_device 29 Qdisc Netlink IPSec Tunneling bottom halves/rcu/ timer/interrupt Kernel layer Heap Stack memory Virtualization Core layer ns-3 (network simulation core) DCE ns-3 applicati on ns-3 TCP/IP stack 3) POSIX! Layer 1) Core! Layer 2) Kernel! Layer
  • 30. Bug reproducibility Home Agent AP1 AP2 30 Wi-Fi Wi-Fi handoff ping6 correspondent node mobile node (gdb) b mip6_mh_filter if dce_debug_nodeid()==0 Breakpoint 1 at 0x7ffff287c569: file net/ipv6/mip6.c, line 88. <continue> (gdb) bt 4 #0 mip6_mh_filter (sk=0x7ffff7f69e10, skb=0x7ffff7cde8b0) at net/ipv6/mip6.c:109 #1 0x00007ffff2831418 in ipv6_raw_deliver (skb=0x7ffff7cde8b0, nexthdr=135) at net/ipv6/raw.c:199 #2 0x00007ffff2831697 in raw6_local_deliver (skb=0x7ffff7cde8b0, nexthdr=135) at net/ipv6/raw.c:232 #3 0x00007ffff27e6068 in ip6_input_finish (skb=0x7ffff7cde8b0) at net/ipv6/ip6_input.c:197
  • 31. Debugging ==5864== Memcheck, a memory error detector ==5864== Copyright (C) 2002-2009, and GNU GPL'd, by Julian Seward et al. ==5864== Using Valgrind-3.6.0.SVN and LibVEX; rerun with -h for copyright info ==5864== Command: ../build/bin/ns3test-dce-vdl --verbose ==5864== ==5864== Conditional jump or move depends on uninitialised value(s) ==5864== at 0x7D5AE32: tcp_parse_options (tcp_input.c:3782) ==5864== by 0x7D65DCB: tcp_check_req (tcp_minisocks.c:532) ==5864== by 0x7D63B09: tcp_v4_hnd_req (tcp_ipv4.c:1496) ==5864== by 0x7D63CB4: tcp_v4_do_rcv (tcp_ipv4.c:1576) ==5864== by 0x7D6439C: tcp_v4_rcv (tcp_ipv4.c:1696) ==5864== by 0x7D447CC: ip_local_deliver_finish (ip_input.c:226) ==5864== by 0x7D442E4: ip_rcv_finish (dst.h:318) ==5864== by 0x7D2313F: process_backlog (dev.c:3368) ==5864== by 0x7D23455: net_rx_action (dev.c:3526) ==5864== by 0x7CF2477: do_softirq (softirq.c:65) ==5864== by 0x7CF2544: softirq_task_function (softirq.c:21) ==5864== by 0x4FA2BE1: ns3::TaskManager::Trampoline(void*) (task-manager.==5864== Uninitialised value was created by a stack allocation ==5864== at 0x7D65B30: tcp_check_req (tcp_minisocks.c:522) ==5864== Memory error detection among distributed nodes in a single process using Valgrind ! ! 31
  • 32. Fine-grained parameter coverage Code coverage measurement with DCE With fine-grained network, node, protocol parameters 32
  • 34. Summary NUSE (Network Stack in Userspace) OS personalization (fast evolution, easy deployment) DCE (Direct Code Execution) Flexible network experiment/test with deterministic clock 34
  • 35. github: https://github.com/libos-nuse/net-next- nuse DCE: http://bit.ly/ns-3-dce twitter: @thehajime 35
  • 37. Potentials of Userspace Networking High-performance networking Useful debugging facilities Operating system personalization 37
  • 38. 1) kernel build build kernel source tree w/ the patch make menuconfig ARCH=sim make library ARCH=sim ➔ libnuse-linux-3.17-rc1.so 38
  • 39. Example: How timer works 39 add_timer() TIMER_SOFTIRQ timer_list run_timer_softirq () timer handler timer thread (timer_create (2))
  • 40. 3) POSIX glue code extern int sim_sock_socket (int,int,int, struct socket **); int socket (int family, int type, int proto) { sim_update_jiffies (); struct socket *kernel_socket = sim_malloc (sizeof (struct socket)); memset (kernel_socket, 0, sizeof (struct socket)); int ret = sim_sock_socket (family, type, proto, &kernel_socket); g_fd_table[curfd++] = kernel_socket; sim_softirq_wakeup (); return curfd - 1; } https://github.com/libos-nuse/net-next-nuse/blob/nuse/arch/sim/nuse-glue.c 40
  • 41. Tx callgraph sendmsg () (socket API) sim_sock_sendmsg () (NUSE) sock_sendmsg () ip_send_skb () ip_finish_output2 () dst_neigh_output () (ex-kernel) neigh_resolve_output () arp_solicit () dev_queue_xmit () sim_dev_xmit () (NUSE) nuse_vif_raw_write () 41
  • 42. Rx callgraph start_thread () (pthread) nuse_netdev_rx_trampoline () nuse_vif_raw_read () (NUSE) sim_dev_rx () netif_rx () (ex-kernel) start_thread () (pthread) do_softirq () (NUSE) net_rx_action () process_backlog () (ex-kernel) __netif_receive_skb_core () ip_rcv () 42 vNIC! rx softirq! rx