Network Stack in Userspace (NUSE)

Network Stack in
Userspace (NUSE)
!
!
Hajime Tazaki
高速PCルーター研究会 2014/9/29

Today’s talk
• Userspace version of (Linux) network
stack
• not intended for high-speed something
• but useful for high-speed network I/O
2

I have a new Layer-3/4
protocol! Yey!
• I have new, great Layer-3/4 protocol ! It will
change the WORLD !
• network stack って、入れかえたいですか?
• No: your code will destroy my life ?!
(experimental ? not tested ?)
• Yes: I wanna be your slave.
• VM cloud = OK, no much users/services interfere
• multi-user server, PC, phone = Nightmare, my life
will have trouble…
3

I have a new Layer-3/4
protocol! Yey! (cont’d)
• Kernel programming sucks
• LKM ? can cause panic anyway..
• Click ? only router/middlebox, not for
end-hosts
• Slow evolution
• VM ? Hmm, I’m a lazy guy..
4

costin.raiciu@cs.pub.ro, j.araujo@ucl.ac.uk, rizzo@iet.unipi.it
Internet paths
that it is still
despite the
the blame
extensions taking
placed on end
moving protocols
deployment
optimizations.
support for user-level
commodity
number of
host stack,
s.
our mux/de-mux
line rate (up
Slow evolution of network stack
Honda et al., Rekindling Network Protocol Innovation with User-Level Stacks, ACM
SIGCOMM CCR, Vol.44, Num. 2, April 2014
cores, and
over a basic
same server
1.00
0.75
0.50
0.25
0.00
2007 2008 2009 2010 2011 2012
Date
Ratio of flows
Option
SACK
Timestamp
Windowscale
Direction
Inbound
Outbound
Figure 1: TCP options deployment over time.
pen infrequently not only because of slow release cycles, but
also due to their cost and potential disruption to existing
setups. If protocol stacks were embedded into applications,
they could be updated on a case-by-case basis, and deploy-ment
would be a lot more timely.
For example, Mac OS, Windows XP and FreeBSD still
use a traditional Additive Increase Multiplicative Decrease
(AIMD) algorithm for TCP congestion control, while Linux

Virtual Machine ?
Poll: “When you download and run software, how often do you use a virtual machine (to reduce
security risks)?”
Jon Howell, Galen Hunt, David Molnar, and Donald E. Porter, Living Dangerously: A Survey of Software Download
Practices, no. MSR-TR-2010-51, May 2010
6

Meanwhile in
Filesystem world..
• There is,
• Filesystem in Userspace
(FUSE)
• Userspace code can host
new filesystem (sshfs,
GmailFS, etc)
• Performance is bad,
but doesn’t matter
• Flexibility and
functionality do matter
7
http://fuse.sourceforge.net/

Problem Statements
• Slow evolution of network stack
• Interfere to host OS (which is
untouchable)
• Too heavy workload of VM
8

What’s NUSE ?
• Network stack in Userspace
• Userspace as much as possible
• like Fuse (Filesystem in Userspace)
• Library version of network stack (of
monolithic kernel)
• kernel bypassed
• (UNIX) Process-based virtualization
9

What can do with NUSE ?
• Host operating system
• Linux (for the moment)
• Guest operating systems
• Linux (3.17-rc1 based)
• FreeBSD (ongoing)
• Suitable with kernel-bypass technologies
• DPDK/netmap with (full) network stack + (existing) applications
• Applications
• ping, iperf, nginx (partially worked)
10

FUSE vs NUSE
11
nuse example
kernel bypassed
TCP/IP
ARP/
ndisc
libnuse
glibc
NIC
userspace
kernel
raw sock
netmap
DPDK (etc)
libfuse
glibc glibc
VFS
FUSE
......
NFS
ext3
ls -l
/tmp/fuse
example
/tmp/fuse
userspace
kernel

Design Goals
• No modification to userspace apps
• No mod to kernel space as well
• Transparent
• LD_PRELOADable
• x1 performance of native OS
12

Application
POSIX glue
TCP UDP DCCP SCTP
ICMP ARP
IPv6 IPv4
Qdisc
Netfilter Bridging
Netlink
IPSec Tunneling
Kernel layer
NUSE core
bottom halves/
rcu/timer/
interrupt
struct
net_device
RAW DPDK netmap ...
NIC
Recipe
petit-scheduler
1. (monolithic) kernel
source
2. petit-scheduler
3. POSIX glue
• redirect system calls (at
libc-level)
4. network I/O
• raw socket, DPDK, netmap,
etc..
13

1) kernel build
Application
POSIX glue
TCP UDP DCCP SCTP
ICMP ARP
IPv6 IPv4
Qdisc
Netfilter Bridging
Netlink
IPSec Tunneling
Kernel layer
NUSE core
bottom halves/
rcu/timer/
interrupt
struct
net_device
RAW DPDK netmap ...
NIC
petit-scheduler
• patch to kernel tree
• with new (hw independent)
arch (arch/sim)
• robust to (frequent)
mainstream changes
• build kernel source tree
w/ the patch
• make menuconfig ARCH=sim
• make library ARCH=sim
• ➔ libnuse-linux-3.17-rc1.so
14

2) petit scheduler
• offer alternate context
primitives
Application
POSIX glue
TCP UDP DCCP SCTP
ICMP ARP
IPv6 IPv4
Qdisc
Netfilter Bridging
Netlink
IPSec Tunneling
Kernel layer
NUSE core
bottom halves/
rcu/timer/
interrupt
struct
net_device
RAW DPDK netmap ...
NIC
petit-scheduler
• interrupts, timer, thread,
bottom halves (tasklet,
workqueue, waiter, etc)
!
• Implemented with POSIX
thread
• easily debuggable
• ucontext fiber for low
overhead (not yet)
15

3) POSIX glue code
Application
POSIX glue
TCP UDP DCCP SCTP
ICMP ARP
IPv6 IPv4
Qdisc
Netfilter Bridging
Netlink
IPSec Tunneling
Kernel layer
NUSE core
bottom halves/
rcu/timer/
interrupt
struct
net_device
RAW DPDK netmap ...
NIC
petit-scheduler
• Hijack function calls
• socket => nuse_socket
• read => nuse_read
• libc level hijack
• apps not aware of
• LD_PRELOAD=libnuse.so ..
• can’t catch int 0x80
16

extern int sim_sock_socket (int,int,int, struct socket **);
int socket (int family, int type, int proto)
{
sim_update_jiffies ();
struct socket *kernel_socket =
sim_malloc (sizeof (struct socket));
memset (kernel_socket, 0, sizeof (struct socket));
int ret = sim_sock_socket (family, type, proto, &kernel_socket);
g_fd_table[curfd++] = kernel_socket;
sim_softirq_wakeup ();
return curfd - 1;
}
https://github.com/thehajime/net-next-nuse/blob/nuse/arch/sim/nuse-glue.c

4) network I/O
Application
POSIX glue
TCP UDP DCCP SCTP
ICMP ARP
IPv6 IPv4
Qdisc
Netfilter Bridging
Netlink
IPSec Tunneling
Kernel layer
NUSE core
bottom halves/
rcu/timer/
interrupt
struct
net_device
RAW DPDK netmap ...
NIC
petit-scheduler
• connect NUSE to NIC
• options
• raw socket (general)
• DPDK (if available)
• netmap (if available)
• Tap ?
18

tatic netdev_tx_t
kernel_dev_xmit(struct sk_buff *skb,
struct net_device *dev)
{
netif_stop_queue(dev);
sim_dev_xmit ((struct SimDevice *)dev, skb->data, skb->len);
dev_kfree_skb(skb);
netif_wake_queue(dev);
return 0;
}
static const struct net_device_ops sim_dev_ops = {
.ndo_start_xmit = kernel_dev_xmit,
.ndo_set_mac_address = eth_mac_addr,
};
void sim_dev_rx (struct SimDevice *device, struct SimDevicePacket
packet)
{
struct sk_buff *skb = packet.token;
struct net_device *dev = &device->dev;
skb->protocol = eth_type_trans(skb, dev);
skb->ip_summed = CHECKSUM_PARTIAL; // Do the TCP checksum (FIXME:
should be configurable)
!
netif_rx (skb);
}
https://github.com/thehajime/net-next-nuse/blob/nuse/arch/sim/sim-device.c

How to use NUSE ?
• download
• git clone git://github.com/thehajime/net-next-nuse
• compile
• make library ARCH=sim NETMAP=yes
• execute
• sudo ./nuse (application)
• success ? : lucky guy !
• fail: add hijack calls
20

Alternatives
• Container (LXC, OpenVZ, vimage)
• share kernel with host operating system (no flexibility)
• virtual machine (KVM,Xen,UML)
• flexible/functional, but heavy bootstrap
• Library OS
• full scratch: mtcp, Mirage, lwIP
• Porting: OSv, Sandstorm, libuinet (FreeBSD), Arrakis
(lwIP), OpenOnload (lwIP?)
• Glue-layer: LKL (Linux-2.6), Rump (NetBSD)
21

Alternatives (cont’d)
Rumpkernel
• https://github.com/rumpkernel/wiki/wiki
• One binary runs on everywhere
• Linux,xBSD,Soralis,cygwin Host
• Xen Dom-U
• Bare metal (hardware, KVM, Virtualbox)
• Well-defined API (hypercall)
!
• Only NetBSD network stack is available
22

Evaluation
• Performance ?
• not good so far..
• Generality
• Run all applications ? up to POSIX
coverage
23

Ongoings
• (efficient) thread scheduling
• batch Tx/Rx
• fork(2)/exec(2)
• multi-processes
!
• => migrate to rumpkernel ?
25

Summary
• Network Stack in Userspace (NUSE)
• network stack library
• light virtualization
• fast evolution, easy deployments
https://github.com/thehajime/net-next-nuse
26

GASPP: A GPU-Accelerated Stateful
Packet Processing Framework
Giorgos Vasiliadis, Lazaros Koromilas, Michalis Polychronakis, and Sotiris Ioannidis, GASPP: A GPU-Accelerated Stateful Packet
Processing Framework, USENIX ATC 2014, June, 2014
28

Network Stack in Userspace (NUSE)

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (17)

Similar to Network Stack in Userspace (NUSE)

Similar to Network Stack in Userspace (NUSE) (20)

Recently uploaded

Recently uploaded (20)

Network Stack in Userspace (NUSE)