Network Stack in
(University of Tokyo)
New Directions in Operating Systems
Implementation of the Internet
is not finished yet
Faster evolution of OSes (network
I have a new Layer-3/4
I have new, great Layer-3/4 protocol ! It
will change the WORLD !
Replace network stack ?
No: destroy my life ?!
(experimental ? not tested ?)
Yes: I wanna be your slave.
Slow evolution of network stack ?
VM on personal device ?
Virtual Machine ?
Poll: “When you download and run software, how often do you use a virtual machine (to reduce
Jon Howell, Galen Hunt, David Molnar, and Donald E. Porter, Living Dangerously: A Survey of Software Download
Practices, no. MSR-TR-2010-51, May 2010
firstname.lastname@example.org, email@example.com, firstname.lastname@example.org
that it is still
placed on end
support for user-level
line rate (up
Slow evolution of network stack
Honda et al., Rekindling Network Protocol Innovation with User-Level Stacks, ACM
SIGCOMM CCR, Vol.44, Num. 2, April 2014
over a basic
2007 2008 2009 2010 2011 2012
Ratio of flows
Figure 1: TCP options deployment over time.
pen infrequently not only because of slow release cycles, but
also due to their cost and potential disruption to existing
setups. If protocol stacks were embedded into applications,
they could be updated on a case-by-case basis, and deploy-ment
would be a lot more timely.
For example, Mac OS, Windows XP and FreeBSD still
use a traditional Additive Increase Multiplicative Decrease
(AIMD) algorithm for TCP congestion control, while Linux
Filesystem in Userspace
Userspace code can host
new filesystem (sshfs,
Performance is bad,
but doesn’t matter
functionality do matter
Container (LXC, OpenVZ, vimage)
share kernel with host operating system (no
full scratch: mtcp, Mirage, lwIP
Porting: OSv, Sandstorm, libuinet (FreeBSD),
Arrakis (lwIP), OpenOnload (lwIP?)
Glue-layer: LKL (Linux-2.6), rumpkernel (NetBSD)
What’s NUSE ?
Network stack in Userspace
A library operating system
Library version of network
stack (of monolithic kernel)
Linux (latest), FreeBSD (plan)
Why NUSE ?
minimized porting effort
Linux (net-next) changes frequently
full functional network stack for
(any kernel-bypass technology)
How it works
TCP UDP DCCP SCTP
RAW DPDK netmap ...
1. (monolithic) kernel
3. POSIX glue
redirect system calls
4. network I/O
raw socket, DPDK,
1) kernel build
TCP UDP DCCP SCTP
RAW DPDK netmap ...
patch to kernel tree
with new (hw independent)
robust to (frequent)
(possible) use cases
New protocol deployment
Chrome + Linux mptcp (on NUSE)
Process-level virtual instance
% NUSE-linux-ovs | NUSE-freebsd-NAT |
NUSE-router | NUSE-nginx!
VM chaining via UNIX command line
no fork(2)/exec(2) support
(inefficient) thread scheduling
1. Can we benefit with OS personalization?
present a custom (NUSE) kernel with an
application (OS personalization)
2. How much overhead does NUSE add?
Simple performance measurements
ping, iperf, nginx (partially), sleep,
nc, wget, dig, host
(gdb) b mip6_mh_filter if dce_debug_nodeid()==0
Breakpoint 1 at 0x7ffff287c569: file net/ipv6/mip6.c, line 88.
(gdb) bt 4
#1 0x00007ffff2831418 in ipv6_raw_deliver
#2 0x00007ffff2831697 in raw6_local_deliver
#3 0x00007ffff27e6068 in ip6_input_finish
==5864== Memcheck, a memory error detector
==5864== Copyright (C) 2002-2009, and GNU GPL'd, by Julian Seward et al.
==5864== Using Valgrind-3.6.0.SVN and LibVEX; rerun with -h for copyright info
==5864== Command: ../build/bin/ns3test-dce-vdl --verbose
==5864== Conditional jump or move depends on uninitialised value(s)
==5864== at 0x7D5AE32: tcp_parse_options (tcp_input.c:3782)
==5864== by 0x7D65DCB: tcp_check_req (tcp_minisocks.c:532)
==5864== by 0x7D63B09: tcp_v4_hnd_req (tcp_ipv4.c:1496)
==5864== by 0x7D63CB4: tcp_v4_do_rcv (tcp_ipv4.c:1576)
==5864== by 0x7D6439C: tcp_v4_rcv (tcp_ipv4.c:1696)
==5864== by 0x7D447CC: ip_local_deliver_finish (ip_input.c:226)
==5864== by 0x7D442E4: ip_rcv_finish (dst.h:318)
==5864== by 0x7D2313F: process_backlog (dev.c:3368)
==5864== by 0x7D23455: net_rx_action (dev.c:3526)
==5864== by 0x7CF2477: do_softirq (softirq.c:65)
==5864== by 0x7CF2544: softirq_task_function (softirq.c:21)
==5864== by 0x4FA2BE1: ns3::TaskManager::Trampoline(void*) (task-manager.==5864== Uninitialised value was created by a stack allocation
==5864== at 0x7D65B30: tcp_check_req (tcp_minisocks.c:522)
Memory error detection
among distributed nodes
in a single process
Fine-grained parameter coverage
Code coverage measurement with DCE
With fine-grained network, node, protocol parameters