1
Linux Kernel Library: Reusing
Monolithic Kernel
Hajime Tazaki
IIJ Innovation Institute
2016/07
AIST seminar vol.2
2 . 1
LKL in a nutshell
Linux kernel library
a library of Linux
Octavian Purdila (Intel)'s
work (since 2007?)
Proposed on LKML (Nov. 2015)
2809 LoC (as of Apr. 2016)
https://lwn.net/Articles/662953/
Purdila et al., LKL: The Linux kernel library, RoEduNet
2010.
2 . 2
LKL (cont'd)
hardware-independent architecture (arch/lkl)
provide an interface underlying environment
outsource dependencies
clock, memory allocation, scheduler
running on Windows, Linux, FreeBSD
simplify I/O operation of devices
virtio host implementation
could use the driver (of virtio) in Linux
Purdila et al., LKL: The Linux kernel library,
RoEduNet 2010.
2 . 3
Benefit
less ossi cation of new features
operating system personality
userspace library has less deployment cost
Well-matured code base
(e.g.) Linux kernel running in userspace
small kernel, a bunch of library
but in a di erent shape
Any problem in computer science can be solved with
another level of indirection.
(Wheeler and/or Lampson)
img src: https://www. ickr.com/photos/thomasclaveirole/305073153
2 . 4
2 . 5
What is reusing monolithic kernel ?
Anykernel: originally in NetBSD rump kernel
We de ne an anykernel to be an organization of
kernel code which allows the kernel's unmodi ed
drivers to be run in various con gurations such as
application libraries and microkernel style servers,
and also as part of a monolithic kernel. -- Kantee
2012.
Using (unmodi ed) high-quality code base of monolithic kernel
on di erent environment in di erent shape
by gluing additional stu s
2 . 62 . 7
(a bit
of)
History
rump: 2007 (NetBSD)
LKL: 2007 (Linux)
DCE/LibOS: 2008 (Linux/FreeBSD)
LibOS/LKL revival: 2015
LibOS merged to LKL
http://news.mynavi.jp/news/2015/03/25/285/
https://news.ycombinator.com/item?id=9259292
http://www.phoronix.com/scan.php?page=news_item&px=Linux-Library-LibOS
http://lwn.net/Articles/639333/
2 . 8
2 . 9
LKL v.s. LibOS
LKL LibOS
LKL v.s. LibOS (cont'd)
LoC:
arch/lkl (LKL) < arch/lib (LibOS)
di : the amount of stub code
commons
no modi cation to the original Linux code
description of kernel context (by POSIX thread)
outsourced resources (clock, memory, scheduler)
CPU independent architecture
di s
LibOS: implemented with higher API (timer, irq, kthread) by pthread
LKL: implement IRQ, kthread, timer with pthread in lower layer
3 . 1
Implementation
2 . 10
3 . 2
Internals
1. Host backend (host_ops)
2. CPU independent arch. (arch/lkl)
3. Application interface
1. host backend
environment dependent
part
unify an interface across
di erent platforms
(rump-hypercall like)
device interface with Virtio
block device <=> disk image
networking <=> TAP,
raw socket, DPDK, VDE
3 . 4
2. CPU independent architecture
architecture (arch/lkl)
transparent architecture bind
(as CPU arch)
require no modi cation to
the other
2800 LoC
thread information (struct
thread_info)
irq, timer, syscall handler
access to underlying layer
by host_ops
3 . 3
3 . 5
3. Application interface
1. use exposed API (LKL syscall)
2. use host libc (LD_PRELOAD)
3. extend (alternative) libc
3 . 6
API 1: use exposed API (LKL
syscall)
call entry points of LKL kernel
lkl_sys_open(), lkl_sys_socket()
almost same as ordinal syscalls
return value, errno noti cation are di erent
can use LKL syscall and host syscall
simultaneously
read ext4 le by lkl_sys_read() =>
write into host (Windows) by write()
3 . 7
API 2: hijack host standard library
dynamically replace symbols
of host syscalls (of libc)
LD_PRELOAD
socket() => lkl_sys_socket()
can use host binary (executable) as-is
limitation of replaceable symbols
needs syscall translation on non-linux host
3 . 8
API 3: extend (alternative) libc
only call LKL syscall with our own libc
also introduce as a virtual CPU architecture
a program can link this instead of host libc
can't access to (underlying) host resource
directly via this lkl syscall
as a patch for musl libc
3 . 9
Usecase (applications)
Use Case 1: instant kernel bypass
Use Case 2: programs reusing kernel code in userspace
Use Case 3: unikernel
3 . 10
Use Case 1: instant kernel bypass
syscall redirection by LD_PRELOAD
can use both LKL and host syscalls
new feature without touching host kernel
LD_PRELOAD=liblkl­super­tcp++.so firefox 
3 . 11
Use Case 2: programs reusing
kernel code in userspace
use kernel code without porting
mount a lesystem w/o root privilege
can use both LKL and host syscalls
e.g., access to disk image of ext4 format on Windows
1. open disk image (CreateFile())
2. Mount (lkl_sys_mount())
3. read a le in the disk image (lkl_sys_read())
4. write a le to windows side (WriteFile())
3 . 12
Use Case 3: Unikernel
single-application contained LKL
python + LKL, nginx + LKL
only LKL syscalls available
musl libc extension
rump hypcall (frankenlibc)
running on non-OS environment
(on Xen Mini-OS via rumprun)
Work in progress
- http://www.linux.com/news/enterprise/cloud-
computing/751156-are-cloud-operating-
systems-the-next-big-thing-
3 . 13
demos with linux kernel library
Unikernel on Linux (ping6 command
embedded kernel library)
Unikernel on qemu-arm (hello
world)
4 . 1
Kernel bypass/userspace
networking
4 . 2
Network Stack
Why in kernel space ?
the cost of packet was
expensive at the era ('70s)
now much cheaper
Getting fat (matured)
after decades
code path is longer
(and slower)
hard to add new features
faced unknown issues
img src: http://www.makelinux.net/kernel_map/
4 . 3
Alternate network stacks
lwip (2002~)
Arrakis [OSDI '14]
IX [OSDI '14]
MegaPipe [OSDI '12]
mTCP [NSDI '14]
SandStorm [SIGCOMM '14]
uTCP [CCR '14]
rumpkernel [ATC '09]
FastSocket [ASPLOS '16]
SolarFlare (2007~?)
StackMap [ATC '16]
libuinet (2013~)
SeaStar (2014~)
Snabb Switch (2012~)
4 . 4
Motivations
Socket API sucks
StackMap, MegaPipe, uTCP, SandStorm, IX
New API: no bene t with existing applications
Network stack in kernel space sucks
FastSocket, mTCP, lwip (SolarFlare?)
Compatibility is (also) important
rumpkernel, libuinet, Arrakis, IX, SolarFlare
Existing programming model sucks
SeaStar
4 . 5
Techniques
batching (syscall/NIC access)
Arrakis, IX, MegaPipe, mTCP, SandStorm, uTCP
Utilize feature-rich kernel stack
rumpkernel, fastsocket, StackMap
Porting to userspace stack
libuinet, SandStorm
Kernel bypass (userspace network stack)
mTCP, SandStorm, uTCP, rumpkernel, libuinet, lwip, SeaStar
bypass technique itself
netmap, PF_RING, raw socket, Intel DPDK
Connection locality (multi-core scalability)
SeaStar, MegaPipe, mTCP, fastsocket, .....
4 . 6
Implementation
Full scratch
lwip (Arrakis, IX, SolarFlare?), mTCP, uTCP, SeaStar
Porting based
libuinet, SandStorm
New API
MegaPipe, StackMap
Anykernel
rumpkernel, (LKL)
4 . 7
What's still missing ?
some solves problems by specialization
avoiding generality tax
performance w/ specialization v.s. more features w/ generalization
e.g., less TCP stack features, new API breaks existing applications
support.
specialized v.s. generalized
generalization often involves indirection
indirection usually introduces complexity (Wheeler/Lampson)
performant and generalized ?
5 . 1
Performance study
5 . 2
Conditions
ThinkStation P310 x2
CPU: Intel Core i7-6700 CPU @ 3.40GHz (8 cores)
Memory: 32GB
NIC: X540-T2
Linux 4.4.6-301 (x86_64) on Fedora 23
Linux bridge (X540 + tap/raw socket)
no DPDK... can't with hijack, etc
netperf (git ~v2.7.0)
netserver (native)
netperf (varied)
5 . 3
Conditions (cont'd)
combinations
netperf (sendmmsg) + host stack (native)
+ hijack library, native thread (hijack)
+ frankenlibc/lkl, green thread (lkl-musl)
netperf (sendmmsg) + lkl extension + frankenlibc (lkl-musl (skb pre
alloc))
pinned a processor
using taskset command
disable all o oad features (tso/gso/gro, rx/tx cksum)
TCP_RR (netperf)
5 . 4
UDP_STREAM (netperf)
5 . 5
UDP_STREAM (pps, netperf)
5 . 6
TCP_STREAM (netperf)
5 . 7
5 . 8
(ref.) LibOS results (as of Feb.
2015)
1024 bytes UDP, own-crafted tool
throughput: <10% of Linux native
5 . 9
Observations (of benchmark)
Native thread vs Green thread
better TCP_RR w/ native thread (pthread)
better TCP_STREAM/UDP_STREAM w/ green thread
???
avoiding dynamic allocation contributes a lot
penalized over MTU-sized payload on host stack (?)
6 . 1
Summary
Morphing monolithic kernel into an Anykernel
Various use cases
Userspace network stack (kernel bypass)
Unikernel
Performance study in progress
https://github.com/lkl/linux
6 . 2
Reference
Linux Kernel Library
Purdila et al., LKL: The Linux kernel library, RoEduNet 2010.
Rumpkernel (dissertation)
Kantee, Flexible Operating System Internals: The Design and
Implementation of the Anykernel and Rump Kernels, Ph.D Thesis,
2012
Linux LibOS in general
Tazaki et al. Direct Code Execution: Revisiting Library OS
Architecture for Reproducible Network Experiments, CoNEXT 2013
(LibOS in general)
https://github.com/lkl/linux
http://libos-nuse.github.io/
https://lwn.net/Articles/637658/
7 . 1
Backups
7 . 4
Recent Updates
7 . 5
Updates (diff to lkl)
(musl) libc integration
rump hypercall interface
via frankenlibc tools (for POSIX environment)
via rumprun framework (for baremetall/xen/kvm environment)
more applications
netperf (signal handling, etc)
nginx
ghc (Haskell runtime)
performance study
7 . 6
libc integration
standard lib for LKL
all syscall direct to LKL
application can use LKL transparently
no special modi cations or hijack needed
based on musl libc
introduce new (sub) architecture lkl
rump hypercall interface
replacement of LKL host_ops
or yet-another new host environment (rump)
has two thread primitives
pthread-based (as LKL does)
ucontext-based (more e cient on non-MP)
can reduce
the e ort of host_ops maintainance
complexity of tall abstraction turtle
7 . 8
rump hypcall (cont'd)
integration of
libc (musl for LKL, netbsd libc for rumpkernel)
rump hypcall (on linux, freebsd, netbsd, qemu-arm, spike)
host (platform) support code
frankenlibc
has two namespaced libc(s)
hyper call implementation can use libc
provides
a libc.a
cross-build toolchains (rumprun-cc, etc)
7 . 7
7 . 9
Usage
build
% ./configure CC=rumprun­cc ; make 
execution (with rexec launcher)
% rexec ./nginx disk­nginx.img tap:tap0 ­­ ­c nginx.conf 
rexec executable [disk image le] [NIC] -- [executable speci c options]
7 . 10
Codes
https://github.com/libos-nuse/lkl-linux
https://github.com/libos-nuse/musl
https://github.com/libos-nuse/frankenlibc
https://github.com/libos-nuse/rumprun
https://github.com/libos-nuse/nginx
https://github.com/libos-nuse/ghc

Linux Kernel Library - Reusing Monolithic Kernel

  • 1.
    1 Linux Kernel Library:Reusing Monolithic Kernel Hajime Tazaki IIJ Innovation Institute 2016/07 AIST seminar vol.2
  • 2.
    2 . 1 LKLin a nutshell Linux kernel library a library of Linux Octavian Purdila (Intel)'s work (since 2007?) Proposed on LKML (Nov. 2015) 2809 LoC (as of Apr. 2016) https://lwn.net/Articles/662953/ Purdila et al., LKL: The Linux kernel library, RoEduNet 2010.
  • 3.
    2 . 2 LKL(cont'd) hardware-independent architecture (arch/lkl) provide an interface underlying environment outsource dependencies clock, memory allocation, scheduler running on Windows, Linux, FreeBSD simplify I/O operation of devices virtio host implementation could use the driver (of virtio) in Linux Purdila et al., LKL: The Linux kernel library, RoEduNet 2010.
  • 4.
    2 . 3 Benefit lessossi cation of new features operating system personality userspace library has less deployment cost Well-matured code base (e.g.) Linux kernel running in userspace small kernel, a bunch of library but in a di erent shape
  • 5.
    Any problem incomputer science can be solved with another level of indirection. (Wheeler and/or Lampson) img src: https://www. ickr.com/photos/thomasclaveirole/305073153
  • 6.
    2 . 4 2. 5 What is reusing monolithic kernel ? Anykernel: originally in NetBSD rump kernel We de ne an anykernel to be an organization of kernel code which allows the kernel's unmodi ed drivers to be run in various con gurations such as application libraries and microkernel style servers, and also as part of a monolithic kernel. -- Kantee 2012. Using (unmodi ed) high-quality code base of monolithic kernel on di erent environment in di erent shape by gluing additional stu s
  • 7.
    2 . 62. 7 (a bit of) History rump: 2007 (NetBSD) LKL: 2007 (Linux) DCE/LibOS: 2008 (Linux/FreeBSD) LibOS/LKL revival: 2015 LibOS merged to LKL
  • 8.
  • 9.
    2 . 8 2. 9 LKL v.s. LibOS LKL LibOS
  • 10.
    LKL v.s. LibOS(cont'd) LoC: arch/lkl (LKL) < arch/lib (LibOS) di : the amount of stub code commons no modi cation to the original Linux code description of kernel context (by POSIX thread) outsourced resources (clock, memory, scheduler) CPU independent architecture di s LibOS: implemented with higher API (timer, irq, kthread) by pthread LKL: implement IRQ, kthread, timer with pthread in lower layer
  • 11.
  • 12.
    2 . 10 3. 2 Internals 1. Host backend (host_ops) 2. CPU independent arch. (arch/lkl) 3. Application interface
  • 13.
    1. host backend environmentdependent part unify an interface across di erent platforms (rump-hypercall like) device interface with Virtio block device <=> disk image networking <=> TAP, raw socket, DPDK, VDE
  • 14.
    3 . 4 2.CPU independent architecture architecture (arch/lkl) transparent architecture bind (as CPU arch) require no modi cation to the other 2800 LoC thread information (struct thread_info) irq, timer, syscall handler access to underlying layer by host_ops
  • 15.
    3 . 3 3. 5 3. Application interface 1. use exposed API (LKL syscall) 2. use host libc (LD_PRELOAD) 3. extend (alternative) libc
  • 16.
    3 . 6 API1: use exposed API (LKL syscall) call entry points of LKL kernel lkl_sys_open(), lkl_sys_socket() almost same as ordinal syscalls return value, errno noti cation are di erent can use LKL syscall and host syscall simultaneously read ext4 le by lkl_sys_read() => write into host (Windows) by write()
  • 17.
    3 . 7 API2: hijack host standard library dynamically replace symbols of host syscalls (of libc) LD_PRELOAD socket() => lkl_sys_socket() can use host binary (executable) as-is limitation of replaceable symbols needs syscall translation on non-linux host
  • 18.
    3 . 8 API3: extend (alternative) libc only call LKL syscall with our own libc also introduce as a virtual CPU architecture a program can link this instead of host libc can't access to (underlying) host resource directly via this lkl syscall as a patch for musl libc
  • 19.
    3 . 9 Usecase(applications) Use Case 1: instant kernel bypass Use Case 2: programs reusing kernel code in userspace Use Case 3: unikernel
  • 20.
    3 . 10 UseCase 1: instant kernel bypass syscall redirection by LD_PRELOAD can use both LKL and host syscalls new feature without touching host kernel LD_PRELOAD=liblkl­super­tcp++.so firefox 
  • 21.
    3 . 11 UseCase 2: programs reusing kernel code in userspace use kernel code without porting mount a lesystem w/o root privilege can use both LKL and host syscalls e.g., access to disk image of ext4 format on Windows 1. open disk image (CreateFile()) 2. Mount (lkl_sys_mount()) 3. read a le in the disk image (lkl_sys_read()) 4. write a le to windows side (WriteFile())
  • 22.
    3 . 12 UseCase 3: Unikernel single-application contained LKL python + LKL, nginx + LKL only LKL syscalls available musl libc extension rump hypcall (frankenlibc) running on non-OS environment (on Xen Mini-OS via rumprun) Work in progress - http://www.linux.com/news/enterprise/cloud- computing/751156-are-cloud-operating- systems-the-next-big-thing-
  • 23.
    3 . 13 demoswith linux kernel library Unikernel on Linux (ping6 command embedded kernel library) Unikernel on qemu-arm (hello world)
  • 24.
    4 . 1 Kernelbypass/userspace networking
  • 25.
    4 . 2 NetworkStack Why in kernel space ? the cost of packet was expensive at the era ('70s) now much cheaper Getting fat (matured) after decades code path is longer (and slower) hard to add new features faced unknown issues img src: http://www.makelinux.net/kernel_map/
  • 26.
    4 . 3 Alternatenetwork stacks lwip (2002~) Arrakis [OSDI '14] IX [OSDI '14] MegaPipe [OSDI '12] mTCP [NSDI '14] SandStorm [SIGCOMM '14] uTCP [CCR '14] rumpkernel [ATC '09] FastSocket [ASPLOS '16] SolarFlare (2007~?) StackMap [ATC '16] libuinet (2013~) SeaStar (2014~) Snabb Switch (2012~)
  • 27.
    4 . 4 Motivations SocketAPI sucks StackMap, MegaPipe, uTCP, SandStorm, IX New API: no bene t with existing applications Network stack in kernel space sucks FastSocket, mTCP, lwip (SolarFlare?) Compatibility is (also) important rumpkernel, libuinet, Arrakis, IX, SolarFlare Existing programming model sucks SeaStar
  • 28.
    4 . 5 Techniques batching(syscall/NIC access) Arrakis, IX, MegaPipe, mTCP, SandStorm, uTCP Utilize feature-rich kernel stack rumpkernel, fastsocket, StackMap Porting to userspace stack libuinet, SandStorm Kernel bypass (userspace network stack) mTCP, SandStorm, uTCP, rumpkernel, libuinet, lwip, SeaStar bypass technique itself netmap, PF_RING, raw socket, Intel DPDK Connection locality (multi-core scalability) SeaStar, MegaPipe, mTCP, fastsocket, .....
  • 29.
    4 . 6 Implementation Fullscratch lwip (Arrakis, IX, SolarFlare?), mTCP, uTCP, SeaStar Porting based libuinet, SandStorm New API MegaPipe, StackMap Anykernel rumpkernel, (LKL)
  • 30.
    4 . 7 What'sstill missing ? some solves problems by specialization avoiding generality tax performance w/ specialization v.s. more features w/ generalization e.g., less TCP stack features, new API breaks existing applications support. specialized v.s. generalized generalization often involves indirection indirection usually introduces complexity (Wheeler/Lampson) performant and generalized ?
  • 31.
  • 32.
    5 . 2 Conditions ThinkStationP310 x2 CPU: Intel Core i7-6700 CPU @ 3.40GHz (8 cores) Memory: 32GB NIC: X540-T2 Linux 4.4.6-301 (x86_64) on Fedora 23 Linux bridge (X540 + tap/raw socket) no DPDK... can't with hijack, etc netperf (git ~v2.7.0) netserver (native) netperf (varied)
  • 33.
    5 . 3 Conditions(cont'd) combinations netperf (sendmmsg) + host stack (native) + hijack library, native thread (hijack) + frankenlibc/lkl, green thread (lkl-musl) netperf (sendmmsg) + lkl extension + frankenlibc (lkl-musl (skb pre alloc)) pinned a processor using taskset command disable all o oad features (tso/gso/gro, rx/tx cksum)
  • 34.
  • 35.
  • 36.
    5 . 5 UDP_STREAM(pps, netperf)
  • 37.
  • 38.
    5 . 7 5. 8 (ref.) LibOS results (as of Feb. 2015) 1024 bytes UDP, own-crafted tool throughput: <10% of Linux native
  • 39.
    5 . 9 Observations(of benchmark) Native thread vs Green thread better TCP_RR w/ native thread (pthread) better TCP_STREAM/UDP_STREAM w/ green thread ??? avoiding dynamic allocation contributes a lot penalized over MTU-sized payload on host stack (?)
  • 40.
    6 . 1 Summary Morphingmonolithic kernel into an Anykernel Various use cases Userspace network stack (kernel bypass) Unikernel Performance study in progress https://github.com/lkl/linux
  • 41.
    6 . 2 Reference LinuxKernel Library Purdila et al., LKL: The Linux kernel library, RoEduNet 2010. Rumpkernel (dissertation) Kantee, Flexible Operating System Internals: The Design and Implementation of the Anykernel and Rump Kernels, Ph.D Thesis, 2012 Linux LibOS in general Tazaki et al. Direct Code Execution: Revisiting Library OS Architecture for Reproducible Network Experiments, CoNEXT 2013 (LibOS in general) https://github.com/lkl/linux http://libos-nuse.github.io/ https://lwn.net/Articles/637658/
  • 42.
  • 43.
    7 . 4 RecentUpdates
  • 44.
    7 . 5 Updates(diff to lkl) (musl) libc integration rump hypercall interface via frankenlibc tools (for POSIX environment) via rumprun framework (for baremetall/xen/kvm environment) more applications netperf (signal handling, etc) nginx ghc (Haskell runtime) performance study
  • 45.
    7 . 6 libcintegration standard lib for LKL all syscall direct to LKL application can use LKL transparently no special modi cations or hijack needed based on musl libc introduce new (sub) architecture lkl
  • 46.
    rump hypercall interface replacementof LKL host_ops or yet-another new host environment (rump) has two thread primitives pthread-based (as LKL does) ucontext-based (more e cient on non-MP) can reduce the e ort of host_ops maintainance complexity of tall abstraction turtle
  • 47.
    7 . 8 rumphypcall (cont'd) integration of libc (musl for LKL, netbsd libc for rumpkernel) rump hypcall (on linux, freebsd, netbsd, qemu-arm, spike) host (platform) support code frankenlibc has two namespaced libc(s) hyper call implementation can use libc provides a libc.a cross-build toolchains (rumprun-cc, etc)
  • 48.
    7 . 7 7. 9 Usage build % ./configure CC=rumprun­cc ; make  execution (with rexec launcher) % rexec ./nginx disk­nginx.img tap:tap0 ­­ ­c nginx.conf  rexec executable [disk image le] [NIC] -- [executable speci c options]
  • 49.