SlideShare a Scribd company logo
1 of 49
Download to read offline
1
Linux Kernel Library: Reusing
Monolithic Kernel
Hajime Tazaki
IIJ Innovation Institute
2016/07
AIST seminar vol.2
2 . 1
LKL in a nutshell
Linux kernel library
a library of Linux
Octavian Purdila (Intel)'s
work (since 2007?)
Proposed on LKML (Nov. 2015)
2809 LoC (as of Apr. 2016)
https://lwn.net/Articles/662953/
Purdila et al., LKL: The Linux kernel library, RoEduNet
2010.
2 . 2
LKL (cont'd)
hardware-independent architecture (arch/lkl)
provide an interface underlying environment
outsource dependencies
clock, memory allocation, scheduler
running on Windows, Linux, FreeBSD
simplify I/O operation of devices
virtio host implementation
could use the driver (of virtio) in Linux
Purdila et al., LKL: The Linux kernel library,
RoEduNet 2010.
2 . 3
Benefit
less ossi cation of new features
operating system personality
userspace library has less deployment cost
Well-matured code base
(e.g.) Linux kernel running in userspace
small kernel, a bunch of library
but in a di erent shape
Any problem in computer science can be solved with
another level of indirection.
(Wheeler and/or Lampson)
img src: https://www. ickr.com/photos/thomasclaveirole/305073153
2 . 4
2 . 5
What is reusing monolithic kernel ?
Anykernel: originally in NetBSD rump kernel
We de ne an anykernel to be an organization of
kernel code which allows the kernel's unmodi ed
drivers to be run in various con gurations such as
application libraries and microkernel style servers,
and also as part of a monolithic kernel. -- Kantee
2012.
Using (unmodi ed) high-quality code base of monolithic kernel
on di erent environment in di erent shape
by gluing additional stu s
2 . 62 . 7
(a bit
of)
History
rump: 2007 (NetBSD)
LKL: 2007 (Linux)
DCE/LibOS: 2008 (Linux/FreeBSD)
LibOS/LKL revival: 2015
LibOS merged to LKL
http://news.mynavi.jp/news/2015/03/25/285/
https://news.ycombinator.com/item?id=9259292
http://www.phoronix.com/scan.php?page=news_item&px=Linux-Library-LibOS
http://lwn.net/Articles/639333/
2 . 8
2 . 9
LKL v.s. LibOS
LKL LibOS
LKL v.s. LibOS (cont'd)
LoC:
arch/lkl (LKL) < arch/lib (LibOS)
di : the amount of stub code
commons
no modi cation to the original Linux code
description of kernel context (by POSIX thread)
outsourced resources (clock, memory, scheduler)
CPU independent architecture
di s
LibOS: implemented with higher API (timer, irq, kthread) by pthread
LKL: implement IRQ, kthread, timer with pthread in lower layer
3 . 1
Implementation
2 . 10
3 . 2
Internals
1. Host backend (host_ops)
2. CPU independent arch. (arch/lkl)
3. Application interface
1. host backend
environment dependent
part
unify an interface across
di erent platforms
(rump-hypercall like)
device interface with Virtio
block device <=> disk image
networking <=> TAP,
raw socket, DPDK, VDE
3 . 4
2. CPU independent architecture
architecture (arch/lkl)
transparent architecture bind
(as CPU arch)
require no modi cation to
the other
2800 LoC
thread information (struct
thread_info)
irq, timer, syscall handler
access to underlying layer
by host_ops
3 . 3
3 . 5
3. Application interface
1. use exposed API (LKL syscall)
2. use host libc (LD_PRELOAD)
3. extend (alternative) libc
3 . 6
API 1: use exposed API (LKL
syscall)
call entry points of LKL kernel
lkl_sys_open(), lkl_sys_socket()
almost same as ordinal syscalls
return value, errno noti cation are di erent
can use LKL syscall and host syscall
simultaneously
read ext4 le by lkl_sys_read() =>
write into host (Windows) by write()
3 . 7
API 2: hijack host standard library
dynamically replace symbols
of host syscalls (of libc)
LD_PRELOAD
socket() => lkl_sys_socket()
can use host binary (executable) as-is
limitation of replaceable symbols
needs syscall translation on non-linux host
3 . 8
API 3: extend (alternative) libc
only call LKL syscall with our own libc
also introduce as a virtual CPU architecture
a program can link this instead of host libc
can't access to (underlying) host resource
directly via this lkl syscall
as a patch for musl libc
3 . 9
Usecase (applications)
Use Case 1: instant kernel bypass
Use Case 2: programs reusing kernel code in userspace
Use Case 3: unikernel
3 . 10
Use Case 1: instant kernel bypass
syscall redirection by LD_PRELOAD
can use both LKL and host syscalls
new feature without touching host kernel
LD_PRELOAD=liblkl­super­tcp++.so firefox 
3 . 11
Use Case 2: programs reusing
kernel code in userspace
use kernel code without porting
mount a lesystem w/o root privilege
can use both LKL and host syscalls
e.g., access to disk image of ext4 format on Windows
1. open disk image (CreateFile())
2. Mount (lkl_sys_mount())
3. read a le in the disk image (lkl_sys_read())
4. write a le to windows side (WriteFile())
3 . 12
Use Case 3: Unikernel
single-application contained LKL
python + LKL, nginx + LKL
only LKL syscalls available
musl libc extension
rump hypcall (frankenlibc)
running on non-OS environment
(on Xen Mini-OS via rumprun)
Work in progress
- http://www.linux.com/news/enterprise/cloud-
computing/751156-are-cloud-operating-
systems-the-next-big-thing-
3 . 13
demos with linux kernel library
Unikernel on Linux (ping6 command
embedded kernel library)
Unikernel on qemu-arm (hello
world)
4 . 1
Kernel bypass/userspace
networking
4 . 2
Network Stack
Why in kernel space ?
the cost of packet was
expensive at the era ('70s)
now much cheaper
Getting fat (matured)
after decades
code path is longer
(and slower)
hard to add new features
faced unknown issues
img src: http://www.makelinux.net/kernel_map/
4 . 3
Alternate network stacks
lwip (2002~)
Arrakis [OSDI '14]
IX [OSDI '14]
MegaPipe [OSDI '12]
mTCP [NSDI '14]
SandStorm [SIGCOMM '14]
uTCP [CCR '14]
rumpkernel [ATC '09]
FastSocket [ASPLOS '16]
SolarFlare (2007~?)
StackMap [ATC '16]
libuinet (2013~)
SeaStar (2014~)
Snabb Switch (2012~)
4 . 4
Motivations
Socket API sucks
StackMap, MegaPipe, uTCP, SandStorm, IX
New API: no bene t with existing applications
Network stack in kernel space sucks
FastSocket, mTCP, lwip (SolarFlare?)
Compatibility is (also) important
rumpkernel, libuinet, Arrakis, IX, SolarFlare
Existing programming model sucks
SeaStar
4 . 5
Techniques
batching (syscall/NIC access)
Arrakis, IX, MegaPipe, mTCP, SandStorm, uTCP
Utilize feature-rich kernel stack
rumpkernel, fastsocket, StackMap
Porting to userspace stack
libuinet, SandStorm
Kernel bypass (userspace network stack)
mTCP, SandStorm, uTCP, rumpkernel, libuinet, lwip, SeaStar
bypass technique itself
netmap, PF_RING, raw socket, Intel DPDK
Connection locality (multi-core scalability)
SeaStar, MegaPipe, mTCP, fastsocket, .....
4 . 6
Implementation
Full scratch
lwip (Arrakis, IX, SolarFlare?), mTCP, uTCP, SeaStar
Porting based
libuinet, SandStorm
New API
MegaPipe, StackMap
Anykernel
rumpkernel, (LKL)
4 . 7
What's still missing ?
some solves problems by specialization
avoiding generality tax
performance w/ specialization v.s. more features w/ generalization
e.g., less TCP stack features, new API breaks existing applications
support.
specialized v.s. generalized
generalization often involves indirection
indirection usually introduces complexity (Wheeler/Lampson)
performant and generalized ?
5 . 1
Performance study
5 . 2
Conditions
ThinkStation P310 x2
CPU: Intel Core i7-6700 CPU @ 3.40GHz (8 cores)
Memory: 32GB
NIC: X540-T2
Linux 4.4.6-301 (x86_64) on Fedora 23
Linux bridge (X540 + tap/raw socket)
no DPDK... can't with hijack, etc
netperf (git ~v2.7.0)
netserver (native)
netperf (varied)
5 . 3
Conditions (cont'd)
combinations
netperf (sendmmsg) + host stack (native)
+ hijack library, native thread (hijack)
+ frankenlibc/lkl, green thread (lkl-musl)
netperf (sendmmsg) + lkl extension + frankenlibc (lkl-musl (skb pre
alloc))
pinned a processor
using taskset command
disable all o oad features (tso/gso/gro, rx/tx cksum)
TCP_RR (netperf)
5 . 4
UDP_STREAM (netperf)
5 . 5
UDP_STREAM (pps, netperf)
5 . 6
TCP_STREAM (netperf)
5 . 7
5 . 8
(ref.) LibOS results (as of Feb.
2015)
1024 bytes UDP, own-crafted tool
throughput: <10% of Linux native
5 . 9
Observations (of benchmark)
Native thread vs Green thread
better TCP_RR w/ native thread (pthread)
better TCP_STREAM/UDP_STREAM w/ green thread
???
avoiding dynamic allocation contributes a lot
penalized over MTU-sized payload on host stack (?)
6 . 1
Summary
Morphing monolithic kernel into an Anykernel
Various use cases
Userspace network stack (kernel bypass)
Unikernel
Performance study in progress
https://github.com/lkl/linux
6 . 2
Reference
Linux Kernel Library
Purdila et al., LKL: The Linux kernel library, RoEduNet 2010.
Rumpkernel (dissertation)
Kantee, Flexible Operating System Internals: The Design and
Implementation of the Anykernel and Rump Kernels, Ph.D Thesis,
2012
Linux LibOS in general
Tazaki et al. Direct Code Execution: Revisiting Library OS
Architecture for Reproducible Network Experiments, CoNEXT 2013
(LibOS in general)
https://github.com/lkl/linux
http://libos-nuse.github.io/
https://lwn.net/Articles/637658/
7 . 1
Backups
7 . 4
Recent Updates
7 . 5
Updates (diff to lkl)
(musl) libc integration
rump hypercall interface
via frankenlibc tools (for POSIX environment)
via rumprun framework (for baremetall/xen/kvm environment)
more applications
netperf (signal handling, etc)
nginx
ghc (Haskell runtime)
performance study
7 . 6
libc integration
standard lib for LKL
all syscall direct to LKL
application can use LKL transparently
no special modi cations or hijack needed
based on musl libc
introduce new (sub) architecture lkl
rump hypercall interface
replacement of LKL host_ops
or yet-another new host environment (rump)
has two thread primitives
pthread-based (as LKL does)
ucontext-based (more e cient on non-MP)
can reduce
the e ort of host_ops maintainance
complexity of tall abstraction turtle
7 . 8
rump hypcall (cont'd)
integration of
libc (musl for LKL, netbsd libc for rumpkernel)
rump hypcall (on linux, freebsd, netbsd, qemu-arm, spike)
host (platform) support code
frankenlibc
has two namespaced libc(s)
hyper call implementation can use libc
provides
a libc.a
cross-build toolchains (rumprun-cc, etc)
7 . 7
7 . 9
Usage
build
% ./configure CC=rumprun­cc ; make 
execution (with rexec launcher)
% rexec ./nginx disk­nginx.img tap:tap0 ­­ ­c nginx.conf 
rexec executable [disk image le] [NIC] -- [executable speci c options]
7 . 10
Codes
https://github.com/libos-nuse/lkl-linux
https://github.com/libos-nuse/musl
https://github.com/libos-nuse/frankenlibc
https://github.com/libos-nuse/rumprun
https://github.com/libos-nuse/nginx
https://github.com/libos-nuse/ghc

More Related Content

What's hot

Kernelvm 201312-dlmopen
Kernelvm 201312-dlmopenKernelvm 201312-dlmopen
Kernelvm 201312-dlmopen
Hajime Tazaki
 
Achieving Performance Isolation with Lightweight Co-Kernels
Achieving Performance Isolation with Lightweight Co-KernelsAchieving Performance Isolation with Lightweight Co-Kernels
Achieving Performance Isolation with Lightweight Co-Kernels
Jiannan Ouyang, PhD
 
CETH for XDP [Linux Meetup Santa Clara | July 2016]
CETH for XDP [Linux Meetup Santa Clara | July 2016] CETH for XDP [Linux Meetup Santa Clara | July 2016]
CETH for XDP [Linux Meetup Santa Clara | July 2016]
IO Visor Project
 
Shoot4U: Using VMM Assists to Optimize TLB Operations on Preempted vCPUs
Shoot4U: Using VMM Assists to Optimize TLB Operations on Preempted vCPUsShoot4U: Using VMM Assists to Optimize TLB Operations on Preempted vCPUs
Shoot4U: Using VMM Assists to Optimize TLB Operations on Preempted vCPUs
Jiannan Ouyang, PhD
 
Recent advance in netmap/VALE(mSwitch)
Recent advance in netmap/VALE(mSwitch)Recent advance in netmap/VALE(mSwitch)
Recent advance in netmap/VALE(mSwitch)
micchie
 

What's hot (20)

NUSE (Network Stack in Userspace) at #osio
NUSE (Network Stack in Userspace) at #osioNUSE (Network Stack in Userspace) at #osio
NUSE (Network Stack in Userspace) at #osio
 
mTCP使ってみた
mTCP使ってみたmTCP使ってみた
mTCP使ってみた
 
Kernelvm 201312-dlmopen
Kernelvm 201312-dlmopenKernelvm 201312-dlmopen
Kernelvm 201312-dlmopen
 
Realizing Linux Containers (LXC)
Realizing Linux Containers (LXC)Realizing Linux Containers (LXC)
Realizing Linux Containers (LXC)
 
Achieving Performance Isolation with Lightweight Co-Kernels
Achieving Performance Isolation with Lightweight Co-KernelsAchieving Performance Isolation with Lightweight Co-Kernels
Achieving Performance Isolation with Lightweight Co-Kernels
 
Introduction to DPDK
Introduction to DPDKIntroduction to DPDK
Introduction to DPDK
 
CETH for XDP [Linux Meetup Santa Clara | July 2016]
CETH for XDP [Linux Meetup Santa Clara | July 2016] CETH for XDP [Linux Meetup Santa Clara | July 2016]
CETH for XDP [Linux Meetup Santa Clara | July 2016]
 
FD.io Vector Packet Processing (VPP)
FD.io Vector Packet Processing (VPP)FD.io Vector Packet Processing (VPP)
FD.io Vector Packet Processing (VPP)
 
Linux Kernel Cryptographic API and Use Cases
Linux Kernel Cryptographic API and Use CasesLinux Kernel Cryptographic API and Use Cases
Linux Kernel Cryptographic API and Use Cases
 
Shoot4U: Using VMM Assists to Optimize TLB Operations on Preempted vCPUs
Shoot4U: Using VMM Assists to Optimize TLB Operations on Preempted vCPUsShoot4U: Using VMM Assists to Optimize TLB Operations on Preempted vCPUs
Shoot4U: Using VMM Assists to Optimize TLB Operations on Preempted vCPUs
 
Introduction to RCU
Introduction to RCUIntroduction to RCU
Introduction to RCU
 
VLANs in the Linux Kernel
VLANs in the Linux KernelVLANs in the Linux Kernel
VLANs in the Linux Kernel
 
Recent advance in netmap/VALE(mSwitch)
Recent advance in netmap/VALE(mSwitch)Recent advance in netmap/VALE(mSwitch)
Recent advance in netmap/VALE(mSwitch)
 
DPDK KNI interface
DPDK KNI interfaceDPDK KNI interface
DPDK KNI interface
 
Introduction to eBPF
Introduction to eBPFIntroduction to eBPF
Introduction to eBPF
 
introduction to linux kernel tcp/ip ptocotol stack
introduction to linux kernel tcp/ip ptocotol stack introduction to linux kernel tcp/ip ptocotol stack
introduction to linux kernel tcp/ip ptocotol stack
 
How to Speak Intel DPDK KNI for Web Services.
How to Speak Intel DPDK KNI for Web Services.How to Speak Intel DPDK KNI for Web Services.
How to Speak Intel DPDK KNI for Web Services.
 
Netmap presentation
Netmap presentationNetmap presentation
Netmap presentation
 
The 7 Deadly Sins of Packet Processing - Venky Venkatesan and Bruce Richardson
The 7 Deadly Sins of Packet Processing - Venky Venkatesan and Bruce RichardsonThe 7 Deadly Sins of Packet Processing - Venky Venkatesan and Bruce Richardson
The 7 Deadly Sins of Packet Processing - Venky Venkatesan and Bruce Richardson
 
LinuxCon 2015 Linux Kernel Networking Walkthrough
LinuxCon 2015 Linux Kernel Networking WalkthroughLinuxCon 2015 Linux Kernel Networking Walkthrough
LinuxCon 2015 Linux Kernel Networking Walkthrough
 

Viewers also liked

Janogia20120921 tsuchiyashishio
Janogia20120921 tsuchiyashishioJanogia20120921 tsuchiyashishio
Janogia20120921 tsuchiyashishio
Keisuke Ishibashi
 
Janogia20120921 yoshinotakeshi
Janogia20120921 yoshinotakeshiJanogia20120921 yoshinotakeshi
Janogia20120921 yoshinotakeshi
Keisuke Ishibashi
 
horiyo-talk-CfS-20150527
horiyo-talk-CfS-20150527horiyo-talk-CfS-20150527
horiyo-talk-CfS-20150527
Saga University
 

Viewers also liked (20)

IIJlab seminar - Linux Kernel Library: Reusable monolithic kernel (in Japanese)
IIJlab seminar - Linux Kernel Library: Reusable monolithic kernel (in Japanese)IIJlab seminar - Linux Kernel Library: Reusable monolithic kernel (in Japanese)
IIJlab seminar - Linux Kernel Library: Reusable monolithic kernel (in Japanese)
 
Kernel Recipes 2015: Kernel packet capture technologies
Kernel Recipes 2015: Kernel packet capture technologiesKernel Recipes 2015: Kernel packet capture technologies
Kernel Recipes 2015: Kernel packet capture technologies
 
信学会IA研(広島市立大,2011年12月)招待講演発表資料,小川晃通,「2011年インターネット関連ニュース総括」
信学会IA研(広島市立大,2011年12月)招待講演発表資料,小川晃通,「2011年インターネット関連ニュース総括」信学会IA研(広島市立大,2011年12月)招待講演発表資料,小川晃通,「2011年インターネット関連ニュース総括」
信学会IA研(広島市立大,2011年12月)招待講演発表資料,小川晃通,「2011年インターネット関連ニュース総括」
 
Fablab baisc
Fablab baiscFablab baisc
Fablab baisc
 
Lively Walk-Through: A Lightweight Formal Method in UI/UX design
Lively Walk-Through: A Lightweight Formal Method in UI/UX designLively Walk-Through: A Lightweight Formal Method in UI/UX design
Lively Walk-Through: A Lightweight Formal Method in UI/UX design
 
2016.03.04 NetOpsCoding#2
2016.03.04 NetOpsCoding#22016.03.04 NetOpsCoding#2
2016.03.04 NetOpsCoding#2
 
ドメイン名の ライフサイクルマネージメント
ドメイン名の ライフサイクルマネージメントドメイン名の ライフサイクルマネージメント
ドメイン名の ライフサイクルマネージメント
 
昨今のトラフィック状況
昨今のトラフィック状況昨今のトラフィック状況
昨今のトラフィック状況
 
Debian tokyo-20150224-01
Debian tokyo-20150224-01Debian tokyo-20150224-01
Debian tokyo-20150224-01
 
Janogia20120921 tsuchiyashishio
Janogia20120921 tsuchiyashishioJanogia20120921 tsuchiyashishio
Janogia20120921 tsuchiyashishio
 
Janogia20120921 yoshinotakeshi
Janogia20120921 yoshinotakeshiJanogia20120921 yoshinotakeshi
Janogia20120921 yoshinotakeshi
 
horiyo-talk-CfS-20150527
horiyo-talk-CfS-20150527horiyo-talk-CfS-20150527
horiyo-talk-CfS-20150527
 
キメチャッテ
キメチャッテキメチャッテ
キメチャッテ
 
Capturando pacotes de rede no kernelspace
Capturando pacotes de rede no kernelspaceCapturando pacotes de rede no kernelspace
Capturando pacotes de rede no kernelspace
 
仮想通貨テストベッドネットワークの構築
仮想通貨テストベッドネットワークの構築仮想通貨テストベッドネットワークの構築
仮想通貨テストベッドネットワークの構築
 
Linux下Poll和Epoll内核源码剖析
Linux下Poll和Epoll内核源码剖析Linux下Poll和Epoll内核源码剖析
Linux下Poll和Epoll内核源码剖析
 
Benchmarkspec
BenchmarkspecBenchmarkspec
Benchmarkspec
 
Data Structures used in Linux kernel
Data Structures used in Linux kernel Data Structures used in Linux kernel
Data Structures used in Linux kernel
 
ASAMAP 開発秘話
ASAMAP 開発秘話ASAMAP 開発秘話
ASAMAP 開発秘話
 
運用自動化に向けての現場からの課題
運用自動化に向けての現場からの課題運用自動化に向けての現場からの課題
運用自動化に向けての現場からの課題
 

Similar to Linux Kernel Library - Reusing Monolithic Kernel

Linux26 New Features
Linux26 New FeaturesLinux26 New Features
Linux26 New Features
guest491c69
 
Network & Filesystem: Doing less cross rings memory copy
Network & Filesystem: Doing less cross rings memory copyNetwork & Filesystem: Doing less cross rings memory copy
Network & Filesystem: Doing less cross rings memory copy
Scaleway
 
DUSK - Develop at Userland Install into Kernel
DUSK - Develop at Userland Install into KernelDUSK - Develop at Userland Install into Kernel
DUSK - Develop at Userland Install into Kernel
Alexey Smirnov
 
Evolution of Linux Containerization
Evolution of Linux Containerization Evolution of Linux Containerization
Evolution of Linux Containerization
WSO2
 

Similar to Linux Kernel Library - Reusing Monolithic Kernel (20)

Evolution of containers to kubernetes
Evolution of containers to kubernetesEvolution of containers to kubernetes
Evolution of containers to kubernetes
 
The Why and How of HPC-Cloud Hybrids with OpenStack - Lev Lafayette, Universi...
The Why and How of HPC-Cloud Hybrids with OpenStack - Lev Lafayette, Universi...The Why and How of HPC-Cloud Hybrids with OpenStack - Lev Lafayette, Universi...
The Why and How of HPC-Cloud Hybrids with OpenStack - Lev Lafayette, Universi...
 
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
 
CRIU: are we there yet?
CRIU: are we there yet?CRIU: are we there yet?
CRIU: are we there yet?
 
UniK - a unikernel compiler and runtime
UniK - a unikernel compiler and runtimeUniK - a unikernel compiler and runtime
UniK - a unikernel compiler and runtime
 
20240201 [HPC Containers] Rootless Containers.pdf
20240201 [HPC Containers] Rootless Containers.pdf20240201 [HPC Containers] Rootless Containers.pdf
20240201 [HPC Containers] Rootless Containers.pdf
 
Linux26 New Features
Linux26 New FeaturesLinux26 New Features
Linux26 New Features
 
Journal Seminar: Is Singularity-based Container Technology Ready for Running ...
Journal Seminar: Is Singularity-based Container Technology Ready for Running ...Journal Seminar: Is Singularity-based Container Technology Ready for Running ...
Journal Seminar: Is Singularity-based Container Technology Ready for Running ...
 
Network & Filesystem: Doing less cross rings memory copy
Network & Filesystem: Doing less cross rings memory copyNetwork & Filesystem: Doing less cross rings memory copy
Network & Filesystem: Doing less cross rings memory copy
 
Linux Container Brief for IEEE WG P2302
Linux Container Brief for IEEE WG P2302Linux Container Brief for IEEE WG P2302
Linux Container Brief for IEEE WG P2302
 
Linux containers – next gen virtualization for cloud (atl summit) ar4 3 - copy
Linux containers – next gen virtualization for cloud (atl summit) ar4 3 - copyLinux containers – next gen virtualization for cloud (atl summit) ar4 3 - copy
Linux containers – next gen virtualization for cloud (atl summit) ar4 3 - copy
 
OCP Engineering Workshop at UNH
OCP Engineering Workshop at UNH OCP Engineering Workshop at UNH
OCP Engineering Workshop at UNH
 
Intel Briefing Notes
Intel Briefing NotesIntel Briefing Notes
Intel Briefing Notes
 
Rclex: A Library for Robotics meet Elixir
Rclex: A Library for Robotics meet ElixirRclex: A Library for Robotics meet Elixir
Rclex: A Library for Robotics meet Elixir
 
Docker London: Container Security
Docker London: Container SecurityDocker London: Container Security
Docker London: Container Security
 
DUSK - Develop at Userland Install into Kernel
DUSK - Develop at Userland Install into KernelDUSK - Develop at Userland Install into Kernel
DUSK - Develop at Userland Install into Kernel
 
Lua and its Ecosystem
Lua and its EcosystemLua and its Ecosystem
Lua and its Ecosystem
 
Evolution of Linux Containerization
Evolution of Linux Containerization Evolution of Linux Containerization
Evolution of Linux Containerization
 
Evoluation of Linux Container Virtualization
Evoluation of Linux Container VirtualizationEvoluation of Linux Container Virtualization
Evoluation of Linux Container Virtualization
 
Containerize! Between Docker and Jube.
Containerize! Between Docker and Jube.Containerize! Between Docker and Jube.
Containerize! Between Docker and Jube.
 

Recently uploaded

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Recently uploaded (20)

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Navigating Identity and Access Management in the Modern Enterprise
Navigating Identity and Access Management in the Modern EnterpriseNavigating Identity and Access Management in the Modern Enterprise
Navigating Identity and Access Management in the Modern Enterprise
 
Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Quantum Leap in Next-Generation Computing
Quantum Leap in Next-Generation ComputingQuantum Leap in Next-Generation Computing
Quantum Leap in Next-Generation Computing
 

Linux Kernel Library - Reusing Monolithic Kernel

  • 1. 1 Linux Kernel Library: Reusing Monolithic Kernel Hajime Tazaki IIJ Innovation Institute 2016/07 AIST seminar vol.2
  • 2. 2 . 1 LKL in a nutshell Linux kernel library a library of Linux Octavian Purdila (Intel)'s work (since 2007?) Proposed on LKML (Nov. 2015) 2809 LoC (as of Apr. 2016) https://lwn.net/Articles/662953/ Purdila et al., LKL: The Linux kernel library, RoEduNet 2010.
  • 3. 2 . 2 LKL (cont'd) hardware-independent architecture (arch/lkl) provide an interface underlying environment outsource dependencies clock, memory allocation, scheduler running on Windows, Linux, FreeBSD simplify I/O operation of devices virtio host implementation could use the driver (of virtio) in Linux Purdila et al., LKL: The Linux kernel library, RoEduNet 2010.
  • 4. 2 . 3 Benefit less ossi cation of new features operating system personality userspace library has less deployment cost Well-matured code base (e.g.) Linux kernel running in userspace small kernel, a bunch of library but in a di erent shape
  • 5. Any problem in computer science can be solved with another level of indirection. (Wheeler and/or Lampson) img src: https://www. ickr.com/photos/thomasclaveirole/305073153
  • 6. 2 . 4 2 . 5 What is reusing monolithic kernel ? Anykernel: originally in NetBSD rump kernel We de ne an anykernel to be an organization of kernel code which allows the kernel's unmodi ed drivers to be run in various con gurations such as application libraries and microkernel style servers, and also as part of a monolithic kernel. -- Kantee 2012. Using (unmodi ed) high-quality code base of monolithic kernel on di erent environment in di erent shape by gluing additional stu s
  • 7. 2 . 62 . 7 (a bit of) History rump: 2007 (NetBSD) LKL: 2007 (Linux) DCE/LibOS: 2008 (Linux/FreeBSD) LibOS/LKL revival: 2015 LibOS merged to LKL
  • 9. 2 . 8 2 . 9 LKL v.s. LibOS LKL LibOS
  • 10. LKL v.s. LibOS (cont'd) LoC: arch/lkl (LKL) < arch/lib (LibOS) di : the amount of stub code commons no modi cation to the original Linux code description of kernel context (by POSIX thread) outsourced resources (clock, memory, scheduler) CPU independent architecture di s LibOS: implemented with higher API (timer, irq, kthread) by pthread LKL: implement IRQ, kthread, timer with pthread in lower layer
  • 12. 2 . 10 3 . 2 Internals 1. Host backend (host_ops) 2. CPU independent arch. (arch/lkl) 3. Application interface
  • 13. 1. host backend environment dependent part unify an interface across di erent platforms (rump-hypercall like) device interface with Virtio block device <=> disk image networking <=> TAP, raw socket, DPDK, VDE
  • 14. 3 . 4 2. CPU independent architecture architecture (arch/lkl) transparent architecture bind (as CPU arch) require no modi cation to the other 2800 LoC thread information (struct thread_info) irq, timer, syscall handler access to underlying layer by host_ops
  • 15. 3 . 3 3 . 5 3. Application interface 1. use exposed API (LKL syscall) 2. use host libc (LD_PRELOAD) 3. extend (alternative) libc
  • 16. 3 . 6 API 1: use exposed API (LKL syscall) call entry points of LKL kernel lkl_sys_open(), lkl_sys_socket() almost same as ordinal syscalls return value, errno noti cation are di erent can use LKL syscall and host syscall simultaneously read ext4 le by lkl_sys_read() => write into host (Windows) by write()
  • 17. 3 . 7 API 2: hijack host standard library dynamically replace symbols of host syscalls (of libc) LD_PRELOAD socket() => lkl_sys_socket() can use host binary (executable) as-is limitation of replaceable symbols needs syscall translation on non-linux host
  • 18. 3 . 8 API 3: extend (alternative) libc only call LKL syscall with our own libc also introduce as a virtual CPU architecture a program can link this instead of host libc can't access to (underlying) host resource directly via this lkl syscall as a patch for musl libc
  • 19. 3 . 9 Usecase (applications) Use Case 1: instant kernel bypass Use Case 2: programs reusing kernel code in userspace Use Case 3: unikernel
  • 20. 3 . 10 Use Case 1: instant kernel bypass syscall redirection by LD_PRELOAD can use both LKL and host syscalls new feature without touching host kernel LD_PRELOAD=liblkl­super­tcp++.so firefox 
  • 21. 3 . 11 Use Case 2: programs reusing kernel code in userspace use kernel code without porting mount a lesystem w/o root privilege can use both LKL and host syscalls e.g., access to disk image of ext4 format on Windows 1. open disk image (CreateFile()) 2. Mount (lkl_sys_mount()) 3. read a le in the disk image (lkl_sys_read()) 4. write a le to windows side (WriteFile())
  • 22. 3 . 12 Use Case 3: Unikernel single-application contained LKL python + LKL, nginx + LKL only LKL syscalls available musl libc extension rump hypcall (frankenlibc) running on non-OS environment (on Xen Mini-OS via rumprun) Work in progress - http://www.linux.com/news/enterprise/cloud- computing/751156-are-cloud-operating- systems-the-next-big-thing-
  • 23. 3 . 13 demos with linux kernel library Unikernel on Linux (ping6 command embedded kernel library) Unikernel on qemu-arm (hello world)
  • 24. 4 . 1 Kernel bypass/userspace networking
  • 25. 4 . 2 Network Stack Why in kernel space ? the cost of packet was expensive at the era ('70s) now much cheaper Getting fat (matured) after decades code path is longer (and slower) hard to add new features faced unknown issues img src: http://www.makelinux.net/kernel_map/
  • 26. 4 . 3 Alternate network stacks lwip (2002~) Arrakis [OSDI '14] IX [OSDI '14] MegaPipe [OSDI '12] mTCP [NSDI '14] SandStorm [SIGCOMM '14] uTCP [CCR '14] rumpkernel [ATC '09] FastSocket [ASPLOS '16] SolarFlare (2007~?) StackMap [ATC '16] libuinet (2013~) SeaStar (2014~) Snabb Switch (2012~)
  • 27. 4 . 4 Motivations Socket API sucks StackMap, MegaPipe, uTCP, SandStorm, IX New API: no bene t with existing applications Network stack in kernel space sucks FastSocket, mTCP, lwip (SolarFlare?) Compatibility is (also) important rumpkernel, libuinet, Arrakis, IX, SolarFlare Existing programming model sucks SeaStar
  • 28. 4 . 5 Techniques batching (syscall/NIC access) Arrakis, IX, MegaPipe, mTCP, SandStorm, uTCP Utilize feature-rich kernel stack rumpkernel, fastsocket, StackMap Porting to userspace stack libuinet, SandStorm Kernel bypass (userspace network stack) mTCP, SandStorm, uTCP, rumpkernel, libuinet, lwip, SeaStar bypass technique itself netmap, PF_RING, raw socket, Intel DPDK Connection locality (multi-core scalability) SeaStar, MegaPipe, mTCP, fastsocket, .....
  • 29. 4 . 6 Implementation Full scratch lwip (Arrakis, IX, SolarFlare?), mTCP, uTCP, SeaStar Porting based libuinet, SandStorm New API MegaPipe, StackMap Anykernel rumpkernel, (LKL)
  • 30. 4 . 7 What's still missing ? some solves problems by specialization avoiding generality tax performance w/ specialization v.s. more features w/ generalization e.g., less TCP stack features, new API breaks existing applications support. specialized v.s. generalized generalization often involves indirection indirection usually introduces complexity (Wheeler/Lampson) performant and generalized ?
  • 32. 5 . 2 Conditions ThinkStation P310 x2 CPU: Intel Core i7-6700 CPU @ 3.40GHz (8 cores) Memory: 32GB NIC: X540-T2 Linux 4.4.6-301 (x86_64) on Fedora 23 Linux bridge (X540 + tap/raw socket) no DPDK... can't with hijack, etc netperf (git ~v2.7.0) netserver (native) netperf (varied)
  • 33. 5 . 3 Conditions (cont'd) combinations netperf (sendmmsg) + host stack (native) + hijack library, native thread (hijack) + frankenlibc/lkl, green thread (lkl-musl) netperf (sendmmsg) + lkl extension + frankenlibc (lkl-musl (skb pre alloc)) pinned a processor using taskset command disable all o oad features (tso/gso/gro, rx/tx cksum)
  • 35. 5 . 4 UDP_STREAM (netperf)
  • 36. 5 . 5 UDP_STREAM (pps, netperf)
  • 37. 5 . 6 TCP_STREAM (netperf)
  • 38. 5 . 7 5 . 8 (ref.) LibOS results (as of Feb. 2015) 1024 bytes UDP, own-crafted tool throughput: <10% of Linux native
  • 39. 5 . 9 Observations (of benchmark) Native thread vs Green thread better TCP_RR w/ native thread (pthread) better TCP_STREAM/UDP_STREAM w/ green thread ??? avoiding dynamic allocation contributes a lot penalized over MTU-sized payload on host stack (?)
  • 40. 6 . 1 Summary Morphing monolithic kernel into an Anykernel Various use cases Userspace network stack (kernel bypass) Unikernel Performance study in progress https://github.com/lkl/linux
  • 41. 6 . 2 Reference Linux Kernel Library Purdila et al., LKL: The Linux kernel library, RoEduNet 2010. Rumpkernel (dissertation) Kantee, Flexible Operating System Internals: The Design and Implementation of the Anykernel and Rump Kernels, Ph.D Thesis, 2012 Linux LibOS in general Tazaki et al. Direct Code Execution: Revisiting Library OS Architecture for Reproducible Network Experiments, CoNEXT 2013 (LibOS in general) https://github.com/lkl/linux http://libos-nuse.github.io/ https://lwn.net/Articles/637658/
  • 43. 7 . 4 Recent Updates
  • 44. 7 . 5 Updates (diff to lkl) (musl) libc integration rump hypercall interface via frankenlibc tools (for POSIX environment) via rumprun framework (for baremetall/xen/kvm environment) more applications netperf (signal handling, etc) nginx ghc (Haskell runtime) performance study
  • 45. 7 . 6 libc integration standard lib for LKL all syscall direct to LKL application can use LKL transparently no special modi cations or hijack needed based on musl libc introduce new (sub) architecture lkl
  • 46. rump hypercall interface replacement of LKL host_ops or yet-another new host environment (rump) has two thread primitives pthread-based (as LKL does) ucontext-based (more e cient on non-MP) can reduce the e ort of host_ops maintainance complexity of tall abstraction turtle
  • 47. 7 . 8 rump hypcall (cont'd) integration of libc (musl for LKL, netbsd libc for rumpkernel) rump hypcall (on linux, freebsd, netbsd, qemu-arm, spike) host (platform) support code frankenlibc has two namespaced libc(s) hyper call implementation can use libc provides a libc.a cross-build toolchains (rumprun-cc, etc)
  • 48. 7 . 7 7 . 9 Usage build % ./configure CC=rumprun­cc ; make  execution (with rexec launcher) % rexec ./nginx disk­nginx.img tap:tap0 ­­ ­c nginx.conf  rexec executable [disk image le] [NIC] -- [executable speci c options]