Noah - Robust and Flexible Operating System Compatibility Architecture - Container runtime meetup #2

Takaya Saeki
Takaya SaekiSoftware Developer
Noah
A Robust and Flexible Operating System
Compatibility Architecture
Takahiro Shinagawa Shinichi HonidenYuichi Nishiwaki
Takaya Saeki (@nullpo_head)
Who I am?
• Takaya Saeki (@nullpo_head)
• Software Engineer
• Likes web and system layer
• Projects that might sound interesting
• Noah
• Ported XV6 OS to MIPS, and to a home-built FPGA CPU
• Sudo by Windows Hello in WSL
2
What’s Noah?
3
Noah: User-space Linux*compatibility layer
powered by virtualization
1. As an implementation
Noah runs Linux apps in macOS, like WSL 1 in Windows
2. As an architecture
Noah is a kind of user-space kernel for OS compatibility, powered by virtualization.
No Linux emulation kernel extension (unlike WSL 1)
• Loads a guest binary to an empty VM without any kernel
• Traps System calls, and emulates them in the user space with
• Accomplishes memory management such as CoW by virtualization
4
* The architecture is not limited to Linux, but can be applied to other operating systems
=> Technially fun!
=> Academic novelty (APSys ’17, VEE ‘20)
Short Demo
5
6
Video: Download pptx to play it
Why did we start Noah project?
7
Linux
• One of the most important
operating systems
• Today’s de facto standard ecosystem
• Kubernetes / Docker
OS Compatibility Layers
• Windows and FreeBSD have Linux compatibility layer
to utilize Linux ecosystem natively
• So, why not let macOS have one?
• Then, Linux ABI would be lingua franca!
• What is more, creating yet another Linux layer is fun!
• Started in 2016 as a MITOH project by me and Yuichi Nishiwaki.
The Architecture of Noah
10
Implementing OS compatibility layer:
Kernel-space vs User-space
• Kernel space
👍 Flexibility to achieve binary compatibility
• System calls and memory management can be easily handled
👎 Vulnerability against bugs in OS compatibility layers
• A bug could lead to system crashes
• User space
👍 Robustness against bugs
• Bugs do not affect the OS stability
👎 Challenges to achieve full compatibility
• E.g., copy-on-write not implemented in Cygwin
A Robust and Flexible Operating System Compatibility Architecture (VEE 2020, March 17, 2020) 11
Host OS Kernel
Guest Application Binary
OS Compatibility Layer
Host OS Kernel
Guest
Application
Binary
OS
Compat.
Layer
Noah’s OS Compatibility Architecture
• Running each guest process in a VM (without its OS kernel)
👍 Robustness
• Most part of OS compatibility layers can be implemented in user space
• Bugs do not cause kernel crashes
👍 Flexibility
• Hardware virtualization technology provides low-layer event handling functionalities
• E.g., trapping system calls and page faults, manipulating page tables, …
12
Host OS Kernel
OS Compatibility Layer
VMHost Process
Standardized Virtualization Interface
Guest Application Process
CPUHardware Virtualization Function
⇒ Published as papers for its academic novelty
[T.Saeki, Y.Nishiwaki, T.Shinagawa, S.Honiden]
• A robust and flexible operating system compatibility architecture,
in VEE 2020
• Bash on Ubuntu on macOS, in APSys 2017
Overall Design
• Three main components
1. Guest VM
2. VMM module
3. Monitor process
13
monitor process guest process
Guest VMs
kernel
emulate
system calls
User
Space
Kernel
space
trap
system calls &
exceptions
no kernel
upcall
monitor
VMM module
manage
VMs
Host OS
Our approach:
Utilize Virtualization Technology
14
Our approach:
Utilize Virtualization Technology
15
1. Monitor process launches a new VM and loads ELF
inside it without kernel
Our approach:
Utilize Virtualization Technology
16
2. The ELF application calls Linux system calls when
running in the VM. Then, they are trapped by the VMM.
Our approach:
Utilize Virtualization Technology
17
3. The VMM passes the trapped system call to the
monitor process
Our approach:
Utilize Virtualization Technology
18
4. The Monitor process emulates the behavior of the
Linux system call with host OS’s system calls
macOS
Monitor Process Monitor Process
Bash Bash
fork
fork()
fork()
Clone the VM state
21
$ noah bash
$ cat file | grep 42
Example: Inter-process communication
macOS
Monitor Process Monitor Process
Bash Bash
exec to “cat”
execve(…)
cat
Replace VM
contents
22
$ noah bash
$ cat file | grep 42
Example: Inter-process communication
macOS
Monitor Process Monitor Process Monitor Process
Bash cat
write read
grep
23
$ noah bash
$ cat file | grep 42
Example: Inter-process communication
macOS
Monitor Process Monitor Process
Regular Native
Process
Bash cat
Can do IPC with
native apps
smoothly
24
$ noah bash
$ cat file | grep 42
Example: Inter-process communication
Advantages of
User-space compatibility layer with virtualization
1. Robust
• Do not cause OS crash, because it’s just a user-space app except VMM
2. Flexible
• Thanks to VMM, achieve binary compatibility, CoW by user-space kernel
3. Portable and has lower development cost, compared to kernel-space
• Rich host OS functionalities: system calls, libraries, high-level languages…
• Actually, NoahW is implemented by C++ with Boost
4. Seamlessness
• Single kernel: share resources such as FS, memory, process scheduling, IPC…
25
Implementation
26
Implementation
• Target Linux 4.6 of x86-64 (Intel VT-x)
• Noah: Linux compatibility layer for macOS
• Use Apple Hypervisor.framework as the VMM module
• NoahW: Linux compatibility layer for Windows (preliminary)
• Use Intel Hardware Accelerated Execution Manager as the VMM module
27
Memory Management
• Two page tables
(a) Guest page table in the VM
(b) Nested page table (EPT) in the VMM
• Fix (a) and modify (b)
• (a) is fixed to the straight mapping
• Virtual address = Physical address
• (b) can be manipulated with the API
• Provided by the VMM module
Limitation: GVA is up to 512 GiB
• 39-bit physical address in Intel CPU
• 48-bit virtual address
• Stack is moved to the lower address
• No kernel area
A Robust and Flexible Operating System Compatibility Architecture (VEE 2020, March 17, 2020) 28
511 GiB
0
GVA GPA HPA
GVA: Guest Virtual Address
GPA: Guest Physical Address
HPA: Host Physical Address
512 GiB
Guest
page table
(fixed)
Nested
page table
(modified)
1-GiB guest system data area
(page tables, segment descriptors, …)
Process Management (fork)
• Noah (on macOS)
• Implement a subset of clone()
• Apple Hypervisor.framework does not support fork() with a VM
• Save and destroy the VM before fork()
• Restore the VM after fork()
• NoahW (on Windows)
• Implement fork() with copy-on-write using shared memory and virtualization
• Create a memory region shared among monitor processes
• Save, restore, and modify the VM states on fork()
• Trap page faults in the VMs to implement copy-on-write
A Robust and Flexible Operating System Compatibility Architecture (VEE 2020, March 17, 2020) 29
File System
• Implemented VFS layer
• To run Linux apps, the default FS is mapped as follows
30
/
/usr
/etc
/Users
/dev
/tmp
/Users
/dev
~/.noah/tree/usr
~/.noah/tree/etc
/tmp
Other Systemcalls
• Futex
• emulate with conditional value
• Signal
• Implement delivery system inside Noah
• Socket
• Integrate with Noah’s VFS
• IO such readv64 / writev64
• Simulate incompatible small IO system calls.
31
That’s how Noah implements
Linux kernel to run Linux apps!
32
Demo
33
34
Video: bash, vi, gcc, and cross-platform; Download pptx to play it
35
Video: Play Doom; Download pptx to play it
Evaluation
36
Macro Benchmark (Phoronix Test Suite + α)
A Robust and Flexible Operating System Compatibility Architecture (VEE 2020, March 17, 2020) 37
16%
-23%
-4%
50%
-58%
9%
-200% -100% 0% 100% 200%
Linux kernel build
unpack-linux
postmark
sqlite
openssl
compress-7zip
Primitive Benchmark: dup() system call
A Robust and Flexible Operating System Compatibility Architecture (VEE 2020, March 17, 2020) 38
270
3202520
11091
1330
7044
2770
2118
588
5504
2809
11091
251
297
0
5,000
10,000
15,000
20,000
25,000
30,000
35,000
40,000
macOS Windows
CycleNumber
VM enter
downcall
post-process
host syscall
pre-process
upcall
VM exit
Micro Benchmark: lmbench (processor, process)
A Robust and Flexible Operating System Compatibility Architecture (VEE 2020, March 17, 2020) 39
410%
310%
175%
172%
13%
329%
239%
256%
-24%
46%
-100% 0% 100% 200% 300% 400%
null call
null I/O
stat
open clos
slct TCP
sig inst
sig hndl
fork proc
exec proc
sh proc
Micro Benchmark: lmbench (File & VM latency)
A Robust and Flexible Operating System Compatibility Architecture (VEE 2020, March 17, 2020) 40
42%
5%
17%
4%
28%
-45%
-92%
8%
-100% -50% 0% 50%
0K Create
0K Delete
10K Create
10K Delete
Mmap Latency
Prot Fault
Page Fault
100fd selct
Comparison of OS Compatibility Layers
Benchmark NoahW Cygwin WSL1
dup2() [call per second] 36,723 556,453 693,309
write() [call per second] 0.30 0.56 0.57
fork() (0 MiB array) [ms] 106.4 219.4 2.06
fork() (512 MiB array) [ms] 338.9 789.9 32.51
fork() (1 GiB array) [ms] 458.4 1531.8 62.66
A Robust and Flexible Operating System Compatibility Architecture (VEE 2020, March 17, 2020) 41
Summary
• Noah has a novel OS compatibility architecture
• Exploited the OS-standard virtualization technology support
• Achieved both robustness and flexibility
• The architecture consists of three components
• VMs to run guest processes
• The VMM module to provide API for hardware virtualization technology
• Monitor processes to implement OS compatibility functions
• Run Linux binaries on macOS, and Windows (preliminary)
• Noah implemented 172 out of 329 Linux system calls
• The overhead of Linux kernel build time on Noah was 16%
A Robust and Flexible Operating System Compatibility Architecture (VEE 2020, March 17, 2020) 42
Wait, so you don’t mention to
containers at all…???? 🙄
In “Container Runtime Meetup”..???
🙄
43
Noah as an OCI
Runtime
• OCI Runtime
• The spec of the layer of runc
• Runs container images
• E.g.) runc, Gvisor’s runsc
• Why not add Noah to them?
• Run Linux image (near) natively on
macOS
• I can finally talk about containers in
this Container Runtime Meetup #2 😂
44
Thanks, Hajime-san…
• Containerd and Dockerd buildable on macOS
• https://github.com/ukontainer/containerd
• https://github.com/ukontainer/dockerd-darwin
45
46
This joke was made since late at night yesterday,
so enjoy the simple demo as much as possible! 🤗
47
Video: Noah as an OCI runtime; Download pptx to play it
1 of 45

Recommended

Unikernels: Rise of the Library Hypervisor by
Unikernels: Rise of the Library HypervisorUnikernels: Rise of the Library Hypervisor
Unikernels: Rise of the Library HypervisorAnil Madhavapeddy
10.5K views42 slides
Advanced Docker Developer Workflows on MacOS X and Windows by
Advanced Docker Developer Workflows on MacOS X and WindowsAdvanced Docker Developer Workflows on MacOS X and Windows
Advanced Docker Developer Workflows on MacOS X and WindowsAnil Madhavapeddy
70.3K views23 slides
XPDS14: Xen 4.5 Roadmap - Konrad Wilk, Oracle by
XPDS14: Xen 4.5 Roadmap - Konrad Wilk, OracleXPDS14: Xen 4.5 Roadmap - Konrad Wilk, Oracle
XPDS14: Xen 4.5 Roadmap - Konrad Wilk, OracleThe Linux Foundation
8.3K views24 slides
The HaLVM: A Simple Platform for Simple Platforms by
The HaLVM: A Simple Platform for Simple PlatformsThe HaLVM: A Simple Platform for Simple Platforms
The HaLVM: A Simple Platform for Simple PlatformsThe Linux Foundation
30.5K views22 slides
Using functional programming within an industrial product group: perspectives... by
Using functional programming within an industrial product group: perspectives...Using functional programming within an industrial product group: perspectives...
Using functional programming within an industrial product group: perspectives...Anil Madhavapeddy
1.3K views12 slides
Windsor: Domain 0 Disaggregation for XenServer and XCP by
	Windsor: Domain 0 Disaggregation for XenServer and XCP	Windsor: Domain 0 Disaggregation for XenServer and XCP
Windsor: Domain 0 Disaggregation for XenServer and XCPThe Linux Foundation
6.4K views23 slides

More Related Content

What's hot

XPDS14: OpenXT - Security and the Properties of a Xen Virtualisation Platform... by
XPDS14: OpenXT - Security and the Properties of a Xen Virtualisation Platform...XPDS14: OpenXT - Security and the Properties of a Xen Virtualisation Platform...
XPDS14: OpenXT - Security and the Properties of a Xen Virtualisation Platform...The Linux Foundation
3.4K views16 slides
PVH : PV Guest in HVM container by
PVH : PV Guest in HVM containerPVH : PV Guest in HVM container
PVH : PV Guest in HVM containerThe Linux Foundation
58.1K views12 slides
Xen Cloud Platform Update by
Xen Cloud Platform UpdateXen Cloud Platform Update
Xen Cloud Platform UpdateThe Linux Foundation
47.5K views28 slides
Xen Project CI for OpenStack Overview by
Xen Project CI for OpenStack OverviewXen Project CI for OpenStack Overview
Xen Project CI for OpenStack OverviewThe Linux Foundation
1.7K views11 slides
FreeNAS 10: Challenges of Building a Modern Storage Appliance based on FreeBS... by
FreeNAS 10: Challenges of Building a Modern Storage Appliance based on FreeBS...FreeNAS 10: Challenges of Building a Modern Storage Appliance based on FreeBS...
FreeNAS 10: Challenges of Building a Modern Storage Appliance based on FreeBS...iXsystems
874 views18 slides
The True Story of FreeNAS by
The True Story of FreeNASThe True Story of FreeNAS
The True Story of FreeNASiXsystems
3.1K views29 slides

What's hot(20)

XPDS14: OpenXT - Security and the Properties of a Xen Virtualisation Platform... by The Linux Foundation
XPDS14: OpenXT - Security and the Properties of a Xen Virtualisation Platform...XPDS14: OpenXT - Security and the Properties of a Xen Virtualisation Platform...
XPDS14: OpenXT - Security and the Properties of a Xen Virtualisation Platform...
FreeNAS 10: Challenges of Building a Modern Storage Appliance based on FreeBS... by iXsystems
FreeNAS 10: Challenges of Building a Modern Storage Appliance based on FreeBS...FreeNAS 10: Challenges of Building a Modern Storage Appliance based on FreeBS...
FreeNAS 10: Challenges of Building a Modern Storage Appliance based on FreeBS...
iXsystems874 views
The True Story of FreeNAS by iXsystems
The True Story of FreeNASThe True Story of FreeNAS
The True Story of FreeNAS
iXsystems3.1K views
Unikernel User Summit 2015: Getting started in unikernels using the rump kernel by The Linux Foundation
Unikernel User Summit 2015: Getting started in unikernels using the rump kernelUnikernel User Summit 2015: Getting started in unikernels using the rump kernel
Unikernel User Summit 2015: Getting started in unikernels using the rump kernel
PCI Pass-through - FreeBSD VM on Hyper-V (MeetBSD California 2016) by iXsystems
PCI Pass-through - FreeBSD VM on Hyper-V (MeetBSD California 2016)PCI Pass-through - FreeBSD VM on Hyper-V (MeetBSD California 2016)
PCI Pass-through - FreeBSD VM on Hyper-V (MeetBSD California 2016)
iXsystems2.4K views
Virtualization with KVM (Kernel-based Virtual Machine) by Novell
Virtualization with KVM (Kernel-based Virtual Machine)Virtualization with KVM (Kernel-based Virtual Machine)
Virtualization with KVM (Kernel-based Virtual Machine)
Novell15.9K views
Introducing Container Technology to TSUBAME3.0 Supercomputer by Akihiro Nomura
Introducing Container Technology to TSUBAME3.0 SupercomputerIntroducing Container Technology to TSUBAME3.0 Supercomputer
Introducing Container Technology to TSUBAME3.0 Supercomputer
Akihiro Nomura313 views
LCNA14: Why Use Xen for Large Scale Enterprise Deployments? - Konrad Rzeszute... by The Linux Foundation
LCNA14: Why Use Xen for Large Scale Enterprise Deployments? - Konrad Rzeszute...LCNA14: Why Use Xen for Large Scale Enterprise Deployments? - Konrad Rzeszute...
LCNA14: Why Use Xen for Large Scale Enterprise Deployments? - Konrad Rzeszute...
Oscon 2012 : From Datacenter to the Cloud - Featuring Xen and XCP by The Linux Foundation
Oscon 2012 : From Datacenter to the Cloud - Featuring Xen and XCPOscon 2012 : From Datacenter to the Cloud - Featuring Xen and XCP
Oscon 2012 : From Datacenter to the Cloud - Featuring Xen and XCP
Xen, XenServer, and XAPI: What’s the Difference?-XPUS13 Bulpin,Pavlicek by The Linux Foundation
Xen, XenServer, and XAPI: What’s the Difference?-XPUS13 Bulpin,PavlicekXen, XenServer, and XAPI: What’s the Difference?-XPUS13 Bulpin,Pavlicek
Xen, XenServer, and XAPI: What’s the Difference?-XPUS13 Bulpin,Pavlicek
The Linux Foundation101.3K views
CIF16/Scale14x: The latest from the Xen Project (Lars Kurth, Chairman of Xen ... by The Linux Foundation
CIF16/Scale14x: The latest from the Xen Project (Lars Kurth, Chairman of Xen ...CIF16/Scale14x: The latest from the Xen Project (Lars Kurth, Chairman of Xen ...
CIF16/Scale14x: The latest from the Xen Project (Lars Kurth, Chairman of Xen ...
Develop QNAP NAS App by Docker by Terry Chen
Develop QNAP NAS App by DockerDevelop QNAP NAS App by Docker
Develop QNAP NAS App by Docker
Terry Chen4.5K views
Kvm and libvirt by plarsen67
Kvm and libvirtKvm and libvirt
Kvm and libvirt
plarsen67525 views
XPDS16: libvirt and Tools: What's New and What's Next - James Fehlig, SUSE by The Linux Foundation
XPDS16: libvirt and Tools: What's New and What's Next - James Fehlig, SUSEXPDS16: libvirt and Tools: What's New and What's Next - James Fehlig, SUSE
XPDS16: libvirt and Tools: What's New and What's Next - James Fehlig, SUSE

Similar to Noah - Robust and Flexible Operating System Compatibility Architecture - Container runtime meetup #2

A Robust and Flexible Operating System Compatibility Architecture by
A Robust and Flexible Operating System Compatibility ArchitectureA Robust and Flexible Operating System Compatibility Architecture
A Robust and Flexible Operating System Compatibility ArchitectureShinagawa Laboratory, The University of Tokyo
96 views23 slides
An Updated Performance Comparison of Virtual Machines and Linux Containers by
An Updated Performance Comparison of Virtual Machines and Linux ContainersAn Updated Performance Comparison of Virtual Machines and Linux Containers
An Updated Performance Comparison of Virtual Machines and Linux ContainersKento Aoyama
2.4K views41 slides
OSCON: Advanced Docker developer workflows on Mac OS and Windows by
OSCON: Advanced Docker developer workflows on Mac OS and WindowsOSCON: Advanced Docker developer workflows on Mac OS and Windows
OSCON: Advanced Docker developer workflows on Mac OS and WindowsDocker, Inc.
3.3K views23 slides
Techdays SE 2016 - Micros.. err Microcosmos by
Techdays SE 2016 - Micros.. err MicrocosmosTechdays SE 2016 - Micros.. err Microcosmos
Techdays SE 2016 - Micros.. err MicrocosmosMike Martin
300 views70 slides
Unikernels: the rise of the library hypervisor in MirageOS by
Unikernels: the rise of the library hypervisor in MirageOSUnikernels: the rise of the library hypervisor in MirageOS
Unikernels: the rise of the library hypervisor in MirageOSDocker, Inc.
2.3K views42 slides
Making clouds: turning opennebula into a product by
Making clouds: turning opennebula into a productMaking clouds: turning opennebula into a product
Making clouds: turning opennebula into a productCarlo Daffara
3.2K views49 slides

Similar to Noah - Robust and Flexible Operating System Compatibility Architecture - Container runtime meetup #2(20)

An Updated Performance Comparison of Virtual Machines and Linux Containers by Kento Aoyama
An Updated Performance Comparison of Virtual Machines and Linux ContainersAn Updated Performance Comparison of Virtual Machines and Linux Containers
An Updated Performance Comparison of Virtual Machines and Linux Containers
Kento Aoyama2.4K views
OSCON: Advanced Docker developer workflows on Mac OS and Windows by Docker, Inc.
OSCON: Advanced Docker developer workflows on Mac OS and WindowsOSCON: Advanced Docker developer workflows on Mac OS and Windows
OSCON: Advanced Docker developer workflows on Mac OS and Windows
Docker, Inc.3.3K views
Techdays SE 2016 - Micros.. err Microcosmos by Mike Martin
Techdays SE 2016 - Micros.. err MicrocosmosTechdays SE 2016 - Micros.. err Microcosmos
Techdays SE 2016 - Micros.. err Microcosmos
Mike Martin300 views
Unikernels: the rise of the library hypervisor in MirageOS by Docker, Inc.
Unikernels: the rise of the library hypervisor in MirageOSUnikernels: the rise of the library hypervisor in MirageOS
Unikernels: the rise of the library hypervisor in MirageOS
Docker, Inc.2.3K views
Making clouds: turning opennebula into a product by Carlo Daffara
Making clouds: turning opennebula into a productMaking clouds: turning opennebula into a product
Making clouds: turning opennebula into a product
Carlo Daffara3.2K views
Making Clouds: Turning OpenNebula into a Product by NETWAYS
Making Clouds: Turning OpenNebula into a ProductMaking Clouds: Turning OpenNebula into a Product
Making Clouds: Turning OpenNebula into a Product
NETWAYS3.5K views
OpenNebulaConf 2013 - Making Clouds: Turning OpenNebula into a Product by Car... by OpenNebula Project
OpenNebulaConf 2013 - Making Clouds: Turning OpenNebula into a Product by Car...OpenNebulaConf 2013 - Making Clouds: Turning OpenNebula into a Product by Car...
OpenNebulaConf 2013 - Making Clouds: Turning OpenNebula into a Product by Car...
OpenNebula Project39K views
Bridging the Semantic Gap in Virtualized Environment by Andy Lee
Bridging the Semantic Gap in Virtualized EnvironmentBridging the Semantic Gap in Virtualized Environment
Bridging the Semantic Gap in Virtualized Environment
Andy Lee236 views
Docker - Portable Deployment by javaonfly
Docker - Portable DeploymentDocker - Portable Deployment
Docker - Portable Deployment
javaonfly1.9K views
2 Linux Container and Docker by Fabio Fumarola
2 Linux Container and Docker2 Linux Container and Docker
2 Linux Container and Docker
Fabio Fumarola8.1K views
Clusters (Distributed computing) by Sri Prasanna
Clusters (Distributed computing)Clusters (Distributed computing)
Clusters (Distributed computing)
Sri Prasanna1.5K views
Node.js 101 with Rami Sayar by FITC
Node.js 101 with Rami SayarNode.js 101 with Rami Sayar
Node.js 101 with Rami Sayar
FITC6.1K views
Docker Meetup 08 03-2016 by Docker
Docker Meetup 08 03-2016Docker Meetup 08 03-2016
Docker Meetup 08 03-2016
Docker5.4K views
OSv at Usenix ATC 2014 by Don Marti
OSv at Usenix ATC 2014OSv at Usenix ATC 2014
OSv at Usenix ATC 2014
Don Marti28.1K views
Develop with linux containers and docker by Fabio Fumarola
Develop with linux containers and dockerDevelop with linux containers and docker
Develop with linux containers and docker
Fabio Fumarola1.4K views
Achieving Performance Isolation with Lightweight Co-Kernels by Jiannan Ouyang, PhD
Achieving Performance Isolation with Lightweight Co-KernelsAchieving Performance Isolation with Lightweight Co-Kernels
Achieving Performance Isolation with Lightweight Co-Kernels
Jiannan Ouyang, PhD1.1K views
Docker - Demo on PHP Application deployment by Arun prasath
Docker - Demo on PHP Application deployment Docker - Demo on PHP Application deployment
Docker - Demo on PHP Application deployment
Arun prasath13.4K views

More from Takaya Saeki

Ss systemdのwslディストロを作る kernelvm探検隊online part 3 by
Ss systemdのwslディストロを作る kernelvm探検隊online part 3Ss systemdのwslディストロを作る kernelvm探検隊online part 3
Ss systemdのwslディストロを作る kernelvm探検隊online part 3Takaya Saeki
1.5K views38 slides
WebAssemblyのWeb以外のことぜんぶ話す by
WebAssemblyのWeb以外のことぜんぶ話すWebAssemblyのWeb以外のことぜんぶ話す
WebAssemblyのWeb以外のことぜんぶ話すTakaya Saeki
28.2K views37 slides
Introduction to arm virtualization by
Introduction to arm virtualizationIntroduction to arm virtualization
Introduction to arm virtualizationTakaya Saeki
3.7K views34 slides
カーネル空間ですべてのプロセスを動かすには -TAL, SFI, Wasmとか - カーネル/VM探検隊15 by
カーネル空間ですべてのプロセスを動かすには -TAL, SFI, Wasmとか - カーネル/VM探検隊15カーネル空間ですべてのプロセスを動かすには -TAL, SFI, Wasmとか - カーネル/VM探検隊15
カーネル空間ですべてのプロセスを動かすには -TAL, SFI, Wasmとか - カーネル/VM探検隊15Takaya Saeki
4.5K views66 slides
Kernel / VM 関西9 - WSL FUSE: WSLでもFUSEしたかった by
Kernel / VM 関西9 - WSL FUSE: WSLでもFUSEしたかったKernel / VM 関西9 - WSL FUSE: WSLでもFUSEしたかった
Kernel / VM 関西9 - WSL FUSE: WSLでもFUSEしたかったTakaya Saeki
2.2K views29 slides
kernelvm1118関西-KVM vs AHF vs HAXM! by
kernelvm1118関西-KVM vs AHF vs HAXM!kernelvm1118関西-KVM vs AHF vs HAXM!
kernelvm1118関西-KVM vs AHF vs HAXM!Takaya Saeki
1.5K views57 slides

More from Takaya Saeki(6)

Ss systemdのwslディストロを作る kernelvm探検隊online part 3 by Takaya Saeki
Ss systemdのwslディストロを作る kernelvm探検隊online part 3Ss systemdのwslディストロを作る kernelvm探検隊online part 3
Ss systemdのwslディストロを作る kernelvm探検隊online part 3
Takaya Saeki1.5K views
WebAssemblyのWeb以外のことぜんぶ話す by Takaya Saeki
WebAssemblyのWeb以外のことぜんぶ話すWebAssemblyのWeb以外のことぜんぶ話す
WebAssemblyのWeb以外のことぜんぶ話す
Takaya Saeki28.2K views
Introduction to arm virtualization by Takaya Saeki
Introduction to arm virtualizationIntroduction to arm virtualization
Introduction to arm virtualization
Takaya Saeki3.7K views
カーネル空間ですべてのプロセスを動かすには -TAL, SFI, Wasmとか - カーネル/VM探検隊15 by Takaya Saeki
カーネル空間ですべてのプロセスを動かすには -TAL, SFI, Wasmとか - カーネル/VM探検隊15カーネル空間ですべてのプロセスを動かすには -TAL, SFI, Wasmとか - カーネル/VM探検隊15
カーネル空間ですべてのプロセスを動かすには -TAL, SFI, Wasmとか - カーネル/VM探検隊15
Takaya Saeki4.5K views
Kernel / VM 関西9 - WSL FUSE: WSLでもFUSEしたかった by Takaya Saeki
Kernel / VM 関西9 - WSL FUSE: WSLでもFUSEしたかったKernel / VM 関西9 - WSL FUSE: WSLでもFUSEしたかった
Kernel / VM 関西9 - WSL FUSE: WSLでもFUSEしたかった
Takaya Saeki2.2K views
kernelvm1118関西-KVM vs AHF vs HAXM! by Takaya Saeki
kernelvm1118関西-KVM vs AHF vs HAXM!kernelvm1118関西-KVM vs AHF vs HAXM!
kernelvm1118関西-KVM vs AHF vs HAXM!
Takaya Saeki1.5K views

Recently uploaded

Dapr Unleashed: Accelerating Microservice Development by
Dapr Unleashed: Accelerating Microservice DevelopmentDapr Unleashed: Accelerating Microservice Development
Dapr Unleashed: Accelerating Microservice DevelopmentMiroslav Janeski
10 views29 slides
ict act 1.pptx by
ict act 1.pptxict act 1.pptx
ict act 1.pptxsanjaniarun08
13 views17 slides
The Era of Large Language Models.pptx by
The Era of Large Language Models.pptxThe Era of Large Language Models.pptx
The Era of Large Language Models.pptxAbdulVahedShaik
5 views9 slides
DevsRank by
DevsRankDevsRank
DevsRankdevsrank786
11 views1 slide
DSD-INT 2023 The Danube Hazardous Substances Model - Kovacs by
DSD-INT 2023 The Danube Hazardous Substances Model - KovacsDSD-INT 2023 The Danube Hazardous Substances Model - Kovacs
DSD-INT 2023 The Danube Hazardous Substances Model - KovacsDeltares
8 views17 slides
SUGCON ANZ Presentation V2.1 Final.pptx by
SUGCON ANZ Presentation V2.1 Final.pptxSUGCON ANZ Presentation V2.1 Final.pptx
SUGCON ANZ Presentation V2.1 Final.pptxJack Spektor
22 views34 slides

Recently uploaded(20)

Dapr Unleashed: Accelerating Microservice Development by Miroslav Janeski
Dapr Unleashed: Accelerating Microservice DevelopmentDapr Unleashed: Accelerating Microservice Development
Dapr Unleashed: Accelerating Microservice Development
Miroslav Janeski10 views
DSD-INT 2023 The Danube Hazardous Substances Model - Kovacs by Deltares
DSD-INT 2023 The Danube Hazardous Substances Model - KovacsDSD-INT 2023 The Danube Hazardous Substances Model - Kovacs
DSD-INT 2023 The Danube Hazardous Substances Model - Kovacs
Deltares8 views
SUGCON ANZ Presentation V2.1 Final.pptx by Jack Spektor
SUGCON ANZ Presentation V2.1 Final.pptxSUGCON ANZ Presentation V2.1 Final.pptx
SUGCON ANZ Presentation V2.1 Final.pptx
Jack Spektor22 views
Airline Booking Software by SharmiMehta
Airline Booking SoftwareAirline Booking Software
Airline Booking Software
SharmiMehta5 views
DSD-INT 2023 Simulation of Coastal Hydrodynamics and Water Quality in Hong Ko... by Deltares
DSD-INT 2023 Simulation of Coastal Hydrodynamics and Water Quality in Hong Ko...DSD-INT 2023 Simulation of Coastal Hydrodynamics and Water Quality in Hong Ko...
DSD-INT 2023 Simulation of Coastal Hydrodynamics and Water Quality in Hong Ko...
Deltares14 views
Advanced API Mocking Techniques by Dimpy Adhikary
Advanced API Mocking TechniquesAdvanced API Mocking Techniques
Advanced API Mocking Techniques
Dimpy Adhikary19 views
Dev-HRE-Ops - Addressing the _Last Mile DevOps Challenge_ in Highly Regulated... by TomHalpin9
Dev-HRE-Ops - Addressing the _Last Mile DevOps Challenge_ in Highly Regulated...Dev-HRE-Ops - Addressing the _Last Mile DevOps Challenge_ in Highly Regulated...
Dev-HRE-Ops - Addressing the _Last Mile DevOps Challenge_ in Highly Regulated...
TomHalpin95 views
Software evolution understanding: Automatic extraction of software identifier... by Ra'Fat Al-Msie'deen
Software evolution understanding: Automatic extraction of software identifier...Software evolution understanding: Automatic extraction of software identifier...
Software evolution understanding: Automatic extraction of software identifier...
Dev-Cloud Conference 2023 - Continuous Deployment Showdown: Traditionelles CI... by Marc Müller
Dev-Cloud Conference 2023 - Continuous Deployment Showdown: Traditionelles CI...Dev-Cloud Conference 2023 - Continuous Deployment Showdown: Traditionelles CI...
Dev-Cloud Conference 2023 - Continuous Deployment Showdown: Traditionelles CI...
Marc Müller37 views
DSD-INT 2023 European Digital Twin Ocean and Delft3D FM - Dols by Deltares
DSD-INT 2023 European Digital Twin Ocean and Delft3D FM - DolsDSD-INT 2023 European Digital Twin Ocean and Delft3D FM - Dols
DSD-INT 2023 European Digital Twin Ocean and Delft3D FM - Dols
Deltares7 views
DSD-INT 2023 Exploring flash flood hazard reduction in arid regions using a h... by Deltares
DSD-INT 2023 Exploring flash flood hazard reduction in arid regions using a h...DSD-INT 2023 Exploring flash flood hazard reduction in arid regions using a h...
DSD-INT 2023 Exploring flash flood hazard reduction in arid regions using a h...
Deltares5 views
20231129 - Platform @ localhost 2023 - Application-driven infrastructure with... by sparkfabrik
20231129 - Platform @ localhost 2023 - Application-driven infrastructure with...20231129 - Platform @ localhost 2023 - Application-driven infrastructure with...
20231129 - Platform @ localhost 2023 - Application-driven infrastructure with...
sparkfabrik5 views
2023-November-Schneider Electric-Meetup-BCN Admin Group.pptx by animuscrm
2023-November-Schneider Electric-Meetup-BCN Admin Group.pptx2023-November-Schneider Electric-Meetup-BCN Admin Group.pptx
2023-November-Schneider Electric-Meetup-BCN Admin Group.pptx
animuscrm14 views
Software testing company in India.pptx by SakshiPatel82
Software testing company in India.pptxSoftware testing company in India.pptx
Software testing company in India.pptx
SakshiPatel827 views

Noah - Robust and Flexible Operating System Compatibility Architecture - Container runtime meetup #2

  • 1. Noah A Robust and Flexible Operating System Compatibility Architecture Takahiro Shinagawa Shinichi HonidenYuichi Nishiwaki Takaya Saeki (@nullpo_head)
  • 2. Who I am? • Takaya Saeki (@nullpo_head) • Software Engineer • Likes web and system layer • Projects that might sound interesting • Noah • Ported XV6 OS to MIPS, and to a home-built FPGA CPU • Sudo by Windows Hello in WSL 2
  • 4. Noah: User-space Linux*compatibility layer powered by virtualization 1. As an implementation Noah runs Linux apps in macOS, like WSL 1 in Windows 2. As an architecture Noah is a kind of user-space kernel for OS compatibility, powered by virtualization. No Linux emulation kernel extension (unlike WSL 1) • Loads a guest binary to an empty VM without any kernel • Traps System calls, and emulates them in the user space with • Accomplishes memory management such as CoW by virtualization 4 * The architecture is not limited to Linux, but can be applied to other operating systems => Technially fun! => Academic novelty (APSys ’17, VEE ‘20)
  • 7. Why did we start Noah project? 7
  • 8. Linux • One of the most important operating systems • Today’s de facto standard ecosystem • Kubernetes / Docker
  • 9. OS Compatibility Layers • Windows and FreeBSD have Linux compatibility layer to utilize Linux ecosystem natively • So, why not let macOS have one? • Then, Linux ABI would be lingua franca! • What is more, creating yet another Linux layer is fun! • Started in 2016 as a MITOH project by me and Yuichi Nishiwaki.
  • 11. Implementing OS compatibility layer: Kernel-space vs User-space • Kernel space 👍 Flexibility to achieve binary compatibility • System calls and memory management can be easily handled 👎 Vulnerability against bugs in OS compatibility layers • A bug could lead to system crashes • User space 👍 Robustness against bugs • Bugs do not affect the OS stability 👎 Challenges to achieve full compatibility • E.g., copy-on-write not implemented in Cygwin A Robust and Flexible Operating System Compatibility Architecture (VEE 2020, March 17, 2020) 11 Host OS Kernel Guest Application Binary OS Compatibility Layer Host OS Kernel Guest Application Binary OS Compat. Layer
  • 12. Noah’s OS Compatibility Architecture • Running each guest process in a VM (without its OS kernel) 👍 Robustness • Most part of OS compatibility layers can be implemented in user space • Bugs do not cause kernel crashes 👍 Flexibility • Hardware virtualization technology provides low-layer event handling functionalities • E.g., trapping system calls and page faults, manipulating page tables, … 12 Host OS Kernel OS Compatibility Layer VMHost Process Standardized Virtualization Interface Guest Application Process CPUHardware Virtualization Function ⇒ Published as papers for its academic novelty [T.Saeki, Y.Nishiwaki, T.Shinagawa, S.Honiden] • A robust and flexible operating system compatibility architecture, in VEE 2020 • Bash on Ubuntu on macOS, in APSys 2017
  • 13. Overall Design • Three main components 1. Guest VM 2. VMM module 3. Monitor process 13 monitor process guest process Guest VMs kernel emulate system calls User Space Kernel space trap system calls & exceptions no kernel upcall monitor VMM module manage VMs Host OS
  • 15. Our approach: Utilize Virtualization Technology 15 1. Monitor process launches a new VM and loads ELF inside it without kernel
  • 16. Our approach: Utilize Virtualization Technology 16 2. The ELF application calls Linux system calls when running in the VM. Then, they are trapped by the VMM.
  • 17. Our approach: Utilize Virtualization Technology 17 3. The VMM passes the trapped system call to the monitor process
  • 18. Our approach: Utilize Virtualization Technology 18 4. The Monitor process emulates the behavior of the Linux system call with host OS’s system calls
  • 19. macOS Monitor Process Monitor Process Bash Bash fork fork() fork() Clone the VM state 21 $ noah bash $ cat file | grep 42 Example: Inter-process communication
  • 20. macOS Monitor Process Monitor Process Bash Bash exec to “cat” execve(…) cat Replace VM contents 22 $ noah bash $ cat file | grep 42 Example: Inter-process communication
  • 21. macOS Monitor Process Monitor Process Monitor Process Bash cat write read grep 23 $ noah bash $ cat file | grep 42 Example: Inter-process communication
  • 22. macOS Monitor Process Monitor Process Regular Native Process Bash cat Can do IPC with native apps smoothly 24 $ noah bash $ cat file | grep 42 Example: Inter-process communication
  • 23. Advantages of User-space compatibility layer with virtualization 1. Robust • Do not cause OS crash, because it’s just a user-space app except VMM 2. Flexible • Thanks to VMM, achieve binary compatibility, CoW by user-space kernel 3. Portable and has lower development cost, compared to kernel-space • Rich host OS functionalities: system calls, libraries, high-level languages… • Actually, NoahW is implemented by C++ with Boost 4. Seamlessness • Single kernel: share resources such as FS, memory, process scheduling, IPC… 25
  • 25. Implementation • Target Linux 4.6 of x86-64 (Intel VT-x) • Noah: Linux compatibility layer for macOS • Use Apple Hypervisor.framework as the VMM module • NoahW: Linux compatibility layer for Windows (preliminary) • Use Intel Hardware Accelerated Execution Manager as the VMM module 27
  • 26. Memory Management • Two page tables (a) Guest page table in the VM (b) Nested page table (EPT) in the VMM • Fix (a) and modify (b) • (a) is fixed to the straight mapping • Virtual address = Physical address • (b) can be manipulated with the API • Provided by the VMM module Limitation: GVA is up to 512 GiB • 39-bit physical address in Intel CPU • 48-bit virtual address • Stack is moved to the lower address • No kernel area A Robust and Flexible Operating System Compatibility Architecture (VEE 2020, March 17, 2020) 28 511 GiB 0 GVA GPA HPA GVA: Guest Virtual Address GPA: Guest Physical Address HPA: Host Physical Address 512 GiB Guest page table (fixed) Nested page table (modified) 1-GiB guest system data area (page tables, segment descriptors, …)
  • 27. Process Management (fork) • Noah (on macOS) • Implement a subset of clone() • Apple Hypervisor.framework does not support fork() with a VM • Save and destroy the VM before fork() • Restore the VM after fork() • NoahW (on Windows) • Implement fork() with copy-on-write using shared memory and virtualization • Create a memory region shared among monitor processes • Save, restore, and modify the VM states on fork() • Trap page faults in the VMs to implement copy-on-write A Robust and Flexible Operating System Compatibility Architecture (VEE 2020, March 17, 2020) 29
  • 28. File System • Implemented VFS layer • To run Linux apps, the default FS is mapped as follows 30 / /usr /etc /Users /dev /tmp /Users /dev ~/.noah/tree/usr ~/.noah/tree/etc /tmp
  • 29. Other Systemcalls • Futex • emulate with conditional value • Signal • Implement delivery system inside Noah • Socket • Integrate with Noah’s VFS • IO such readv64 / writev64 • Simulate incompatible small IO system calls. 31
  • 30. That’s how Noah implements Linux kernel to run Linux apps! 32
  • 32. 34 Video: bash, vi, gcc, and cross-platform; Download pptx to play it
  • 33. 35 Video: Play Doom; Download pptx to play it
  • 35. Macro Benchmark (Phoronix Test Suite + α) A Robust and Flexible Operating System Compatibility Architecture (VEE 2020, March 17, 2020) 37 16% -23% -4% 50% -58% 9% -200% -100% 0% 100% 200% Linux kernel build unpack-linux postmark sqlite openssl compress-7zip
  • 36. Primitive Benchmark: dup() system call A Robust and Flexible Operating System Compatibility Architecture (VEE 2020, March 17, 2020) 38 270 3202520 11091 1330 7044 2770 2118 588 5504 2809 11091 251 297 0 5,000 10,000 15,000 20,000 25,000 30,000 35,000 40,000 macOS Windows CycleNumber VM enter downcall post-process host syscall pre-process upcall VM exit
  • 37. Micro Benchmark: lmbench (processor, process) A Robust and Flexible Operating System Compatibility Architecture (VEE 2020, March 17, 2020) 39 410% 310% 175% 172% 13% 329% 239% 256% -24% 46% -100% 0% 100% 200% 300% 400% null call null I/O stat open clos slct TCP sig inst sig hndl fork proc exec proc sh proc
  • 38. Micro Benchmark: lmbench (File & VM latency) A Robust and Flexible Operating System Compatibility Architecture (VEE 2020, March 17, 2020) 40 42% 5% 17% 4% 28% -45% -92% 8% -100% -50% 0% 50% 0K Create 0K Delete 10K Create 10K Delete Mmap Latency Prot Fault Page Fault 100fd selct
  • 39. Comparison of OS Compatibility Layers Benchmark NoahW Cygwin WSL1 dup2() [call per second] 36,723 556,453 693,309 write() [call per second] 0.30 0.56 0.57 fork() (0 MiB array) [ms] 106.4 219.4 2.06 fork() (512 MiB array) [ms] 338.9 789.9 32.51 fork() (1 GiB array) [ms] 458.4 1531.8 62.66 A Robust and Flexible Operating System Compatibility Architecture (VEE 2020, March 17, 2020) 41
  • 40. Summary • Noah has a novel OS compatibility architecture • Exploited the OS-standard virtualization technology support • Achieved both robustness and flexibility • The architecture consists of three components • VMs to run guest processes • The VMM module to provide API for hardware virtualization technology • Monitor processes to implement OS compatibility functions • Run Linux binaries on macOS, and Windows (preliminary) • Noah implemented 172 out of 329 Linux system calls • The overhead of Linux kernel build time on Noah was 16% A Robust and Flexible Operating System Compatibility Architecture (VEE 2020, March 17, 2020) 42
  • 41. Wait, so you don’t mention to containers at all…???? 🙄 In “Container Runtime Meetup”..??? 🙄 43
  • 42. Noah as an OCI Runtime • OCI Runtime • The spec of the layer of runc • Runs container images • E.g.) runc, Gvisor’s runsc • Why not add Noah to them? • Run Linux image (near) natively on macOS • I can finally talk about containers in this Container Runtime Meetup #2 😂 44
  • 43. Thanks, Hajime-san… • Containerd and Dockerd buildable on macOS • https://github.com/ukontainer/containerd • https://github.com/ukontainer/dockerd-darwin 45
  • 44. 46 This joke was made since late at night yesterday, so enjoy the simple demo as much as possible! 🤗
  • 45. 47 Video: Noah as an OCI runtime; Download pptx to play it

Editor's Notes

  1. Hello everyone. I’m Takahiro Shinagawa from the University of Tokyo. Today, I’d like to talk about a robust and flexible operating system compatibility architecture. This is a joint work with Mr. Saeki, Mr. Nishiwaki, and professor Honiden. This work was done mainly by Mr. Saeki in cooperation with Mr. Nishiwaki while they were master course students. Unfortunately, Mr. Saeki has already graduated and Mr. Nishiwaki is in a different field laboratory, so I’m going to make this presentation.
  2. There are two ways to implement OS compatibility layers. One way is to implement them in kernel space. Kernel-space implementation has the advantage that it has flexibility to achieve binary compatibility. However, it is vulnerable against bugs in the OS compatibility layers. For example, the former Windows Subsystem for Linux has several bugs that could cause the blue screen of death of Windows. The other way is to implement them in user space. It has the advantage that bugs do not affect the stability of the operating system. However, user-space-only implementations are inflexible to achieve full binary compatibility because they cannot trap system call instructions or manipulate page tables, unless we use binary modification. For example, Cygwin cannot implement the copy-on-write capability in the fork() system call.
  3. So, we propose a novel operating system compatibility architecture. In this architecture, we run each guest process in a separate VM without its OS kernel, and the process of the OS compatibility layer running on the host operating system manages the VM to emulate the execution environment for the guest application process. This architecture can achieve robustness because most of OS compatibility layers can be implemented in user space and bugs in the layers do not cause kernel crashes. It can also achieve flexibility to realize full binary compatibility because the virtualization technology allows low-level event handling such as trapping system calls and page faults, manipulating page tables, and so on. We need a host OS support to handle hardware-assisted virtualization technology, but fortunately recent operating systems provide standard virtualization interfaces and we can reuse them. So, we do not need to modify the OS kernels by ourselves.
  4. Here is the overall design of our proposed architecture. Our system consists of three main components, that is, guest VMs, the VMM module, and monitor processes. We will explain each of them in the following slides.
  5. This is the summary of the advantages of our architecture. It can achieve robustness because the monitor process is implemented in user space and bugs of them will not cause system crashes. It can also achieve flexibility to realize full binary compatibility and good performance with the copy-on-write capability. In addition, it inherits the advantages of using OS compatibility layers in general. For example, it can achieve low development cost because it can use the rich host OS functionalities such as system calls, useful libraries, and high-level languages such as Rust and Go. It also achieve seamlessness because there is only a single kernel and all system resources are managed by the kernel with the single management policy.
  6. Now, we explain the implementation. Our target is Linux 4 point 6 running on x eighty-six-sixty-four processors with the support of Inter VT-x. We implemented a Linux compatibility layer for macOS called Noah, that is, we can run Linux binaries on macOS without modifications. This implementation is mature enough to run many Linux applications. For example, we can build Linux kernels and run several X11 applications on Noah. We also implemented a preliminary version for Windows that supports the copy-on-write capability so that we can confirm the advantage of our architecture. Unfortunately, it does not implement many system calls yet.
  7. As for memory management, there are two page tables in our architecture. That is, the guest page table in the VM and nested page table in the VMM. To avoid handling two page tables, we should fix one page table and manipulate only the other. Which one to choose is a design choice. We chose to fix the guest page table and manipulate the nested page table for easy implementation and debugging. The VMM module provides the API to manipulate the nested page tables and the monitor process can map memory pages to the VM by specifying the virtual address of the monitor process. So, the monitor process can easily change the page mappings of the VMs. This approach has one limitation. Since current Intel CPUs support only up to 39-bit physical address, we cannot use the all 48-bit virtual address. Fortunately, the only problem of this in Linux is the default stack address, and we can safely move the stack to the lower address. We should note that there is no kernel in the VM, so we do not need the higher region of the guest virtual address space.
  8. As for process management, we used different approaches on macOS and Windows. On macOS, we can use the fork system call to fork the monitor process, and implemented a subset of the clone system call. Unfortunately, Apple Hypervisor.framework does not support the fork of the process with a virtual machine. Therefore, we first save and destroy the VM before fork, and then restore the VM state after fork. On Windows, the monitor process cannot use fork because Windows does not support it. So, we implemented the fork functionality by ourself using shared memory and virtualization technology. That is, we created a memory region shared among the monitor processes, and the monitor processes trap the page faults and performs the copy-on-write. The implementation is a little bit complicated, but basically similar to the implementation in the OS kernel.
  9. As for process management, we used different approaches on macOS and Windows. On macOS, we can use the fork system call to fork the monitor process, and implemented a subset of the clone system call. Unfortunately, Apple Hypervisor.framework does not support the fork of the process with a virtual machine. Therefore, we first save and destroy the VM before fork, and then restore the VM state after fork. On Windows, the monitor process cannot use fork because Windows does not support it. So, we implemented the fork functionality by ourself using shared memory and virtualization technology. That is, we created a memory region shared among the monitor processes, and the monitor processes trap the page faults and performs the copy-on-write. The implementation is a little bit complicated, but basically similar to the implementation in the OS kernel.
  10. This is the result of macro benchmark using Phoronix Test Suite. We can see from the graph that some applications became slower and some applications became faster depending on their characteristics. If the application issues many simple system calls, the overhead will become higher. If the application issues many complicated system calls, the overhead will become lower. If the application causes many page faults, it could become faster. Since one of our target application is build environments, kernel build performance is a good benchmark. It was 16% percent overhead and we believe this is a reasonable performance.
  11. This is the result of the primitive benchmark. We measured the CPU cycles of dup() system call. dup() is a simple system call but it actually call the OS kernel, so we used it to measure the breakdown of the system call. As shown in the figure, VM enter and exit took around three hundred cycles and they were not so high. The system call itself took only around two thousands cycles. Unfortunately, the VMM module took extra CPU cycles to enter and exit the VM. We believe there is still room for optimization in this part and we can further reduce the overhead on system calls.
  12. This is the result of the micro benchmark using lmbench. This benchmark measured the performance related to the process and processor. From this figure, we found that the overhead of simple system calls was high, because the cost of context switches is high, but in complex system calls, the context switch cost became relatively lower. One interesting result is the performance of the exec system call. It became faster than macOS because the exec system call is mainly implemented by our system in a simple way, and the implementation in macOS may be very complicated due to its kernel structure.
  13. This is another result of lmbench. The result shows that Noah incurred up to 42 percent overhead on basic file-related system calls. Another interesting result is that page fault and protection fault handling was much faster in our system than macOS. This is probably because of the same reason with the exec system call, because memory management is mainly implemented by our system.
  14. Finally, we show the performance comparison of OS Compatibility layers. We used the Windows version of our implementation, and compared with Cygwin that is a user-level implementation of OS compatibility layers, and Windows Subsystem for Linux, shown as WSL1, as a kernel-level implementation. We can see that the performance of the dup2 system call is much slower in our system because this is a simple system call and the virtualization overhead is dominant. We can also see that write performance is comparable to the other two systems because this performance is mainly determined by the host I/O performance. Finally, we can confirm that fork performance of our system is much faster than Cygwin, especially the guest process has large data, because we support the copy-on-write capability.
  15. So, this is the conclusion. We proposed a novel OS compatibility architecture that exploits the OS-standard virtualization technology and achieved both robustness and flexibility. In our architecture, an OS compatibility layer consists of three components, that is, virtual machines to run guest processes, the VMM module to provide API for hardware virtualization technology, and monitor processes that implement the OS compatibility functions. Our implementations can run many Linux binaries on macOS and simple Linux binaries on Windows. The overhead of Linux kernel build time on Noah was six-teen percent.