Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Noah - Robust and Flexible Operating System Compatibility Architecture - Container runtime meetup #2

332 views

Published on

The presentation of "Noah - Robust and Flexible Operating System Compatibility Architecture" at Container runtime meetup #2, Japan, 2020/08/22

Published in: Software
  • Login to see the comments

Noah - Robust and Flexible Operating System Compatibility Architecture - Container runtime meetup #2

  1. 1. Noah A Robust and Flexible Operating System Compatibility Architecture Takahiro Shinagawa Shinichi HonidenYuichi Nishiwaki Takaya Saeki (@nullpo_head)
  2. 2. Who I am? • Takaya Saeki (@nullpo_head) • Software Engineer • Likes web and system layer • Projects that might sound interesting • Noah • Ported XV6 OS to MIPS, and to a home-built FPGA CPU • Sudo by Windows Hello in WSL 2
  3. 3. What’s Noah? 3
  4. 4. Noah: User-space Linux*compatibility layer powered by virtualization 1. As an implementation Noah runs Linux apps in macOS, like WSL 1 in Windows 2. As an architecture Noah is a kind of user-space kernel for OS compatibility, powered by virtualization. No Linux emulation kernel extension (unlike WSL 1) • Loads a guest binary to an empty VM without any kernel • Traps System calls, and emulates them in the user space with • Accomplishes memory management such as CoW by virtualization 4 * The architecture is not limited to Linux, but can be applied to other operating systems => Technially fun! => Academic novelty (APSys ’17, VEE ‘20)
  5. 5. Short Demo 5
  6. 6. 6 Video: Download pptx to play it
  7. 7. Why did we start Noah project? 7
  8. 8. Linux • One of the most important operating systems • Today’s de facto standard ecosystem • Kubernetes / Docker
  9. 9. OS Compatibility Layers • Windows and FreeBSD have Linux compatibility layer to utilize Linux ecosystem natively • So, why not let macOS have one? • Then, Linux ABI would be lingua franca! • What is more, creating yet another Linux layer is fun! • Started in 2016 as a MITOH project by me and Yuichi Nishiwaki.
  10. 10. The Architecture of Noah 10
  11. 11. Implementing OS compatibility layer: Kernel-space vs User-space • Kernel space 👍 Flexibility to achieve binary compatibility • System calls and memory management can be easily handled 👎 Vulnerability against bugs in OS compatibility layers • A bug could lead to system crashes • User space 👍 Robustness against bugs • Bugs do not affect the OS stability 👎 Challenges to achieve full compatibility • E.g., copy-on-write not implemented in Cygwin A Robust and Flexible Operating System Compatibility Architecture (VEE 2020, March 17, 2020) 11 Host OS Kernel Guest Application Binary OS Compatibility Layer Host OS Kernel Guest Application Binary OS Compat. Layer
  12. 12. Noah’s OS Compatibility Architecture • Running each guest process in a VM (without its OS kernel) 👍 Robustness • Most part of OS compatibility layers can be implemented in user space • Bugs do not cause kernel crashes 👍 Flexibility • Hardware virtualization technology provides low-layer event handling functionalities • E.g., trapping system calls and page faults, manipulating page tables, … 12 Host OS Kernel OS Compatibility Layer VMHost Process Standardized Virtualization Interface Guest Application Process CPUHardware Virtualization Function ⇒ Published as papers for its academic novelty [T.Saeki, Y.Nishiwaki, T.Shinagawa, S.Honiden] • A robust and flexible operating system compatibility architecture, in VEE 2020 • Bash on Ubuntu on macOS, in APSys 2017
  13. 13. Overall Design • Three main components 1. Guest VM 2. VMM module 3. Monitor process 13 monitor process guest process Guest VMs kernel emulate system calls User Space Kernel space trap system calls & exceptions no kernel upcall monitor VMM module manage VMs Host OS
  14. 14. Our approach: Utilize Virtualization Technology 14
  15. 15. Our approach: Utilize Virtualization Technology 15 1. Monitor process launches a new VM and loads ELF inside it without kernel
  16. 16. Our approach: Utilize Virtualization Technology 16 2. The ELF application calls Linux system calls when running in the VM. Then, they are trapped by the VMM.
  17. 17. Our approach: Utilize Virtualization Technology 17 3. The VMM passes the trapped system call to the monitor process
  18. 18. Our approach: Utilize Virtualization Technology 18 4. The Monitor process emulates the behavior of the Linux system call with host OS’s system calls
  19. 19. macOS Monitor Process Monitor Process Bash Bash fork fork() fork() Clone the VM state 21 $ noah bash $ cat file | grep 42 Example: Inter-process communication
  20. 20. macOS Monitor Process Monitor Process Bash Bash exec to “cat” execve(…) cat Replace VM contents 22 $ noah bash $ cat file | grep 42 Example: Inter-process communication
  21. 21. macOS Monitor Process Monitor Process Monitor Process Bash cat write read grep 23 $ noah bash $ cat file | grep 42 Example: Inter-process communication
  22. 22. macOS Monitor Process Monitor Process Regular Native Process Bash cat Can do IPC with native apps smoothly 24 $ noah bash $ cat file | grep 42 Example: Inter-process communication
  23. 23. Advantages of User-space compatibility layer with virtualization 1. Robust • Do not cause OS crash, because it’s just a user-space app except VMM 2. Flexible • Thanks to VMM, achieve binary compatibility, CoW by user-space kernel 3. Portable and has lower development cost, compared to kernel-space • Rich host OS functionalities: system calls, libraries, high-level languages… • Actually, NoahW is implemented by C++ with Boost 4. Seamlessness • Single kernel: share resources such as FS, memory, process scheduling, IPC… 25
  24. 24. Implementation 26
  25. 25. Implementation • Target Linux 4.6 of x86-64 (Intel VT-x) • Noah: Linux compatibility layer for macOS • Use Apple Hypervisor.framework as the VMM module • NoahW: Linux compatibility layer for Windows (preliminary) • Use Intel Hardware Accelerated Execution Manager as the VMM module 27
  26. 26. Memory Management • Two page tables (a) Guest page table in the VM (b) Nested page table (EPT) in the VMM • Fix (a) and modify (b) • (a) is fixed to the straight mapping • Virtual address = Physical address • (b) can be manipulated with the API • Provided by the VMM module Limitation: GVA is up to 512 GiB • 39-bit physical address in Intel CPU • 48-bit virtual address • Stack is moved to the lower address • No kernel area A Robust and Flexible Operating System Compatibility Architecture (VEE 2020, March 17, 2020) 28 511 GiB 0 GVA GPA HPA GVA: Guest Virtual Address GPA: Guest Physical Address HPA: Host Physical Address 512 GiB Guest page table (fixed) Nested page table (modified) 1-GiB guest system data area (page tables, segment descriptors, …)
  27. 27. Process Management (fork) • Noah (on macOS) • Implement a subset of clone() • Apple Hypervisor.framework does not support fork() with a VM • Save and destroy the VM before fork() • Restore the VM after fork() • NoahW (on Windows) • Implement fork() with copy-on-write using shared memory and virtualization • Create a memory region shared among monitor processes • Save, restore, and modify the VM states on fork() • Trap page faults in the VMs to implement copy-on-write A Robust and Flexible Operating System Compatibility Architecture (VEE 2020, March 17, 2020) 29
  28. 28. File System • Implemented VFS layer • To run Linux apps, the default FS is mapped as follows 30 / /usr /etc /Users /dev /tmp /Users /dev ~/.noah/tree/usr ~/.noah/tree/etc /tmp
  29. 29. Other Systemcalls • Futex • emulate with conditional value • Signal • Implement delivery system inside Noah • Socket • Integrate with Noah’s VFS • IO such readv64 / writev64 • Simulate incompatible small IO system calls. 31
  30. 30. That’s how Noah implements Linux kernel to run Linux apps! 32
  31. 31. Demo 33
  32. 32. 34 Video: bash, vi, gcc, and cross-platform; Download pptx to play it
  33. 33. 35 Video: Play Doom; Download pptx to play it
  34. 34. Evaluation 36
  35. 35. Macro Benchmark (Phoronix Test Suite + α) A Robust and Flexible Operating System Compatibility Architecture (VEE 2020, March 17, 2020) 37 16% -23% -4% 50% -58% 9% -200% -100% 0% 100% 200% Linux kernel build unpack-linux postmark sqlite openssl compress-7zip
  36. 36. Primitive Benchmark: dup() system call A Robust and Flexible Operating System Compatibility Architecture (VEE 2020, March 17, 2020) 38 270 3202520 11091 1330 7044 2770 2118 588 5504 2809 11091 251 297 0 5,000 10,000 15,000 20,000 25,000 30,000 35,000 40,000 macOS Windows CycleNumber VM enter downcall post-process host syscall pre-process upcall VM exit
  37. 37. Micro Benchmark: lmbench (processor, process) A Robust and Flexible Operating System Compatibility Architecture (VEE 2020, March 17, 2020) 39 410% 310% 175% 172% 13% 329% 239% 256% -24% 46% -100% 0% 100% 200% 300% 400% null call null I/O stat open clos slct TCP sig inst sig hndl fork proc exec proc sh proc
  38. 38. Micro Benchmark: lmbench (File & VM latency) A Robust and Flexible Operating System Compatibility Architecture (VEE 2020, March 17, 2020) 40 42% 5% 17% 4% 28% -45% -92% 8% -100% -50% 0% 50% 0K Create 0K Delete 10K Create 10K Delete Mmap Latency Prot Fault Page Fault 100fd selct
  39. 39. Comparison of OS Compatibility Layers Benchmark NoahW Cygwin WSL1 dup2() [call per second] 36,723 556,453 693,309 write() [call per second] 0.30 0.56 0.57 fork() (0 MiB array) [ms] 106.4 219.4 2.06 fork() (512 MiB array) [ms] 338.9 789.9 32.51 fork() (1 GiB array) [ms] 458.4 1531.8 62.66 A Robust and Flexible Operating System Compatibility Architecture (VEE 2020, March 17, 2020) 41
  40. 40. Summary • Noah has a novel OS compatibility architecture • Exploited the OS-standard virtualization technology support • Achieved both robustness and flexibility • The architecture consists of three components • VMs to run guest processes • The VMM module to provide API for hardware virtualization technology • Monitor processes to implement OS compatibility functions • Run Linux binaries on macOS, and Windows (preliminary) • Noah implemented 172 out of 329 Linux system calls • The overhead of Linux kernel build time on Noah was 16% A Robust and Flexible Operating System Compatibility Architecture (VEE 2020, March 17, 2020) 42
  41. 41. Wait, so you don’t mention to containers at all…???? 🙄 In “Container Runtime Meetup”..??? 🙄 43
  42. 42. Noah as an OCI Runtime • OCI Runtime • The spec of the layer of runc • Runs container images • E.g.) runc, Gvisor’s runsc • Why not add Noah to them? • Run Linux image (near) natively on macOS • I can finally talk about containers in this Container Runtime Meetup #2 😂 44
  43. 43. Thanks, Hajime-san… • Containerd and Dockerd buildable on macOS • https://github.com/ukontainer/containerd • https://github.com/ukontainer/dockerd-darwin 45
  44. 44. 46 This joke was made since late at night yesterday, so enjoy the simple demo as much as possible! 🤗
  45. 45. 47 Video: Noah as an OCI runtime; Download pptx to play it

×