Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

The Quest for the Perfect API

41 views

Published on

Discusses my 25-year journey for finding the perfect operating-system interface, covering our work on the Mungi single-address-space operating system (SASOS), early work on L4 microkernels, and now the seL4 microkernel, its evolution and verification.

Talk originally given at a seminar series hosted by VMware Research on occasion of the company's 20st anniversary.

Published in: Science
  • Be the first to comment

  • Be the first to like this

The Quest for the Perfect API

  1. 1. https://trustworthy.systems The Quest for the Perfect API Gernot Heiser | gernot.heiser@data61.csiro.au | @GernotHeiser Trustworthy Systems | Data61
  2. 2. Observation: Operating Systems Suck VMware Research, April'182 |
  3. 3. Mungi Single-Address-Space OS
  4. 4. Mungi: Single-Address-Space OS De-couple translation from protection VMware Research, April'184 |
  5. 5. Mungi Retrospective Mungi was a roaring success – not • never found a convincing killer app • workarounds for 32-bit limitations (Unix model) too accepted • it was naïve to think we could change mainstream computing • 64-bit address space already too small for all data Lesson: If you want to change the world, pick the right world to change Long-term benefits of Mungi • built a systems group • got us into L4 VMware Research, April'185 |
  6. 6. L4 Microkernel
  7. 7. L4 Microkernel VMware Research, April'187 | A concept is tolerated inside the microkernel only if moving it outside the kernel, i.e. permitting competing implementations, would prevent the implementation of the system’s required functionality. [Liedtke, SOSP’95]
  8. 8. 25 Years of L4 Microkernel R&D VMware Research, April'188 | L3→L4 “X” Hazelnut Pistachio L4/Alpha L4/MIPS OKL4-µKernel OKL4-Microvisor Codezero P4 → PikeOS Fiasco Fiasco.OC L4-embed. Nova GMD/IBM/Karlsruhe UNSW/NICTA/Data61 Dresden Other (commercial) OK Labs API Inheritance Code Inheritance 93 94 95 96 97 98 99 00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 Qualcomm modem chips iOS secure enclave
  9. 9. L4 IPC Performance Over the Years VMware Research, April'189 | Name Year Processor MHz Cycles µs Original 1993 i486 50 250 5.00 Original 1997 Pentium 160 121 0.75 L4/MIPS 1997 R4700 100 86 0.86 L4/Alpha 1997 21064 433 45 0.10 Hazelnut 2002 Pentium 4 1,400 2,000 1.38 Pistachio 2005 Itanium 1,500 36 0.02 OKL4 2007 XScale 255 400 151 0.64 NOVA 2010 i7 Bloomfield (32-bit) 2,660 288 0.11 seL4 2013 ARM11 532 188 0.35 seL4 2018 i7 Haswell (64-bit) 3,400 442 0.13 seL4 2018 Cortex A9 1,000 303 0.30
  10. 10. Minimality: Source-Code Size VMware Research, April'1810 | Name Architecture C/C++ asm total kSLOC Original i486 0 6.4 6.4 L4/Alpha Alpha 0 14.2 14.2 L4/MIPS MIPS64 6.0 4.5 10.5 Hazelnut x86 10.0 0.8 10.8 Pistachio x86 22.4 1.4 23.0 L4-embedded ARMv5 7.6 1.4 9.0 OKL4 3.0 ARMv6 15.0 0.0 15.0 Fiasco.OC x86 36.2 1.1 37.6 seL4 ARMv6 9.7 0.5 10.2
  11. 11. Original L4: Design & Implementation Implementation Tricks [SOSP’93] • Process kernel • Virtual TCB array • Lazy scheduling • Direct process switch • Non-preemptible • Non-portable • Non-standard calling convention • Assembler • Design Decisions [SOSP’95] • Synchronous IPC • Rich message structure, arbitrary out- of-line messages • Zero-copy register messages • User-mode page-fault handlers • Threads as IPC destinations • IPC timeouts • Hierarchical IPC control • User-mode device drivers • Process hierarchy • Recursive address-space construction VMware Research, April'1811 | Objective: Minimise cache footprint and TLB misses
  12. 12. seL4: Rethinking Resource Management
  13. 13. Memory Management VMware Research, April'1813 | Global Resource Manager RAM I+D GRM I+D Resource Manager RM I+D Resource Manager RM I+D Addr Space AS Addr Space Addr Space RM RM I+DResources fully delegated, allows autonomous operation enabled by capabilities Strong isolation, No shared kernel resources Design for isolation: No memory allocation by kernel
  14. 14. Isolation Goes Deep VMware Research, April'18 High Low TCBs Caps PTs TCBs Caps PTs Kernel data partitioned like user data 14 |
  15. 15. How About Temporal Isolation? Safety: Timeliness • Execution interference Security: Confidentiality • Leakage via timing channels High Low Observe execution speed: Confidentiality violation Affect execution speed: Integrity violation VMware Research, April'1815 |
  16. 16. Integrity Challenge: Mixed Criticality Runs every 100 ms for few millisecods Runs frequently but for short time (order of µs) Control loopSensor readings NW driver NW interrupts NW driver must preempt control loop • … to avoid packet loss • Driver must run at high prio • Driver must be trusted not to monopolise CPU VMware Research, April'1816 |
  17. 17. Scheduling Contexts: Caps for Time Classical thread attributes • Priority • Time slice New thread attributes • Priority • Scheduling context capability VMware Research, April'1817 | Not runnable if null Not runnable if null Scheduling context object • T: period • C: budget (≤ T) Limits CPU access! SchedControl capability conveys right to assign budgets (i.e. perform admission control) C = 2 T = 3 C = 250 T = 1000 Capability for time
  18. 18. Confidentiality: Closing Timing-Channels VMware Research, April'1818 | High Low Prevent observation of execution speed • Black-box, OS-enforced isolation • No requirement to trust High code not to leak • No requirement for modifying High code • High and Low code untrusted – mandatory confinement • Should also protect against data-dependent execution time Time protection, just like standard memory protection Eliminates covert channels required for Meltdown/Spectre exploits
  19. 19. Mitigation: Prevent Sharing of State VMware Research, April'1819 | High Low Cache Context Switch Flush Cannot partition on-core caches (L1, TLB, branch predictor, prefetchers) • virtually-indexed • OS cannot control access Cache High Low High Low Cache Partition thru page colouring
  20. 20. Colouring User Memory is Easy VMware Research, April'1820 | Global Resource Manager RAM I+D GRM I+D Resource Manager RM I+D Resource Manager RM I+D Partitions restricted to coloured memory System permanently coloured
  21. 21. Colouring the Kernel VMware Research, April'1821 | Global Resource Manager RAM I+D GRM I+D Resource Manager RM I+D Resource Manager RM I+D Each partition has own kernel image Kernel clone! I+DI+D Only shared kernel data: • Scheduler queue array & bitmap • Pointers to current: thread, kernel, page table, cap space, FPU state
  22. 22. Formal Verification – The Killer “App”
  23. 23. Abstract Model Integrity Proof C Imple- mentation Proof Confidentiality Availability Binary code ProofProofProof Functional correctness Isolation properties Translation correctness Exclusions (all in progress): • Initialisation • Privileged state & caches • Multicore • Temporal isolation Worst-case execution time World’s fastest microkernel! Provable Security Enforcement VMware Research, April'1823 |
  24. 24. What Made Verification Possible? • Suitable design: • Microkernel, of course! • Isolation-oriented resource management helped proving global invariants • Resource-management model crucial for proving isolation properties • From-scratch implementation: • Verifying code not written for verification is infeasible • Feedback loop between implementers and verifiers is essential VMware Research, April'1824 |
  25. 25. Why Build Your Own OS? • Cannot really rethink abstractions on an legacy OS • From-scratch implementation feasible and necessary for high- performance microkernels • Verification of existing code bases infeasible VMware Research, April'1825 |
  26. 26. https://trustworthy.systems Thank you!

×