The Quest for the Perfect API

https://trustworthy.systems
The Quest for the Perfect API
Gernot Heiser | gernot.heiser@data61.csiro.au | @GernotHeiser
Trustworthy Systems | Data61

Observation: Operating Systems Suck
VMware Research, April'182 |

Mungi: Single-Address-Space OS
De-couple translation
from protection

Mungi Retrospective
Mungi was a roaring success – not
• never found a convincing killer app
• workarounds for 32-bit limitations (Unix model) too accepted
• it was naïve to think we could change mainstream computing
• 64-bit address space already too small for all data
Lesson: If you want to change the world, pick the right world to change
Long-term benefits of Mungi
• built a systems group
• got us into L4

L4 Microkernel
A concept is tolerated inside the microkernel only if
moving it outside the kernel, i.e. permitting
competing implementations, would prevent the
implementation of the system’s required
functionality. [Liedtke, SOSP’95]

25 Years of L4 Microkernel R&D
L3→L4 “X” Hazelnut Pistachio
L4/Alpha
L4/MIPS
OKL4-µKernel
OKL4-Microvisor
Codezero
P4 → PikeOS
Fiasco Fiasco.OC
L4-embed.
Nova
GMD/IBM/Karlsruhe
UNSW/NICTA/Data61
Dresden
Other (commercial)
OK Labs
API Inheritance
Code Inheritance
93 94 95 96 97 98 99 00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15
Qualcomm
modem chips
iOS secure
enclave

L4 IPC Performance Over the Years
Name Year Processor MHz Cycles µs
Original 1993 i486 50 250 5.00
Original 1997 Pentium 160 121 0.75
L4/MIPS 1997 R4700 100 86 0.86
L4/Alpha 1997 21064 433 45 0.10
Hazelnut 2002 Pentium 4 1,400 2,000 1.38
Pistachio 2005 Itanium 1,500 36 0.02
OKL4 2007 XScale 255 400 151 0.64
NOVA 2010 i7 Bloomfield (32-bit) 2,660 288 0.11
seL4 2013 ARM11 532 188 0.35
seL4 2018 i7 Haswell (64-bit) 3,400 442 0.13
seL4 2018 Cortex A9 1,000 303 0.30

Minimality: Source-Code Size
Name Architecture C/C++ asm total kSLOC
Original i486 0 6.4 6.4
L4/Alpha Alpha 0 14.2 14.2
L4/MIPS MIPS64 6.0 4.5 10.5
Hazelnut x86 10.0 0.8 10.8
Pistachio x86 22.4 1.4 23.0
L4-embedded ARMv5 7.6 1.4 9.0
OKL4 3.0 ARMv6 15.0 0.0 15.0
Fiasco.OC x86 36.2 1.1 37.6
seL4 ARMv6 9.7 0.5 10.2

Original L4: Design & Implementation
Implementation Tricks [SOSP’93]
• Process kernel
• Virtual TCB array
• Lazy scheduling
• Direct process switch
• Non-preemptible
• Non-portable
• Non-standard calling convention
• Assembler
• Design Decisions [SOSP’95]
• Synchronous IPC
• Rich message structure, arbitrary out-
of-line messages
• Zero-copy register messages
• User-mode page-fault handlers
• Threads as IPC destinations
• IPC timeouts
• Hierarchical IPC control
• User-mode device drivers
• Process hierarchy
• Recursive address-space construction
Objective: Minimise cache footprint and TLB misses

seL4:
Rethinking Resource Management

Memory Management
Global Resource Manager
RAM
I+D
GRM
I+D
Resource Manager
RM
I+D
Resource Manager
RM
I+D
Addr
Space
AS
Addr
Space
Addr
Space
RM
RM
I+DResources fully
delegated, allows
autonomous operation
enabled by capabilities
Strong isolation,
No shared kernel
resources
Design for isolation:
No memory
allocation by kernel

Isolation Goes Deep
VMware Research, April'18
High Low
TCBs Caps
PTs
TCBs Caps
PTs
Kernel data
partitioned
like user data
14 |

How About Temporal Isolation?
Safety: Timeliness
• Execution interference
Security: Confidentiality
• Leakage via timing channels
High Low
Observe execution speed:
Confidentiality violation
Affect execution speed:
Integrity violation

Integrity Challenge: Mixed Criticality
Runs every 100 ms
for few millisecods
Runs frequently but for
short time (order of µs)
Control
loopSensor
readings
NW
driver
NW
interrupts
NW driver must preempt control loop
• … to avoid packet loss
• Driver must run at high prio
• Driver must be trusted not to monopolise CPU

Scheduling Contexts: Caps for Time
Classical thread attributes
• Priority
• Time slice
New thread attributes
• Priority
• Scheduling context capability
Not
runnable
if null
Not
runnable
if null
Scheduling context object
• T: period
• C: budget (≤ T)
Limits CPU
access!
SchedControl capability
conveys right to assign
budgets (i.e. perform
admission control)
C = 2
T = 3
C = 250
T = 1000
Capability
for time

Confidentiality: Closing Timing-Channels
High Low
Prevent observation of
execution speed
• Black-box, OS-enforced isolation
• No requirement to trust High code not to leak
• No requirement for modifying High code
• High and Low code untrusted – mandatory confinement
• Should also protect against data-dependent execution time
Time protection,
just like standard
memory protection
Eliminates covert channels
required for
Meltdown/Spectre
exploits

Mitigation: Prevent Sharing of State
High Low
Cache
Context Switch
Flush
Cannot partition on-core
caches (L1, TLB, branch
predictor, prefetchers)
• virtually-indexed
• OS cannot control access
Cache
High Low
High Low
Cache
Partition thru
page colouring

Colouring User Memory is Easy
RAM
I+D
GRM
I+D
Resource Manager
RM
I+D
Resource Manager
RM
I+D
Partitions restricted
to coloured
memory
System permanently
coloured

Colouring the Kernel
RAM
I+D
GRM
I+D
Resource Manager
RM
I+D
Resource Manager
RM
I+D
Each partition has
own kernel image
Kernel
clone!
I+DI+D
Only shared kernel data:
• Scheduler queue array & bitmap
• Pointers to current: thread, kernel,
page table, cap space, FPU state

Formal Verification –
The Killer “App”

Abstract
Model
Integrity
Proof
C Imple-
mentation
Proof
Confidentiality Availability
Binary code
ProofProofProof
Functional
correctness
Isolation properties
Translation
correctness
Exclusions (all in progress):
• Initialisation
• Privileged state & caches
• Multicore
• Temporal isolation
Worst-case
execution time
World’s fastest
microkernel!
Provable Security Enforcement

What Made Verification Possible?
• Suitable design:
• Microkernel, of course!
• Isolation-oriented resource management helped proving global invariants
• Resource-management model crucial for proving isolation properties
• From-scratch implementation:
• Verifying code not written for verification is infeasible
• Feedback loop between implementers and verifiers is essential

Why Build Your Own OS?
• Cannot really rethink abstractions on an legacy OS
• From-scratch implementation feasible and necessary for high-
performance microkernels
• Verification of existing code bases infeasible

https://trustworthy.systems
Thank you!

The Quest for the Perfect API

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to The Quest for the Perfect API

Similar to The Quest for the Perfect API (20)

Recently uploaded

Recently uploaded (20)

The Quest for the Perfect API