OS frontiers in the AI era
Felix Xiaozhu Lin
1
Who Am I
• Fall 2020. Assoc. Prof, UVA CS – Looking forward to it!
• 2014 – 2020. Asst. Prof, Purdue ECE
• 2014 PhD in CS. Rice 
• 2008 MS + BS. Tsinghua
2
What is an OS?
Undergrads (Some) data scientists Tech media
4
Are these OSes?
the era of AI!
Where are 
we today?
5
https://en.wikipedia.org/wiki/Time_(xkcd)#/media/File:Xkcd_time_frame_0001.png
The 3rd boom of AI 
1960s 1970s 1980s 1990s 2000s 2010s 2020s
Classical AI Expert systems Machine learning
Ups
Downs
6
Two Pendulums
1960s 1970s 1980s 1990s 2000s 2010s 2020s
Expert systems Machine learning
Ups
Downs
Centralized
Decentralized
Dedicated
hw & net
PC + Internet
Datacenters
Cloud computing
5G, edge, 
embedded AI…
7
Classical AI
Two Pendulums
1960s 1970s 1980s 1990s 2000s 2010s 2020s
Expert systems Machine learning
Generalized
Specialized
Microcomputers,
x86, DBMS, PC
Mainframes,
Apollo11
Ups
Downs
Wearables/IoT, 
NoSQL, GPU/FPGA, RISCV
Centralized
Decentralized
Dedicated
hw & net
PC + Internet
Datacenters
Cloud computing
5G, edge, 
embedded AI…
8
Classical AI
Two Pendulums
1960s 1970s 1980s 1990s 2000s 2010s 2020s
Expert systems Machine learning
Generalized
Specialized
Microcomputers,
x86, DBMS, PC
Mainframes,
Apollo11
Ups
Downs
Wearables/IoT, 
NoSQL, GPU/FPGA, RISCV
Centralized
Decentralized
Dedicated
hw & net
PC + Internet
Datacenters
Cloud computing
5G, edge, 
embedded AI…
9
Classical AI
My view of OS
OS == System Software == Software Infrastructure 
12
What you have learnt from undergrad OS class
1960s 1970s 1980s 1990s 2000s 2010s 2020s
GOFAI Expert systems Machine 
learning
Generalized
Specialized
Microcomputers,
x86, DBMS, PC
Mainframes,
Apollo11
Ups
Downs
Wearables/IoT, 
NoSQL, GPU/FPGA, RISCV
Centralized
Decentraliz
ed
Dedicated
hw & net
PC + Internet
Datacenters
Cloud 
computing
5G, edge, 
embedded AI…
14
What OS people are working on
1960s 1970s 1980s 1990s 2000s 2010s 2020s
GOFAI Expert systems Machine 
learning
Generalized
Specialized
Microcomputers,
x86, DBMS, PC
Mainframes,
Apollo11
Ups
Downs
Wearables/IoT, 
NoSQL, GPU/FPGA, RISCV
Centralized
Decentraliz
ed
Dedicated
hw & net
PC + Internet
Datacenters
Cloud 
computing
5G, edge, 
embedded AI…
15
So, where are 
we today?
17
• Diverse 
• A few common OSes
• Many specialized ones
• Serving “things” in addition to humans
• Defined by scenarios
• Generic OSes  firmware
• Specialized OSes  overlays
• Blurry boundaries
• arch, runtime, compiler, kernel, 
hypervisor, trusted exec environment…
Cloud
Edge
Devices
18
Three flavors of OS research
1. Tune up
2. Do a specific thing well
3. Show possibility 
Common themes:
open blackboxes, 
break, & build
19
Case 1: A highspeed stream 
analytics engine
21
High‐throughput. Sub‐second delay.
Timely processing before data gets cold!  22
“Hot springs”: telemetry events
Power sensor
140M events/day
Oil rig
1‐2TB/day
Manufacturing machines
PBs/day
Stream analytics: state of the art
• Classic engines?
• StreamBase, Aurora, TelegraphCQ, NiagaraST…
• Single threaded. Not scaling well. 
• Modern engines for datacenters?
• Apache Flink, Spark Streaming, Beam…
• Designed for tens ‐ hundreds of machines. Scaling out. 
• Assuming okay if individual nodes perform poorly
• As analytics moves to the edge  bad
23
Project StreamBox
stream analytics at the memory speed
24
• RDMA
• Co‐designed with 
mm/scheduling
• RDMA
• Co‐designed with 
mm/scheduling
Stream pipeline Threads
Ingestion
Scheduler
Mem
• Squeeze parallelism for 
multi/manycore
• Manage NUMA domains
• Squeeze parallelism for 
multi/manycore
• Manage NUMA domains
Exploit high‐bandwidth memoryExploit high‐bandwidth memory
[ASPLOS'19] "StreamBox‐HBM: Stream Analytics on High Bandwidth Hybrid Memory," Hongyu Miao, Myeongjae Jeon, Gennady Pekhimenko, 
Kathryn S. McKinley, and Felix Xiaozhu Lin
[USENIX ATC'17] "StreamBox: Modern Stream Processing on a Multicore Machine," Hongyu Miao, Heejin Park, Myeongjae Jeon, Gennady 
Pekhimenko, Kathryn S. McKinley, and Felix Xiaozhu Lin, in Proc. USENIX Annual Technical Conference, 2017.
[ASPLOS'16] "memif: Towards Programming Heterogeneous Memory Asynchronously," Felix Xiaozhu Lin and Xu Liu, in Proc. ACM Int. Conf. 
Architectural Support for Programming Languages and Operating Systems, 2016.
CoresCores
High‐bandwidth hybrid memory
25
3D DRAM Normal DRAM
Tradeoffs: capacity vs. bandwidth
Untraditional memory hierarchy
No latency benefit 
Unlike SRAM+DRAM
16 GB
375 GB/s
~100 GB
~100 GB/s
26
Already on off‐the‐shelf machinesIntel Xeon Phi Knights Landing (KNL)
27
HBM
CoresCores
Normal DRAM
Streaming
data
Data Bundles
Index
{key, pointer}
Capacity: Use HBM only for grouping indexes
28
Cheap VM
(huge page)
Apps
OS
kernel
Fast net stack
(40 GbE or RDMA)
High task 
parallelism
Custom mem 
allocator
Sequential mem
access
Runtime
Thread pool 
+ custom task scheduler
Wide SIMD 
(avx512)
Hybrid
memory
A system software’s approach to 3D DRAM
Blurry boundaries
Case 2: Autonomous AI on 
Cameras
[MobiSys'20] "Approximate Query Processing on Autonomous Cameras," Mengwei Xu, Xiwen Zhang, Yunxin Liu, 
Xuanzhe Liu, and Felix Xiaozhu Lin 29
X1Video: a killer AI application
30
31
32
33
Cut the cords
Run on harvested energy
Solar 
Wind 
34
35
… and on wireless
Construction sites Farms
Boats/RVs Warehouses
Photos Credits: Reolink
36
Pervasive cameras
Object 
Counts
Elf
Running analytics on wire‐free 
cameras?
37
Query: (car, 30 mins)
Install
7:00AM-7:30AM [500 + 100] Cars
7:30AM-8:00AM [700 + 140] Cars
8:00AM-8:30AM [800 + 180] Cars
8:30AM-9:00AM [400 + 100] Cars
9:30AM-10:00AM [200 + 80] Cars
Sample & 
capture
200‐80 +80
Elf: Query model
38
Camera Operating System
Planning Energy via 
Reinforcement Learning
Planning Energy via 
Reinforcement Learning
Sampled frames
Aggregator
with error
Integration
Aggregator
with error
Integration
Selected NeuralNet
Object
counts
Elf: System Internals
39
Elf prototype: heterogeneous processors
40
7:00AM-7:30AM [500 + 100] Cars
7:30AM-8:00AM [700 + 140] Cars
8:00AM-8:30AM [800 + 180] Cars
8:30AM-9:00AM [400 + 100] Cars
9:30AM-10:00AM [200 + 80] Cars
Ground
Truth
Error:
11%
Confidence interval width: 17%
Auburn, AL
Hampton, NY Jackson, WY
Taipei Taipei
~1000 hours 
41
M4
AUTONOMOUS
Intelligence
OS for AI + AI for OS 
= Autonomous infrastructure
42
Case 3: Kernel IO on co‐processors 
43
44
45
Weak co‐processors
46
CPU Co
Proc
2.5GHz 50MHz
DRAM IO
A heterogeneous SoC
47
CPU Co
Proc
2.5GHz 50MHz
DRAM IO
Weak co‐processors: suits low‐power IO tasks!
high 
efficiency
Linux Kernel
IO 
tasks
48
CPU Co
Proc
2.5GHz 50MHz
DRAM IO
Kernel execution on weak co‐processors?
Linux Kernel
IO 
tasks
Diff ISA
No MMU
…
49
CPU Co
Proc
DRAM IO
Co‐processor translates unmodified kernel binary
Dynamic
Binary
Translation
Linux Kernel
IO 
tasks
[USENIX ATC 19] "Transkernel: An Executor for Commodity Kernels on Peripheral Cores," 
Liwei Guo, Shuang Zhai, Yi Qiao, and Felix Xiaozhu Lin
Take away: 
• Taming an OS kernel (a beast) for new hardware 
• … without re‐engineering much of the software stack 
50
Recap
• What are OSes in 2020?
• Three OS projects: 
• spanning IoT, mobile, and datacenters
• each with different flavors
• The builder culture 
• open blackboxes
• break things
• build things from the ground up
• Started by a small group of hardcore hackers
• Now more diverse and inclusive 
• A brave new world
51
StreamBox
Elf
Image credits
• https://dsportmag.com/the‐tech/test‐n‐tune/test‐tune‐2017‐subaru‐wrx‐
sti‐part3‐closer‐to‐the‐ej257s‐limits/
• https://techcrunch.com/2009/11/19/redneck‐rampage‐a‐truck‐with‐a‐jet‐
engine/?utm_source=feedburner#038;utm_medium=email
52

OS frontiers in the AI era