SlideShare a Scribd company logo
1 of 84
Download to read offline
Supercharge large IoT analytics
An OS approach
Felix Xiaozhu Lin
1
This talk has three parts that can be
sampled independently
“Kernel is firmware”
If you care about: kernel, security, file system, drivers,
TrustZone, heterogeneous SoCs, binary translation…
Goto slide 6
Large stream analytics on the edge
If you care about: stream processing, 3D-stacked memory,
parallelism, memory mgmt, in-memory computing…
Goto slide 26
Large video analytics on edge & cameras
If you care about: video intelligence, deep learning, storage,
IoT, edge computing…
Goto slide 51
2
Bio
• 2014 – now. Asst. prof, Purdue ECE
• Xroads Systems Exploration Lab
• 2014 PhD in CS. Rice
• Thesis: OS for mobile computing
• 2008 MS + BS. Tsinghua
3
What I do
• Layer?
• Operating system (in a broad sense)
• Scenarios?
• Edge & IoT (mostly)
• Objectives?
• Speed, efficiency, & security
4
My premises for OS research …
The remaining OS is defined by scenarios
Kernel is firmware
Entrees: 45 mins
Appetizers: 5 mins
6
Kernel is firmware
• Entangled subsystems
• Difficult to re-architecture or extract
• Has own evolution plan
• Likely reject new ideologies
• Little respect for stable internal interfaces
• New additions quickly become obsoleted
• Open source (a white box)
• Can we retrofit the kernel as firmware?
7
8
Case 1: Trustworthy file systems for smart devices
Retrofitting the kernel (1): reuse in vivo
9
Hw
Case 1: Trustworthy file systems for smart devices
Kernel
Apps
Retrofitting the kernel (1): reuse in vivo
10
App
Normal
world
Secure
world
Isolating secure apps in TrustZone
Retrofitting the kernel (1): reuse in vivo
Kernel
11
Normal
world
Secure
world
Keep secure data persistence?
Retrofitting the kernel (1): reuse in vivo
Kernel
FS
App
12
VFS
FS
Block layer
Reuse kernel file system in vivo
Retrofitting the kernel (1): reuse in vivo
Normal
world
Secure
world
App
13
VFS
FS
Block layer
• Cloud: a safer execution environment
• A pair of twin file systems
• File data never leaves the device’s secure world
Metadata-only
FS Replica
Untrusted Trusted
Cloud continuously verifies fs behaviors
Retrofitting the kernel (1): reuse in vivo
App
[arXiv:1902.06327] "Let the Cloud Watch Over Your IoT File Systems," Liwei Guo, Yiying Zhang, and Felix Xiaozhu Lin, 2019.
14
Case 2: Unmodified drivers for TrustZone
HW
App
Normal
world
Secure
world
Retrofitting the kernel (2): code transformation
15
Device code
Driver libs
Kernel libs
Core services
SPI CSI
WiFiEth
USB
Kernel source tree
Othercode
Transplant Linux drivers?
CAM
Retrofitting the kernel (2): code transformation
App
CAM
Secure
world
16
Device code
Driver libs
Kernel libs
Core services
SPI CSI
WiFiEth
USB
CAM
Driver
kernel
Othercode
Statically miniaturize the whole kernel
Retrofitting the kernel (2): code transformation
CAM
Kernel source tree
17
Statically miniaturize the whole kernel
Retrofitting the kernel (2): code transformation
A kernel for all → A kernel specialized for the driver only
App
Driver
kernel
Normal
world
Secure
world
18
Case 3: Kernel IO paths on co-processors
Retrofit the kernel (3): binary translation
19
Retrofit the kernel (3): binary translation
Weak co-processors
20
CPU Co
Proc
2.5GHz 50MHz
DRAM IO
A heterogeneous SoC
Retrofit the kernel (3): binary translation
21
CPU Co
Proc
2.5GHz 50MHz
DRAM IO
Weak co-processors: suits low-power IO tasks!
Retrofit the kernel (3): binary translation
high
efficiency
Linux Kernel
IO
tasks
22
CPU Co
Proc
2.5GHz 50MHz
DRAM IO
Kernel execution on weak co-processors?
Retrofit the kernel (3): binary translation
Linux Kernel
IO
tasks
Diff ISA
No MMU
No POSIX
…
23
CPU Co
Proc
DRAM IO
Co-processor translates unmodified kernel binary
Retrofit the kernel (3): binary translation
Dynamic
Binary
Translation
Linux Kernel
IO
tasks
[arXiv:1811.05000] "Transkernel: An Executor for Commodity Kernels on Peripheral Cores,"
Liwei Guo, Shuang Zhai, Yi Qiao, and Felix Xiaozhu Lin
Retrofit kernel as firmware
1. Reuse in vivo
Unmodified file systems
for TrustZone
2. Source transformation
Unmodified device drivers
for TrustZone
3. Binary translation
Unmodified IO paths for co-processors
algorithms + resources + objectives
OS defined by scenarios
OSes defined by two IoT scenarios
Hot springs
Edge
Icebergs
High-throughput. Sub-second delay.
Timely processing before data gets cold! 27
“Hot springs”: telemetry events
Power sensor
140M events/day
Oil rig
1-2TB/day
Manufacturing machines
PBs/day
Ingestion
Groupby
SensorID
Average per
sensor
28
Finding high-power sensors
Window TopK
Edge: cleanse & summarize data
10:00-10:05
10:05-10:10
130
500
302
100
150
500
302
Time 10:01
ID: 0x1024
Value: 200
Stream analytics: state of the art
• Classic engines?
• StreamBase, Aurora, TelegraphCQ, NiagaraST…
• Single threaded. Not scaling well.
• Modern engines for datacenters?
• Apache Flink, Spark Streaming, Beam…
• Designed for tens - hundreds of machines. Scaling out.
• Assuming okay if individual nodes perform poorly
• As analytics moves to the edge → bad
29
Project StreamBox
stream analytics at the memory speed
30
• RDMA / 10GbE
• Co-designed with
mm/scheduling
Stream pipeline Threads
Ingestion
Scheduler
Mem
• Squeeze parallelism for
multi/manycore
• Manage NUMA domains
Exploit high-bandwidth memory
[ASPLOS'19] "StreamBox-HBM: Stream Analytics on High Bandwidth Hybrid Memory," Hongyu Miao, Myeongjae Jeon, Gennady Pekhimenko,
Kathryn S. McKinley, and Felix Xiaozhu Lin
[USENIX ATC'17] "StreamBox: Modern Stream Processing on a Multicore Machine," Hongyu Miao, Heejin Park, Myeongjae Jeon, Gennady
Pekhimenko, Kathryn S. McKinley, and Felix Xiaozhu Lin, in Proc. USENIX Annual Technical Conference, 2017.
[ASPLOS'16] "memif: Towards Programming Heterogeneous Memory Asynchronously," Felix Xiaozhu Lin and Xu Liu, in Proc. ACM Int. Conf.
Architectural Support for Programming Languages and Operating Systems, 2016.
Cores
High-bandwidth hybrid memory
31
3D DRAM Normal DRAM
Tradeoffs: capacity vs. bandwidth
Untraditional memory hierarchy
No latency benefit (Unlike SRAM+DRAM)
16 GB
375 GB/s
~100 GB
80 GB/s
32
Already on off-the-shelf machinesIntel Xeon Phi Knights Landing (KNL)
Cool. But benefit is not free
• Two alternative configurations:
• Hw-managed: HBM as a cache
• Sw-managed: one flat address space
• Throw existing analytics engines on HBM?
• Almost no benefit (or even hurts)
33
Existing engines: 3 inadequacies
Algorithm
• HBM sequential access + high parallelism
• Existing engines: grouping is hash w/ random access
Capacity
• HBM: capacity limited
• Streaming: high data volume + high velocity
Dynamism
• Streaming: fluctuating workloads
• How to map to two memory types?
34
Ingress
Group by
key
Average per
key
Window TopK
Algorithm: HBM can accelerate grouping!
• Hash vs Sort: duals for Grouping
• Algorithmic complexity: Sort is worse than Hash
• Hash for in-core; sort for out-of-core [VLDB’09, VLDB’13, SIGMOD’15]
• Yet, Sort outperforms Hash with …
• High data parallelism (bitonic sort + avx512)
• High task parallelism (parallel merge sort)
• High mem bw (stacked DRAM)
35
[vldb’09] C Kim et al., Sort vs. hash revisited: Fast join implementation on modern multi-core cpus
[vldb’13] C Balkesen et al., Multi-core, main-memory joins: Sort vs. hash revisited
[sigmod’15] O Polychroniou et al., Rethinking simd vectorization for in-memory databases
Grouping -- Sort vs Hash?
36
0
20
40
60
80
100
120
140
160
180
0 20 40 60
millionpairs/sec
# cores
0
50
100
150
200
250
300
0 20 40 60
GB/sec
# cores
Throughput Mem bandwidth
Grouping -- Sort vs Hash?
37
0
20
40
60
80
100
120
140
160
180
0 20 40 60
millionpairs/sec
# cores
0
50
100
150
200
250
300
0 20 40 60
GB/sec
# cores
Hash DRAM
Hash DRAM
Throughput Mem bandwidth
38
0
20
40
60
80
100
120
140
160
180
0 20 40 60
millionpairs/sec
# cores
0
50
100
150
200
250
300
0 20 40 60
GB/sec
# cores
Hash DRAM
Hash DRAM
Sort DRAM Sort DRAM
Throughput Mem bandwidth
Grouping -- Sort vs Hash?
Grouping - Sort vs Hash choice reversed!
39
0
20
40
60
80
100
120
140
160
180
0 20 40 60
millionpairs/sec
# cores
0
50
100
150
200
250
300
0 20 40 60
GB/sec
# cores
Throughput Mem bandwidth
Hash DRAM
Hash DRAM
Sort DRAM
Sort HBM Sort HBM
Sort DRAM
40
HBM
Cores
Normal DRAM
Streaming
data
Data Bundles
Index
{<key, pointer>}
Capacity: Use HBM only for grouping indexes
41
HBM
Cores
Normal DRAM
Dynamism: the art of pressure balance
DRAM
Bandwidth
HBM
Capacity
StreamBox vs. Apache Flink: 5-10x faster
42
0
10
20
30
40
50
60
2 10 18 26 34 42 50 58
ThroughputMRec/s
# Cores
Flink @ x56
Flink @ KNL
StreamBox @ KNL
RDMA ingestion limit
KNL: Intel Xeon Phi Knights Landing w/ HBM. 64 cores@1.3GHz. $5,000
x56: Intel Xeon E7-4830v4. 4x14 cores @2.0GHz. 256GB. $23,000
Benchmark: Yahoo Stream Benchmark
Output delay: 1 second
StreamBox vs. Apache Flink: 5-10x faster
43
0
10
20
30
40
50
60
2 10 18 26 34 42 50 58
ThroughputMRec/s
# Cores
Flink @ x56
Flink @ KNL
StreamBox @ KNL
RDMA ingestion limit
KNL: Intel Xeon Phi Knights Landing w/ HBM. 64 cores@1.3GHz. $5,000
x56: Intel Xeon E7-4830v4. 4x14 cores @2.0GHz. $23,000
Benchmark: Yahoo Stream Benchmark.
Output delay: 1 second
~5GB/sec!
5-10x
StreamBox scales well
44
0
5
10
15
20
25
30
35
0 10 20 30 40 50 60
# cores
ThroughputMrec/s
StreamBox
HW: Intel Xeon Phi Knights Landing w/ HBM. 64 cores@1.3GHz. $5,000 Benchmark: TopK per key
Output delay: 1 second
HBM matters
45
0
5
10
15
20
25
30
35
0 10 20 30 40 50 60
# cores
ThroughputMrec/s
Not using HBM
StreamBox
DRAM only
HW: Intel Xeon Phi Knights Landing w/ HBM. 64 cores@1.3GHz. $5,000 Benchmark: TopK per key
Output delay: 1 second
Runtime memory management matters
46
0
5
10
15
20
25
30
35
0 10 20 30 40 50 60
# cores
ThroughputMrec/s
3D mem as
cache
DRAM only
HW-managed
HBM
StreamBox
HW: Intel Xeon Phi Knights Landing w/ HBM. 64 cores@1.3GHz. $5,000 Benchmark: TopK per key
Output delay: 1 second
47
0
5
10
15
20
25
30
35
0 10 20 30 40 50 60
# cores
ThroughputMrec/s
No in-mem
indexes
3D mem as
cache
DRAM only
3D mem as
cache; full records
StreamBox
In-HBM index matters
HW: Intel Xeon Phi Knights Landing w/ HBM. 64 cores@1.3GHz. $5,000 Benchmark: TopK per key
Output delay: 1 second
StreamBox lessons
• An analytics engine built from ground up
• 2.5 years. ~60,000 lines of C++11.
http://xsel.rocks/p/streambox
• Hardware often badly underutilized, even with
production software
• Performance requires careful optimization
everywhere
48
49
Cheap VM
(huge page)
Apps
OS
kernel
Fast net stack
(40 GbE or RDMA)
High task
parallelism
Custom mem
allocator
Sequential mem
access
Runtime
Thread pool
+ custom task scheduler
Wide SIMD
(avx512)
Hybrid
memory
The software engineer’s guide to 3D DRAM
Make sure to pack all the following
OSes defined by two IoT scenarios
Hot springs
Edge
Icebergs
[EuroSys'19] "VStore: Reinventing Data Stores for Video Analytics," Tiantu Xu, Luis Materon Botelho, and Felix
Xiaozhu Lin, to appear at Proc. Eurosys Conference, 2019.
Cheap cameras. Large videos.
51
130M surveillance cameras
shipped per year
Many institutions run > 200
cameras 24x7
A single camera produces
24 GB video per day
Must be consumed by
algorithms!
$25 on Amazon
Video analytics is expensive
52
NVIDIA Quadro P6000
NN object detection: 5 FPS
$4,500
Object detection: deep neural network model YOLOv3
IoT Camera
30 FPS
$25
Storage is cheap
53
Seagate Surveillance HDD 8TB
One-year of video
$250
IoT Camera
24 GB/day
$25
Cameras
EdgeEdge
Video analytics on the edge: ingestion
54
Cloud
EdgeEdge
Video analytics on the edge: query
55
Cameras
A retrospective query
“Find all white buses appeared
yesterday”
As a cascade of operators
• selects operators from a lib
• specifies target accuracies for
operators
56Image credits: NoScope: Optimizing Neural Network Queries over Video at Scale, Daniel Kang, John Emmons, Firas Abuzaid, Peter
Bailis, Matei Zaharia, VLDB 2017
Frame diff
detector
Shallow
neural net
Deep
neural net
~10,000x ~1,000x ~10x
Project VStore
The video data store for AI
57
Ingestion Storage Retrieval Consume
Video
Data
Operator
@ accuracy
Query
“Aren’t there many video databases already?”
For human consumers. Not for AI consumers
[EuroSys'19] "VStore: Reinventing Data Stores for Video Analytics," Tiantu Xu, Luis Materon Botelho, and Felix
Xiaozhu Lin, Eurosys Conference, 2019.
The first-class concern: controlling
video formats
58
Ingestion Storage Retrieval Consume
Video
Data
Operator
@ accuracy
Query
Extensive video format knobs
59
Ingestion Storage Retrieval Consume
Video
Data
Operator
@ accuracy
Query
Quality Crop Res Sample Speed
KeyFrame
Interval
Fidelity Coding
Extensive video format knobs
60
Ingestion Storage Retrieval Consume
Video
Data
Operator
@ accuracy
Query
Quality Crop Res Sample
Fidelity
Knob Impacts: High & Complex
61
Ingestion Storage Retrieval Consume
Video
Data
Operator
@ accuracy
Query
Quality Crop Res Sample
Ingestion
Storage
Retrieval
Consumption
Fidelity
Knob Impacts: High & Complex
62
Ingestion Storage Consume
Video
Data
Operator
@ accuracy
Query
Quality Crop Res Sample
Bad 100% 100p 2/3
Ingestion
Storage
Retrieval
Consumption
Retrieval
Knob Impacts: High & Complex
63
Ingestion Storage Retrieval Consume
Video
Data
Operator
@ accuracy
Query
Best 100% 100p 1/30
Ingestion
Storage
Retrieval
Consumption
Quality Crop Res Sample
Knob Impacts: High & Complex
64
Ingestion Storage Retrieval Consume
Video
Data
Operator
@ accuracy
Query
Quality Crop Res Sample
Good 75% 100p 1/2
Ingestion
Storage
Retrieval
Consumption
Configuration Space
65
Ingestion Storage Retrieval Consume
<motion,0.95>
M Storage
Formats
N Consumption
Formats
<motion, 0.7>
…
<OCR, 0.95>
<OCR, 0.90>
…
<NN, 0.95>
Operator
@ accuracy
K Consumers
Configuration Space
66
<motion,0.95>
M Storage
Formats
N Consumption
Formats
<motion, 0.7>
…
<OCR, 0.95>
<OCR, 0.90>
…
<NN, 0.95>
K Consumers
Ingestion Storage Retrieval Consume
Operator
@ accuracy
Configuration Constraints
67
<motion,0.95>
<motion, 0.7>
…
<OCR, 0.95>
<OCR, 0.90>
…
<NN, 0.95>
M Storage
Formats
N Consumption
Formats
K Consumers
Ingestion Storage Retrieval Consume
Operator
@ accuracy
Richer!
Configuration Constraints
68
<motion,0.95>
<motion, 0.7>
…
<OCR, 0.95>
<OCR, 0.90>
…
<NN, 0.95>
M Storage
Formats
N Consumption
Formats
Ingestion Storage Retrieval Consume
Operator
@ accuracy
K Consumers
Faster!
Objectives for Configuration
69
<motion,0.95>
M Storage
Formats
N Consumption
Formats
<motion, 0.7>
…
<OCR, 0.95>
<OCR, 0.90>
…
<NN, 0.95>
K Consumers
Ingestion Storage Retrieval Consume
Operator
@ accuracy
Retrieval never
be bottleneck
Satisfy accuracy
Respect resource
budgets
Challenges
70
<motion,0.95>
M Storage
Formats
N Consumption
Formats
<motion, 0.7>
…
<OCR, 0.95>
<OCR, 0.90>
…
<NN, 0.95>
K Consumers
Ingestion Storage Retrieval Consume
Operator
@ accuracy
Retrieval never
be bottleneck
Satisfy accuracy
Respect resource
budgets
Many possible formats (~15k combos of knobs)
Many possible configurations (~4M for 4 operators)
71
M Storage
Formats
N Consumption
Formats
Ingestion Storage Retrieval Consume
Operator
@ accuracy
K Consumers
Key Idea: Deriving Configuration Backwards
Backward derivation of formats
<motion,0.95>
<motion, 0.7>
…
<OCR, 0.95>
<OCR, 0.90>
…
<NN, 0.95>
72
M Storage
Formats
N Consumption
Formats
Ingestion Storage Retrieval Consume
Operator
@ accuracy
K Consumers
Technique 1: Profiling
<motion,0.95>
<motion, 0.7>
…
<OCR, 0.95>
<OCR, 0.90>
…
<NN, 0.95>
1
73
M Storage
Formats
N Consumption
Formats
Ingestion Storage Retrieval Consume
Operator
@ accuracy
K Consumers
Technique 2: Coalescing
<motion,0.95>
<motion, 0.7>
…
<OCR, 0.95>
<OCR, 0.90>
…
<NN, 0.95>
12
74
M Storage
Formats
N Consumption
Formats
Ingestion Storage Retrieval Consume
Operator
@ accuracy
K Consumers
Technique 3: Eroding
<motion,0.95>
<motion, 0.7>
…
<OCR, 0.95>
<OCR, 0.90>
…
<NN, 0.95>
12
3
MaxMin
Video Age
Storage
(All deleted)
A sample configuration by VStore
75
Storage formats
Hundreds of knobs. Only possible through auto config!
Consumption formats
Ingestion Storage Retrieval Consume
Operator
@ accuracy
Target
Accuracy
1x
10x
100x
1000x
1
0.95
0.9
0.8
Queryspeed(xrealtime)
OursUnified format
76
Query speedup
Test platform. CPU: 56-core Xeon E7-4830v4. DRAM: 260 GB.
HDD: 4×1TB 10K RPM SAS 12Gbps in RAID 5. GPU: NVIDIA Quadro P6000
Query: car detector
Why?
• Lower accuracy → query tolerates cheaper video formats
• VStore ensures video decoding is proportionally cheaper!
jackson
1x
10x
100x
1000x
1
0.95
0.9
0.8
miami
1
0.95
0.9
0.8
tucson
1
0.95
0.9
0.8
dashcam
1
0.95
0.9
0.8
park
1
0.95
0.9
0.8
airport
1
0.95
0.9
0.8
OursUnified format
Queryspeed(xrealtime)
16x
77
Query speedup
Query 1-hour video in 10 secs!
Test platform. CPU: 56-core Xeon E7-4830v4. DRAM: 260 GB.
HDD: 4×1TB 10K RPM SAS 12Gbps in RAID 5. GPU: NVIDIA Quadro P6000
Query: car detector
Why?
• Lower accuracy → query tolerates cheaper video formats
• VStore ensures video decoding is proportionally cheaper!
Ongoing: Video icebergs on cameras
Edge
Icebergs
Edge
Little Icebergs
Why would this happen?
• Wireless cameras easy to deploy
• Wireless bandwidth is precious
• Public WiFi: typically < 1MB/sec
• Complaints on cams slowing down WiFi
• Streaming videos → NOT scalable
• On-camera storage is cheap
79
https://community.netgear.com/t5/Nighthawk-WiFi-Routers/Wireless-cameras-slowing-router-too-much/td-p/513047
https://www.securitycameraking.com/securityinfo/forum/networking/ip-cameras-are-slowing-down-your-network/
https://www.security-camera-warehouse.com/ip-camera/wifi-enabled/
Edge
Cloud
Edge
Video icebergs on cameras
80
Cameras capture videos & keep silence
Only respond to queries
Cameras
Feasible?
• Users are waiting
• On-camera video is large (tens of GB)
• Wireless bandwidth is scarce (1MB/sec)
• $20 cameras are wimpy (one frame in 30 secs)
81
Feasible?
• Users are waiting
• On-camera video is large (tens of GB)
• Wireless bandwidth is scarce (1MB/sec)
• $20 cameras are wimpy (one frame in 30 secs)
82
Yes
• Ingestion: cameras learn videos slowly but surely
• Query: continuously refining results
• Edge bootstraps specialized NNs for cameras to run
• 1000x cheaper than full NN.
• Process 1-hour video in secs (working with edge)
Lesson: conquering video icebergs
• Lazy ingestion: pay as little as possible
• Eager query-time optimizations
• Take specialization opportunities
• Users know their queries better
• Resonate with compiler/PL wisdom!
• Just-in-time compilation & lazy evaluation
83
Supercharge IoT analytics
Two important scenarios
Large stream analytics on the edge
Large video analytics on edge & cameras
OS plays key roles
Map AI to new hardware
Dynamically configure AI
Trade off among competing objectives
84

More Related Content

Recently uploaded

Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructureitnewsafrica
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Mark Simos
 
WomenInAutomation2024: AI and Automation for eveyone
WomenInAutomation2024: AI and Automation for eveyoneWomenInAutomation2024: AI and Automation for eveyone
WomenInAutomation2024: AI and Automation for eveyoneUiPathCommunity
 
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...Karmanjay Verma
 
Kuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorialKuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorialJoão Esperancinha
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...amber724300
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Karmanjay Verma
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesBernd Ruecker
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
Infrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsInfrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsYoss Cohen
 
Digital Tools & AI in Career Development
Digital Tools & AI in Career DevelopmentDigital Tools & AI in Career Development
Digital Tools & AI in Career DevelopmentMahmoud Rabie
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...Nikki Chapple
 
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Jeffrey Haguewood
 
All These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFAll These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFMichael Gough
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 

Recently uploaded (20)

Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
 
WomenInAutomation2024: AI and Automation for eveyone
WomenInAutomation2024: AI and Automation for eveyoneWomenInAutomation2024: AI and Automation for eveyone
WomenInAutomation2024: AI and Automation for eveyone
 
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
 
Kuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorialKuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorial
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architectures
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
Infrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsInfrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platforms
 
Digital Tools & AI in Career Development
Digital Tools & AI in Career DevelopmentDigital Tools & AI in Career Development
Digital Tools & AI in Career Development
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
 
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
 
All These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFAll These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDF
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 

Supercharge large IoT analytics

  • 1. Supercharge large IoT analytics An OS approach Felix Xiaozhu Lin 1
  • 2. This talk has three parts that can be sampled independently “Kernel is firmware” If you care about: kernel, security, file system, drivers, TrustZone, heterogeneous SoCs, binary translation… Goto slide 6 Large stream analytics on the edge If you care about: stream processing, 3D-stacked memory, parallelism, memory mgmt, in-memory computing… Goto slide 26 Large video analytics on edge & cameras If you care about: video intelligence, deep learning, storage, IoT, edge computing… Goto slide 51 2
  • 3. Bio • 2014 – now. Asst. prof, Purdue ECE • Xroads Systems Exploration Lab • 2014 PhD in CS. Rice • Thesis: OS for mobile computing • 2008 MS + BS. Tsinghua 3
  • 4. What I do • Layer? • Operating system (in a broad sense) • Scenarios? • Edge & IoT (mostly) • Objectives? • Speed, efficiency, & security 4 My premises for OS research …
  • 5. The remaining OS is defined by scenarios Kernel is firmware Entrees: 45 mins Appetizers: 5 mins
  • 6. 6
  • 7. Kernel is firmware • Entangled subsystems • Difficult to re-architecture or extract • Has own evolution plan • Likely reject new ideologies • Little respect for stable internal interfaces • New additions quickly become obsoleted • Open source (a white box) • Can we retrofit the kernel as firmware? 7
  • 8. 8 Case 1: Trustworthy file systems for smart devices Retrofitting the kernel (1): reuse in vivo
  • 9. 9 Hw Case 1: Trustworthy file systems for smart devices Kernel Apps Retrofitting the kernel (1): reuse in vivo
  • 10. 10 App Normal world Secure world Isolating secure apps in TrustZone Retrofitting the kernel (1): reuse in vivo Kernel
  • 11. 11 Normal world Secure world Keep secure data persistence? Retrofitting the kernel (1): reuse in vivo Kernel FS App
  • 12. 12 VFS FS Block layer Reuse kernel file system in vivo Retrofitting the kernel (1): reuse in vivo Normal world Secure world App
  • 13. 13 VFS FS Block layer • Cloud: a safer execution environment • A pair of twin file systems • File data never leaves the device’s secure world Metadata-only FS Replica Untrusted Trusted Cloud continuously verifies fs behaviors Retrofitting the kernel (1): reuse in vivo App [arXiv:1902.06327] "Let the Cloud Watch Over Your IoT File Systems," Liwei Guo, Yiying Zhang, and Felix Xiaozhu Lin, 2019.
  • 14. 14 Case 2: Unmodified drivers for TrustZone HW App Normal world Secure world Retrofitting the kernel (2): code transformation
  • 15. 15 Device code Driver libs Kernel libs Core services SPI CSI WiFiEth USB Kernel source tree Othercode Transplant Linux drivers? CAM Retrofitting the kernel (2): code transformation App CAM Secure world
  • 16. 16 Device code Driver libs Kernel libs Core services SPI CSI WiFiEth USB CAM Driver kernel Othercode Statically miniaturize the whole kernel Retrofitting the kernel (2): code transformation CAM Kernel source tree
  • 17. 17 Statically miniaturize the whole kernel Retrofitting the kernel (2): code transformation A kernel for all → A kernel specialized for the driver only App Driver kernel Normal world Secure world
  • 18. 18 Case 3: Kernel IO paths on co-processors Retrofit the kernel (3): binary translation
  • 19. 19 Retrofit the kernel (3): binary translation Weak co-processors
  • 20. 20 CPU Co Proc 2.5GHz 50MHz DRAM IO A heterogeneous SoC Retrofit the kernel (3): binary translation
  • 21. 21 CPU Co Proc 2.5GHz 50MHz DRAM IO Weak co-processors: suits low-power IO tasks! Retrofit the kernel (3): binary translation high efficiency Linux Kernel IO tasks
  • 22. 22 CPU Co Proc 2.5GHz 50MHz DRAM IO Kernel execution on weak co-processors? Retrofit the kernel (3): binary translation Linux Kernel IO tasks Diff ISA No MMU No POSIX …
  • 23. 23 CPU Co Proc DRAM IO Co-processor translates unmodified kernel binary Retrofit the kernel (3): binary translation Dynamic Binary Translation Linux Kernel IO tasks [arXiv:1811.05000] "Transkernel: An Executor for Commodity Kernels on Peripheral Cores," Liwei Guo, Shuang Zhai, Yi Qiao, and Felix Xiaozhu Lin
  • 24. Retrofit kernel as firmware 1. Reuse in vivo Unmodified file systems for TrustZone 2. Source transformation Unmodified device drivers for TrustZone 3. Binary translation Unmodified IO paths for co-processors
  • 25. algorithms + resources + objectives OS defined by scenarios
  • 26. OSes defined by two IoT scenarios Hot springs Edge Icebergs
  • 27. High-throughput. Sub-second delay. Timely processing before data gets cold! 27 “Hot springs”: telemetry events Power sensor 140M events/day Oil rig 1-2TB/day Manufacturing machines PBs/day
  • 28. Ingestion Groupby SensorID Average per sensor 28 Finding high-power sensors Window TopK Edge: cleanse & summarize data 10:00-10:05 10:05-10:10 130 500 302 100 150 500 302 Time 10:01 ID: 0x1024 Value: 200
  • 29. Stream analytics: state of the art • Classic engines? • StreamBase, Aurora, TelegraphCQ, NiagaraST… • Single threaded. Not scaling well. • Modern engines for datacenters? • Apache Flink, Spark Streaming, Beam… • Designed for tens - hundreds of machines. Scaling out. • Assuming okay if individual nodes perform poorly • As analytics moves to the edge → bad 29
  • 30. Project StreamBox stream analytics at the memory speed 30 • RDMA / 10GbE • Co-designed with mm/scheduling Stream pipeline Threads Ingestion Scheduler Mem • Squeeze parallelism for multi/manycore • Manage NUMA domains Exploit high-bandwidth memory [ASPLOS'19] "StreamBox-HBM: Stream Analytics on High Bandwidth Hybrid Memory," Hongyu Miao, Myeongjae Jeon, Gennady Pekhimenko, Kathryn S. McKinley, and Felix Xiaozhu Lin [USENIX ATC'17] "StreamBox: Modern Stream Processing on a Multicore Machine," Hongyu Miao, Heejin Park, Myeongjae Jeon, Gennady Pekhimenko, Kathryn S. McKinley, and Felix Xiaozhu Lin, in Proc. USENIX Annual Technical Conference, 2017. [ASPLOS'16] "memif: Towards Programming Heterogeneous Memory Asynchronously," Felix Xiaozhu Lin and Xu Liu, in Proc. ACM Int. Conf. Architectural Support for Programming Languages and Operating Systems, 2016.
  • 31. Cores High-bandwidth hybrid memory 31 3D DRAM Normal DRAM Tradeoffs: capacity vs. bandwidth Untraditional memory hierarchy No latency benefit (Unlike SRAM+DRAM) 16 GB 375 GB/s ~100 GB 80 GB/s
  • 32. 32 Already on off-the-shelf machinesIntel Xeon Phi Knights Landing (KNL)
  • 33. Cool. But benefit is not free • Two alternative configurations: • Hw-managed: HBM as a cache • Sw-managed: one flat address space • Throw existing analytics engines on HBM? • Almost no benefit (or even hurts) 33
  • 34. Existing engines: 3 inadequacies Algorithm • HBM sequential access + high parallelism • Existing engines: grouping is hash w/ random access Capacity • HBM: capacity limited • Streaming: high data volume + high velocity Dynamism • Streaming: fluctuating workloads • How to map to two memory types? 34 Ingress Group by key Average per key Window TopK
  • 35. Algorithm: HBM can accelerate grouping! • Hash vs Sort: duals for Grouping • Algorithmic complexity: Sort is worse than Hash • Hash for in-core; sort for out-of-core [VLDB’09, VLDB’13, SIGMOD’15] • Yet, Sort outperforms Hash with … • High data parallelism (bitonic sort + avx512) • High task parallelism (parallel merge sort) • High mem bw (stacked DRAM) 35 [vldb’09] C Kim et al., Sort vs. hash revisited: Fast join implementation on modern multi-core cpus [vldb’13] C Balkesen et al., Multi-core, main-memory joins: Sort vs. hash revisited [sigmod’15] O Polychroniou et al., Rethinking simd vectorization for in-memory databases
  • 36. Grouping -- Sort vs Hash? 36 0 20 40 60 80 100 120 140 160 180 0 20 40 60 millionpairs/sec # cores 0 50 100 150 200 250 300 0 20 40 60 GB/sec # cores Throughput Mem bandwidth
  • 37. Grouping -- Sort vs Hash? 37 0 20 40 60 80 100 120 140 160 180 0 20 40 60 millionpairs/sec # cores 0 50 100 150 200 250 300 0 20 40 60 GB/sec # cores Hash DRAM Hash DRAM Throughput Mem bandwidth
  • 38. 38 0 20 40 60 80 100 120 140 160 180 0 20 40 60 millionpairs/sec # cores 0 50 100 150 200 250 300 0 20 40 60 GB/sec # cores Hash DRAM Hash DRAM Sort DRAM Sort DRAM Throughput Mem bandwidth Grouping -- Sort vs Hash?
  • 39. Grouping - Sort vs Hash choice reversed! 39 0 20 40 60 80 100 120 140 160 180 0 20 40 60 millionpairs/sec # cores 0 50 100 150 200 250 300 0 20 40 60 GB/sec # cores Throughput Mem bandwidth Hash DRAM Hash DRAM Sort DRAM Sort HBM Sort HBM Sort DRAM
  • 40. 40 HBM Cores Normal DRAM Streaming data Data Bundles Index {<key, pointer>} Capacity: Use HBM only for grouping indexes
  • 41. 41 HBM Cores Normal DRAM Dynamism: the art of pressure balance DRAM Bandwidth HBM Capacity
  • 42. StreamBox vs. Apache Flink: 5-10x faster 42 0 10 20 30 40 50 60 2 10 18 26 34 42 50 58 ThroughputMRec/s # Cores Flink @ x56 Flink @ KNL StreamBox @ KNL RDMA ingestion limit KNL: Intel Xeon Phi Knights Landing w/ HBM. 64 cores@1.3GHz. $5,000 x56: Intel Xeon E7-4830v4. 4x14 cores @2.0GHz. 256GB. $23,000 Benchmark: Yahoo Stream Benchmark Output delay: 1 second
  • 43. StreamBox vs. Apache Flink: 5-10x faster 43 0 10 20 30 40 50 60 2 10 18 26 34 42 50 58 ThroughputMRec/s # Cores Flink @ x56 Flink @ KNL StreamBox @ KNL RDMA ingestion limit KNL: Intel Xeon Phi Knights Landing w/ HBM. 64 cores@1.3GHz. $5,000 x56: Intel Xeon E7-4830v4. 4x14 cores @2.0GHz. $23,000 Benchmark: Yahoo Stream Benchmark. Output delay: 1 second ~5GB/sec! 5-10x
  • 44. StreamBox scales well 44 0 5 10 15 20 25 30 35 0 10 20 30 40 50 60 # cores ThroughputMrec/s StreamBox HW: Intel Xeon Phi Knights Landing w/ HBM. 64 cores@1.3GHz. $5,000 Benchmark: TopK per key Output delay: 1 second
  • 45. HBM matters 45 0 5 10 15 20 25 30 35 0 10 20 30 40 50 60 # cores ThroughputMrec/s Not using HBM StreamBox DRAM only HW: Intel Xeon Phi Knights Landing w/ HBM. 64 cores@1.3GHz. $5,000 Benchmark: TopK per key Output delay: 1 second
  • 46. Runtime memory management matters 46 0 5 10 15 20 25 30 35 0 10 20 30 40 50 60 # cores ThroughputMrec/s 3D mem as cache DRAM only HW-managed HBM StreamBox HW: Intel Xeon Phi Knights Landing w/ HBM. 64 cores@1.3GHz. $5,000 Benchmark: TopK per key Output delay: 1 second
  • 47. 47 0 5 10 15 20 25 30 35 0 10 20 30 40 50 60 # cores ThroughputMrec/s No in-mem indexes 3D mem as cache DRAM only 3D mem as cache; full records StreamBox In-HBM index matters HW: Intel Xeon Phi Knights Landing w/ HBM. 64 cores@1.3GHz. $5,000 Benchmark: TopK per key Output delay: 1 second
  • 48. StreamBox lessons • An analytics engine built from ground up • 2.5 years. ~60,000 lines of C++11. http://xsel.rocks/p/streambox • Hardware often badly underutilized, even with production software • Performance requires careful optimization everywhere 48
  • 49. 49 Cheap VM (huge page) Apps OS kernel Fast net stack (40 GbE or RDMA) High task parallelism Custom mem allocator Sequential mem access Runtime Thread pool + custom task scheduler Wide SIMD (avx512) Hybrid memory The software engineer’s guide to 3D DRAM Make sure to pack all the following
  • 50. OSes defined by two IoT scenarios Hot springs Edge Icebergs [EuroSys'19] "VStore: Reinventing Data Stores for Video Analytics," Tiantu Xu, Luis Materon Botelho, and Felix Xiaozhu Lin, to appear at Proc. Eurosys Conference, 2019.
  • 51. Cheap cameras. Large videos. 51 130M surveillance cameras shipped per year Many institutions run > 200 cameras 24x7 A single camera produces 24 GB video per day Must be consumed by algorithms! $25 on Amazon
  • 52. Video analytics is expensive 52 NVIDIA Quadro P6000 NN object detection: 5 FPS $4,500 Object detection: deep neural network model YOLOv3 IoT Camera 30 FPS $25
  • 53. Storage is cheap 53 Seagate Surveillance HDD 8TB One-year of video $250 IoT Camera 24 GB/day $25
  • 54. Cameras EdgeEdge Video analytics on the edge: ingestion 54 Cloud
  • 55. EdgeEdge Video analytics on the edge: query 55 Cameras
  • 56. A retrospective query “Find all white buses appeared yesterday” As a cascade of operators • selects operators from a lib • specifies target accuracies for operators 56Image credits: NoScope: Optimizing Neural Network Queries over Video at Scale, Daniel Kang, John Emmons, Firas Abuzaid, Peter Bailis, Matei Zaharia, VLDB 2017 Frame diff detector Shallow neural net Deep neural net ~10,000x ~1,000x ~10x
  • 57. Project VStore The video data store for AI 57 Ingestion Storage Retrieval Consume Video Data Operator @ accuracy Query “Aren’t there many video databases already?” For human consumers. Not for AI consumers [EuroSys'19] "VStore: Reinventing Data Stores for Video Analytics," Tiantu Xu, Luis Materon Botelho, and Felix Xiaozhu Lin, Eurosys Conference, 2019.
  • 58. The first-class concern: controlling video formats 58 Ingestion Storage Retrieval Consume Video Data Operator @ accuracy Query
  • 59. Extensive video format knobs 59 Ingestion Storage Retrieval Consume Video Data Operator @ accuracy Query Quality Crop Res Sample Speed KeyFrame Interval Fidelity Coding
  • 60. Extensive video format knobs 60 Ingestion Storage Retrieval Consume Video Data Operator @ accuracy Query Quality Crop Res Sample Fidelity
  • 61. Knob Impacts: High & Complex 61 Ingestion Storage Retrieval Consume Video Data Operator @ accuracy Query Quality Crop Res Sample Ingestion Storage Retrieval Consumption Fidelity
  • 62. Knob Impacts: High & Complex 62 Ingestion Storage Consume Video Data Operator @ accuracy Query Quality Crop Res Sample Bad 100% 100p 2/3 Ingestion Storage Retrieval Consumption Retrieval
  • 63. Knob Impacts: High & Complex 63 Ingestion Storage Retrieval Consume Video Data Operator @ accuracy Query Best 100% 100p 1/30 Ingestion Storage Retrieval Consumption Quality Crop Res Sample
  • 64. Knob Impacts: High & Complex 64 Ingestion Storage Retrieval Consume Video Data Operator @ accuracy Query Quality Crop Res Sample Good 75% 100p 1/2 Ingestion Storage Retrieval Consumption
  • 65. Configuration Space 65 Ingestion Storage Retrieval Consume <motion,0.95> M Storage Formats N Consumption Formats <motion, 0.7> … <OCR, 0.95> <OCR, 0.90> … <NN, 0.95> Operator @ accuracy K Consumers
  • 66. Configuration Space 66 <motion,0.95> M Storage Formats N Consumption Formats <motion, 0.7> … <OCR, 0.95> <OCR, 0.90> … <NN, 0.95> K Consumers Ingestion Storage Retrieval Consume Operator @ accuracy
  • 67. Configuration Constraints 67 <motion,0.95> <motion, 0.7> … <OCR, 0.95> <OCR, 0.90> … <NN, 0.95> M Storage Formats N Consumption Formats K Consumers Ingestion Storage Retrieval Consume Operator @ accuracy Richer!
  • 68. Configuration Constraints 68 <motion,0.95> <motion, 0.7> … <OCR, 0.95> <OCR, 0.90> … <NN, 0.95> M Storage Formats N Consumption Formats Ingestion Storage Retrieval Consume Operator @ accuracy K Consumers Faster!
  • 69. Objectives for Configuration 69 <motion,0.95> M Storage Formats N Consumption Formats <motion, 0.7> … <OCR, 0.95> <OCR, 0.90> … <NN, 0.95> K Consumers Ingestion Storage Retrieval Consume Operator @ accuracy Retrieval never be bottleneck Satisfy accuracy Respect resource budgets
  • 70. Challenges 70 <motion,0.95> M Storage Formats N Consumption Formats <motion, 0.7> … <OCR, 0.95> <OCR, 0.90> … <NN, 0.95> K Consumers Ingestion Storage Retrieval Consume Operator @ accuracy Retrieval never be bottleneck Satisfy accuracy Respect resource budgets Many possible formats (~15k combos of knobs) Many possible configurations (~4M for 4 operators)
  • 71. 71 M Storage Formats N Consumption Formats Ingestion Storage Retrieval Consume Operator @ accuracy K Consumers Key Idea: Deriving Configuration Backwards Backward derivation of formats <motion,0.95> <motion, 0.7> … <OCR, 0.95> <OCR, 0.90> … <NN, 0.95>
  • 72. 72 M Storage Formats N Consumption Formats Ingestion Storage Retrieval Consume Operator @ accuracy K Consumers Technique 1: Profiling <motion,0.95> <motion, 0.7> … <OCR, 0.95> <OCR, 0.90> … <NN, 0.95> 1
  • 73. 73 M Storage Formats N Consumption Formats Ingestion Storage Retrieval Consume Operator @ accuracy K Consumers Technique 2: Coalescing <motion,0.95> <motion, 0.7> … <OCR, 0.95> <OCR, 0.90> … <NN, 0.95> 12
  • 74. 74 M Storage Formats N Consumption Formats Ingestion Storage Retrieval Consume Operator @ accuracy K Consumers Technique 3: Eroding <motion,0.95> <motion, 0.7> … <OCR, 0.95> <OCR, 0.90> … <NN, 0.95> 12 3 MaxMin Video Age Storage (All deleted)
  • 75. A sample configuration by VStore 75 Storage formats Hundreds of knobs. Only possible through auto config! Consumption formats Ingestion Storage Retrieval Consume Operator @ accuracy
  • 76. Target Accuracy 1x 10x 100x 1000x 1 0.95 0.9 0.8 Queryspeed(xrealtime) OursUnified format 76 Query speedup Test platform. CPU: 56-core Xeon E7-4830v4. DRAM: 260 GB. HDD: 4×1TB 10K RPM SAS 12Gbps in RAID 5. GPU: NVIDIA Quadro P6000 Query: car detector Why? • Lower accuracy → query tolerates cheaper video formats • VStore ensures video decoding is proportionally cheaper!
  • 77. jackson 1x 10x 100x 1000x 1 0.95 0.9 0.8 miami 1 0.95 0.9 0.8 tucson 1 0.95 0.9 0.8 dashcam 1 0.95 0.9 0.8 park 1 0.95 0.9 0.8 airport 1 0.95 0.9 0.8 OursUnified format Queryspeed(xrealtime) 16x 77 Query speedup Query 1-hour video in 10 secs! Test platform. CPU: 56-core Xeon E7-4830v4. DRAM: 260 GB. HDD: 4×1TB 10K RPM SAS 12Gbps in RAID 5. GPU: NVIDIA Quadro P6000 Query: car detector Why? • Lower accuracy → query tolerates cheaper video formats • VStore ensures video decoding is proportionally cheaper!
  • 78. Ongoing: Video icebergs on cameras Edge Icebergs Edge Little Icebergs
  • 79. Why would this happen? • Wireless cameras easy to deploy • Wireless bandwidth is precious • Public WiFi: typically < 1MB/sec • Complaints on cams slowing down WiFi • Streaming videos → NOT scalable • On-camera storage is cheap 79 https://community.netgear.com/t5/Nighthawk-WiFi-Routers/Wireless-cameras-slowing-router-too-much/td-p/513047 https://www.securitycameraking.com/securityinfo/forum/networking/ip-cameras-are-slowing-down-your-network/ https://www.security-camera-warehouse.com/ip-camera/wifi-enabled/
  • 80. Edge Cloud Edge Video icebergs on cameras 80 Cameras capture videos & keep silence Only respond to queries Cameras
  • 81. Feasible? • Users are waiting • On-camera video is large (tens of GB) • Wireless bandwidth is scarce (1MB/sec) • $20 cameras are wimpy (one frame in 30 secs) 81
  • 82. Feasible? • Users are waiting • On-camera video is large (tens of GB) • Wireless bandwidth is scarce (1MB/sec) • $20 cameras are wimpy (one frame in 30 secs) 82 Yes • Ingestion: cameras learn videos slowly but surely • Query: continuously refining results • Edge bootstraps specialized NNs for cameras to run • 1000x cheaper than full NN. • Process 1-hour video in secs (working with edge)
  • 83. Lesson: conquering video icebergs • Lazy ingestion: pay as little as possible • Eager query-time optimizations • Take specialization opportunities • Users know their queries better • Resonate with compiler/PL wisdom! • Just-in-time compilation & lazy evaluation 83
  • 84. Supercharge IoT analytics Two important scenarios Large stream analytics on the edge Large video analytics on edge & cameras OS plays key roles Map AI to new hardware Dynamically configure AI Trade off among competing objectives 84