RISELab: Enabling Intelligent Real-Time Decisions keynote by Ion Stoica

Spark Summit
Spark SummitSpark Summit
RISELab:
Enabling Intelligent Real-time
Decisions
Ion Stoica
February 8, 2017
Berkeley’s AMPLab (2011-2016)
2
Algorithms
Machines People
Goal: Nextgeneration of open source
data analytics stack for industry & academia
BerkeleyDataAnalytics Stack (BDAS)
Berkeley’s AMPLab (2011-2016)
3
Algorithms
Machines People
Goal: Nextgeneration of open source
data analytics stack for industry & academia
BerkeleyDataAnalytics Stack (BDAS)
…
RISE: Real-time Intelligent
Secure Execution
From batch data to advanced analytics
AMPLab
5
From live data to real-time decisions
RISELab
RISE Lab (2017-2022)
12 faculty across AI, systems, security, and architectures
11 Founding sponsors
6
Why?
Data only as valuable as the decisions it enables
7
Why?
What does this mean?
• Faster decisions better than slower decisions
• Decisions on fresh data better than decisions on stale data
• Decisions on personalized data better than on aggregate data
8
Data only as valuable as the decisions it enables
Goal
Real-time decisions
on live data
with strong security
9
decide in ms
the current stateof theenvironment
privacy, confidentiality, integrity
Typical decision system
Decision System
Query
Decision
Environment
+
sensors &
actuators
Observations, Feedback
Preprocess Intermediate
data
Decision
Engine
Decision
Query
Decision
Engine
Preprocess Intermediate
data
Environment
+
sensors &
actuators
Typical decision system
Decision System
Observations, Feedback
Live
Update latency
(e.g., ~1 seconds)
Decision
Query
Decision
Engine
Preprocess Intermediate
data
Environment
+
sensors &
actuators
Typical decision system
Decision System
Observations, Feedback
Live
Update latency
(e.g., ~1 seconds)
Secure
Real-time
decision latency
(e.g., ~10 ms)
Example of decision systems
Decision System
Query
Decision
Training
Models
(diff.tradeoffs
complexity/
accuracy)
Model
Serving
Feedback
Observations, Feedback
ML Pipeline
(e.g., Clipper +
Spark/Tensorflow)
Decision System
Obs.
Action
Update
Policy
Policy
obs à
action
Query
Policy
Observations, Rewards
Reinforcement
Learning Systems
(e.g., Ray)
Pre-
process
Interm-
ediate
Data
Decision
Engine
What else do we want from decisions?
Intelligent:complexdecisions in uncertain environments
Robust: handle complexnoise, unforeseeninputs, failures
Explainable:ability to explainnon-obvious decisions
Goal
Develop open source
platforms, tools, and algorithms for
intelligent real-time decisions on live-data
Some Proposed Research
Secure Real-time Decisions Stack (SRDS)
• Open source platform to develop of RISE apps
• Secure from ground up
• Reinforcement Learning (RL) as one of key app patterns
Learning control hierarchies: speeduplearning, training
Shared learning: learn over confidential data
16
Secure Real-time Decisions Stack (SRDS)
• Open source platform to develop of RISE apps
• Secure from ground up
• Reinforcement Learning (RL) as one of key app patterns
Learning control hierarchies: speeduplearning, training
Shared learning: learn over confidential data
17
Some Proposed Research
Secure Real-time Decision Stack (SRDS)
scheduler object store
RISE μkernel
Ray Clipper …
Ground(data contextservice)
Time
Machine
scheduler object store
RISE μkernel
Ray Clipper …
Ground(data contextservice)
Time
Machine
Minimalist executionengine:
• Support both data flow and task-parallel execution models
• High-throughput, low-latency: ~ 1M tasks/sec @ ms latency
Secure Real-time Decision Stack (SRDS)
scheduler object store
RISE μkernel
Ray Clipper …
Ground(data contextservice)
Time
Machine
Central repositoryfor models,APIs to capture the context
in whichdata gets used and produced
Status:ongoing project with industry partners
Secure Real-time Decision Stack (SRDS)
scheduler object store
RISE μkernel
Ray Clipper …
Ground(data contextservice)
Time
Machine
Replaying of apps at fine granularity
• Simplify development, debugging
• Robustness: replay against perturbed inputs
• Explainability: identify inputs causing decision
• Security: confirm vulnerabilities, test security
patches, compliance auditing
Secure Real-time Decision Stack (SRDS)
scheduler object store
RISE μkernel
Ray Clipper …
Ground(data contextservice)
Time
Machine
Dramatically simplify development of RISE applications
• Apache Spark: improve latency and security
• Clipper: model serving for Apache Spark, Scikit learn, etc
• Ray: framework for RL applications
Secure Real-time Decision Stack (SRDS)
ImprovingApache Spark
Drizzle
• Decrease latency of Structured Streaming and ML algorithms by ~10x
• Techniques: group scheduling, shared variables
23
StreamingLatency:YCSB benchmark
24
0
500
1000
1500
2000
1 2 4 8 12 16 20 24
MedianEventLatency
(ms)
Throughput (Million events/s)
Spark Drizzle Flink Drizzle-Opt
Drizzle-Opt: Reduce-by on mapper side
StreamingLatency:YCSB benchmark
25
0
500
1000
1500
2000
1 2 4 8 12 16 20 24
MedianEventLatency
(ms)
Throughput (Million events/s)
Spark Drizzle Flink Drizzle-Opt
Drizzle-Opt: Reduce-by on mapper side
StreamingLatency:YCSB benchmark
26
0
500
1000
1500
2000
1 2 4 8 12 16 20 24
MedianEventLatency
(ms)
Throughput (Million events/s)
Spark Drizzle Flink Drizzle-Opt
Drizzle-Opt: Reduce-by on mapper side
StreamingLatency:YCSB benchmark
27
0
500
1000
1500
2000
1 2 4 8 12 16 20 24
MedianEventLatency
(ms)
Throughput (Million events/s)
Spark Drizzle Flink Drizzle-Opt
Drizzle-Opt: Reduce-by on mapper side
StreamingLatency:YCSB benchmark
28
0
500
1000
1500
2000
1 2 4 8 12 16 20 24
MedianEventLatency
(ms)
Throughput (Million events/s)
Spark Drizzle Flink Drizzle-Opt
Drizzle-Opt: Reduce-by on mapper side
15x
MLlib: SGD Performance
29
0
20
40
60
4 8 16 32 64 128
Time/iter(ms)
Machines
Spark Drizzle
MLlib: SGD Performance
30
0
20
40
60
4 8 16 32 64 128
Time/iter(ms)
Machines
Spark Drizzle
6x
ImprovingApache Spark
Drizzle
• Decrease latency of Structured Streaming and ML algorithms by ~10x
• Techniques: group scheduling, shared variables
• Some of these techniques will make their way to Apache Spark
Opaque
• Full data encryption, authentication, and verification (Intel’s SGX)
• Oblivious mode: hide data access pattern
• Support most SparkSQL functionality
• See Wenting’s talk later
31
RISELab
Already promising results
Expect muchmore over the next five years!
32
Goal: Develop open source platforms,tools, and
algorithms for intelligent real-timedecisions on live-
data
Thank you
1 of 33

More Related Content

Viewers also liked(20)

Similar to RISELab: Enabling Intelligent Real-Time Decisions keynote by Ion Stoica(20)

optimizing_ceph_flashoptimizing_ceph_flash
optimizing_ceph_flash
Vijayendra Shamanna1K views
Data Analytics at Altocloud Data Analytics at Altocloud
Data Analytics at Altocloud
Altocloud1K views
Big Data Session 1.pptxBig Data Session 1.pptx
Big Data Session 1.pptx
ElsonPaul23 views
Shikha fdp 62_14july2017Shikha fdp 62_14july2017
Shikha fdp 62_14july2017
Dr. Shikha Mehta214 views
R at MicrosoftR at Microsoft
R at Microsoft
Revolution Analytics52.3K views

More from Spark Summit(20)

Recently uploaded(20)

MOSORE_BRESCIAMOSORE_BRESCIA
MOSORE_BRESCIA
Federico Karagulian5 views
Introduction to Microsoft Fabric.pdfIntroduction to Microsoft Fabric.pdf
Introduction to Microsoft Fabric.pdf
ishaniuudeshika19 views
Survey on Factuality in LLM's.pptxSurvey on Factuality in LLM's.pptx
Survey on Factuality in LLM's.pptx
NeethaSherra15 views
PTicketInput.pdfPTicketInput.pdf
PTicketInput.pdf
stuartmcphersonflipm286 views
3196 The Case of The East River3196 The Case of The East River
3196 The Case of The East River
ErickANDRADE909 views
Data structure and algorithm. Data structure and algorithm.
Data structure and algorithm.
Abdul salam 12 views
RIO GRANDE SUPPLY COMPANY INC, JAYSON.docxRIO GRANDE SUPPLY COMPANY INC, JAYSON.docx
RIO GRANDE SUPPLY COMPANY INC, JAYSON.docx
JaysonGarabilesEspej6 views
How Leaders See Data? (Level 1)How Leaders See Data? (Level 1)
How Leaders See Data? (Level 1)
Narendra Narendra10 views
Microsoft Fabric.pptxMicrosoft Fabric.pptx
Microsoft Fabric.pptx
Shruti Chaurasia17 views
RuleBookForTheFairDataEconomy.pptxRuleBookForTheFairDataEconomy.pptx
RuleBookForTheFairDataEconomy.pptx
noraelstela164 views
ColonyOSColonyOS
ColonyOS
JohanKristiansson69 views

RISELab: Enabling Intelligent Real-Time Decisions keynote by Ion Stoica