SlideShare a Scribd company logo
Systems
• Real-time
raise alerts
• Real-time with historical
• Correlate
• Offline
• Develop initial monitoring query
• Back-test
• Progressive
Non-temporal analysis
Engine
+ Fabric
Interactive Query Authoring
Real-Time
Dashboard
• Performance
• Fabric & language integration
• Query model
Scenarios
• monitor
telemetry &
raise alerts
• correlate real-
time with logs
• develop initial
monitoring
query
• back-test over
historical logs
• offline analysis
(BI) with early
results
• Performance
• Fabric & language integration
• Query model
Q
1
2
3
2
1
5min Window
snapshots
logical time
Input
T-1
T-2
T-3
Output
Q = COUNT(*)
3
Relational
Model
Tempo-Relational
Model
QQQ Q Q𝜹𝜹𝜹 𝜹 𝜹
Supports broad & rich analytics
scenarios (relational, progressive,
time-based)
• Key enabler: performance +
fabric & language integration +
query model
struct ClickEvent { long ClickTime; long User; long AdId; }
var str = Network.ToStream(e => e.ClickTime, Latency(10secs));
var query =
str.Where(e => e.User % 100 < 5)
.Select(e => { e.AdId })
.GroupApply( e => e.AdId,
s => s.Window(5min).Aggregate(w => w.Count()));
query.Subscribe(e => Console.Write(e)); // write results to console
stream of batches
• More load  larger batches  better throughput
…
𝑜𝑝2
…
…
𝑜𝑝1
class DataBatch {
long[] SyncTime;
...
Bitvector BV;
}
class UserData_Gen : DataBatch {
long[] c_ClickTime;
long[] c_User;
long[] c_AdId;
}
…
𝑜𝑝2
…
…
𝑜𝑝1
timestamp payload columns
bitvector
str.Where(e => e.User % 100<5);
Send(events)
...
Application
Receive(results)
On(Batch b) {
for i = 0 to b.Size {
if !(b.c_User[i]%100 < 5)
set b.bitvector[i]
}
next-operator.On(b)
}
Trill
session windows,
http://aka.ms/trill
• Lots of “signals” in stream data
• IoT workflows combine relational & signal logic
M
Group-by ID
U
Union
ID Time Value
0 0:42:19 67
1 0:42:22 80
2 0:42:22 85
0 0:42:23 69
2 0:42:24 85
Remove noise
Interpolate missing data
Find periodicity
Discard invalid data
Correlate live data w/ history
σ ⋈ DSP
DSPσ ⋈
19
Which tools to use
to build such apps?
Data Processing
expert
Digital Signal
Processing expert
Engines: stream engines, DBMS, MPP systems
Data model: (tempo)-relational
Language: declarative (SQL, LINQ, functional)
Scenarios: real-time, offline, progressive
Engines: MATLAB, R
Data model: array
Language: imperative (array languages, C)
Scenarios: mostly offline, real-time
How to reconcile
two worlds?
Our solution:
• high-performance (2 OOM faster)
• one query language
• familiar abstractions to both worlds
1. Window
2. Per window: pipeline DSP ops
3. Unwindow
x[n]
x2
y[n]
x0
x1
y0
y1
y2
Per
Device
+
+
• Stream engine for relational
queries
• R for highly-optimized DSP
operations
• Problem: impedance mismatch
x2
+
+
x0
x1
y0
y1
y2
R
STREAM PROCESSING
SYSTEM
• Unified query model
• Non-uniform & uniform signals
• Type-safe mix of stream & signal operators
• Array-based extensibility framework
• DSP operator writer sees arrays
• Supports incremental computation
• “Walled garden” on top of Trill
• No changes in data model
• Inherits Trill’s efficient processing capability
(e.g., grouped computation)
TRILL DSP
Time
Input
events
e1
e2
e3
e4
e5 Time
Aggregated
events
1 1 1 1212
STREAMABLE SIGNALSTREAMABLE
var signal = stream.Where(e => e.Value < 100).Count()
STREAMS
SIGNALS
• Transition to signal domain
• E.g., result of an aggregate query
• Using stream operators to build signal operators
• E.g., adding two signals as a temporal join of two streams
left.Join(right, (l, r) => l + r)
Type-safe operations
• Sampling with interpolation
Time
Input
events
misaligned missing
30 60 90 120 150 180 210
Time
Output
events
30 60 90 120 150 180 210
interpolated
var uniformSignal = signal.Sample(30, 0, ip => ip.Linear(60));
Interpolation window
STREAMS
SIGNALS
UNIFORM
• Expose arrays only inside the windowing operator
var query = uniformSignal
.Window(512, 256,
w => w.FFT().Select(a => f(a)).IFFT(),
a => a.Sum())
)
Uniform signal Uniform signal
UNWIN
AGGFFT f IFFTWIN
• DSP pipeline & arrays instantiated only once ➞ better data
management
• DSP experts write array-array
operators
• Incremental DSP operators
• Leverage Trill’s grouping power!
OLD NEW
WindowHop
FFT f IFFT
4
8
16
32
64
128
256 230 179 128 76 25
HOP SIZE
TrillDSP (1 core) MATLAB
SparkR (16 cores) SciDB-R (16 cores)
Per sensor: Windowed FFT ➞ Function ➞ Inverse FFT ➞ Unwindow
NORMALIZED TIME TO TRILLDSP ON 16 CORES Pre-loaded datasets in
memory
• 100 groups in stream
Up to 2 OOM faster than
others
Performance benefits from:
• Efficient group processing,
group-aware DSP windowing
• Using circular arrays to manage
overlapping windows
• TrillDSP uses FFTW library
32
33
0
20
40
60
80
100
Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Q13 Q14 Q15 Q16 Q17 Q18 Q19 Q20 Q21 Q22
RunningTime(Secs)
Spark + Parquet Spark + JSON (Jackson)
JSON: >80% time is on parsing!
152
Speculation Level
Structural Index Level
Fields:
“id”
Logical positions:
“id” is the 3rd attribute
Physical positions:
“id” is at the 20th byte
Speculation
Fields:
“id”
Physical positions:
“id” is at the 20th byte
35
0.0
0.5
1.0
1.5
2.0
Gson Jackson Mison
ParsingSpeed(GB/s)
36
37
0
20
40
60
80
100
Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Q13 Q14 Q15 Q16 Q17 Q18 Q19 Q20 Q21 Q22
RunningTime(Secs)
Spark + Parquet Spark + JSON (Jackson) Spark + JSON (Mison)
Spark+Mison is ~10X faster than Spark+Jackson
Spark+Mison has comparable performance with Spark+Parquet in the most cases
rich space
temporal logic
• Transfer
ShardedStreamable
shards
• querying
• data movement
• keying
Operation Description
Query Applies unmodified query on each
(keyed) shard
Broadcast Duplicate each shard’s contents on
all shards
Multicast Copy tuples from each input shard
to zero or more specific result
shards
ReShard Load balance across shards
ReDistribute Move tuples so that same key
resides in same result shard
ReKey Changes key associated with each
row in each shard
…
…
…
…
e => e.Count()
Flat re-
distribute
e => e.Count()
e => e.Sum()
(l,r) => l.Join(r, …)
(l,r) => l.Join(r, …)
Flat re-
distribute
Flat
broadcast
No data
movement
str => str.SlidingWindow(Y).Count()
.Where(c => c > threshold)
(l, r) => l.WhereNotExists(y)
str => str.HoppingWindow(Z).Count()
•
•
•
•
•
•
Scan (Quill vs. SparkSQL) Time taken & scheduling overhead
Grouped agg with 40M groups Hopping window (Github data)
http://github.com/Microsoft/CRA
https://www.microsoft.com/en-us/research/people/badrishc/
From Trill to Quill and Beyond

More Related Content

What's hot

Discretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSP
Discretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSPDiscretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSP
Discretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSP
Tathagata Das
 
Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...
Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...
Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...
Spark Summit
 
Real-Time Big Data Stream Analytics
Real-Time Big Data Stream AnalyticsReal-Time Big Data Stream Analytics
Real-Time Big Data Stream Analytics
Albert Bifet
 
When Two Choices Are not Enough: Balancing at Scale in Distributed Stream Pro...
When Two Choices Are not Enough: Balancing at Scale in Distributed Stream Pro...When Two Choices Are not Enough: Balancing at Scale in Distributed Stream Pro...
When Two Choices Are not Enough: Balancing at Scale in Distributed Stream Pro...
Anis Nasir
 
Recent progress on distributing deep learning
Recent progress on distributing deep learningRecent progress on distributing deep learning
Recent progress on distributing deep learning
Viet-Trung TRAN
 
Hadoop Summit 2012 | Bayesian Counters AKA In Memory Data Mining for Large Da...
Hadoop Summit 2012 | Bayesian Counters AKA In Memory Data Mining for Large Da...Hadoop Summit 2012 | Bayesian Counters AKA In Memory Data Mining for Large Da...
Hadoop Summit 2012 | Bayesian Counters AKA In Memory Data Mining for Large Da...
Cloudera, Inc.
 
Deep Turnover Forecast - meetup Lille
Deep Turnover Forecast - meetup LilleDeep Turnover Forecast - meetup Lille
Deep Turnover Forecast - meetup Lille
Carta Alfonso
 
Online learning, Vowpal Wabbit and Hadoop
Online learning, Vowpal Wabbit and HadoopOnline learning, Vowpal Wabbit and Hadoop
Online learning, Vowpal Wabbit and Hadoop
Héloïse Nonne
 
Chronix: Long Term Storage and Retrieval Technology for Anomaly Detection in ...
Chronix: Long Term Storage and Retrieval Technology for Anomaly Detection in ...Chronix: Long Term Storage and Retrieval Technology for Anomaly Detection in ...
Chronix: Long Term Storage and Retrieval Technology for Anomaly Detection in ...
Florian Lautenschlager
 
Distributed implementation of a lstm on spark and tensorflow
Distributed implementation of a lstm on spark and tensorflowDistributed implementation of a lstm on spark and tensorflow
Distributed implementation of a lstm on spark and tensorflow
Emanuel Di Nardo
 
[241]large scale search with polysemous codes
[241]large scale search with polysemous codes[241]large scale search with polysemous codes
[241]large scale search with polysemous codes
NAVER D2
 
A Short Course in Data Stream Mining
A Short Course in Data Stream MiningA Short Course in Data Stream Mining
A Short Course in Data Stream Mining
Albert Bifet
 
High Performance Systems Without Tears - Scala Days Berlin 2018
High Performance Systems Without Tears - Scala Days Berlin 2018High Performance Systems Without Tears - Scala Days Berlin 2018
High Performance Systems Without Tears - Scala Days Berlin 2018
Zahari Dichev
 
BlinkDB and G-OLA: Supporting Continuous Answers with Error Bars in SparkSQL-...
BlinkDB and G-OLA: Supporting Continuous Answers with Error Bars in SparkSQL-...BlinkDB and G-OLA: Supporting Continuous Answers with Error Bars in SparkSQL-...
BlinkDB and G-OLA: Supporting Continuous Answers with Error Bars in SparkSQL-...
Spark Summit
 
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
MLconf
 
Internet of Things Data Science
Internet of Things Data ScienceInternet of Things Data Science
Internet of Things Data Science
Albert Bifet
 
Streaming Algorithms
Streaming AlgorithmsStreaming Algorithms
Streaming Algorithms
Joe Kelley
 
All Pairs-Shortest Path (Fast Floyd-Warshall) Code
All Pairs-Shortest Path (Fast Floyd-Warshall) Code All Pairs-Shortest Path (Fast Floyd-Warshall) Code
All Pairs-Shortest Path (Fast Floyd-Warshall) Code
Ehsan Sharifi
 
Data Stream Outlier Detection Algorithm
Data Stream Outlier Detection Algorithm Data Stream Outlier Detection Algorithm
Data Stream Outlier Detection Algorithm
Hamza Aslam
 
TensorFrames: Google Tensorflow on Apache Spark
TensorFrames: Google Tensorflow on Apache SparkTensorFrames: Google Tensorflow on Apache Spark
TensorFrames: Google Tensorflow on Apache Spark
Databricks
 

What's hot (20)

Discretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSP
Discretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSPDiscretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSP
Discretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSP
 
Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...
Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...
Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...
 
Real-Time Big Data Stream Analytics
Real-Time Big Data Stream AnalyticsReal-Time Big Data Stream Analytics
Real-Time Big Data Stream Analytics
 
When Two Choices Are not Enough: Balancing at Scale in Distributed Stream Pro...
When Two Choices Are not Enough: Balancing at Scale in Distributed Stream Pro...When Two Choices Are not Enough: Balancing at Scale in Distributed Stream Pro...
When Two Choices Are not Enough: Balancing at Scale in Distributed Stream Pro...
 
Recent progress on distributing deep learning
Recent progress on distributing deep learningRecent progress on distributing deep learning
Recent progress on distributing deep learning
 
Hadoop Summit 2012 | Bayesian Counters AKA In Memory Data Mining for Large Da...
Hadoop Summit 2012 | Bayesian Counters AKA In Memory Data Mining for Large Da...Hadoop Summit 2012 | Bayesian Counters AKA In Memory Data Mining for Large Da...
Hadoop Summit 2012 | Bayesian Counters AKA In Memory Data Mining for Large Da...
 
Deep Turnover Forecast - meetup Lille
Deep Turnover Forecast - meetup LilleDeep Turnover Forecast - meetup Lille
Deep Turnover Forecast - meetup Lille
 
Online learning, Vowpal Wabbit and Hadoop
Online learning, Vowpal Wabbit and HadoopOnline learning, Vowpal Wabbit and Hadoop
Online learning, Vowpal Wabbit and Hadoop
 
Chronix: Long Term Storage and Retrieval Technology for Anomaly Detection in ...
Chronix: Long Term Storage and Retrieval Technology for Anomaly Detection in ...Chronix: Long Term Storage and Retrieval Technology for Anomaly Detection in ...
Chronix: Long Term Storage and Retrieval Technology for Anomaly Detection in ...
 
Distributed implementation of a lstm on spark and tensorflow
Distributed implementation of a lstm on spark and tensorflowDistributed implementation of a lstm on spark and tensorflow
Distributed implementation of a lstm on spark and tensorflow
 
[241]large scale search with polysemous codes
[241]large scale search with polysemous codes[241]large scale search with polysemous codes
[241]large scale search with polysemous codes
 
A Short Course in Data Stream Mining
A Short Course in Data Stream MiningA Short Course in Data Stream Mining
A Short Course in Data Stream Mining
 
High Performance Systems Without Tears - Scala Days Berlin 2018
High Performance Systems Without Tears - Scala Days Berlin 2018High Performance Systems Without Tears - Scala Days Berlin 2018
High Performance Systems Without Tears - Scala Days Berlin 2018
 
BlinkDB and G-OLA: Supporting Continuous Answers with Error Bars in SparkSQL-...
BlinkDB and G-OLA: Supporting Continuous Answers with Error Bars in SparkSQL-...BlinkDB and G-OLA: Supporting Continuous Answers with Error Bars in SparkSQL-...
BlinkDB and G-OLA: Supporting Continuous Answers with Error Bars in SparkSQL-...
 
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
 
Internet of Things Data Science
Internet of Things Data ScienceInternet of Things Data Science
Internet of Things Data Science
 
Streaming Algorithms
Streaming AlgorithmsStreaming Algorithms
Streaming Algorithms
 
All Pairs-Shortest Path (Fast Floyd-Warshall) Code
All Pairs-Shortest Path (Fast Floyd-Warshall) Code All Pairs-Shortest Path (Fast Floyd-Warshall) Code
All Pairs-Shortest Path (Fast Floyd-Warshall) Code
 
Data Stream Outlier Detection Algorithm
Data Stream Outlier Detection Algorithm Data Stream Outlier Detection Algorithm
Data Stream Outlier Detection Algorithm
 
TensorFrames: Google Tensorflow on Apache Spark
TensorFrames: Google Tensorflow on Apache SparkTensorFrames: Google Tensorflow on Apache Spark
TensorFrames: Google Tensorflow on Apache Spark
 

Similar to From Trill to Quill and Beyond

Impatience is a Virtue: Revisiting Disorder in High-Performance Log Analytics
Impatience is a Virtue: Revisiting Disorder in High-Performance Log AnalyticsImpatience is a Virtue: Revisiting Disorder in High-Performance Log Analytics
Impatience is a Virtue: Revisiting Disorder in High-Performance Log Analytics
Badrish Chandramouli
 
The Case for a Signal Oriented Data Stream Management System
The Case for a Signal Oriented Data Stream Management SystemThe Case for a Signal Oriented Data Stream Management System
The Case for a Signal Oriented Data Stream Management System
Reza Rahimi
 
Hardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningHardware Acceleration for Machine Learning
Hardware Acceleration for Machine Learning
CastLabKAIST
 
Chronix Poster for the Poster Session FAST 2017
Chronix Poster for the Poster Session FAST 2017Chronix Poster for the Poster Session FAST 2017
Chronix Poster for the Poster Session FAST 2017
Florian Lautenschlager
 
Chris Hillman – Beyond Mapreduce Scientific Data Processing in Real-time
Chris Hillman – Beyond Mapreduce Scientific Data Processing in Real-timeChris Hillman – Beyond Mapreduce Scientific Data Processing in Real-time
Chris Hillman – Beyond Mapreduce Scientific Data Processing in Real-time
Flink Forward
 
Tsinghua University: Two Exemplary Applications in China
Tsinghua University: Two Exemplary Applications in ChinaTsinghua University: Two Exemplary Applications in China
Tsinghua University: Two Exemplary Applications in China
DataStax Academy
 
Apache con 2020 use cases and optimizations of iotdb
Apache con 2020 use cases and optimizations of iotdbApache con 2020 use cases and optimizations of iotdb
Apache con 2020 use cases and optimizations of iotdb
ZhangZhengming
 
Flink Forward Berlin 2017: Dongwon Kim - Predictive Maintenance with Apache F...
Flink Forward Berlin 2017: Dongwon Kim - Predictive Maintenance with Apache F...Flink Forward Berlin 2017: Dongwon Kim - Predictive Maintenance with Apache F...
Flink Forward Berlin 2017: Dongwon Kim - Predictive Maintenance with Apache F...
Flink Forward
 
Convolutional neural networks for speech controlled prosthetic hands
Convolutional neural networks for speech controlled prosthetic handsConvolutional neural networks for speech controlled prosthetic hands
Convolutional neural networks for speech controlled prosthetic hands
Mohsen Jafarzadeh
 
Coding the Continuum
Coding the ContinuumCoding the Continuum
Coding the Continuum
Ian Foster
 
So you think you can stream.pptx
So you think you can stream.pptxSo you think you can stream.pptx
So you think you can stream.pptx
Prakash Chockalingam
 
Accidental Data Analytics
Accidental Data AnalyticsAccidental Data Analytics
Accidental Data Analytics
APNIC
 
Spanner (may 19)
Spanner (may 19)Spanner (may 19)
Spanner (may 19)
Sultan Ahmed
 
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
DataWorks Summit/Hadoop Summit
 
(CMP305) Deep Learning on AWS Made EasyCmp305
(CMP305) Deep Learning on AWS Made EasyCmp305(CMP305) Deep Learning on AWS Made EasyCmp305
(CMP305) Deep Learning on AWS Made EasyCmp305
Amazon Web Services
 
Real-Time Big Data with Storm, Kafka and GigaSpaces
Real-Time Big Data with Storm, Kafka and GigaSpacesReal-Time Big Data with Storm, Kafka and GigaSpaces
Real-Time Big Data with Storm, Kafka and GigaSpaces
Oleksii Diagiliev
 
Strata Singapore: Gearpump Real time DAG-Processing with Akka at Scale
Strata Singapore: GearpumpReal time DAG-Processing with Akka at ScaleStrata Singapore: GearpumpReal time DAG-Processing with Akka at Scale
Strata Singapore: Gearpump Real time DAG-Processing with Akka at Scale
Sean Zhong
 
Update on OpenTSDB and AsyncHBase
Update on OpenTSDB and AsyncHBase Update on OpenTSDB and AsyncHBase
Update on OpenTSDB and AsyncHBase
HBaseCon
 
Provenance for Data Munging Environments
Provenance for Data Munging EnvironmentsProvenance for Data Munging Environments
Provenance for Data Munging Environments
Paul Groth
 
Hadoop Tutorial with @techmilind
Hadoop Tutorial with @techmilindHadoop Tutorial with @techmilind
Hadoop Tutorial with @techmilind
EMC
 

Similar to From Trill to Quill and Beyond (20)

Impatience is a Virtue: Revisiting Disorder in High-Performance Log Analytics
Impatience is a Virtue: Revisiting Disorder in High-Performance Log AnalyticsImpatience is a Virtue: Revisiting Disorder in High-Performance Log Analytics
Impatience is a Virtue: Revisiting Disorder in High-Performance Log Analytics
 
The Case for a Signal Oriented Data Stream Management System
The Case for a Signal Oriented Data Stream Management SystemThe Case for a Signal Oriented Data Stream Management System
The Case for a Signal Oriented Data Stream Management System
 
Hardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningHardware Acceleration for Machine Learning
Hardware Acceleration for Machine Learning
 
Chronix Poster for the Poster Session FAST 2017
Chronix Poster for the Poster Session FAST 2017Chronix Poster for the Poster Session FAST 2017
Chronix Poster for the Poster Session FAST 2017
 
Chris Hillman – Beyond Mapreduce Scientific Data Processing in Real-time
Chris Hillman – Beyond Mapreduce Scientific Data Processing in Real-timeChris Hillman – Beyond Mapreduce Scientific Data Processing in Real-time
Chris Hillman – Beyond Mapreduce Scientific Data Processing in Real-time
 
Tsinghua University: Two Exemplary Applications in China
Tsinghua University: Two Exemplary Applications in ChinaTsinghua University: Two Exemplary Applications in China
Tsinghua University: Two Exemplary Applications in China
 
Apache con 2020 use cases and optimizations of iotdb
Apache con 2020 use cases and optimizations of iotdbApache con 2020 use cases and optimizations of iotdb
Apache con 2020 use cases and optimizations of iotdb
 
Flink Forward Berlin 2017: Dongwon Kim - Predictive Maintenance with Apache F...
Flink Forward Berlin 2017: Dongwon Kim - Predictive Maintenance with Apache F...Flink Forward Berlin 2017: Dongwon Kim - Predictive Maintenance with Apache F...
Flink Forward Berlin 2017: Dongwon Kim - Predictive Maintenance with Apache F...
 
Convolutional neural networks for speech controlled prosthetic hands
Convolutional neural networks for speech controlled prosthetic handsConvolutional neural networks for speech controlled prosthetic hands
Convolutional neural networks for speech controlled prosthetic hands
 
Coding the Continuum
Coding the ContinuumCoding the Continuum
Coding the Continuum
 
So you think you can stream.pptx
So you think you can stream.pptxSo you think you can stream.pptx
So you think you can stream.pptx
 
Accidental Data Analytics
Accidental Data AnalyticsAccidental Data Analytics
Accidental Data Analytics
 
Spanner (may 19)
Spanner (may 19)Spanner (may 19)
Spanner (may 19)
 
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
 
(CMP305) Deep Learning on AWS Made EasyCmp305
(CMP305) Deep Learning on AWS Made EasyCmp305(CMP305) Deep Learning on AWS Made EasyCmp305
(CMP305) Deep Learning on AWS Made EasyCmp305
 
Real-Time Big Data with Storm, Kafka and GigaSpaces
Real-Time Big Data with Storm, Kafka and GigaSpacesReal-Time Big Data with Storm, Kafka and GigaSpaces
Real-Time Big Data with Storm, Kafka and GigaSpaces
 
Strata Singapore: Gearpump Real time DAG-Processing with Akka at Scale
Strata Singapore: GearpumpReal time DAG-Processing with Akka at ScaleStrata Singapore: GearpumpReal time DAG-Processing with Akka at Scale
Strata Singapore: Gearpump Real time DAG-Processing with Akka at Scale
 
Update on OpenTSDB and AsyncHBase
Update on OpenTSDB and AsyncHBase Update on OpenTSDB and AsyncHBase
Update on OpenTSDB and AsyncHBase
 
Provenance for Data Munging Environments
Provenance for Data Munging EnvironmentsProvenance for Data Munging Environments
Provenance for Data Munging Environments
 
Hadoop Tutorial with @techmilind
Hadoop Tutorial with @techmilindHadoop Tutorial with @techmilind
Hadoop Tutorial with @techmilind
 

Recently uploaded

“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
Claudio Di Ciccio
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
James Anderson
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Vladimir Iglovikov, Ph.D.
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Zilliz
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
Rohit Gautam
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 

Recently uploaded (20)

“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 

From Trill to Quill and Beyond

  • 1.
  • 2.
  • 4.
  • 5.
  • 6.
  • 7. • Real-time raise alerts • Real-time with historical • Correlate • Offline • Develop initial monitoring query • Back-test • Progressive Non-temporal analysis Engine + Fabric Interactive Query Authoring Real-Time Dashboard
  • 8. • Performance • Fabric & language integration • Query model Scenarios • monitor telemetry & raise alerts • correlate real- time with logs • develop initial monitoring query • back-test over historical logs • offline analysis (BI) with early results
  • 9. • Performance • Fabric & language integration • Query model
  • 10. Q 1 2 3 2 1 5min Window snapshots logical time Input T-1 T-2 T-3 Output Q = COUNT(*) 3 Relational Model Tempo-Relational Model QQQ Q Q𝜹𝜹𝜹 𝜹 𝜹 Supports broad & rich analytics scenarios (relational, progressive, time-based)
  • 11. • Key enabler: performance + fabric & language integration + query model
  • 12. struct ClickEvent { long ClickTime; long User; long AdId; } var str = Network.ToStream(e => e.ClickTime, Latency(10secs)); var query = str.Where(e => e.User % 100 < 5) .Select(e => { e.AdId }) .GroupApply( e => e.AdId, s => s.Window(5min).Aggregate(w => w.Count())); query.Subscribe(e => Console.Write(e)); // write results to console
  • 13. stream of batches • More load  larger batches  better throughput … 𝑜𝑝2 … … 𝑜𝑝1
  • 14. class DataBatch { long[] SyncTime; ... Bitvector BV; } class UserData_Gen : DataBatch { long[] c_ClickTime; long[] c_User; long[] c_AdId; } … 𝑜𝑝2 … … 𝑜𝑝1 timestamp payload columns bitvector
  • 15. str.Where(e => e.User % 100<5); Send(events) ... Application Receive(results) On(Batch b) { for i = 0 to b.Size { if !(b.c_User[i]%100 < 5) set b.bitvector[i] } next-operator.On(b) } Trill
  • 17.
  • 18.
  • 19. • Lots of “signals” in stream data • IoT workflows combine relational & signal logic M Group-by ID U Union ID Time Value 0 0:42:19 67 1 0:42:22 80 2 0:42:22 85 0 0:42:23 69 2 0:42:24 85 Remove noise Interpolate missing data Find periodicity Discard invalid data Correlate live data w/ history σ ⋈ DSP DSPσ ⋈ 19 Which tools to use to build such apps?
  • 20. Data Processing expert Digital Signal Processing expert Engines: stream engines, DBMS, MPP systems Data model: (tempo)-relational Language: declarative (SQL, LINQ, functional) Scenarios: real-time, offline, progressive Engines: MATLAB, R Data model: array Language: imperative (array languages, C) Scenarios: mostly offline, real-time How to reconcile two worlds? Our solution: • high-performance (2 OOM faster) • one query language • familiar abstractions to both worlds
  • 21. 1. Window 2. Per window: pipeline DSP ops 3. Unwindow x[n] x2 y[n] x0 x1 y0 y1 y2 Per Device + +
  • 22. • Stream engine for relational queries • R for highly-optimized DSP operations • Problem: impedance mismatch x2 + + x0 x1 y0 y1 y2 R STREAM PROCESSING SYSTEM
  • 23. • Unified query model • Non-uniform & uniform signals • Type-safe mix of stream & signal operators • Array-based extensibility framework • DSP operator writer sees arrays • Supports incremental computation • “Walled garden” on top of Trill • No changes in data model • Inherits Trill’s efficient processing capability (e.g., grouped computation) TRILL DSP
  • 24.
  • 25. Time Input events e1 e2 e3 e4 e5 Time Aggregated events 1 1 1 1212 STREAMABLE SIGNALSTREAMABLE var signal = stream.Where(e => e.Value < 100).Count() STREAMS SIGNALS • Transition to signal domain • E.g., result of an aggregate query • Using stream operators to build signal operators • E.g., adding two signals as a temporal join of two streams left.Join(right, (l, r) => l + r) Type-safe operations
  • 26. • Sampling with interpolation Time Input events misaligned missing 30 60 90 120 150 180 210 Time Output events 30 60 90 120 150 180 210 interpolated var uniformSignal = signal.Sample(30, 0, ip => ip.Linear(60)); Interpolation window STREAMS SIGNALS UNIFORM
  • 27. • Expose arrays only inside the windowing operator var query = uniformSignal .Window(512, 256, w => w.FFT().Select(a => f(a)).IFFT(), a => a.Sum()) ) Uniform signal Uniform signal UNWIN AGGFFT f IFFTWIN • DSP pipeline & arrays instantiated only once ➞ better data management
  • 28. • DSP experts write array-array operators • Incremental DSP operators • Leverage Trill’s grouping power! OLD NEW WindowHop FFT f IFFT
  • 29. 4 8 16 32 64 128 256 230 179 128 76 25 HOP SIZE TrillDSP (1 core) MATLAB SparkR (16 cores) SciDB-R (16 cores) Per sensor: Windowed FFT ➞ Function ➞ Inverse FFT ➞ Unwindow NORMALIZED TIME TO TRILLDSP ON 16 CORES Pre-loaded datasets in memory • 100 groups in stream Up to 2 OOM faster than others Performance benefits from: • Efficient group processing, group-aware DSP windowing • Using circular arrays to manage overlapping windows • TrillDSP uses FFTW library
  • 30.
  • 31.
  • 32. 32
  • 33. 33 0 20 40 60 80 100 Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Q13 Q14 Q15 Q16 Q17 Q18 Q19 Q20 Q21 Q22 RunningTime(Secs) Spark + Parquet Spark + JSON (Jackson) JSON: >80% time is on parsing! 152
  • 34.
  • 35. Speculation Level Structural Index Level Fields: “id” Logical positions: “id” is the 3rd attribute Physical positions: “id” is at the 20th byte Speculation Fields: “id” Physical positions: “id” is at the 20th byte 35
  • 37. 37 0 20 40 60 80 100 Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Q13 Q14 Q15 Q16 Q17 Q18 Q19 Q20 Q21 Q22 RunningTime(Secs) Spark + Parquet Spark + JSON (Jackson) Spark + JSON (Mison) Spark+Mison is ~10X faster than Spark+Jackson Spark+Mison has comparable performance with Spark+Parquet in the most cases
  • 38.
  • 39.
  • 40. rich space temporal logic • Transfer ShardedStreamable
  • 41. shards • querying • data movement • keying Operation Description Query Applies unmodified query on each (keyed) shard Broadcast Duplicate each shard’s contents on all shards Multicast Copy tuples from each input shard to zero or more specific result shards ReShard Load balance across shards ReDistribute Move tuples so that same key resides in same result shard ReKey Changes key associated with each row in each shard … … … …
  • 42.
  • 43. e => e.Count() Flat re- distribute e => e.Count() e => e.Sum()
  • 44. (l,r) => l.Join(r, …) (l,r) => l.Join(r, …) Flat re- distribute Flat broadcast No data movement
  • 45. str => str.SlidingWindow(Y).Count() .Where(c => c > threshold) (l, r) => l.WhereNotExists(y) str => str.HoppingWindow(Z).Count()
  • 47. Scan (Quill vs. SparkSQL) Time taken & scheduling overhead
  • 48. Grouped agg with 40M groups Hopping window (Github data)
  • 49.
  • 51.
  • 52.