From Trill to Quill and Beyond

• Real-time
raise alerts
• Real-time with historical
• Correlate
• Offline
• Develop initial monitoring query
• Back-test
• Progressive
Non-temporal analysis
Engine
+ Fabric
Interactive Query Authoring
Real-Time
Dashboard

• Performance
• Fabric & language integration
• Query model
Scenarios
• monitor
telemetry &
raise alerts
• correlate real-
time with logs
• develop initial
monitoring
query
• back-test over
historical logs
• offline analysis
(BI) with early
results

• Performance
• Fabric & language integration
• Query model

Q
1
2
3
2
1
5min Window
snapshots
logical time
Input
T-1
T-2
T-3
Output
Q = COUNT(*)
3
Relational
Model
Tempo-Relational
Model
QQQ Q Q𝜹𝜹𝜹 𝜹 𝜹
Supports broad & rich analytics
scenarios (relational, progressive,
time-based)

• Key enabler: performance +
fabric & language integration +
query model

struct ClickEvent { long ClickTime; long User; long AdId; }
var str = Network.ToStream(e => e.ClickTime, Latency(10secs));
var query =
str.Where(e => e.User % 100 < 5)
.Select(e => { e.AdId })
.GroupApply( e => e.AdId,
s => s.Window(5min).Aggregate(w => w.Count()));
query.Subscribe(e => Console.Write(e)); // write results to console

stream of batches
• More load  larger batches  better throughput
…
𝑜𝑝2
…
…
𝑜𝑝1

class DataBatch {
long[] SyncTime;
...
Bitvector BV;
}
class UserData_Gen : DataBatch {
long[] c_ClickTime;
long[] c_User;
long[] c_AdId;
}
…
𝑜𝑝2
…
…
𝑜𝑝1
timestamp payload columns
bitvector

str.Where(e => e.User % 100<5);
Send(events)
...
Application
Receive(results)
On(Batch b) {
for i = 0 to b.Size {
if !(b.c_User[i]%100 < 5)
set b.bitvector[i]
}
next-operator.On(b)
}
Trill

session windows,
http://aka.ms/trill

• Lots of “signals” in stream data
• IoT workflows combine relational & signal logic
M
Group-by ID
U
Union
ID Time Value
0 0:42:19 67
1 0:42:22 80
2 0:42:22 85
0 0:42:23 69
2 0:42:24 85
Remove noise
Interpolate missing data
Find periodicity
Discard invalid data
Correlate live data w/ history
σ ⋈ DSP
DSPσ ⋈
19
Which tools to use
to build such apps?

Data Processing
expert
Digital Signal
Processing expert
Engines: stream engines, DBMS, MPP systems
Data model: (tempo)-relational
Language: declarative (SQL, LINQ, functional)
Scenarios: real-time, offline, progressive
Engines: MATLAB, R
Data model: array
Language: imperative (array languages, C)
Scenarios: mostly offline, real-time
How to reconcile
two worlds?
Our solution:
• high-performance (2 OOM faster)
• one query language
• familiar abstractions to both worlds

1. Window
2. Per window: pipeline DSP ops
3. Unwindow
x[n]
x2
y[n]
x0
x1
y0
y1
y2
Per
Device
+
+

• Stream engine for relational
queries
• R for highly-optimized DSP
operations
• Problem: impedance mismatch
x2
+
+
x0
x1
y0
y1
y2
R
STREAM PROCESSING
SYSTEM

• Unified query model
• Non-uniform & uniform signals
• Type-safe mix of stream & signal operators
• Array-based extensibility framework
• DSP operator writer sees arrays
• Supports incremental computation
• “Walled garden” on top of Trill
• No changes in data model
• Inherits Trill’s efficient processing capability
(e.g., grouped computation)
TRILL DSP

Time
Input
events
e1
e2
e3
e4
e5 Time
Aggregated
events
1 1 1 1212
STREAMABLE SIGNALSTREAMABLE
var signal = stream.Where(e => e.Value < 100).Count()
STREAMS
SIGNALS
• Transition to signal domain
• E.g., result of an aggregate query
• Using stream operators to build signal operators
• E.g., adding two signals as a temporal join of two streams
left.Join(right, (l, r) => l + r)
Type-safe operations

• Sampling with interpolation
Time
Input
events
misaligned missing
30 60 90 120 150 180 210
Time
Output
events
30 60 90 120 150 180 210
interpolated
var uniformSignal = signal.Sample(30, 0, ip => ip.Linear(60));
Interpolation window
STREAMS
SIGNALS
UNIFORM

• Expose arrays only inside the windowing operator
var query = uniformSignal
.Window(512, 256,
w => w.FFT().Select(a => f(a)).IFFT(),
a => a.Sum())
)
Uniform signal Uniform signal
UNWIN
AGGFFT f IFFTWIN
• DSP pipeline & arrays instantiated only once ➞ better data
management

• DSP experts write array-array
operators
• Incremental DSP operators
• Leverage Trill’s grouping power!
OLD NEW
WindowHop
FFT f IFFT

4
8
16
32
64
128
256 230 179 128 76 25
HOP SIZE
TrillDSP (1 core) MATLAB
SparkR (16 cores) SciDB-R (16 cores)
Per sensor: Windowed FFT ➞ Function ➞ Inverse FFT ➞ Unwindow
NORMALIZED TIME TO TRILLDSP ON 16 CORES Pre-loaded datasets in
memory
• 100 groups in stream
Up to 2 OOM faster than
others
Performance benefits from:
• Efficient group processing,
group-aware DSP windowing
• Using circular arrays to manage
overlapping windows
• TrillDSP uses FFTW library

33
0
20
40
60
80
100
Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Q13 Q14 Q15 Q16 Q17 Q18 Q19 Q20 Q21 Q22
RunningTime(Secs)
Spark + Parquet Spark + JSON (Jackson)
JSON: >80% time is on parsing!
152

Speculation Level
Structural Index Level
Fields:
“id”
Logical positions:
“id” is the 3rd attribute
Physical positions:
“id” is at the 20th byte
Speculation
Fields:
“id”
Physical positions:
“id” is at the 20th byte
35

0.0
0.5
1.0
1.5
2.0
Gson Jackson Mison
ParsingSpeed(GB/s)
36

37
0
20
40
60
80
100
Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Q13 Q14 Q15 Q16 Q17 Q18 Q19 Q20 Q21 Q22
RunningTime(Secs)
Spark + Parquet Spark + JSON (Jackson) Spark + JSON (Mison)
Spark+Mison is ~10X faster than Spark+Jackson
Spark+Mison has comparable performance with Spark+Parquet in the most cases

rich space
temporal logic
• Transfer
ShardedStreamable

shards
• querying
• data movement
• keying
Operation Description
Query Applies unmodified query on each
(keyed) shard
Broadcast Duplicate each shard’s contents on
all shards
Multicast Copy tuples from each input shard
to zero or more specific result
shards
ReShard Load balance across shards
ReDistribute Move tuples so that same key
resides in same result shard
ReKey Changes key associated with each
row in each shard
…
…
…
…

e => e.Count()
Flat re-
distribute
e => e.Count()
e => e.Sum()

(l,r) => l.Join(r, …)
(l,r) => l.Join(r, …)
Flat re-
distribute
Flat
broadcast
No data
movement

str => str.SlidingWindow(Y).Count()
.Where(c => c > threshold)
(l, r) => l.WhereNotExists(y)
str => str.HoppingWindow(Z).Count()

Scan (Quill vs. SparkSQL) Time taken & scheduling overhead

Grouped agg with 40M groups Hopping window (Github data)

http://github.com/Microsoft/CRA

https://www.microsoft.com/en-us/research/people/badrishc/

From Trill to Quill and Beyond

From Trill to Quill and Beyond

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to From Trill to Quill and Beyond

Similar to From Trill to Quill and Beyond (20)

Recently uploaded

Recently uploaded (20)

From Trill to Quill and Beyond