Badrish Chandramouli @ DEBS 2016
• Real-time
raise alerts
• Real-time with historical
• Correlate
• Offline
• Develop initial monitoring query
• Back-test
...
• Performance
• Fabric & language integration
• Query model
Scenarios
• monitor
telemetry &
raise alerts
• correlate real-...
• Performance
• Fabric & language integration
• Query model
Badrish Chandramouli @ DEBS 2016
Q
1
2
3
2
1
5min Window
snapshots
logical time
Input
T-1
T-2
T-3
Output
Q = COUNT(*)
3
Relational
Model
Tempo-Relational
M...
• Key enabler: performance +
fabric & language integration +
query model
Badrish Chandramouli @ DEBS 2016
struct ClickEvent { long ClickTime; long User; long AdId; }
var str = Network.ToStream(e => e.ClickTime, Latency(10secs));...
stream of batches
• More load  larger batches  better throughput
…
𝑜𝑝2
…
…
𝑜𝑝1
Badrish Chandramouli @ DEBS 2016
class DataBatch {
long[] SyncTime;
...
Bitvector BV;
}
class UserData_Gen : DataBatch {
long[] c_ClickTime;
long[] c_User;...
str.Where(e => e.User % 100<5);
Send(events)
...
Application
Receive(results)
On(Batch b) {
for i = 0 to b.Size {
if !(b.c...
Func<TState> InitialState();
Func<TState, long, TInput, TState> Accumulate();
Func<TState,long, TInput, TState> Deaccumula...
session windows,
http://aka.ms/trill
Badrish Chandramouli @ DEBS 2016
Badrish Chandramouli @ DEBS 2016
• Increasing interest in real-time processing over
out-of-order streams
0
20
40
60
80
100
Refresh every second
Badrish Cha...
Up to 8X faster
Badrish Chandramouli @ DEBS 2016
use existing high-perf in-order Trill operators unchanged
Badrish Chandramouli @ DEBS 2016
Low-latency
Completeness
1 sec, 98%
1 hour, 100%
?
0
20
40
60
80
100
0
20
40
60
80
100
0
20
40
60
80
100
0
20
40
60
80
100...
Impatience framework gives us low latency, high
completeness, high throughput, and low memory usage
Latency Completeness
{...
Badrish Chandramouli @ DEBS 2016
no overlapping lifetimes
0
20
40
60
80
100
Badrish Chandramouli @ DEBS 2016
data streams and operations
arrays of numerical values
Badrish Chandramouli @ DEBS 2016
Badrish Chandramouli @ DEBS 2016
Badrish Chandramouli @ DEBS 2016
Badrish Chandramouli @ DEBS 2016
Badrish Chandramouli @ DEBS 2016
rich space
temporal logic
• Transfer
ShardedStreamable
Badrish Chandramouli @ DEBS 2016
shards
• querying
• data movement
• keying
Operation Description
Query Applies unmodified query on each
(keyed) shard
Broa...
Badrish Chandramouli @ DEBS 2016
e => e.Count()
Flat re-
distribute
e => e.Count()
e => e.Sum()
Badrish Chandramouli @ DEBS 2016
e => e.Count()
[ReDist]
Union
[ReDist]
Union
[ReKey] [ReKey]
AGG AGG
[ReDist]
Union
[ReDist]
Union
[ReKey] [ReKey]
[ReDist...
(l,r) => l.Join(r, …)
(l,r) => l.Join(r, …)
Flat re-
distribute
Flat
broadcast
No data
movement
Badrish Chandramouli @ DEB...
str => str.SlidingWindow(Y).Count()
.Where(c => c > threshold)
(l, r) => l.WhereNotExists(y)
str => str.HoppingWindow(Z).C...
•
•
•
•
•
Badrish Chandramouli @ DEBS 2016
Badrish Chandramouli @ DEBS 2016
Scan (Quill vs. SparkSQL) Time taken & scheduling overhead
Badrish Chandramouli @ DEBS 2016
Grouped agg with 40M groups Hopping window (Github data)
Badrish Chandramouli @ DEBS 2016
http://badrish.net/papers/shrink-TR.pdf
Badrish Chandramouli @ DEBS 2016
https://www.microsoft.com/en-us/research/people/badrishc/
http://aka.ms/streams/
Badrish Chandramouli @ DEBS 2016
From Trill to Quill: Pushing the Envelope of Functionality and Scale
From Trill to Quill: Pushing the Envelope of Functionality and Scale
From Trill to Quill: Pushing the Envelope of Functionality and Scale
From Trill to Quill: Pushing the Envelope of Functionality and Scale
From Trill to Quill: Pushing the Envelope of Functionality and Scale
From Trill to Quill: Pushing the Envelope of Functionality and Scale
From Trill to Quill: Pushing the Envelope of Functionality and Scale
Upcoming SlideShare
Loading in …5
×

From Trill to Quill: Pushing the Envelope of Functionality and Scale

223 views

Published on

In this talk, I overview Trill, describe two projects that expand Trill's functionality, and describe Quill, a new multi-node offline analytics system I have been working on at MSR.

Published in: Data & Analytics
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
223
On SlideShare
0
From Embeds
0
Number of Embeds
14
Actions
Shares
0
Downloads
1
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

From Trill to Quill: Pushing the Envelope of Functionality and Scale

  1. 1. Badrish Chandramouli @ DEBS 2016
  2. 2. • Real-time raise alerts • Real-time with historical • Correlate • Offline • Develop initial monitoring query • Back-test • Progressive Non-temporal analysis Engine + Fabric Interactive Query Authoring Real-Time Dashboard Badrish Chandramouli @ DEBS 2016
  3. 3. • Performance • Fabric & language integration • Query model Scenarios • monitor telemetry & raise alerts • correlate real- time with logs • develop initial monitoring query • back-test over historical logs • offline analysis (BI) with early results Badrish Chandramouli @ DEBS 2016
  4. 4. • Performance • Fabric & language integration • Query model Badrish Chandramouli @ DEBS 2016
  5. 5. Q 1 2 3 2 1 5min Window snapshots logical time Input T-1 T-2 T-3 Output Q = COUNT(*) 3 Relational Model Tempo-Relational Model QQQ Q Q𝜹𝜹𝜹 𝜹 𝜹 Supports broad & rich analytics scenarios (relational, progressive, time-based) Badrish Chandramouli @ DEBS 2016
  6. 6. • Key enabler: performance + fabric & language integration + query model Badrish Chandramouli @ DEBS 2016
  7. 7. struct ClickEvent { long ClickTime; long User; long AdId; } var str = Network.ToStream(e => e.ClickTime, Latency(10secs)); var query = str.Where(e => e.User % 100 < 5) .Select(e => { e.AdId }) .GroupApply( e => e.AdId, s => s.Window(5min).Aggregate(w => w.Count())); query.Subscribe(e => Console.Write(e)); // write results to console Badrish Chandramouli @ DEBS 2016
  8. 8. stream of batches • More load  larger batches  better throughput … 𝑜𝑝2 … … 𝑜𝑝1 Badrish Chandramouli @ DEBS 2016
  9. 9. class DataBatch { long[] SyncTime; ... Bitvector BV; } class UserData_Gen : DataBatch { long[] c_ClickTime; long[] c_User; long[] c_AdId; } … 𝑜𝑝2 … … 𝑜𝑝1 timestamp payload columns bitvector Badrish Chandramouli @ DEBS 2016
  10. 10. str.Where(e => e.User % 100<5); Send(events) ... Application Receive(results) On(Batch b) { for i = 0 to b.Size { if !(b.c_User[i]%100 < 5) set b.bitvector[i] } next-operator.On(b) } Trill Badrish Chandramouli @ DEBS 2016
  11. 11. Func<TState> InitialState(); Func<TState, long, TInput, TState> Accumulate(); Func<TState,long, TInput, TState> Deaccumulate(); Func<TState, TState, TState> Sum(); Func<TState, TState, TState> Difference(); Func<TState, TResult> ComputeResult(); InitialState: () => 0 Accumulate: (oldCount, timestamp, input) => oldCount + 1 Deaccumulate: (oldCount, timestamp, input) => oldCount - 1 Sum: (leftCount, rightCount) => leftCount + rightCount Difference: (leftCount, rightCount) => leftCount - rightCount ComputeResult: count => count Badrish Chandramouli @ DEBS 2016
  12. 12. session windows, http://aka.ms/trill Badrish Chandramouli @ DEBS 2016
  13. 13. Badrish Chandramouli @ DEBS 2016
  14. 14. • Increasing interest in real-time processing over out-of-order streams 0 20 40 60 80 100 Refresh every second Badrish Chandramouli @ DEBS 2016
  15. 15. Up to 8X faster Badrish Chandramouli @ DEBS 2016
  16. 16. use existing high-perf in-order Trill operators unchanged Badrish Chandramouli @ DEBS 2016
  17. 17. Low-latency Completeness 1 sec, 98% 1 hour, 100% ? 0 20 40 60 80 100 0 20 40 60 80 100 0 20 40 60 80 100 0 20 40 60 80 100 0 20 40 60 80 100 10 seconds Refresh every secondCloud telemetry log Badrish Chandramouli @ DEBS 2016
  18. 18. Impatience framework gives us low latency, high completeness, high throughput, and low memory usage Latency Completeness {1 sec} ~ 1 sec 98% {1 hour} ~ 1 hour 100% {1 sec} + {1 min} + {1 hour} ~ 1 sec 100% {1 sec, 1 min, 1 hour} ~ 1 sec 100% 0 2 4 6 8 10 12 14 Throughput(million/sec) Throughput {1sec, 1min, 1hour} {1sec}+{1min}+{1hour} 1 10 100 1000 Memoryusage(MB),logscale Memory usage {1sec, 1min, 1hour} {1sec}+{1min}+{1hour} Badrish Chandramouli @ DEBS 2016
  19. 19. Badrish Chandramouli @ DEBS 2016
  20. 20. no overlapping lifetimes 0 20 40 60 80 100 Badrish Chandramouli @ DEBS 2016
  21. 21. data streams and operations arrays of numerical values Badrish Chandramouli @ DEBS 2016
  22. 22. Badrish Chandramouli @ DEBS 2016
  23. 23. Badrish Chandramouli @ DEBS 2016
  24. 24. Badrish Chandramouli @ DEBS 2016
  25. 25. Badrish Chandramouli @ DEBS 2016
  26. 26. rich space temporal logic • Transfer ShardedStreamable Badrish Chandramouli @ DEBS 2016
  27. 27. shards • querying • data movement • keying Operation Description Query Applies unmodified query on each (keyed) shard Broadcast Duplicate each shard’s contents on all shards Multicast Copy tuples from each input shard to zero or more specific result shards ReShard Load balance across shards ReDistribute Move tuples so that same key resides in same result shard ReKey Changes key associated with each row in each shard … … … … Badrish Chandramouli @ DEBS 2016
  28. 28. Badrish Chandramouli @ DEBS 2016
  29. 29. e => e.Count() Flat re- distribute e => e.Count() e => e.Sum() Badrish Chandramouli @ DEBS 2016
  30. 30. e => e.Count() [ReDist] Union [ReDist] Union [ReKey] [ReKey] AGG AGG [ReDist] Union [ReDist] Union [ReKey] [ReKey] [ReDist] Union [ReDist] Union AGG AGG [ReDist] Union [ReDist] Union AGG AGG AGG AGG e => e.Sum() Badrish Chandramouli @ DEBS 2016
  31. 31. (l,r) => l.Join(r, …) (l,r) => l.Join(r, …) Flat re- distribute Flat broadcast No data movement Badrish Chandramouli @ DEBS 2016
  32. 32. str => str.SlidingWindow(Y).Count() .Where(c => c > threshold) (l, r) => l.WhereNotExists(y) str => str.HoppingWindow(Z).Count() Badrish Chandramouli @ DEBS 2016
  33. 33. • • • • • Badrish Chandramouli @ DEBS 2016
  34. 34. Badrish Chandramouli @ DEBS 2016
  35. 35. Scan (Quill vs. SparkSQL) Time taken & scheduling overhead Badrish Chandramouli @ DEBS 2016
  36. 36. Grouped agg with 40M groups Hopping window (Github data) Badrish Chandramouli @ DEBS 2016
  37. 37. http://badrish.net/papers/shrink-TR.pdf Badrish Chandramouli @ DEBS 2016
  38. 38. https://www.microsoft.com/en-us/research/people/badrishc/ http://aka.ms/streams/ Badrish Chandramouli @ DEBS 2016

×