Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
A Tutorial
Modern Real-Time Streaming
Architectures
Karthik	
  Ramasamy*,	
  Sanjeev	
  Kulkarni*,	
  Neng	
  Lu#,	
  Arun...
2
MESSAGING STREAMING OPERATIONS
DATA SKETECHES LAMBDA, KAPPA UNIFICATION
OUTLINE
3
Information Age
Real	
  @me	
  is	
  key	
  
Ká !
4
Internet of Things (IoT)
$1.9	
  T	
  in	
  value	
  by	
  2020	
  -­‐	
  Mfg	
  (15%),	
  Health	
  Care	
  (15%),	
  I...
5
The Future
Biostamps [2]
Mobile
Sensor Network
Exponential growth [1]
[1]	
  hap://opensignal.com/assets/pdf/reports/201...
6
Intelligent Health Care
Con@nuous	
  Monitoring	
  
Tracking Movements
Measure	
  effect	
  of	
  social	
  
influences
Go...
7
User Experience, Productivity
Real	
  @me	
  
Real-time Video Streams
N E W S
Drones Robotics
I N D U S T R Y 	
  
$ 4 0...
8
Increasingly Connected World
Internet of Things
30	
  B	
  connected	
  devices	
  by	
  2020
Health Care
153	
  Exabyte...
9
TO STREAMING
10
Traditional Data Processing
Challenges	
  
Introduces	
  too	
  much	
  “decision	
  latency”	
  
Responses	
  are	
  d...
11
The New Era: Streaming Data/Fast Data
Events	
  are	
  analyzed	
  and	
  processed	
  in	
  real-­‐@me	
  as	
  they	
...
12
Real Time Use Cases
Algorithmic	
  trading	
  
Online	
  fraud	
  detec@on	
  
Geo	
  fencing	
  
Proximity/loca@on	
  ...
13
Requirements of Stream Processing
In-stream Handle imperfections Predictable Performance
Process	
  data	
  as	
  it	
 ...
14
High level languages Integrate stored and
streaming data
Data safety and
availability
Process and respond
SQL	
  or	
  ...
15
Real Time Stack
REAL TIME
STACK
Collectors
s
Compute
J
Messaging
a
Storage
b
16
of MESSAGING FRAMEWORKS
An In-Depth
17
Current Messaging Systems
01 02 03 04
05 06 07 08
ActiveMQ RabbitMQ Pulsar RocketMQ
Azure
Event Hub
Google
Pub-Sub
Sato...
18
Why Apache Pulsar?
Ordering	
  
Guaranteed	
  ordering
MulH-­‐tenancy	
  
A	
  single	
  cluster	
  can	
  support	
  m...
19
Unified Messaging Model
Producer	
  (X)	
  
Producer	
  (Y)
Topic	
  (T)
Subscrip@on	
  (A)	
  
Subscrip@on	
  (B)	
  
C...
20
Pulsar Producer
PulsarClient client = PulsarClient.create(
“http://broker.usw.example.com:8080”);
Producer producer = c...
21
Pulsar Consumer
PulsarClient client = PulsarClient.create(
"http://broker.usw.example.com:8080");
Consumer consumer = c...
22
Pulsar Architecture
Pulsar	
  Broker	
  1 Pulsar	
  Broker	
  1 Pulsar	
  Broker	
  1
Bookie	
  1 Bookie	
  2 Bookie	
 ...
23
Pulsar Architecture
Pulsar	
  Broker	
  1 Pulsar	
  Broker	
  1 Pulsar	
  Broker	
  1
Bookie	
  1 Bookie	
  2 Bookie	
 ...
24
Pulsar Architecture
Clients
CLIENTS
Lookup correct broker through service discovery
Establish connections to brokers
En...
25
Pulsar Architecture
Message	
  Dispatching
DISPATCHER
End-to-end async message processing
Messages relayed across produ...
26
Pulsar Architecture
Geo	
  ReplicaHon
GEO REPLICATION
Asynchronous replication
Integrated in the broker message flow
Sim...
27
Pulsar Use Cases - Message Queue
Online	
  Events Topic	
  (T)
Worker	
  1
Worker	
  2
Decouple	
  Online/Offline
Topic	
...
28
Pulsar Use Cases - Feedback System
Event Topic	
  (T)
Propagate	
  States
Controller
Topic	
  (T)
Serving	
  System Ser...
29
Pulsar in Production
3+	
  years	
  
Serves	
  2.3	
  million	
  topics	
  
100	
  billion	
  messages/day	
  
Average	...
30
Companies using Pulsar
31
of STREAMING FRAMEWORKS
An In-Depth
32
Current Streaming Frameworks
01 02 03 04
05 06 07 08
Beam S-Store Spark Flink
Heron Storm Apex
KAFKA
STREAMS
33
Apache Beam
Promises	
  
Abstrac@ng	
  the	
  Computa@on	
  
Express	
  Computa@on	
  
Expressive	
  Windowing/Triggeri...
34
Apache Beam
ComputaHon	
  AbstracHon	
  
All	
  Data	
  is	
  4	
  tuple	
  
Key,	
  Value	
  
Event	
  Time	
  
Window...
35
Apache Beam
Windowing	
  
Window	
  Assignment	
  
Fixed(a.k.a.	
  Tumbling),	
  Sliding	
  Session	
  
Pluggable	
  
W...
36
Apache Beam
Challenges	
  
Mul@ple	
  Layers	
  
API	
  vs	
  Execu@on	
  
Troubleshoo@ng	
  complexi@es	
  
Need	
  hi...
37
IBM S-Store
Promises	
  
Combine	
  Stream	
  Processing	
  and	
  Transac@ons	
  
Extended	
  an	
  OLTP	
  engine(H-­...
38
IBM S-Store
Data	
  and	
  Processing	
  Model	
  	
  
Tuples	
  grouped	
  into	
  Atomic	
  Batches	
  
Grouping	
  o...
39
IBM S-Store
Exactly	
  Once	
  Guarantees	
  
Strong	
  
Inputs	
  and	
  Outputs	
  are	
  logged	
  at	
  every	
  DA...
40
IBM S-Store
Challenges	
  
Throughput	
  
Non	
  OLTP	
  processing	
  is	
  much	
  slower	
  compared	
  to	
  modern...
41
Heron Terminology
Topology
Directed	
  acyclic	
  graph	
  	
  
ver@ces	
  =	
  computa@on,	
  and	
  	
  
edges	
  =	
...
42
Heron Topology
%
%
%
%
%
Spout 1
Spout 2
Bolt 1
Bolt 2
Bolt 3
Bolt 4
Bolt 5
43
Heron Topology - Physical Execution
%
%
%
%
%
Spout 1
Spout 2
Bolt 1
Bolt 2
Bolt 3
Bolt 4
Bolt 5
%%
%%
%%
%%
%%
44
Heron Groupings
01 02 03 04
Shuffle Grouping
Random distribution of tuples
Fields Grouping
Group tuples by a field or
mu...
45
Heron Topology - Physical Execution
%
%
%
%
%
Spout 1
Spout 2
Bolt 1
Bolt 2
Bolt 3
Bolt 4
Bolt 5
%%
%%
%%
%%
%%
Shuffle ...
46
Writing Heron Topologies
Procedural - Low Level API
Directly	
  write	
  your	
  spouts	
  and	
  
bolts
Functional - M...
47
Heron Design Goals
Efficiency	
  
Reduce	
  resource	
  consump@on
Support	
  for	
  diverse	
  workloads	
  
Throughput	...
48
Heron Architecture
Scheduler
Topology 1 Topology 2 Topology N
Topology
Submission
49
Topology Master
Monitoring of containers Gateway for metrics Assigns role
50
Topology Architecture
Topology Master
ZK
Cluster
Stream
Manager
I1 I2 I3 I4
Stream
Manager
I1 I2 I3 I4
Logical Plan,
Ph...
51
Stream Manager
Routes tuples Implements backpressure Ack management
52
Stream Manager
Sample	
  topology
% %
S1 B2 B3
%
B4
53
Stream Manager
Physical	
  execuHon
S1 B2
B3
Stream
Manager
Stream
Manager
Stream
Manager
Stream
Manager
S1 B2
B3 B4
S1...
54
Stream Manager Backpressure
Spout based back pressureTCP backpressure Stage by stage back pressure
55
Stream Manager Backpressure
TCP	
  based	
  backpressure
Slows upstream and downstream instances
S1 B2
B3
Stream
Manage...
56
Stream Manager Backpressure
Spout	
  based	
  backpressure
S1 S1
S1S1S1 S1
S1S1 B2
B3
Stream
Manager
Stream
Manager
Str...
57
Heron Instance
Runs only one task (spout/bolt)
Exposes Heron API
Collects several metrics
API
G
58
Heron Instance
Stream
Manager
Metrics
Manager
Gateway
Thread
Task Execution
Thread
data-in queue
data-out queue
metrics...
59
Companies using Heron
60
AND
of STREAM PROCESSING APPLICATIONS
61
Pulsar Operations
Reac@ng	
  to	
  Failures	
  
Brokers	
  
Bookies	
  
Common	
  Issues	
  
Consumer	
  Backlog	
  
I/...
62
Reacting to Failures - Brokers
Brokers	
  don’t	
  have	
  durable	
  state	
  
Easily	
  replaceable	
  
Topics	
  are...
63
Reacting to Failures - Bookies
When	
  a	
  bookie	
  fails,	
  brokers	
  will	
  immediately	
  con@nue	
  on	
  othe...
64
Consumer Backlog
Metrics	
  are	
  available	
  to	
  make	
  assessments:	
  
When	
  problem	
  started	
  
How	
  bi...
65
I/O Prioritization and Throttling
Priori@ze	
  access	
  to	
  IO	
  
During	
  an	
  outage	
  many	
  tenants	
  migh...
66
Enforcing Multi-Tenancy
Ensure	
  tenants	
  don’t	
  cause	
  performance	
  issues	
  on	
  other	
  tenants	
  
Back...
67
Heron @Twitter
LARGEST	
  CLUSTER
100’s	
  of	
  TOPOLOGIES
BILLIONS	
  OF	
  MESSAGES100’s	
  OF	
  TERABYTESREDUCED	
...
68
Heron Deployment
Topology 1
Topology 2
Topology N
Heron
Tracker
Heron
VIZ
Heron
Web
ZK
Cluster
Aurora Services
Observab...
69
Heron Visualization
70
Heron Use Cases
Monitoring
Real	
  Time	
  	
  
Machine	
  Learning
Ads
Real	
  Time	
  Trends
Product	
  Safety
Real	
...
71
Heron Topology Complexity
72
Heron Topology Scale
CONTAINERS - 1 TO 600 INSTANCES - 10 TO 6000
73
Heron Happy Facts :)
v	
  No	
  more	
  pages	
  during	
  midnight	
  for	
  Heron	
  team	
  
Ø	
  Very	
  rare	
  ...
74
Heron Developer Issues
01 02
Container	
  resource	
  allocaHon
Parallelism	
  tuning
75
Heron Operational Issues
01 02 03
Slow Hosts Network Issues Data Skew
/
.
-
04
Load Variations
,
05
SLA Violations
/
76
Slow Hosts
Memory Parity Errors Impending Disk Failures Lower GHZ
77
Network Issues
Network Slowness
Network Partitioning
G
78
Network Slowness
01 02 03
Delays	
  processing
Data	
  is	
  accumulaHng
Timelines	
  of	
  results	
  
Is	
  affected
79
Data Skew
Multiple Keys
Several	
  keys	
  map	
  into	
  single	
  
instance	
  and	
  their	
  count	
  is	
  
high
S...
80
Load Variations
Spikes
Sudden	
  surge	
  of	
  data	
  -­‐	
  short	
  
lived	
  vs	
  last	
  for	
  several	
  
minu...
81
Self Regulating Streaming Systems
Automate	
  Tuning
SLO	
  	
  
Maintenance
Self	
  RegulaHng	
  
Streaming	
  Systems...
82
Self Regulating Streaming Systems
Self tuning Self stabilizing Self healing
G !
Several tuning knobs
Time consuming tun...
83
Enter Dhalion
Dhalion periodically executes
well-specified policies that
optimize execution based on
some objective.
We...
84
Dhalion Policy Framework
Symptom
Detector 1
Symptom
Detector 2
Symptom
Detector 3
Symptom
Detector N
....
Diagnoser 1
D...
85
Dynamic Resource Provisioning
Policy
This	
  policy	
  reacts	
  to	
  unexpected	
  
load	
  varia@ons	
  (workload	
 ...
86
Dynamic Resource Provisioning
Pending Tuples
Detector
Backpressure
Detector
Processing Rate
Skew Detector
Resource Over...
87
Dynamic Resource Provisioning
Tweet Spout
Tweet Spout
Tweet Spout
%
%
%
%
Splitter Spout
Splitter Spout Counter Bolt
Co...
88
Dynamic Resource Provisioning
Tweet Spout
Tweet Spout
Tweet Spout
%
%
%
%
Splitter Spout
Splitter Spout Counter Bolt
Co...
89
Dynamic Resource Provisioning
Tweet Spout
Tweet Spout
Tweet Spout
%
%
%
%
Splitter Spout
Splitter Spout Counter Bolt
Co...
90
Dynamic Resource Provisioning
Tweet Spout
Tweet Spout
Tweet Spout
%
%
%
%
Splitter Spout
Splitter Spout Counter Bolt
Co...
91
Dynamic Resource Provisioning
Tweet Spout
Tweet Spout
Tweet Spout
%
%
%
%
Splitter Spout
Splitter Spout Counter Bolt
Co...
92
Dynamic Resource Provisioning
Tweet Spout
Tweet Spout
Tweet Spout
%
%
%
%
Splitter Spout
Splitter Spout Counter Bolt
Co...
93
Experimental Setup
% %
Spout Splitter Bolt Counter Bolt
Shuffle Grouping Fields Grouping
Microsoq	
  HDInsight	
  
Intel...
94
Dynamic Provisioning Profile
0.00
0.20
0.40
0.60
0.80
1.00
1.20
1.40
0 10 20 30 40 50 60 70 80 90 100 110 120
Normalized...
Streaming and One-Pass
Algorithms
DATA	
  CHARACTERISTICS
UNBOUNDED
DATA	
  CHARACTERISTICS
UNORDERED	
  
	
  	
  	
  	
  Varying	
  Skew	
  
	
  	
  	
  	
  	
  	
  	
  	
  Event	
  Hme	
  ...
98
Neklix	
  ,	
  HBO,	
  Hulu,	
  YouTube,	
  DailymoHon,	
  ESPN,	
  Sling	
  TV
Satori,	
  Facebook	
  Live,	
  Perisco...
KEY	
  CHARACTERISTICS
LOW	
  LATENCY
HIGH	
  VELOCITY	
  DATA
100
KEY	
  CHARACTERISTICS
In-Order O(1) Storage O(n) Time
ONE PASS
101
DISTRIBUTED COMPUTATION SCALE OUT
ROBUST
CHARACTERISTICS
FAULT TOLERANCE
KEY
102
DATA SKETCHES
Early	
  work
The space complexity of approximating
the frequency moments
Counting
Frequent Elements
[Mi...
DATA	
  SKETCHES
UNIQUE	
  
FILTER	
  
COUNT	
  
HISTOGRAM	
  
QUANTILE	
  
MOMENTS	
  
TOP-­‐K
ADVANCED	
  	
  DATA	
  SKETCHES
RANDOM	
  PROJECTIONS,	
  FREQUENT	
  DIRECTIONS	
  
	
  	
  	
  	
  Dimensionality	
  Re...
105
1 2 3
5 4
SAMPLING
A/B	
  
Tes@ng
FILTERING
Set	
  
membership
CORRELATION
Fraud	
  
Detec@on
QUANTILES
Network	
  	
 ...
106
6 7 8
10 9
MOMENTS
Database
FREQUENT	
  	
  
ELEMENTS
Trending	
  
Hashtags
CLUSTERING
Medical	
  
Imaging
ANOMALY	
  ...
107
Sampling
[1]	
  J.	
  S.	
  Viaer.	
  Random	
  Sampling	
  with	
  a	
  Reservoir.	
  ACM	
  Transac@ons	
  on	
  Mat...
108
Sampling
Sliding	
  window	
  approach	
  (sample	
  size	
  k,	
  window	
  width	
  n)	
  
Sequence-­‐based	
  	
  
...
109
Sampling
[1]	
  C.	
  C.	
  Aggarwal.On	
  Biased	
  Reservoir	
  Sampling	
  in	
  the	
  presence	
  of	
  Stream	
 ...
110
Filtering
Set	
  Membership	
  
	
  	
  	
  	
  Determine,	
  with	
  some	
  false	
  probability,	
  if	
  an	
  ite...
111
Filtering
Set	
  Membership
Applica@on	
  to	
  hyphena@on	
  	
  
programs	
  
Early	
  UNIX	
  spell	
  checkers
[1]...
112
Filtering
Set	
  Membership
Natural	
  generaliza@on	
  of	
  hashing	
  	
  
False	
  posi@ves	
  are	
  possible	
  ...
113
Filtering
Set	
  Membership
Minimizing	
  false	
  posi@ve	
  rate	
  ε	
  w.r.t.	
  k	
  [1]	
  
k	
  =	
  ln	
  2	
 ...
114
Filtering
Set	
  Membership:	
  Cuckoo	
  Filter	
  [1]
Key	
  Highlights	
  
Add	
  and	
  remove	
  items	
  dynamic...
115
Filtering
Set	
  Membership
[1]	
  R.	
  Pagh	
  and	
  F.	
  Rodler.	
  Cuckoo	
  hashing.	
  Journal	
  of	
  Algori...
Dele@on	
  
Item	
  must	
  have	
  been	
  previously	
  inserted
116
Filtering
Set	
  Membership
Cuckoo Filter
Par@al-­‐...
117
Filtering
Set	
  Membership
Comparison
k	
  ➛ # hash functions, d	
  ➛ # partitions
118
Cardinality
Dis@nct	
  Elements	
  
Database	
  systems/Search	
  engines	
  
#	
  dis@nct	
  queries	
  
Network	
  m...
119
Previous	
  work	
  
Probabilis@c	
  coun@ng	
  [Flajolet	
  and	
  Mar@n,	
  1985]	
  
	
  LogLog	
  coun@ng	
  [Dura...
120
Comparison
N	
  ≤	
  109
Cardinality
121
Hyperloglog	
  
Apply	
  hash	
  func@on	
  h	
  to	
  every	
  element	
  in	
  a	
  mul@set	
  	
  
Cardinality	
  o...
122
Use	
  of	
  64-­‐bit	
  hash	
  func@on	
  	
  
Total	
  memory	
  requirement	
  5	
  *	
  2p	
  -­‐>	
  6	
  *	
  2...
123
Self-­‐Learning	
  Bitmap	
  (S-­‐bitmap)	
  [1]	
  
Achieve	
  constant	
  rela@ve	
  es@ma@on	
  errors	
  for	
  un...
124
Quantiles
Quan@les,	
  Histograms	
  
Large	
  set	
  of	
  real-­‐world	
  applica@ons	
  
Database	
  applica@ons	
 ...
125
q-­‐digest	
  [1]	
  
Groups	
  values	
  in	
  variable	
  size	
  buckets	
  of	
  almost	
  	
  
equal	
  weights	
...
126
q-­‐digest	
  
Building	
  a	
  q-­‐digest	
  
q-­‐digests	
  can	
  be	
  constructed	
  in	
  a	
  distributed	
  fa...
127
t-­‐digest	
  [1]	
  
Approxima@on	
  of	
  rank-­‐based	
  sta@s@cs	
  
Compute	
  quan@le	
  q	
  with	
  an	
  accu...
128
t-­‐digest	
  
Group	
  samples	
  into	
  sub-­‐sequences	
  
Smaller	
  sub-­‐sequences	
  near	
  the	
  ends	
  
L...
129
t-­‐digest	
  
Es@ma@ng	
  quan@le	
  via	
  interpola@on	
  
Sub-­‐sequences	
  contain	
  centroid	
  of	
  the	
  s...
130
Frequent Elements
Applica@ons	
  
Track	
  bandwidth	
  hogs	
  
Determine	
  popular	
  tourist	
  des@na@ons	
  
Ite...
Count-­‐min	
  Sketch	
  [1]	
  
A	
  two-­‐dimensional	
  array	
  counts	
  with	
  w	
  columns	
  and	
  d	
  rows	
  ...
Variants	
  of	
  Count-­‐min	
  Sketch	
  [1]	
  
Count-­‐Min	
  sketch	
  with	
  conserva@ve	
  update	
  (CU	
  sketch...
133
OPEN SOURCE TWITTER
YAHOO!
HUAWEI
streamDM^
SGD	
  Learner	
  and	
  Perceptron	
  
Naive	
  Bayes	
  
CluStream	
  
H...
134
Anomaly Detection
[1]	
  A.	
  S.	
  Willsky,	
  “A	
  survey	
  of	
  design	
  methods	
  for	
  failure	
  detec@on...
135
Very	
  rich	
  -­‐	
  over	
  150	
  yrs	
  -­‐	
  history	
  
Anomalies	
  are	
  contextual	
  in	
  nature
“DISCOR...
136
Anomaly Detection
CHARACTERISTICS
DIRECTION
Posi@ve,	
  Nega@ve
FREQUENCY
Reliability
WIDTH
Ac@onability
MAGNITUDE
Sev...
137
Anomaly Detection
COMMON	
  APPROACHES
DOMAINS
STATS	
  
MFG	
  
OPS
NOT	
  VALID
in	
  real-­‐life
Moving Averages	
 ...
138
Anomaly Detection
ROBUST	
  MEASURES
MEDIAN MAD [1] MCD [2] MVEE [3,4]
Median Absolute
Deviation
Minimum Covariance
De...
139
Anomaly Detection
Challenges
NOISE STATIONARITY
SEASONALITY TREND BREAKOUT
140
Anomaly Detection
Challenges	
  
Live	
  Data	
  
Mul@-­‐dimensional	
  	
  
Low	
  memory	
  footprint	
  	
  
Accura...
TYING	
  IT	
  ALL	
  TOGETHER
142
Real Time Architectures
For	
  the	
  streaming	
  world	
  
Lambda	
  
Run	
  computa@on	
  twice	
  in	
  different	
...
143
Lambda Architecture
Overview
144
Lambda Architecture
Batch	
  Layer	
  
Accurate	
  but	
  delayed	
  
HDFS/Mapreduce	
  
Fast	
  Layer	
  
Inexact	
  ...
145
Lambda Architecture
Characteris@cs	
  
During	
  Inges@on,	
  Data	
  is	
  cloned	
  into	
  two.	
  
One	
  goes	
  ...
146
Lambda Architecture
Challenges	
  
Inherently	
  Inefficient	
  
Data	
  is	
  replicated	
  twice	
  
Computa@on	
  is	...
147
Kappa Architecture
Streaming	
  is	
  everything	
  
Computa@on	
  is	
  expressed	
  in	
  a	
  topology	
  
Computa@...
148
Kappa Architecture
149
Kappa Architecture
Challenges	
  
Data	
  Reprocessing	
  could	
  be	
  very	
  expensive	
  
Code/Logic	
  Changes	
...
150
151
Observations
Lambda	
  is	
  complicated	
  and	
  inefficient	
  
Replica@on	
  of	
  Data	
  and	
  Computa@on	
  
Mul...
152
Observations
Computa@on	
  across	
  batch/real@me	
  is	
  similar	
  
Expressed	
  as	
  DAGS	
  
Run	
  parallely	
...
153
Real-Time Storage Requirements
Requirements	
  for	
  a	
  real-­‐Hme	
  storage	
  plakorm
Be	
  able	
  to	
  write	...
154
Apache BookKeeper - Stream Storage
A	
  storage	
  for	
  log	
  streams
Replicated,	
  durable	
  storage	
  of	
  lo...
155
Record
Smallest	
  I/O	
  and	
  Address	
  Unit
A	
  sequence	
  of	
  invisible	
  records	
  
A	
  record	
  is	
  ...
156
Logs
Two	
  Storage	
  PrimiHves
Ledger:	
  A	
  finite	
  sequence	
  of	
  records.	
  
Stream:	
  An	
  infinite	
  s...
157
Ledger
Finite	
  sequence	
  of	
  records
Ledger:	
  A	
  finite	
  sequence	
  of	
  records	
  that	
  gets	
  termi...
158
Stream
Infinite	
  sequence	
  of	
  records
Stream:	
  An	
  unbounded,	
  infinite	
  sequence	
  of	
  records	
  
Ph...
159
Bookies
Stores	
  fragment	
  of	
  records
Bookie	
  -­‐	
  A	
  storage	
  server	
  to	
  store	
  data	
  records	...
160
Bookies
Stores	
  fragment	
  of	
  records
161
Tying it all together
A	
  typical	
  installaHon	
  of	
  Apache	
  BookKeeper
162
BookKeeper - Use Cases
Combine	
  messaging	
  and	
  storage
Stream	
  Storage	
  combines	
  the	
  func@onality	
  ...
163
BookKeeper in Real-Time Solution
Durable	
  Messaging,	
  Scalable	
  Compute	
  and	
  Stream	
  Storage
164
BookKeeper in Production
Enterprise	
  Grade	
  Stream	
  Storage
4+	
  years	
  at	
  Twiaer	
  and	
  Yahoo,	
  2+	
...
165
Companies using BookKeeper
Enterprise	
  Grade	
  Stream	
  Storage
166
Real Time is Messy and Unpredictable
Aggregation  
Systems
Messaging  
Systems
Result  
Engine
HDFS
Queryable  
Engines
167
Streamlio - Unified Architecture
Interactive  
  Querying
Storm  API
Trident/Apache  
Beam  
SQL
Application  
Builder
...
168
RESOURCES
Sketching	
  Algorithms
haps://www.cs.upc.edu/~gavalda/papers/portoschool.pdf	
  	
  
haps://mapr.com/blog/s...
169
Dhalion:	
  Self-­‐Regula@ng	
  VLDB’17
Twiaer	
  Heron:	
  Towards	
  Extensible	
  ICDE’17
Dhalion:	
  Self-­‐Regula...
170
Readings
FOCS’00	
  
Clustering Data Streams
SIGMOD’02	
  
Querying and mining data streams:
You only get one look
SIA...
171
Readings
SODA’10	
  
Coresets and Sketches for high dimensional
subspace approximation problems
SIGMOD’16	
  
Time Ada...
173
GET	
  IN	
  TOUCH
C O N T A C T 	
   U S
@arun_kejariwal	
  
@kramasamy,	
  	
  @sanjeerk	
  
@sijieg,	
  	
  @merlim...
E N J O Y T H E P R E S E N T A T I O N
The End
Modern real-time streaming architectures
Modern real-time streaming architectures
Upcoming SlideShare
Loading in …5
×

Modern real-time streaming architectures

5,584 views

Published on

In this tutorial we walk through state-of-the-art streaming systems, algorithms, and deployment architectures and cover the typical challenges in modern real-time big data platforms and offering insights on how to address them. We also discuss how advances in technology might impact the streaming architectures and applications of the future. Along the way, we explore the interplay between storage and stream processing and discuss future developments.

Published in: Technology
  • I've Saved Over $400 On Batteries! I can't believe how simple your reconditioning steps are! My old (and once dead) car batteries, cell phone battery, drill battery, camera battery and tons of other batteries are all reconditioned and working great again! Since starting your program I've saved over $400 on batteries! ➤➤ https://tinyurl.com/rtswhls
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Follow the link, new dating source: ❶❶❶ http://bit.ly/2F90ZZC ❶❶❶
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Dating for everyone is here: ♥♥♥ http://bit.ly/2F90ZZC ♥♥♥
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Get paid to post comments on Facebook - $25 per hour ♥♥♥ http://t.cn/AieXipTS
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Do This Simple 2-Minute Ritual To Loss 1 Pound Of Belly Fat Every 72 Hours ▲▲▲ https://tinyurl.com/y6qaaou7
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Modern real-time streaming architectures

  1. 1. A Tutorial Modern Real-Time Streaming Architectures Karthik  Ramasamy*,  Sanjeev  Kulkarni*,  Neng  Lu#,  Arun  Kejariwal^   and  Sijie  Guo* *Streamlio,  #Twi0er,  ^MZ
  2. 2. 2 MESSAGING STREAMING OPERATIONS DATA SKETECHES LAMBDA, KAPPA UNIFICATION OUTLINE
  3. 3. 3 Information Age Real  @me  is  key   Ká !
  4. 4. 4 Internet of Things (IoT) $1.9  T  in  value  by  2020  -­‐  Mfg  (15%),  Health  Care  (15%),  Insurance  (11%)   26  B  -­‐  75  B  units  [2,  3,  4,  5] Improve  opera@onal  efficiencies,  customer  experience,  new  business  modelsY Beacons:  Retailers  and  bank  branches   60M  units  market  by  2019  [6] Smart  buildings:    Reduce  energy  costs,  cut  maintenance  costs   Increase  safety  &  security Large  Market  Poten@al  
  5. 5. 5 The Future Biostamps [2] Mobile Sensor Network Exponential growth [1] [1]  hap://opensignal.com/assets/pdf/reports/2015_08_fragmenta@on_report.pdf   [2]  hap://www.ericsson.com/thinkingahead/networked_society/stories/#/film/mc10-­‐biostamp
  6. 6. 6 Intelligent Health Care Con@nuous  Monitoring   Tracking Movements Measure  effect  of  social   influences Google Lens Measure  glucose  level  in   tears Watch/Wristband Smart Textiles Skin  temperature   Perspira@on Ingestible Sensors Medica@on  compliance  [1] Heart  func@on ! !
  7. 7. 7 User Experience, Productivity Real  @me   Real-time Video Streams N E W S Drones Robotics I N D U S T R Y   $ 4 0   B   b y   2 0 2 0   [ 3 ] [2]
  8. 8. 8 Increasingly Connected World Internet of Things 30  B  connected  devices  by  2020 Health Care 153  Exabytes  (2013)  -­‐>  2314  Exabytes  (2020) Machine Data 40%  of  digital  universe  by  2020 Connected Vehicles Data  transferred  per  vehicle  per  month   4  MB  -­‐>  5  GB Digital Assistants (Predictive Analytics) $2B  (2012)  -­‐>  $6.5B  (2019)  [1]   Siri/Cortana/Google  Now Augmented/Virtual Reality $150B  by  2020  [2]   Oculus/HoloLens/Magic  Leap Ñ !+ >
  9. 9. 9 TO STREAMING
  10. 10. 10 Traditional Data Processing Challenges   Introduces  too  much  “decision  latency”   Responses  are  delivered  “aqer  the  fact”   Maximum  value  of  the  iden@fied  situa@on  is  lost   Decisions  are  made  on  old  and  stale  data   Data  at  Rest Store Analyze Act
  11. 11. 11 The New Era: Streaming Data/Fast Data Events  are  analyzed  and  processed  in  real-­‐@me  as  they  drive   Decisions  are  @mely,  contextual  and  based  on  fresh  data   Decision  latency  is  eliminated   Data  in  mo@on
  12. 12. 12 Real Time Use Cases Algorithmic  trading   Online  fraud  detec@on   Geo  fencing   Proximity/loca@on  tracking   Intrusion  detec@on  systems   Traffic  management Real  @me  recommenda@ons   Churn  detec@on   Internet  of  things   Social  media/data  analy@cs   Gaming  data  feed
  13. 13. 13 Requirements of Stream Processing In-stream Handle imperfections Predictable Performance Process  data  as  it  is   passes  by Delayed,  missing  and   out-­‐of-­‐order  data and  Repeatable and  Scalability I
  14. 14. 14 High level languages Integrate stored and streaming data Data safety and availability Process and respond SQL  or  DSL for  comparing  present   with  the  past and  Repeatable Applica@on  should  keep   at  high  volumes " # $ Requirements of Stream Processing
  15. 15. 15 Real Time Stack REAL TIME STACK Collectors s Compute J Messaging a Storage b
  16. 16. 16 of MESSAGING FRAMEWORKS An In-Depth
  17. 17. 17 Current Messaging Systems 01 02 03 04 05 06 07 08 ActiveMQ RabbitMQ Pulsar RocketMQ Azure Event Hub Google Pub-Sub Satori Kafka
  18. 18. 18 Why Apache Pulsar? Ordering   Guaranteed  ordering MulH-­‐tenancy   A  single  cluster  can  support  many   tenants  and  use  cases High  throughput   Can  reach  1.8  M  messages/s  in  a   single  parHHon Durability   Data  replicated  and  synced  to  disk Geo-­‐replicaHon   Out  of  box  support  for  geographically   distributed  applicaHons Unified  messaging  model   Support  both  Topic  &  Queue   semanHc  in  a  single  model Delivery  Guarantees   At  least  once,  at  most  once  and  effecHvely   once Low  Latency   Low  publish  latency  of  5ms  at  99pct Highly  scalable   Can  support  millions  of  topics
  19. 19. 19 Unified Messaging Model Producer  (X)   Producer  (Y) Topic  (T) Subscrip@on  (A)   Subscrip@on  (B)   Consumer  (A1) Consumer  (B2) Consumer  (B1) Consumer  (B3)
  20. 20. 20 Pulsar Producer PulsarClient client = PulsarClient.create( “http://broker.usw.example.com:8080”); Producer producer = client.createProducer( “persistent://my-property/us-west/my-namespace/my-topic”); // handles retries in case of failure producer.send("my-message".getBytes()); // Async version: producer.sendAsync("my-message".getBytes()).thenRun(() -> { // Message was persisted });
  21. 21. 21 Pulsar Consumer PulsarClient client = PulsarClient.create( "http://broker.usw.example.com:8080"); Consumer consumer = client.subscribe( "persistent://my-property/us-west/my-namespace/my-topic", "my-subscription-name"); while (true) { // Wait for a message Message msg = consumer.receive(); System.out.println("Received message: " + msg.getData()); // Acknowledge the message so that it can be deleted by broker consumer.acknowledge(msg); }
  22. 22. 22 Pulsar Architecture Pulsar  Broker  1 Pulsar  Broker  1 Pulsar  Broker  1 Bookie  1 Bookie  2 Bookie  3 Bookie  4 Bookie  5 Apache  BookKeeper Apache  Pulsar Producer   Consumer   Stateless  Serving BROKER   Clients interact only with brokers No state is stored in brokers BOOKIES   Apache BookKeeper as the storage Storage is append only Provides high performance, low latency Durability   No data loss. fsync before acknowledgement
  23. 23. 23 Pulsar Architecture Pulsar  Broker  1 Pulsar  Broker  1 Pulsar  Broker  1 Bookie  1 Bookie  2 Bookie  3 Bookie  4 Bookie  5 Apache  BookKeeper Apache  Pulsar Producer   Consumer   SeparaHon  of  Storage  and  Serving SERVING Brokers can be added independently Traffic can be shifted quickly across brokers STORAGE   Bookies can be added independently New bookies will ramp up traffic quickly
  24. 24. 24 Pulsar Architecture Clients CLIENTS Lookup correct broker through service discovery Establish connections to brokers Enforce authentication/authorization during connection establishment Establish producer/consumer session Reconnect with backoff strategy Dispatcher Load  Balancer Managed   Ledger CacheGlobal  Replica@on Producer   Consumer  Service   Discovery   Pulsar   Broker Bookie
  25. 25. 25 Pulsar Architecture Message  Dispatching DISPATCHER End-to-end async message processing Messages relayed across producers, bookies and consumers with no copies Pooled reference count buffers Dispatcher Load  Balancer Managed   Ledger CacheGlobal  Replica@on Producer   Consumer  Service   Discovery   Pulsar   Broker Bookie MANAGED LEDGER Abstraction of single topic storage Cache recent messages
  26. 26. 26 Pulsar Architecture Geo  ReplicaHon GEO REPLICATION Asynchronous replication Integrated in the broker message flow Simple configuration to add/remove regions Topic  (T1) Topic  (T1) Topic  (T1) Subscrip@on   (S1) Subscrip@on   (S1) Producer     (P1) Consumer     (C1) Producer     (P3) Producer     (P2) Consumer     (C2) Data  Center  A Data  Center  B Data  Center  C
  27. 27. 27 Pulsar Use Cases - Message Queue Online  Events Topic  (T) Worker  1 Worker  2 Decouple  Online/Offline Topic  (T) Worker  3 MESSAGE QUEUES Decouple online or background High availability Reliable data transport NoHficaHons Long  running  tasks Low  latency     publish
  28. 28. 28 Pulsar Use Cases - Feedback System Event Topic  (T) Propagate  States Controller Topic  (T) Serving  System Serving  System Serving  System FEEDBACK SYSTEM Coordinate large number of machines Propagate states Examples State propagation Personalization Ad-systems Feedback Updates
  29. 29. 29 Pulsar in Production 3+  years   Serves  2.3  million  topics   100  billion  messages/day   Average  latency  <  5  ms   99%  15  ms  (strong  durability  guarantees)   Zero  data  loss   80+  applica@ons   Self  served  provisioning   Full-­‐mesh  cross-­‐datacenter  replica@on  -­‐   8+  data  centers
  30. 30. 30 Companies using Pulsar
  31. 31. 31 of STREAMING FRAMEWORKS An In-Depth
  32. 32. 32 Current Streaming Frameworks 01 02 03 04 05 06 07 08 Beam S-Store Spark Flink Heron Storm Apex KAFKA STREAMS
  33. 33. 33 Apache Beam Promises   Abstrac@ng  the  Computa@on   Express  Computa@on   Expressive  Windowing/Triggering   Incremental  Processing  for  late  data   Selectable  Engine   Select  criteria   Latency   Resource  Cost   Supported  Engines   Google  DataFlow   Apache  Spark,  Apache  Flink,  Apache  Apex
  34. 34. 34 Apache Beam ComputaHon  AbstracHon   All  Data  is  4  tuple   Key,  Value   Event  Time   Window  the  tuple  belongs   Core  Operators   ParDo   User  supplied  DoFn   Emits  Zero  or  more  elements   GroupByKey   Groups  tuples  by  keys  in  the  window
  35. 35. 35 Apache Beam Windowing   Window  Assignment   Fixed(a.k.a.  Tumbling),  Sliding  Session   Pluggable   Window  Merging   Happens  during  GroupByKey   Window  Triggering   Discard/Accumulate/Retract
  36. 36. 36 Apache Beam Challenges   Mul@ple  Layers   API  vs  Execu@on   Troubleshoo@ng  complexi@es   Need  higher  level  APIs   Mul@ple  efforts  on  their  way   Other  Cloud  Vendor  Buy-­‐in?   Azure/AWS?
  37. 37. 37 IBM S-Store Promises   Combine  Stream  Processing  and  Transac@ons   Extended  an  OLTP  engine(H-­‐Store)  adding   Tuple  Ordering   Windowing   Push-­‐based  processing   Exactly  Once  Seman@cs
  38. 38. 38 IBM S-Store Data  and  Processing  Model     Tuples  grouped  into  Atomic  Batches   Grouping  of  Non-­‐overlapping  tuples   Treated  like  a  Transac@on   Atomic  Batches  belong  to  one  Stream   Processing  is  modeled  as  a  DAG   DAG  nodes  consume  one  or  more  streams  and  possibly  output  more   Node  logic  is  treated  as  a  Transac@on  
  39. 39. 39 IBM S-Store Exactly  Once  Guarantees   Strong   Inputs  and  Outputs  are  logged  at  every  DAG  node   On  component  failure,  the  log  is  replayed  from  snapshot   Weak   Distributed  Snapsho{ng
  40. 40. 40 IBM S-Store Challenges   Throughput   Non  OLTP  processing  is  much  slower  compared  to  modern  systems   Scalability   Mul@-­‐Node  s@ll  in  research  (2016)
  41. 41. 41 Heron Terminology Topology Directed  acyclic  graph     ver@ces  =  computa@on,  and     edges  =  streams  of  data  tuples Spouts Sources  of  data  tuples  for  the  topology   Examples  -­‐  Pulsar/Ka}a/MySQL/Postgres Bolts Process  incoming  tuples,  and  emit  outgoing  tuples   Examples  -­‐  filtering/aggrega@on/join/any  func@on , %
  42. 42. 42 Heron Topology % % % % % Spout 1 Spout 2 Bolt 1 Bolt 2 Bolt 3 Bolt 4 Bolt 5
  43. 43. 43 Heron Topology - Physical Execution % % % % % Spout 1 Spout 2 Bolt 1 Bolt 2 Bolt 3 Bolt 4 Bolt 5 %% %% %% %% %%
  44. 44. 44 Heron Groupings 01 02 03 04 Shuffle Grouping Random distribution of tuples Fields Grouping Group tuples by a field or multiple fields All Grouping Replicates tuples to all tasks Global Grouping Send the entire stream to one task / . - ,
  45. 45. 45 Heron Topology - Physical Execution % % % % % Spout 1 Spout 2 Bolt 1 Bolt 2 Bolt 3 Bolt 4 Bolt 5 %% %% %% %% %% Shuffle Grouping Shuffle Grouping Fields Grouping Fields Grouping Fields Grouping Fields Grouping
  46. 46. 46 Writing Heron Topologies Procedural - Low Level API Directly  write  your  spouts  and   bolts Functional - Mid Level API Use  of  maps,  flat  maps,  transform,  windows Declarative - SQL (coming) Use  of  declara@ve  language  -­‐  specify  what  you   want,  system  will  figure  it  out. , %
  47. 47. 47 Heron Design Goals Efficiency   Reduce  resource  consump@on Support  for  diverse  workloads   Throughput  vs  latency  sensi@ve Support  for  mulHple  semanHcs   Atmost  once,  Atleast  once,   Effec@vely  once NaHve  MulH-­‐Language  Support   C++,  Java,  Python Task  IsolaHon   Ease  of  debug-­‐ability/isola@on/profiling   Support  for  back  pressure   Topologies  should  be  self  adjus@ng Use  of  containers   Runs  in  schedulers  -­‐  Kubernetes  &  DCOS  &   many  more MulH-­‐level  APIs   Procedural,  Func@onal  and  Declara@ve  for   diverse  applica@ons Diverse  deployment  models   Run  as  a  service  or  pure  library
  48. 48. 48 Heron Architecture Scheduler Topology 1 Topology 2 Topology N Topology Submission
  49. 49. 49 Topology Master Monitoring of containers Gateway for metrics Assigns role
  50. 50. 50 Topology Architecture Topology Master ZK Cluster Stream Manager I1 I2 I3 I4 Stream Manager I1 I2 I3 I4 Logical Plan, Physical Plan and Execution State Sync Physical Plan DATA CONTAINER DATA CONTAINER Metrics Manager Metrics Manager Metrics Manager Health Manager MASTER CONTAINER
  51. 51. 51 Stream Manager Routes tuples Implements backpressure Ack management
  52. 52. 52 Stream Manager Sample  topology % % S1 B2 B3 % B4
  53. 53. 53 Stream Manager Physical  execuHon S1 B2 B3 Stream Manager Stream Manager Stream Manager Stream Manager S1 B2 B3 B4 S1 B2 B3 S1 B2 B3 B4 B4
  54. 54. 54 Stream Manager Backpressure Spout based back pressureTCP backpressure Stage by stage back pressure
  55. 55. 55 Stream Manager Backpressure TCP  based  backpressure Slows upstream and downstream instances S1 B2 B3 Stream Manager Stream Manager Stream Manager Stream Manager S1 B2 B3 B4 S1 B2 B3 S1 B2 B3 B4 B4
  56. 56. 56 Stream Manager Backpressure Spout  based  backpressure S1 S1 S1S1S1 S1 S1S1 B2 B3 Stream Manager Stream Manager Stream Manager Stream Manager B2 B3 B4 B2 B3 B2 B3 B4 B4
  57. 57. 57 Heron Instance Runs only one task (spout/bolt) Exposes Heron API Collects several metrics API G
  58. 58. 58 Heron Instance Stream Manager Metrics Manager Gateway Thread Task Execution Thread data-in queue data-out queue metrics-out queue
  59. 59. 59 Companies using Heron
  60. 60. 60 AND of STREAM PROCESSING APPLICATIONS
  61. 61. 61 Pulsar Operations Reac@ng  to  Failures   Brokers   Bookies   Common  Issues   Consumer  Backlog   I/O  Priori@za@on  and  Throaling   Mul@-­‐Tenancy
  62. 62. 62 Reacting to Failures - Brokers Brokers  don’t  have  durable  state   Easily  replaceable   Topics  are  immediately  reassigned  to  healthy  brokers   Expanding  capacity   Simply  add  new  broker  node   If  other  brokers  are  overloaded,  traffic  will  be  automa@cally  assigned   Load  manager   Monitor  traffic  load  on  all  brokers  (CPU,  memory,  network,  topics)   Ini@ally  place  topics  to  least  loaded  brokers   Reassign  topics  when  a  broker  is  overloaded
  63. 63. 63 Reacting to Failures - Bookies When  a  bookie  fails,  brokers  will  immediately  con@nue  on  other  bookies   Auto-­‐Recovery  mechanism  will  re-­‐establish  the  replica@on  factor  in  background   If  a  bookie  keeps  giving  errors  or  @meouts,  it  will  be  “quaran@ned”   Not  considered  for  new  ledgers  for  some  period  of  @me
  64. 64. 64 Consumer Backlog Metrics  are  available  to  make  assessments:   When  problem  started   How  big  is  backlog?  Messages?  Disk  space?     How  fast  is  draining?   What’s  the  ETA  to  catch  up  with  publishers?   Establish  where  is  the  boaleneck   Applica@on  is  not  fast  enough   Disk  read  IO
  65. 65. 65 I/O Prioritization and Throttling Priori@ze  access  to  IO   During  an  outage  many  tenants  might  try  to  drain  backlog  as  fast  as  they  can   Read  IO  becomes  the  boaleneck   Throaling  can  be  used  to  priori@ze  draining:   Cri@cal  use  cases  can  recover  quickly   Fewer  concurrent  readers  lead  to  higher  throughput   Once  they  catch  up,  message  will  be  dispatched  from  cache
  66. 66. 66 Enforcing Multi-Tenancy Ensure  tenants  don’t  cause  performance  issues  on  other  tenants   Backlog  quotas   Soq-­‐Isola@on   Flow  control   Throaling   In  cases  when  user  behavior  is  triggering  performance  degrada@on   Hard-­‐isola@on  as  a  last  resource  for  quick  reac@on  while  proper  fix  is  deployed   Isolate  tenant  on  a  subset  of  brokers   Can  be  also  applied  at  the  BookKeeper  level
  67. 67. 67 Heron @Twitter LARGEST  CLUSTER 100’s  of  TOPOLOGIES BILLIONS  OF  MESSAGES100’s  OF  TERABYTESREDUCED  INCIDENTS GOOD  NIGHT  SLEEP 3X - 5X reduction in resource usage
  68. 68. 68 Heron Deployment Topology 1 Topology 2 Topology N Heron Tracker Heron VIZ Heron Web ZK Cluster Aurora Services Observability
  69. 69. 69 Heron Visualization
  70. 70. 70 Heron Use Cases Monitoring Real  Time     Machine  Learning Ads Real  Time  Trends Product  Safety Real  Time  Business   Intelligence
  71. 71. 71 Heron Topology Complexity
  72. 72. 72 Heron Topology Scale CONTAINERS - 1 TO 600 INSTANCES - 10 TO 6000
  73. 73. 73 Heron Happy Facts :) v  No  more  pages  during  midnight  for  Heron  team   Ø  Very  rare  incidents  for  Heron  customer  teams   ü  Easy  to  debug  during  incident  for  quick  turn  around   §  Reduced  resource  u@liza@on  saving  cost
  74. 74. 74 Heron Developer Issues 01 02 Container  resource  allocaHon Parallelism  tuning
  75. 75. 75 Heron Operational Issues 01 02 03 Slow Hosts Network Issues Data Skew / . - 04 Load Variations , 05 SLA Violations /
  76. 76. 76 Slow Hosts Memory Parity Errors Impending Disk Failures Lower GHZ
  77. 77. 77 Network Issues Network Slowness Network Partitioning G
  78. 78. 78 Network Slowness 01 02 03 Delays  processing Data  is  accumulaHng Timelines  of  results   Is  affected
  79. 79. 79 Data Skew Multiple Keys Several  keys  map  into  single   instance  and  their  count  is   high Single Key Single   key   maps   into   a   instance   and  its  count  is  high H C
  80. 80. 80 Load Variations Spikes Sudden  surge  of  data  -­‐  short   lived  vs  last  for  several   minutes Daily Patterns Predictable  change  in  traffic H C
  81. 81. 81 Self Regulating Streaming Systems Automate  Tuning SLO     Maintenance Self  RegulaHng   Streaming  Systems Tuning Manual,  @me-­‐consuming  and   error-­‐prone   task   of   tuning   various   systems   knobs   to   achieve  SLOs SLO Maintenance  of  SLOs  in  the  face  of   unpredictable   load   varia@ons   and   hardware   or   soqware   performance   degrada@on Self  RegulaHng  Streaming  Systems System   that   adjusts   itself   to   the   environmental   changes   and  con@nue  to  produce  results
  82. 82. 82 Self Regulating Streaming Systems Self tuning Self stabilizing Self healing G ! Several tuning knobs Time consuming tuning phase The system should take as input an SLO and automatically configure the knobs. The system should react to external shocks a n d a u t o m a t i c a l l y reconfigure itself Stream jobs are long running Load variations are common The system should identify internal faults and attempt to recover from them System performance affected by hardware or software delivering degraded quality of service
  83. 83. 83 Enter Dhalion Dhalion periodically executes well-specified policies that optimize execution based on some objective. We created policies that dynamically provision resources in the presence of load variations and auto-tune streaming applications so that a throughput SLO is met. Dhalion is a policy based framework integrated into Heron
  84. 84. 84 Dhalion Policy Framework Symptom Detector 1 Symptom Detector 2 Symptom Detector 3 Symptom Detector N .... Diagnoser 1 Diagnoser 2 Diagnoser M .... Resolver Invocation D iagnosis 1 Diagnosis 2 D iagnosis M Symptom 1 Symptom 2 Symptom 3 Symptom N Symptom Detection Diagnosis Generation Resolution Resolver 1 Resolver 2 Resolver M .... Resolver Selection Metrics
  85. 85. 85 Dynamic Resource Provisioning Policy This  policy  reacts  to  unexpected   load  varia@ons  (workload  spikes) Goal Goal  is  to  scale  up  and  scale  down   the  topology  resources  as  needed  -­‐   while   keeping   the   topology   in   a   steady  state  where  back  pressure  is   not  observed H C Policy
  86. 86. 86 Dynamic Resource Provisioning Pending Tuples Detector Backpressure Detector Processing Rate Skew Detector Resource Over provisioning Diagnoser Resource Under Provisioning Diagnoser Data Skew Diagnoser Resolver Invocation Diagnosis Symptoms Symptom Detection Diagnosis Generation Resolution Metrics Slow Instances Diagnoser Bolt  Scale     Down  Resolver Bolt  Scale     Up  Resolver Data  Skew   Resolver Restart   Instances   Resolver ImplementaHon
  87. 87. 87 Dynamic Resource Provisioning Tweet Spout Tweet Spout Tweet Spout % % % % Splitter Spout Splitter Spout Counter Bolt Counter Bolt 100  |  20 100  |  20 processing  rate  (tps)  |  queue  size  (#tuples) Steady  State
  88. 88. 88 Dynamic Resource Provisioning Tweet Spout Tweet Spout Tweet Spout % % % % Splitter Spout Splitter Spout Counter Bolt Counter Bolt 150  |  80 150  |  80 processing  rate  (tps)  |  queue  size  (#tuples) Under  provisioning
  89. 89. 89 Dynamic Resource Provisioning Tweet Spout Tweet Spout Tweet Spout % % % % Splitter Spout Splitter Spout Counter Bolt Counter Bolt 100  |  20 100  |  20 processing  rate  (tps)  |  queue  size  (#tuples) Steady  State
  90. 90. 90 Dynamic Resource Provisioning Tweet Spout Tweet Spout Tweet Spout % % % % Splitter Spout Splitter Spout Counter Bolt Counter Bolt 50  |  05 50  |  80 processing  rate  (tps)  |  queue  size  (#tuples) Slow  Instance
  91. 91. 91 Dynamic Resource Provisioning Tweet Spout Tweet Spout Tweet Spout % % % % Splitter Spout Splitter Spout Counter Bolt Counter Bolt 100  |  20 100  |  20 processing  rate  (tps)  |  queue  size  (#tuples) Steady  State
  92. 92. 92 Dynamic Resource Provisioning Tweet Spout Tweet Spout Tweet Spout % % % % Splitter Spout Splitter Spout Counter Bolt Counter Bolt 50  |  05 150  |  80 processing  rate  (tps)  |  queue  size  (#tuples) Data  Skew
  93. 93. 93 Experimental Setup % % Spout Splitter Bolt Counter Bolt Shuffle Grouping Fields Grouping Microsoq  HDInsight   Intel  Xeon  ES-­‐2673  CPU@2.40  GHz   28  GB  of  Memory Throughput  of  Spouts  (No.  Of   tuples  emiaed  over  1  min)   Throughput  of  Bolts  (No.  of  tuples   emiaed  over  1  min)   Number  of  Heron  Instances   provisioned Hardware  and  Soqware  Configura@on Evalua@on  Metrics
  94. 94. 94 Dynamic Provisioning Profile 0.00 0.20 0.40 0.60 0.80 1.00 1.20 1.40 0 10 20 30 40 50 60 70 80 90 100 110 120 Normalized Throughput Time (in minutes) Spout Splitter Bolt Counter Bolt Scale Down Scale Up S1 S2 S3 The Dynamic Resource Provisioning Policy is able to adjust the topology resources on-the-fly when workload spikes occur. The policy can correctly detect and resolve bottlenecks even on multi- stage topologies where backpressure is gradually propagated from one stage of the topology to another. 0 5 10 15 0 20 40 60 80 100 120 Number of Bolts Time (in minutes) Splitter Bolt Counter Bolt Heron Instances are gradually scaled up and down according to the input load
  95. 95. Streaming and One-Pass Algorithms
  96. 96. DATA  CHARACTERISTICS UNBOUNDED
  97. 97. DATA  CHARACTERISTICS UNORDERED          Varying  Skew                  Event  Hme                  Processing  Hme          Correctness          Completeness
  98. 98. 98 Neklix  ,  HBO,  Hulu,  YouTube,  DailymoHon,  ESPN,  Sling  TV Satori,  Facebook  Live,  Periscope SpoHfy,  Pandora,  Apple  Music,  Tidal Amazon  Twitch,  YouTube  Gaming,  Microsom  Mixer
  99. 99. KEY  CHARACTERISTICS LOW  LATENCY
  100. 100. HIGH  VELOCITY  DATA 100 KEY  CHARACTERISTICS In-Order O(1) Storage O(n) Time ONE PASS
  101. 101. 101 DISTRIBUTED COMPUTATION SCALE OUT ROBUST CHARACTERISTICS FAULT TOLERANCE KEY
  102. 102. 102 DATA SKETCHES Early  work The space complexity of approximating the frequency moments Counting Frequent Elements [Misra and Gries, 1982] Flajolet and Martin 1985] Computing on Data Streams [Henzinger et al. 1998] [Alon et al. 1996] Counting [Morris, 1977] Median of a sequence [Munro and Paterson, 1980] Membership [Bloom, 1970]
  103. 103. DATA  SKETCHES UNIQUE   FILTER   COUNT   HISTOGRAM   QUANTILE   MOMENTS   TOP-­‐K
  104. 104. ADVANCED    DATA  SKETCHES RANDOM  PROJECTIONS,  FREQUENT  DIRECTIONS          Dimensionality  Reduc@on   RANDOMIZED  NUMERICAL  ALGEBRA            Matrix  mul@ply   GRAPHS          Summarize  adjacency                    Connec@vity,  k-­‐connec@vity,  Spanners,  Sparsifica@on   GEOMETRIC          Diameter,  Lp  distances,  Min-­‐cost  matchings          Informa@on  Distances,  e.g.,  Hellinger  distance   SKETCHING  SKETCHES          Tes@ng  independence
  105. 105. 105 1 2 3 5 4 SAMPLING A/B   Tes@ng FILTERING Set   membership CORRELATION Fraud   Detec@on QUANTILES Network     Analysis CARDINALITY Site  Audience   Analysis Applications
  106. 106. 106 6 7 8 10 9 MOMENTS Database FREQUENT     ELEMENTS Trending   Hashtags CLUSTERING Medical   Imaging ANOMALY     DETECTION Sensor  Networks SUBSEQUENCES Traffic   Analysis Applications ! ! " "
  107. 107. 107 Sampling [1]  J.  S.  Viaer.  Random  Sampling  with  a  Reservoir.  ACM  Transac@ons  on  Mathema@cal  Soqware,  Vol.  11(1):37–57,  March  1985. Obtain  a  representa@ve  sample  from  a  data  stream   Maintain  dynamic  sample   A  data  stream  is  a  con@nuous  process   Not  known  in  advance  how  many  points  may  elapse  before  an  analyst  may  need  to  use  a   representa@ve  sample    Reservoir  sampling  [1]   Probabilis@c  inser@ons  and  dele@ons  on  arrival  of  new  stream  points   Probability  of  successive  inser@on  of  new  points  reduces  with  progression  of  the  stream   An  unbiased  sample  contains  a  larger  and  larger  frac@on  of  points  from  the  distant  history  of  the  stream   Prac@cal  perspec@ve   Data  stream  may  evolve  and  hence,  the  majority  of  the  points  in  the  sample  may  represent  the   stale  history
  108. 108. 108 Sampling Sliding  window  approach  (sample  size  k,  window  width  n)   Sequence-­‐based     Replace  expired  element  with  newly  arrived  element     Disadvantage:  highly  periodic   Chain-­‐sample  approach     Select  element  ith  with  probability  Min(i,n)/n   Select  uniformly  at  random  an  index  from  [i+1,  i+n]  of  the  element  which  will  replace  the  ith  item    Maintain  k  independent  chain  samples    Timestamp-­‐based     #  elements  in  a  moving  window  may  vary  over  @me   Priority-­‐sample  approach 3 5 1 4 6 2 8 5 2 3 5 4 2 2 5 0 9 8 4 6 7 3 3 5 1 4 6 2 8 5 2 3 5 4 2 2 5 0 9 8 4 6 7 3 3 5 1 4 6 2 8 5 2 3 5 4 2 2 5 0 9 8 4 6 7 3 3 5 1 4 6 2 8 5 2 3 5 4 2 2 5 0 9 8 4 6 7 3 [1]  B.  Babcock.  Sampling  From  a  Moving  Window  Over  Streaming  Data.  In  Proceedings  of  SODA,  2002.
  109. 109. 109 Sampling [1]  C.  C.  Aggarwal.On  Biased  Reservoir  Sampling  in  the  presence  of  Stream  Evolu@on.  in  Proceedings  of  VLDB,  2006.  Biased  Reservoir  Sampling  [1]   Use  a  temporal  bias  func@on  -­‐  recent  points  have  higher  probability  of     being  represented  in  the  sample  reservoir   Memory-­‐less  bias  func@ons   Future  probability  of  retaining  a  current  point  in  the  reservoir  is  independent  of  its  past  history  or     arrival  @me     Probability  of  an  rth  point  belonging  to  the  reservoir  at  the  @me  t  is  propor@onal  to  the  bias  func@on         Exponen@al  bias  func@ons  for  rth  data  point  at  @me  t,                                                                        where,  r  ≤  t,    λ  ∈  [0,  1]  is  the     bias  rate   Maximum  reservoir  requirement  R(t)  is  bounded
  110. 110. 110 Filtering Set  Membership          Determine,  with  some  false  probability,  if  an  item  in  a  data  stream  has          been  seen  before   Databases  (e.g.,  speed  up  semi-­‐join  opera@ons),  Caches,  Routers,  Storage  Systems   Reduce  space  requirement  in  probabilis@c  rou@ng  tables   Speedup  longest-­‐prefix  matching  of  IP  addresses   Encode  mul@cast  forwarding  informa@on  in  packets   Summarize  content  to  aid  collabora@ons  in  overlay  and  peer-­‐to-­‐peer  networks   Improve  network  state  management  and  monitoring
  111. 111. 111 Filtering Set  Membership Applica@on  to  hyphena@on     programs   Early  UNIX  spell  checkers [1]  Illustra@on  borrowed  from  hap://www.eecs.harvard.edu/~michaelm/postscripts/im2005b.pdf [1]
  112. 112. 112 Filtering Set  Membership Natural  generaliza@on  of  hashing     False  posi@ves  are  possible       No  false  nega@ves   No  dele@ons  allowed   For  false  posi@ve  rate  ε,  #  hash  func@ons  =  log2(1/ε) where,  n  =  #  elements,                              k  =  #  hash  func@ons                            m  =  #  bits  in  the  array
  113. 113. 113 Filtering Set  Membership Minimizing  false  posi@ve  rate  ε  w.r.t.  k  [1]   k  =  ln  2  *  (m/n)   ε  =  (1/2)k  ≈  (0.6185)m/n   1.44  *  log2(1/ε)  bits  per  item   Independent  of  item  size  or  #  items   Informa@on-­‐theore@c  minimum:  log2(1/ε)  bits  per  item   44%  overhead     X  =  #  0  bits where [1]  A.  Broder  and  M.  Mitzenmacher.  Network  Applica@ons  of  Bloom  Filters:  A  Survey.  In  Internet  Mathema@cs  Vol.  1,  No.  4,  2005.
  114. 114. 114 Filtering Set  Membership:  Cuckoo  Filter  [1] Key  Highlights   Add  and  remove  items  dynamically     For  false  posi@ve  rate  ε  <  3%,  more  space  efficient  than  Bloom  filter   Higher  performance  than  Bloom  filter  for  many  real  workloads   Asympto@cally  worse  performance  than  Bloom  filter    Min  fingerprint  size  α  log  (#  entries  in  table)   Overview     Stores  only  a  fingerprint  of  an  item  inserted   Original  key  and  value  bits  of  each  item  not  retrievable     Set  membership  query  for  item  x:  search  hash  table  for  fingerprint  of  x [1]  Fan  et  al.,  Cuckoo  Filter:  Prac@cally  Beaer  Than  Bloom.  In  Proceedings  of  the  10th  ACM  Interna@onal  on  Conference  on  Emerging  Networking  Experiments  and  Technologies,  2014.
  115. 115. 115 Filtering Set  Membership [1]  R.  Pagh  and  F.  Rodler.  Cuckoo  hashing.  Journal  of  Algorithms,  51(2):122-­‐144,  2004.   [2]  Illustra@on  borrowed  from  “Fan  et  al.,  Cuckoo  Filter:  Prac@cally  Beaer  Than  Bloom.  In  Proceedings  of  the  10th  ACM  Interna@onal  on  Conference  on  Emerging  Networking  Experiments  and  Technologies,  2014.” [2] Illustra@on  of  Cuckoo  hashing  [2] Cuckoo Hashing [1] High  space  occupancy   Prac@cal  implementa@ons:  mul@ple  items/bucket   Example  uses:  Soqware-­‐based  Ethernet  switches   Cuckoo Filter [2] Uses  a  mul@-­‐way  associa@ve  Cuckoo  hash  table   Employs  par@al-­‐key  cuckoo  hashing   Store  fingerprint  of  an  item     Relocate  exis@ng  fingerprints  to  their  alterna@ve  loca@ons [2]
  116. 116. Dele@on   Item  must  have  been  previously  inserted 116 Filtering Set  Membership Cuckoo Filter Par@al-­‐key  cuckoo  hashing   Fingerprint  hashing  ensures  uniform   distribu@on  of  items  in  the  table   Length  of  fingerprint  <<  Size  of  h1  or  h2   Possible   to   have   mul@ple   entries   of   a   fingerprint  in  a  bucket Alternate   bucket Significantly  shorter   than  h1  and  h2
  117. 117. 117 Filtering Set  Membership Comparison k  ➛ # hash functions, d  ➛ # partitions
  118. 118. 118 Cardinality Dis@nct  Elements   Database  systems/Search  engines   #  dis@nct  queries   Network  monitoring  applica@ons   Natural  language  processing   #  dis@nct  mo@fs  in  a  DNA  sequence   #  dis@nct  elements  of  RFID/sensor  networks
  119. 119. 119 Previous  work   Probabilis@c  coun@ng  [Flajolet  and  Mar@n,  1985]    LogLog  coun@ng  [Durand  and  Flajolet,  2003]    HyperLogLog  [Flajolet  et  al.,  2007]    Sliding  HyperLogLog  [Chabchoub  and  Hebrail,  2010]    HyperLogLog  in  Prac@ce  [Heule  et  al.,  2013]    Self-­‐Organizing  Bitmap  [Chen  and  Cao,  2009]    Discrete  Max-­‐Count  [Ting,  2014]   Sequence  of  sketches  forms  a  Markov  chain  when  h  is  a  strong  universal  hash   Es@mate  cardinality  using  a  mar@ngale Cardinality
  120. 120. 120 Comparison N  ≤  109 Cardinality
  121. 121. 121 Hyperloglog   Apply  hash  func@on  h  to  every  element  in  a  mul@set     Cardinality  of  mul@set  is  2max(ϱ)  where  0ϱ-­‐11  is  the  bit  paaern  observed  at  the   beginning  of  a  hash  value   Above  suffers  with  high  variance   Employ  stochas@c  averaging   Par@@on  input  stream  into  m  sub-­‐streams  Si  using  first  p  bits  of  hash  values  (m  =  2p) where Cardinality
  122. 122. 122 Use  of  64-­‐bit  hash  func@on     Total  memory  requirement  5  *  2p  -­‐>  6  *  2p,  where  p  is  the  precision   Empirical  bias  correc@on   Uses  empirically  determined  data  for  cardinali@es  smaller  than  5m  and  uses  the  unmodified  raw   es@mate  otherwise   Sparse  representa@on    For  n≪m,  store  an  integer  obtained  by  concatena@ng  the  bit  paaerns  for  idx  and  ϱ(w)    Use  variable  length  encoding  for  integers  that  uses  variable  number  of  bytes  to  represent   integers    Use  difference  encoding  -­‐  store  the  difference  between  successive  elements   Other  op@miza@ons  [1,  2] Hypeloglog Optimizations [1]  hap://druid.io/blog/2014/02/18/hyperloglog-­‐op@miza@ons-­‐for-­‐real-­‐world-­‐systems.html   [2]  hap://an@rez.com/news/75 Cardinality
  123. 123. 123 Self-­‐Learning  Bitmap  (S-­‐bitmap)  [1]   Achieve  constant  rela@ve  es@ma@on  errors  for  unknown  cardinali@es  in  a  wide   range,  say  from  10s  to  >106    Bitmap  obtained  via  adap@ve  sampling  process    Bits  corresponding  to  the  sampled  items  are  set  to  1    Sampling  rates  are  learned  from  #  dis@nct  items  already  passed  and  reduced  sequen@ally  as   more  bits  are  set  to  1    For  given  input  parameters  Nmax  and  es@ma@on  precision  ε,  size  of  bit  mask   For  r  =  1  -­‐2ε2(1+ε2)-­‐1  and  sampling  probability  pk  =  m  (m+1-­‐k)-­‐1(1+ε2)rk,  where  k  ∈  [1,m]            Rela@ve  error  ≣  ε [1]  Chen  et  al.  “Dis@nct  coun@ng  with  a  self-­‐learning  bitmap”.  Journal  of  the  American  Sta@s@cal  Associa@on,  106(495):879–890,  2011. Cardinality
  124. 124. 124 Quantiles Quan@les,  Histograms   Large  set  of  real-­‐world  applica@ons   Database  applica@ons   Sensor  networks   Opera@ons   Proper@es     Provide  tunable  and  explicit  guarantees  on  the  precision  of  approxima@on   Single  pass   Early  work   [Greenwald  and  Khanna,  2001]  -­‐  worst  case  space  requirement     [Arasu  and  Manku,  2004]  -­‐  sliding  window  based  model,  worst  case  space     requirement  
  125. 125. 125 q-­‐digest  [1]   Groups  values  in  variable  size  buckets  of  almost     equal  weights   Unlike  a  tradi@onal  histogram,  buckets  can  overlap   Key  features   Detailed  informa@on  about  frequent  values  preserved   Less  frequent  values  lumped  into  larger  buckets   Using  message  of  size  m,  answer  within  an  error  of   Except  root  and  leaf  nodes,  a  node  v  ∈  q-­‐digest  iff Max  signal   value #  Elements Compression   Factor Complete  binary  tree [1]  Shrivastava  et  al.,  Medians  and  Beyond:  New  Aggrega@on  Techniques  for  Sensor  Networks.  In  Proceedings  of  SenSys,  2004. Quantiles
  126. 126. 126 q-­‐digest   Building  a  q-­‐digest   q-­‐digests  can  be  constructed  in  a  distributed  fashion    Merge  q-­‐digests Quantiles
  127. 127. 127 t-­‐digest  [1]   Approxima@on  of  rank-­‐based  sta@s@cs   Compute  quan@le  q  with  an  accuracy  rela@ve  to  max(q,  1-­‐q)   Compute  hybrid  sta@s@cs  such  as  trimmed  sta@s@cs   Key  features   Robust  with  respect  to  highly  skewed  distribu@ons   Independent  of  the  range  of  input  values  (unlike  q-­‐digest)   Rela@ve  error  is  bounded   Non-­‐equal  bin  sizes   Few  samples  contribute  to  the  bins  corresponding  to  the  extreme  quan@les   Merging  independent  t-­‐digests   Reasonable  accuracy [1]T.  Dunning  and  O.  Ertl,  “”Compu@ng  Extremely  Accurate  Quan@les  using  t-­‐digests”,  2017.  haps://github.com/tdunning/t-­‐digest/blob/master/docs/t-­‐digest-­‐paper/histo.pdf   Quantiles
  128. 128. 128 t-­‐digest   Group  samples  into  sub-­‐sequences   Smaller  sub-­‐sequences  near  the  ends   Larger  sub-­‐sequences  in  the  middle   Scaling  func@on   Mapping  k  is  monotonic   k(0)  =  1  and  k(1)  =  δ   k-­‐size  of  each  subsequence  <  1   No@onal   Index Compression   parameterQuan@le Quantiles
  129. 129. 129 t-­‐digest   Es@ma@ng  quan@le  via  interpola@on   Sub-­‐sequences  contain  centroid  of  the  samples   Es@mate  the  boundaries  of  the  sub-­‐sequences   Error   Scales  quadra@cally  in  #  samples   Small  #  samples  in  the  sub-­‐sequences  near  q=0  and  q=1  improves  accuracy   Lower  accuracy  in  the  middle  of  the  distribu@on     Larger  sub-­‐sequences  in  the  middle   Two  flavors   Progressive  merging  (buffering  based)  and  clustering  variant Quantiles
  130. 130. 130 Frequent Elements Applica@ons   Track  bandwidth  hogs   Determine  popular  tourist  des@na@ons   Itemset  mining   Entropy  es@ma@on     Compressed  sensing     Search  log  mining   Network  data  analysis   DBMS  op@miza@on  
  131. 131. Count-­‐min  Sketch  [1]   A  two-­‐dimensional  array  counts  with  w  columns  and  d  rows   Each  entry  of  the  array  is  ini@ally  zero   d   hash   func@ons   are   chosen   uniformly   at   random   from   a   pairwise   independent   family   Update    For  a  new  element  i,  for  each  row  j  and  k  =  hj(i),  increment  the  kth  column  by  one   Point  query                                                                                                          where,  sketch  is  the  table   Parameters 131 ),( δε }1{}1{:,,1 wnhh d ……… → ! ! " # # $ = ε e w ! ! " # # $ = δ 1 lnd [1]  Cormode,  Graham;  S.  Muthukrishnan  (2005).  "An  Improved  Data  Stream  Summary:  The  Count-­‐Min  Sketch  and  its  Applica@ons".  J.  Algorithms  55:  29–38. Frequent Elements
  132. 132. Variants  of  Count-­‐min  Sketch  [1]   Count-­‐Min  sketch  with  conserva@ve  update  (CU  sketch)    Update  an  item  with  frequency  c    Avoid  unnecessary  upda@ng  of  counter  values  =>  Reduce  over-­‐es@ma@on  error    Prone  to  over-­‐es@ma@on  error  on  low-­‐frequency  items     Lossy  Conserva@ve  Update  (LCU)  -­‐  SWS    Divide  stream  into  windows    At  window  boundaries,  ∀  1  ≤  i  ≤  w,  1  ≤  j  ≤  d,  decrement  sketch[i,j]  if  0  <  sketch[i,j]  ≤   132 [1]  Cormode,  G.  2009.  Encyclopedia  entry  on  ’Count-­‐MinSketch’.  In  Encyclopedia  of  Database  Systems.  Springer.,  511–516. Frequent Elements
  133. 133. 133 OPEN SOURCE TWITTER YAHOO! HUAWEI streamDM^ SGD  Learner  and  Perceptron   Naive  Bayes   CluStream   Hoeffding  Decision  Trees   Bagging   Stream  KM++ DATA  SKETCHES * Unique   Quan@le   Histogram   Sampling   Theta  Sketches   Tuple   Sketches   Most  Frequent ALGEBIRD# Filtering   Unique   Histogram   Most  Frequent *  haps://datasketches.github.io/   #  haps://github.com/twiaer/algebird   ^  hap://huawei-­‐noah.github.io/streamDM/   **  haps://github.com/jiecchen/StreamLib StreamLib**
  134. 134. 134 Anomaly Detection [1]  A.  S.  Willsky,  “A  survey  of  design  methods  for  failure  detec@on  systems,”  Automa@ca,  vol.  12,  pp.  601–611,  1976. Very  rich  -­‐  over  150  yrs  -­‐  history   Manufacturing      Sta@s@cs    Econometrics,  Financial  engineering    Signal  processing    Control  systems,  Autonomous  systems  -­‐  fault  detec@on  [1]    Networking    Computa@onal  biology  (e.g.,  microarray  analysis)    Computer  vision
  135. 135. 135 Very  rich  -­‐  over  150  yrs  -­‐  history   Anomalies  are  contextual  in  nature “DISCORDANT observations may be defined as those which present the appearance of differing in respect of their law of frequency from other observations with which they ale combined. In the treatment of such observations there is great diversity between authorities ; but this discordance of methods may be reduced by the following reflection. Different methods are adapted to different hypotheses about the cause of a discordant observation; and different hypotheses are true, or appropriate, according as the subject-matter, or the degree of accuracy required, is different.” F. Y. Edgeworth, “On Discordant Observations”, 1887. Anomaly Detection
  136. 136. 136 Anomaly Detection CHARACTERISTICS DIRECTION Posi@ve,  Nega@ve FREQUENCY Reliability WIDTH Ac@onability MAGNITUDE Severity Global Local # 6 ! !
  137. 137. 137 Anomaly Detection COMMON  APPROACHES DOMAINS STATS   MFG   OPS NOT  VALID in  real-­‐life Moving Averages   SMA, EWMA, PEWMA Assumption   Normal Distribution PARAMS WIDTH   DECAY Rule Based   µ ± σ Stone  1868   Glaisher  1872   Edgeworth  1887   Stewart  1920   Irwin  1925   Jeffreys  1932   Rider  1933
  138. 138. 138 Anomaly Detection ROBUST  MEASURES MEDIAN MAD [1] MCD [2] MVEE [3,4] Median Absolute Deviation Minimum Covariance Determinant Minimum Volume Enclosing Ellipsoid [1]P.  J.  Rousseeuw  and  C.  Croux,  “Alterna@ves  to  the  Median  Absolute  Devia@on”,  1993.   [2]  hap://onlinelibrary.wiley.com/wol1/doi/10.1002/wics.61/abstract   [3]  P.  J.  Rousseeuw  and  A.  M.  Leroy.,“Robust  Regression  and  Outlier  Detec@on”,  1987.   [4]  M.  J.Todda  and  E.  A.  Yıldırım  ,  “On  Khachiyan's  algorithm  for  the  computa@on  of  minimum-­‐volume  enclosing  ellipsoids”,  2007.
  139. 139. 139 Anomaly Detection Challenges NOISE STATIONARITY SEASONALITY TREND BREAKOUT
  140. 140. 140 Anomaly Detection Challenges   Live  Data   Mul@-­‐dimensional     Low  memory  footprint     Accuracy  vs.  Speed  trade-­‐off   Encoding  the  context   Data  types   Video,  Audio,  Text   Data  veracity   Wearables   Smart  ci@es,  Connected  Home,  Internet  of  Things
  141. 141. TYING  IT  ALL  TOGETHER
  142. 142. 142 Real Time Architectures For  the  streaming  world   Lambda   Run  computa@on  twice  in  different  systems   Kappa   Run  computa@on  once  
  143. 143. 143 Lambda Architecture Overview
  144. 144. 144 Lambda Architecture Batch  Layer   Accurate  but  delayed   HDFS/Mapreduce   Fast  Layer   Inexact  but  fast   Storm/Ka}a   Query  Merge  Layer   Merge  results  from  batch  and  fast  layers  at  query  @me  
  145. 145. 145 Lambda Architecture Characteris@cs   During  Inges@on,  Data  is  cloned  into  two.   One  goes  to  the  batch  layer   Other  goes  to  the  fast  layer   Processing  done  at  two  layers   Expressed  as  Map-­‐reduces  in  batch  layer   Expressed  as  topologies  in  the  speed  layer
  146. 146. 146 Lambda Architecture Challenges   Inherently  Inefficient   Data  is  replicated  twice   Computa@on  is  replicated  twice   Opera@onally  Inefficient   Maintain  both  batch  and  streaming  systems   Tune  topologies  for  both  systems
  147. 147. 147 Kappa Architecture Streaming  is  everything   Computa@on  is  expressed  in  a  topology   Computa@on  is  mostly  done  only  once  when  the   data  arrives   Data  moves  into  permanent  storage  
  148. 148. 148 Kappa Architecture
  149. 149. 149 Kappa Architecture Challenges   Data  Reprocessing  could  be  very  expensive   Code/Logic  Changes   Either  Data  needs  to  be  brought  back  from  Storage  to  the  bus   Or  Computa@on  needs  to  be  expressed  to  run  on  bulk-­‐storage   Historic  Analysis   How  to  do  data  analy@cs  over  all  of  last  years  data
  150. 150. 150
  151. 151. 151 Observations Lambda  is  complicated  and  inefficient   Replica@on  of  Data  and  Computa@on   Mul@ple  Systems  to  operate  and  tune   Kappa  is  too  simplis@c   Data  reprocessing  too  expensive   Historical  analysis  not  possible
  152. 152. 152 Observations Computa@on  across  batch/real@me  is  similar   Expressed  as  DAGS   Run  parallely  on  the  cluster   Intermediate  results  need  not  be  materialized   Func@onal/Declara@ve  APIs   Storage  is  the  key   Messaging/Storage  are  two  faces  of  the  same  coin   They  serve  the  same  data
  153. 153. 153 Real-Time Storage Requirements Requirements  for  a  real-­‐Hme  storage  plakorm Be  able  to  write  and  read  streams  of  records  with  low  latency,  storage  durability   Data  storage  should  be  durable,  consistent  and  fault  tolerant   Enable  clients  to  stream  or  tail  ledgers  to  propagate  data  as  they’re  wriaen   Store  and  provide  access  to  both  historic  and  real-­‐@me  data
  154. 154. 154 Apache BookKeeper - Stream Storage A  storage  for  log  streams Replicated,  durable  storage  of  log  streams   Provide  fast  tailing/streaming  facility   Op@mized  for  immutable  data   Low-­‐latency  durability   Simple  repeatable  read  consistency   High  write  and  read  availability  
  155. 155. 155 Record Smallest  I/O  and  Address  Unit A  sequence  of  invisible  records   A  record  is  sequence  of  bytes   The  smallest  I/O  unit,  as  well  as  the  unit  of  address   Each  record  contains  sequence  numbers  for  addressing
  156. 156. 156 Logs Two  Storage  PrimiHves Ledger:  A  finite  sequence  of  records.   Stream:  An  infinite  sequence  of  records.  
  157. 157. 157 Ledger Finite  sequence  of  records Ledger:  A  finite  sequence  of  records  that  gets  terminated   A  client  explicitly  close  it   A  writer  who  writes  records  into  it  has  crashed.
  158. 158. 158 Stream Infinite  sequence  of  records Stream:  An  unbounded,  infinite  sequence  of  records   Physically  comprised  of  mul@ple  ledgers
  159. 159. 159 Bookies Stores  fragment  of  records Bookie  -­‐  A  storage  server  to  store  data  records   Ensemble:  A  group  of  bookies  storing  the  data  records  of  a  ledger   Individual  bookies  store  fragments  of  ledgers
  160. 160. 160 Bookies Stores  fragment  of  records
  161. 161. 161 Tying it all together A  typical  installaHon  of  Apache  BookKeeper
  162. 162. 162 BookKeeper - Use Cases Combine  messaging  and  storage Stream  Storage  combines  the  func@onality  of  streaming  and  storage WAL  -­‐  Write  Ahead  Log Message  Store Object  Store SnapshotsStream  Processing
  163. 163. 163 BookKeeper in Real-Time Solution Durable  Messaging,  Scalable  Compute  and  Stream  Storage
  164. 164. 164 BookKeeper in Production Enterprise  Grade  Stream  Storage 4+  years  at  Twiaer  and  Yahoo,  2+  years  at  Salesforce   Mul@ple  use  cases  from  messaging  to  storage   Database  replica@on,  Message  store,  Stream  compu@ng  …   600+  bookies  in  one  single  cluster   Data  is  stored  from  days  to  a  year   Millions  of  log  streams   1  trillion  records/day,  17  PB/day
  165. 165. 165 Companies using BookKeeper Enterprise  Grade  Stream  Storage
  166. 166. 166 Real Time is Messy and Unpredictable Aggregation   Systems Messaging   Systems Result   Engine HDFS Queryable   Engines
  167. 167. 167 Streamlio - Unified Architecture Interactive    Querying Storm  API Trident/Apache   Beam   SQL Application   Builder Pulsar   API BK/   HDFS   API Kubernetes Metadata   Management Operational   Monitoring Chargeback Security   Authentication Quota   Management Rules     Engine Kafka   API
  168. 168. 168 RESOURCES Sketching  Algorithms haps://www.cs.upc.edu/~gavalda/papers/portoschool.pdf     haps://mapr.com/blog/some-­‐important-­‐streaming-­‐algorithms-­‐you-­‐should-­‐know-­‐about/   haps://gist.github.com/debasishg/8172796 Synopses  for  Massive  Data:  Samples,  Histograms,  Wavelets,     Sketches Data  Streams:  Models  and  Algorithms Charu  Aggarwal   hap://www.springer.com/us/book/9780387287591 Data  Streams:  Algorithms  and  ApplicaHons Muthu  Muthukrishnan   hap://algo.research.googlepages.com/eight.ps Graph  Streaming  Algorithms A.  McGregor G.  Cormode,  M.  Garofalakis  and  P.  J.  Haas Sketching  as  a  Tool    for  Numerical  Linear  Algebra D.  Woodruff
  169. 169. 169 Dhalion:  Self-­‐Regula@ng  VLDB’17 Twiaer  Heron:  Towards  Extensible  ICDE’17 Dhalion:  Self-­‐Regula@ng  VLDB’17 MillWheel:    VLDB’13 Readings Stream  Processing  in  Heron Stream  Processing  in  Heron Streaming  Engines Twiaer  Heron:  Stream  SIGMOD’15 Processing  at  scale Fault-­‐Tolerant  Stream  Processing  at  Internet  Scale The  Dataflow  Model:  A  Prac@cal  VLDB’15 Approach  to  Balancing  Correctness,     Latency  and  Cost  in  Massive-­‐Scale,   Unbounded  Out-­‐of-­‐Order  Data  Processing Anomaly  Detec@on  in  Strata  San  Jose’17 Real-­‐Time  Data  Streams  Using  Heron
  170. 170. 170 Readings FOCS’00   Clustering Data Streams SIGMOD’02   Querying and mining data streams: You only get one look SIAM Journal of Computing’09   Stream Order and Order Statistics: Quantile Estimation in Random-Order Streams PODS’02   Models and Issues in Data Stream Systems SIGMOD’07   Statistical Analysis of Sketch Estimators PODS’10   An optimal algorithm for the distinct elements problem
  171. 171. 171 Readings SODA’10   Coresets and Sketches for high dimensional subspace approximation problems SIGMOD’16   Time Adaptive Sketches (Ada-Sketches) for Summarizing Data Streams SOSR’17   Heavy-Hitter Detection Entirely in the Data Plane PODS’12   Graph Sketches: Sparsification, Spanners, and Subgraphs Arxiv’16   Coresets and Sketches ACM Queue’17   Data Sketching: The approximate approach is often faster and more efficient
  172. 172. 173 GET  IN  TOUCH C O N T A C T   U S @arun_kejariwal   @kramasamy,    @sanjeerk   @sijieg,    @merlimat   @nlu90 karthik@stremlio.io   arun_kejariwal@acm.org
  173. 173. E N J O Y T H E P R E S E N T A T I O N The End

×