SlideShare a Scribd company logo
1 of 175
A Tutorial
Modern Real-Time Streaming
Architectures
Karthik	
  Ramasamy*,	
  Sanjeev	
  Kulkarni*,	
  Neng	
  Lu#,	
  Arun	
  Kejariwal^	
  
and	
  Sijie	
  Guo*
*Streamlio,	
  #Twi0er,	
  ^MZ
2
MESSAGING STREAMING OPERATIONS
DATA SKETECHES LAMBDA, KAPPA UNIFICATION
OUTLINE
3
Information Age
Real	
  @me	
  is	
  key	
  
Ká !
4
Internet of Things (IoT)
$1.9	
  T	
  in	
  value	
  by	
  2020	
  -­‐	
  Mfg	
  (15%),	
  Health	
  Care	
  (15%),	
  Insurance	
  (11%)	
  
26	
  B	
  -­‐	
  75	
  B	
  units	
  [2,	
  3,	
  4,	
  5]
Improve	
  opera@onal	
  efficiencies,	
  customer	
  experience,	
  new	
  business	
  modelsY
Beacons:	
  Retailers	
  and	
  bank	
  branches	
  
60M	
  units	
  market	
  by	
  2019	
  [6]
Smart	
  buildings:	
  	
  Reduce	
  energy	
  costs,	
  cut	
  maintenance	
  costs	
  
Increase	
  safety	
  &	
  security
Large	
  Market	
  Poten@al	
  
5
The Future
Biostamps [2]
Mobile
Sensor Network
Exponential growth [1]
[1]	
  hap://opensignal.com/assets/pdf/reports/2015_08_fragmenta@on_report.pdf	
  
[2]	
  hap://www.ericsson.com/thinkingahead/networked_society/stories/#/film/mc10-­‐biostamp
6
Intelligent Health Care
Con@nuous	
  Monitoring	
  
Tracking Movements
Measure	
  effect	
  of	
  social	
  
influences
Google Lens
Measure	
  glucose	
  level	
  in	
  
tears
Watch/Wristband
Smart Textiles
Skin	
  temperature	
  
Perspira@on
Ingestible Sensors
Medica@on	
  compliance	
  [1]
Heart	
  func@on
!
!
7
User Experience, Productivity
Real	
  @me	
  
Real-time Video Streams
N E W S
Drones Robotics
I N D U S T R Y 	
  
$ 4 0 	
   B 	
   b y 	
   2 0 2 0 	
   [ 3 ]
[2]
8
Increasingly Connected World
Internet of Things
30	
  B	
  connected	
  devices	
  by	
  2020
Health Care
153	
  Exabytes	
  (2013)	
  -­‐>	
  2314	
  Exabytes	
  (2020)
Machine Data
40%	
  of	
  digital	
  universe	
  by	
  2020
Connected Vehicles
Data	
  transferred	
  per	
  vehicle	
  per	
  month	
  
4	
  MB	
  -­‐>	
  5	
  GB
Digital Assistants (Predictive Analytics)
$2B	
  (2012)	
  -­‐>	
  $6.5B	
  (2019)	
  [1]	
  
Siri/Cortana/Google	
  Now
Augmented/Virtual Reality
$150B	
  by	
  2020	
  [2]	
  
Oculus/HoloLens/Magic	
  Leap
Ñ
!+
>
9
TO STREAMING
10
Traditional Data Processing
Challenges	
  
Introduces	
  too	
  much	
  “decision	
  latency”	
  
Responses	
  are	
  delivered	
  “aqer	
  the	
  fact”	
  
Maximum	
  value	
  of	
  the	
  iden@fied	
  situa@on	
  is	
  lost	
  
Decisions	
  are	
  made	
  on	
  old	
  and	
  stale	
  data	
  
Data	
  at	
  Rest
Store Analyze Act
11
The New Era: Streaming Data/Fast Data
Events	
  are	
  analyzed	
  and	
  processed	
  in	
  real-­‐@me	
  as	
  they	
  drive	
  
Decisions	
  are	
  @mely,	
  contextual	
  and	
  based	
  on	
  fresh	
  data	
  
Decision	
  latency	
  is	
  eliminated	
  
Data	
  in	
  mo@on
12
Real Time Use Cases
Algorithmic	
  trading	
  
Online	
  fraud	
  detec@on	
  
Geo	
  fencing	
  
Proximity/loca@on	
  tracking	
  
Intrusion	
  detec@on	
  systems	
  
Traffic	
  management
Real	
  @me	
  recommenda@ons	
  
Churn	
  detec@on	
  
Internet	
  of	
  things	
  
Social	
  media/data	
  analy@cs	
  
Gaming	
  data	
  feed
13
Requirements of Stream Processing
In-stream Handle imperfections Predictable Performance
Process	
  data	
  as	
  it	
  is	
  
passes	
  by
Delayed,	
  missing	
  and	
  
out-­‐of-­‐order	
  data
and	
  Repeatable and	
  Scalability
I
14
High level languages Integrate stored and
streaming data
Data safety and
availability
Process and respond
SQL	
  or	
  DSL
for	
  comparing	
  present	
  
with	
  the	
  past
and	
  Repeatable
Applica@on	
  should	
  keep	
  
at	
  high	
  volumes
" # $
Requirements of Stream Processing
15
Real Time Stack
REAL TIME
STACK
Collectors
s
Compute
J
Messaging
a
Storage
b
16
of MESSAGING FRAMEWORKS
An In-Depth
17
Current Messaging Systems
01 02 03 04
05 06 07 08
ActiveMQ RabbitMQ Pulsar RocketMQ
Azure
Event Hub
Google
Pub-Sub
Satori Kafka
18
Why Apache Pulsar?
Ordering	
  
Guaranteed	
  ordering
MulH-­‐tenancy	
  
A	
  single	
  cluster	
  can	
  support	
  many	
  
tenants	
  and	
  use	
  cases
High	
  throughput	
  
Can	
  reach	
  1.8	
  M	
  messages/s	
  in	
  a	
  
single	
  parHHon
Durability	
  
Data	
  replicated	
  and	
  synced	
  to	
  disk
Geo-­‐replicaHon	
  
Out	
  of	
  box	
  support	
  for	
  geographically	
  
distributed	
  applicaHons
Unified	
  messaging	
  model	
  
Support	
  both	
  Topic	
  &	
  Queue	
  
semanHc	
  in	
  a	
  single	
  model
Delivery	
  Guarantees	
  
At	
  least	
  once,	
  at	
  most	
  once	
  and	
  effecHvely	
  
once
Low	
  Latency	
  
Low	
  publish	
  latency	
  of	
  5ms	
  at	
  99pct
Highly	
  scalable	
  
Can	
  support	
  millions	
  of	
  topics
19
Unified Messaging Model
Producer	
  (X)	
  
Producer	
  (Y)
Topic	
  (T)
Subscrip@on	
  (A)	
  
Subscrip@on	
  (B)	
  
Consumer	
  (A1)
Consumer	
  (B2)
Consumer	
  (B1)
Consumer	
  (B3)
20
Pulsar Producer
PulsarClient client = PulsarClient.create(
“http://broker.usw.example.com:8080”);
Producer producer = client.createProducer(
“persistent://my-property/us-west/my-namespace/my-topic”);
// handles retries in case of failure
producer.send("my-message".getBytes());
// Async version:
producer.sendAsync("my-message".getBytes()).thenRun(() -> {
// Message was persisted
});
21
Pulsar Consumer
PulsarClient client = PulsarClient.create(
"http://broker.usw.example.com:8080");
Consumer consumer = client.subscribe(
"persistent://my-property/us-west/my-namespace/my-topic",
"my-subscription-name");
while (true) {
// Wait for a message
Message msg = consumer.receive();
System.out.println("Received message: " + msg.getData());
// Acknowledge the message so that it can be deleted by broker
consumer.acknowledge(msg);
}
22
Pulsar Architecture
Pulsar	
  Broker	
  1 Pulsar	
  Broker	
  1 Pulsar	
  Broker	
  1
Bookie	
  1 Bookie	
  2 Bookie	
  3 Bookie	
  4 Bookie	
  5
Apache	
  BookKeeper
Apache	
  Pulsar
Producer	
   Consumer	
  
Stateless	
  Serving
BROKER	
  
Clients interact only with brokers
No state is stored in brokers
BOOKIES	
  
Apache BookKeeper as the storage
Storage is append only
Provides high performance, low latency
Durability	
  
No data loss. fsync before acknowledgement
23
Pulsar Architecture
Pulsar	
  Broker	
  1 Pulsar	
  Broker	
  1 Pulsar	
  Broker	
  1
Bookie	
  1 Bookie	
  2 Bookie	
  3 Bookie	
  4 Bookie	
  5
Apache	
  BookKeeper
Apache	
  Pulsar
Producer	
   Consumer	
  
SeparaHon	
  of	
  Storage	
  and	
  Serving
SERVING
Brokers can be added independently
Traffic can be shifted quickly across brokers
STORAGE	
  
Bookies can be added independently
New bookies will ramp up traffic quickly
24
Pulsar Architecture
Clients
CLIENTS
Lookup correct broker through service discovery
Establish connections to brokers
Enforce authentication/authorization during
connection establishment
Establish producer/consumer session
Reconnect with backoff strategy
Dispatcher
Load	
  Balancer
Managed	
  
Ledger
CacheGlobal	
  Replica@on
Producer	
   Consumer	
  Service	
  
Discovery	
  
Pulsar	
  
Broker
Bookie
25
Pulsar Architecture
Message	
  Dispatching
DISPATCHER
End-to-end async message processing
Messages relayed across producers, bookies and
consumers with no copies
Pooled reference count buffers
Dispatcher
Load	
  Balancer
Managed	
  
Ledger
CacheGlobal	
  Replica@on
Producer	
   Consumer	
  Service	
  
Discovery	
  
Pulsar	
  
Broker
Bookie
MANAGED LEDGER
Abstraction of single topic storage
Cache recent messages
26
Pulsar Architecture
Geo	
  ReplicaHon
GEO REPLICATION
Asynchronous replication
Integrated in the broker message flow
Simple configuration to add/remove regions
Topic	
  (T1) Topic	
  (T1)
Topic	
  (T1)
Subscrip@on	
  
(S1)
Subscrip@on	
  
(S1)
Producer	
  	
  
(P1)
Consumer	
  	
  
(C1)
Producer	
  	
  
(P3)
Producer	
  	
  
(P2)
Consumer	
  	
  
(C2)
Data	
  Center	
  A Data	
  Center	
  B
Data	
  Center	
  C
27
Pulsar Use Cases - Message Queue
Online	
  Events Topic	
  (T)
Worker	
  1
Worker	
  2
Decouple	
  Online/Offline
Topic	
  (T)
Worker	
  3
MESSAGE QUEUES
Decouple online or background
High availability
Reliable data transport
NoHficaHons
Long	
  running	
  tasks
Low	
  latency	
  	
  
publish
28
Pulsar Use Cases - Feedback System
Event Topic	
  (T)
Propagate	
  States
Controller
Topic	
  (T)
Serving	
  System Serving	
  System Serving	
  System
FEEDBACK SYSTEM
Coordinate large number of machines
Propagate states
Examples
State propagation
Personalization
Ad-systems
Feedback
Updates
29
Pulsar in Production
3+	
  years	
  
Serves	
  2.3	
  million	
  topics	
  
100	
  billion	
  messages/day	
  
Average	
  latency	
  <	
  5	
  ms	
  
99%	
  15	
  ms	
  (strong	
  durability	
  guarantees)	
  
Zero	
  data	
  loss	
  
80+	
  applica@ons	
  
Self	
  served	
  provisioning	
  
Full-­‐mesh	
  cross-­‐datacenter	
  replica@on	
  -­‐	
  
8+	
  data	
  centers
30
Companies using Pulsar
31
of STREAMING FRAMEWORKS
An In-Depth
32
Current Streaming Frameworks
01 02 03 04
05 06 07 08
Beam S-Store Spark Flink
Heron Storm Apex
KAFKA
STREAMS
33
Apache Beam
Promises	
  
Abstrac@ng	
  the	
  Computa@on	
  
Express	
  Computa@on	
  
Expressive	
  Windowing/Triggering	
  
Incremental	
  Processing	
  for	
  late	
  data	
  
Selectable	
  Engine	
  
Select	
  criteria	
  
Latency	
  
Resource	
  Cost	
  
Supported	
  Engines	
  
Google	
  DataFlow	
  
Apache	
  Spark,	
  Apache	
  Flink,	
  Apache	
  Apex
34
Apache Beam
ComputaHon	
  AbstracHon	
  
All	
  Data	
  is	
  4	
  tuple	
  
Key,	
  Value	
  
Event	
  Time	
  
Window	
  the	
  tuple	
  belongs	
  
Core	
  Operators	
  
ParDo	
  
User	
  supplied	
  DoFn	
  
Emits	
  Zero	
  or	
  more	
  elements	
  
GroupByKey	
  
Groups	
  tuples	
  by	
  keys	
  in	
  the	
  window
35
Apache Beam
Windowing	
  
Window	
  Assignment	
  
Fixed(a.k.a.	
  Tumbling),	
  Sliding	
  Session	
  
Pluggable	
  
Window	
  Merging	
  
Happens	
  during	
  GroupByKey	
  
Window	
  Triggering	
  
Discard/Accumulate/Retract
36
Apache Beam
Challenges	
  
Mul@ple	
  Layers	
  
API	
  vs	
  Execu@on	
  
Troubleshoo@ng	
  complexi@es	
  
Need	
  higher	
  level	
  APIs	
  
Mul@ple	
  efforts	
  on	
  their	
  way	
  
Other	
  Cloud	
  Vendor	
  Buy-­‐in?	
  
Azure/AWS?
37
IBM S-Store
Promises	
  
Combine	
  Stream	
  Processing	
  and	
  Transac@ons	
  
Extended	
  an	
  OLTP	
  engine(H-­‐Store)	
  adding	
  
Tuple	
  Ordering	
  
Windowing	
  
Push-­‐based	
  processing	
  
Exactly	
  Once	
  Seman@cs
38
IBM S-Store
Data	
  and	
  Processing	
  Model	
  	
  
Tuples	
  grouped	
  into	
  Atomic	
  Batches	
  
Grouping	
  of	
  Non-­‐overlapping	
  tuples	
  
Treated	
  like	
  a	
  Transac@on	
  
Atomic	
  Batches	
  belong	
  to	
  one	
  Stream	
  
Processing	
  is	
  modeled	
  as	
  a	
  DAG	
  
DAG	
  nodes	
  consume	
  one	
  or	
  more	
  streams	
  and	
  possibly	
  output	
  more	
  
Node	
  logic	
  is	
  treated	
  as	
  a	
  Transac@on	
  
39
IBM S-Store
Exactly	
  Once	
  Guarantees	
  
Strong	
  
Inputs	
  and	
  Outputs	
  are	
  logged	
  at	
  every	
  DAG	
  node	
  
On	
  component	
  failure,	
  the	
  log	
  is	
  replayed	
  from	
  snapshot	
  
Weak	
  
Distributed	
  Snapsho{ng
40
IBM S-Store
Challenges	
  
Throughput	
  
Non	
  OLTP	
  processing	
  is	
  much	
  slower	
  compared	
  to	
  modern	
  systems	
  
Scalability	
  
Mul@-­‐Node	
  s@ll	
  in	
  research	
  (2016)
41
Heron Terminology
Topology
Directed	
  acyclic	
  graph	
  	
  
ver@ces	
  =	
  computa@on,	
  and	
  	
  
edges	
  =	
  streams	
  of	
  data	
  tuples
Spouts
Sources	
  of	
  data	
  tuples	
  for	
  the	
  topology	
  
Examples	
  -­‐	
  Pulsar/Ka}a/MySQL/Postgres
Bolts
Process	
  incoming	
  tuples,	
  and	
  emit	
  outgoing	
  tuples	
  
Examples	
  -­‐	
  filtering/aggrega@on/join/any	
  func@on
,
%
42
Heron Topology
%
%
%
%
%
Spout 1
Spout 2
Bolt 1
Bolt 2
Bolt 3
Bolt 4
Bolt 5
43
Heron Topology - Physical Execution
%
%
%
%
%
Spout 1
Spout 2
Bolt 1
Bolt 2
Bolt 3
Bolt 4
Bolt 5
%%
%%
%%
%%
%%
44
Heron Groupings
01 02 03 04
Shuffle Grouping
Random distribution of tuples
Fields Grouping
Group tuples by a field or
multiple fields
All Grouping
Replicates tuples to all tasks
Global Grouping
Send the entire stream to one
task
/
.
-
,
45
Heron Topology - Physical Execution
%
%
%
%
%
Spout 1
Spout 2
Bolt 1
Bolt 2
Bolt 3
Bolt 4
Bolt 5
%%
%%
%%
%%
%%
Shuffle Grouping
Shuffle Grouping
Fields Grouping
Fields Grouping
Fields Grouping
Fields Grouping
46
Writing Heron Topologies
Procedural - Low Level API
Directly	
  write	
  your	
  spouts	
  and	
  
bolts
Functional - Mid Level API
Use	
  of	
  maps,	
  flat	
  maps,	
  transform,	
  windows
Declarative - SQL (coming)
Use	
  of	
  declara@ve	
  language	
  -­‐	
  specify	
  what	
  you	
  
want,	
  system	
  will	
  figure	
  it	
  out.
,
%
47
Heron Design Goals
Efficiency	
  
Reduce	
  resource	
  consump@on
Support	
  for	
  diverse	
  workloads	
  
Throughput	
  vs	
  latency	
  sensi@ve
Support	
  for	
  mulHple	
  semanHcs	
  
Atmost	
  once,	
  Atleast	
  once,	
  
Effec@vely	
  once
NaHve	
  MulH-­‐Language	
  Support	
  
C++,	
  Java,	
  Python
Task	
  IsolaHon	
  
Ease	
  of	
  debug-­‐ability/isola@on/profiling	
  
Support	
  for	
  back	
  pressure	
  
Topologies	
  should	
  be	
  self	
  adjus@ng
Use	
  of	
  containers	
  
Runs	
  in	
  schedulers	
  -­‐	
  Kubernetes	
  &	
  DCOS	
  &	
  
many	
  more
MulH-­‐level	
  APIs	
  
Procedural,	
  Func@onal	
  and	
  Declara@ve	
  for	
  
diverse	
  applica@ons
Diverse	
  deployment	
  models	
  
Run	
  as	
  a	
  service	
  or	
  pure	
  library
48
Heron Architecture
Scheduler
Topology 1 Topology 2 Topology N
Topology
Submission
49
Topology Master
Monitoring of containers Gateway for metrics Assigns role
50
Topology Architecture
Topology Master
ZK
Cluster
Stream
Manager
I1 I2 I3 I4
Stream
Manager
I1 I2 I3 I4
Logical Plan,
Physical Plan and
Execution State
Sync Physical Plan
DATA CONTAINER DATA CONTAINER
Metrics
Manager
Metrics
Manager
Metrics
Manager
Health
Manager
MASTER
CONTAINER
51
Stream Manager
Routes tuples Implements backpressure Ack management
52
Stream Manager
Sample	
  topology
% %
S1 B2 B3
%
B4
53
Stream Manager
Physical	
  execuHon
S1 B2
B3
Stream
Manager
Stream
Manager
Stream
Manager
Stream
Manager
S1 B2
B3 B4
S1 B2
B3
S1 B2
B3 B4
B4
54
Stream Manager Backpressure
Spout based back pressureTCP backpressure Stage by stage back pressure
55
Stream Manager Backpressure
TCP	
  based	
  backpressure
Slows upstream and downstream instances
S1 B2
B3
Stream
Manager
Stream
Manager
Stream
Manager
Stream
Manager
S1 B2
B3 B4
S1 B2
B3
S1 B2
B3 B4
B4
56
Stream Manager Backpressure
Spout	
  based	
  backpressure
S1 S1
S1S1S1 S1
S1S1 B2
B3
Stream
Manager
Stream
Manager
Stream
Manager
Stream
Manager
B2
B3 B4
B2
B3
B2
B3 B4
B4
57
Heron Instance
Runs only one task (spout/bolt)
Exposes Heron API
Collects several metrics
API
G
58
Heron Instance
Stream
Manager
Metrics
Manager
Gateway
Thread
Task Execution
Thread
data-in queue
data-out queue
metrics-out queue
59
Companies using Heron
60
AND
of STREAM PROCESSING APPLICATIONS
61
Pulsar Operations
Reac@ng	
  to	
  Failures	
  
Brokers	
  
Bookies	
  
Common	
  Issues	
  
Consumer	
  Backlog	
  
I/O	
  Priori@za@on	
  and	
  Throaling	
  
Mul@-­‐Tenancy
62
Reacting to Failures - Brokers
Brokers	
  don’t	
  have	
  durable	
  state	
  
Easily	
  replaceable	
  
Topics	
  are	
  immediately	
  reassigned	
  to	
  healthy	
  brokers	
  
Expanding	
  capacity	
  
Simply	
  add	
  new	
  broker	
  node	
  
If	
  other	
  brokers	
  are	
  overloaded,	
  traffic	
  will	
  be	
  automa@cally	
  assigned	
  
Load	
  manager	
  
Monitor	
  traffic	
  load	
  on	
  all	
  brokers	
  (CPU,	
  memory,	
  network,	
  topics)	
  
Ini@ally	
  place	
  topics	
  to	
  least	
  loaded	
  brokers	
  
Reassign	
  topics	
  when	
  a	
  broker	
  is	
  overloaded
63
Reacting to Failures - Bookies
When	
  a	
  bookie	
  fails,	
  brokers	
  will	
  immediately	
  con@nue	
  on	
  other	
  bookies	
  
Auto-­‐Recovery	
  mechanism	
  will	
  re-­‐establish	
  the	
  replica@on	
  factor	
  in	
  background	
  
If	
  a	
  bookie	
  keeps	
  giving	
  errors	
  or	
  @meouts,	
  it	
  will	
  be	
  “quaran@ned”	
  
Not	
  considered	
  for	
  new	
  ledgers	
  for	
  some	
  period	
  of	
  @me
64
Consumer Backlog
Metrics	
  are	
  available	
  to	
  make	
  assessments:	
  
When	
  problem	
  started	
  
How	
  big	
  is	
  backlog?	
  Messages?	
  Disk	
  space?	
  	
  
How	
  fast	
  is	
  draining?	
  
What’s	
  the	
  ETA	
  to	
  catch	
  up	
  with	
  publishers?	
  
Establish	
  where	
  is	
  the	
  boaleneck	
  
Applica@on	
  is	
  not	
  fast	
  enough	
  
Disk	
  read	
  IO
65
I/O Prioritization and Throttling
Priori@ze	
  access	
  to	
  IO	
  
During	
  an	
  outage	
  many	
  tenants	
  might	
  try	
  to	
  drain	
  backlog	
  as	
  fast	
  as	
  they	
  can	
  
Read	
  IO	
  becomes	
  the	
  boaleneck	
  
Throaling	
  can	
  be	
  used	
  to	
  priori@ze	
  draining:	
  
Cri@cal	
  use	
  cases	
  can	
  recover	
  quickly	
  
Fewer	
  concurrent	
  readers	
  lead	
  to	
  higher	
  throughput	
  
Once	
  they	
  catch	
  up,	
  message	
  will	
  be	
  dispatched	
  from	
  cache
66
Enforcing Multi-Tenancy
Ensure	
  tenants	
  don’t	
  cause	
  performance	
  issues	
  on	
  other	
  tenants	
  
Backlog	
  quotas	
  
Soq-­‐Isola@on	
  
Flow	
  control	
  
Throaling	
  
In	
  cases	
  when	
  user	
  behavior	
  is	
  triggering	
  performance	
  degrada@on	
  
Hard-­‐isola@on	
  as	
  a	
  last	
  resource	
  for	
  quick	
  reac@on	
  while	
  proper	
  fix	
  is	
  deployed	
  
Isolate	
  tenant	
  on	
  a	
  subset	
  of	
  brokers	
  
Can	
  be	
  also	
  applied	
  at	
  the	
  BookKeeper	
  level
67
Heron @Twitter
LARGEST	
  CLUSTER
100’s	
  of	
  TOPOLOGIES
BILLIONS	
  OF	
  MESSAGES100’s	
  OF	
  TERABYTESREDUCED	
  INCIDENTS
GOOD	
  NIGHT	
  SLEEP
3X - 5X reduction in resource usage
68
Heron Deployment
Topology 1
Topology 2
Topology N
Heron
Tracker
Heron
VIZ
Heron
Web
ZK
Cluster
Aurora Services
Observability
69
Heron Visualization
70
Heron Use Cases
Monitoring
Real	
  Time	
  	
  
Machine	
  Learning
Ads
Real	
  Time	
  Trends
Product	
  Safety
Real	
  Time	
  Business	
  
Intelligence
71
Heron Topology Complexity
72
Heron Topology Scale
CONTAINERS - 1 TO 600 INSTANCES - 10 TO 6000
73
Heron Happy Facts :)
v	
  No	
  more	
  pages	
  during	
  midnight	
  for	
  Heron	
  team	
  
Ø	
  Very	
  rare	
  incidents	
  for	
  Heron	
  customer	
  teams	
  
ü	
  Easy	
  to	
  debug	
  during	
  incident	
  for	
  quick	
  turn	
  around	
  
§	
  Reduced	
  resource	
  u@liza@on	
  saving	
  cost
74
Heron Developer Issues
01 02
Container	
  resource	
  allocaHon
Parallelism	
  tuning
75
Heron Operational Issues
01 02 03
Slow Hosts Network Issues Data Skew
/
.
-
04
Load Variations
,
05
SLA Violations
/
76
Slow Hosts
Memory Parity Errors Impending Disk Failures Lower GHZ
77
Network Issues
Network Slowness
Network Partitioning
G
78
Network Slowness
01 02 03
Delays	
  processing
Data	
  is	
  accumulaHng
Timelines	
  of	
  results	
  
Is	
  affected
79
Data Skew
Multiple Keys
Several	
  keys	
  map	
  into	
  single	
  
instance	
  and	
  their	
  count	
  is	
  
high
Single Key
Single	
   key	
   maps	
   into	
   a	
   instance	
  
and	
  its	
  count	
  is	
  high
H
C
80
Load Variations
Spikes
Sudden	
  surge	
  of	
  data	
  -­‐	
  short	
  
lived	
  vs	
  last	
  for	
  several	
  
minutes
Daily Patterns
Predictable	
  change	
  in	
  traffic
H
C
81
Self Regulating Streaming Systems
Automate	
  Tuning
SLO	
  	
  
Maintenance
Self	
  RegulaHng	
  
Streaming	
  Systems
Tuning
Manual,	
  @me-­‐consuming	
  and	
  
error-­‐prone	
   task	
   of	
   tuning	
  
various	
   systems	
   knobs	
   to	
  
achieve	
  SLOs
SLO
Maintenance	
  of	
  SLOs	
  in	
  the	
  face	
  of	
  
unpredictable	
   load	
   varia@ons	
   and	
  
hardware	
   or	
   soqware	
   performance	
  
degrada@on
Self	
  RegulaHng	
  Streaming	
  Systems
System	
   that	
   adjusts	
   itself	
   to	
   the	
   environmental	
   changes	
  
and	
  con@nue	
  to	
  produce	
  results
82
Self Regulating Streaming Systems
Self tuning Self stabilizing Self healing
G !
Several tuning knobs
Time consuming tuning phase
The system should take
as input an SLO and
automatically configure
the knobs.
The system should
react to external shocks
a n d a u t o m a t i c a l l y
reconfigure itself
Stream jobs are long running
Load variations are common
The system should
identify internal faults
and attempt to recover
from them
System performance affected
by hardware or software
delivering degraded quality of
service
83
Enter Dhalion
Dhalion periodically executes
well-specified policies that
optimize execution based on
some objective.
We created policies that
dynamically provision resources
in the presence of load variations
and auto-tune streaming
applications so that a throughput
SLO is met.
Dhalion is a policy based
framework integrated into Heron
84
Dhalion Policy Framework
Symptom
Detector 1
Symptom
Detector 2
Symptom
Detector 3
Symptom
Detector N
....
Diagnoser 1
Diagnoser 2
Diagnoser M
....
Resolver
Invocation
D
iagnosis
1
Diagnosis 2
D
iagnosis
M
Symptom 1
Symptom 2
Symptom 3
Symptom N
Symptom
Detection
Diagnosis
Generation
Resolution
Resolver 1
Resolver 2
Resolver M
....
Resolver
Selection
Metrics
85
Dynamic Resource Provisioning
Policy
This	
  policy	
  reacts	
  to	
  unexpected	
  
load	
  varia@ons	
  (workload	
  spikes)
Goal
Goal	
  is	
  to	
  scale	
  up	
  and	
  scale	
  down	
  
the	
  topology	
  resources	
  as	
  needed	
  -­‐	
  
while	
   keeping	
   the	
   topology	
   in	
   a	
  
steady	
  state	
  where	
  back	
  pressure	
  is	
  
not	
  observed
H
C
Policy
86
Dynamic Resource Provisioning
Pending Tuples
Detector
Backpressure
Detector
Processing Rate
Skew Detector
Resource Over
provisioning
Diagnoser
Resource Under
Provisioning
Diagnoser
Data Skew
Diagnoser
Resolver
Invocation
Diagnosis
Symptoms
Symptom
Detection
Diagnosis
Generation
Resolution
Metrics
Slow Instances
Diagnoser
Bolt	
  Scale	
  	
  
Down	
  Resolver
Bolt	
  Scale	
  	
  
Up	
  Resolver
Data	
  Skew	
  
Resolver
Restart	
  
Instances	
  
Resolver
ImplementaHon
87
Dynamic Resource Provisioning
Tweet Spout
Tweet Spout
Tweet Spout
%
%
%
%
Splitter Spout
Splitter Spout Counter Bolt
Counter Bolt
100	
  |	
  20
100	
  |	
  20
processing	
  rate	
  (tps)	
  |	
  queue	
  size	
  (#tuples)
Steady	
  State
88
Dynamic Resource Provisioning
Tweet Spout
Tweet Spout
Tweet Spout
%
%
%
%
Splitter Spout
Splitter Spout Counter Bolt
Counter Bolt
150	
  |	
  80
150	
  |	
  80
processing	
  rate	
  (tps)	
  |	
  queue	
  size	
  (#tuples)
Under	
  provisioning
89
Dynamic Resource Provisioning
Tweet Spout
Tweet Spout
Tweet Spout
%
%
%
%
Splitter Spout
Splitter Spout Counter Bolt
Counter Bolt
100	
  |	
  20
100	
  |	
  20
processing	
  rate	
  (tps)	
  |	
  queue	
  size	
  (#tuples)
Steady	
  State
90
Dynamic Resource Provisioning
Tweet Spout
Tweet Spout
Tweet Spout
%
%
%
%
Splitter Spout
Splitter Spout Counter Bolt
Counter Bolt
50	
  |	
  05
50	
  |	
  80
processing	
  rate	
  (tps)	
  |	
  queue	
  size	
  (#tuples)
Slow	
  Instance
91
Dynamic Resource Provisioning
Tweet Spout
Tweet Spout
Tweet Spout
%
%
%
%
Splitter Spout
Splitter Spout Counter Bolt
Counter Bolt
100	
  |	
  20
100	
  |	
  20
processing	
  rate	
  (tps)	
  |	
  queue	
  size	
  (#tuples)
Steady	
  State
92
Dynamic Resource Provisioning
Tweet Spout
Tweet Spout
Tweet Spout
%
%
%
%
Splitter Spout
Splitter Spout Counter Bolt
Counter Bolt
50	
  |	
  05
150	
  |	
  80
processing	
  rate	
  (tps)	
  |	
  queue	
  size	
  (#tuples)
Data	
  Skew
93
Experimental Setup
% %
Spout Splitter Bolt Counter Bolt
Shuffle Grouping Fields Grouping
Microsoq	
  HDInsight	
  
Intel	
  Xeon	
  ES-­‐2673	
  CPU@2.40	
  GHz	
  
28	
  GB	
  of	
  Memory
Throughput	
  of	
  Spouts	
  (No.	
  Of	
  
tuples	
  emiaed	
  over	
  1	
  min)	
  
Throughput	
  of	
  Bolts	
  (No.	
  of	
  tuples	
  
emiaed	
  over	
  1	
  min)	
  
Number	
  of	
  Heron	
  Instances	
  
provisioned
Hardware	
  and	
  Soqware	
  Configura@on Evalua@on	
  Metrics
94
Dynamic Provisioning Profile
0.00
0.20
0.40
0.60
0.80
1.00
1.20
1.40
0 10 20 30 40 50 60 70 80 90 100 110 120
Normalized	Throughput
Time	(in	minutes)
Spout Splitter	Bolt Counter	Bolt
Scale	
Down	
Scale	Up	
S1
S2
S3
The Dynamic Resource
Provisioning Policy is able to
adjust the topology resources
on-the-fly when workload
spikes occur.
The policy can correctly detect and
resolve bottlenecks even on multi-
stage topologies where
backpressure is gradually
propagated from one stage of the
topology to another.
0
5
10
15
0 20 40 60 80 100 120
Number	of	Bolts
Time	(in	minutes)
Splitter	Bolt Counter	Bolt
Heron Instances are gradually
scaled up and down according
to the input load
Streaming and One-Pass
Algorithms
DATA	
  CHARACTERISTICS
UNBOUNDED
DATA	
  CHARACTERISTICS
UNORDERED	
  
	
  	
  	
  	
  Varying	
  Skew	
  
	
  	
  	
  	
  	
  	
  	
  	
  Event	
  Hme	
  
	
  	
  	
  	
  	
  	
  	
  	
  Processing	
  Hme	
  
	
  	
  	
  	
  Correctness	
  
	
  	
  	
  	
  Completeness
98
Neklix	
  ,	
  HBO,	
  Hulu,	
  YouTube,	
  DailymoHon,	
  ESPN,	
  Sling	
  TV
Satori,	
  Facebook	
  Live,	
  Periscope
SpoHfy,	
  Pandora,	
  Apple	
  Music,	
  Tidal
Amazon	
  Twitch,	
  YouTube	
  Gaming,	
  Microsom	
  Mixer
KEY	
  CHARACTERISTICS
LOW	
  LATENCY
HIGH	
  VELOCITY	
  DATA
100
KEY	
  CHARACTERISTICS
In-Order O(1) Storage O(n) Time
ONE PASS
101
DISTRIBUTED COMPUTATION SCALE OUT
ROBUST
CHARACTERISTICS
FAULT TOLERANCE
KEY
102
DATA SKETCHES
Early	
  work
The space complexity of approximating
the frequency moments
Counting
Frequent Elements
[Misra and Gries, 1982]
Flajolet and Martin 1985]
Computing on Data Streams
[Henzinger et al. 1998]
[Alon et al. 1996]
Counting
[Morris, 1977]
Median of a sequence
[Munro and Paterson, 1980]
Membership
[Bloom, 1970]
DATA	
  SKETCHES
UNIQUE	
  
FILTER	
  
COUNT	
  
HISTOGRAM	
  
QUANTILE	
  
MOMENTS	
  
TOP-­‐K
ADVANCED	
  	
  DATA	
  SKETCHES
RANDOM	
  PROJECTIONS,	
  FREQUENT	
  DIRECTIONS	
  
	
  	
  	
  	
  Dimensionality	
  Reduc@on	
  
RANDOMIZED	
  NUMERICAL	
  ALGEBRA	
  
	
  	
  	
  	
  	
  Matrix	
  mul@ply	
  
GRAPHS	
  
	
  	
  	
  	
  Summarize	
  adjacency	
  	
  
	
  	
  	
  	
  	
  	
  	
  	
  Connec@vity,	
  k-­‐connec@vity,	
  Spanners,	
  Sparsifica@on	
  
GEOMETRIC	
  
	
  	
  	
  	
  Diameter,	
  Lp	
  distances,	
  Min-­‐cost	
  matchings	
  
	
  	
  	
  	
  Informa@on	
  Distances,	
  e.g.,	
  Hellinger	
  distance	
  
SKETCHING	
  SKETCHES	
  
	
  	
  	
  	
  Tes@ng	
  independence
105
1 2 3
5 4
SAMPLING
A/B	
  
Tes@ng
FILTERING
Set	
  
membership
CORRELATION
Fraud	
  
Detec@on
QUANTILES
Network	
  	
  
Analysis
CARDINALITY
Site	
  Audience	
  
Analysis
Applications
106
6 7 8
10 9
MOMENTS
Database
FREQUENT	
  	
  
ELEMENTS
Trending	
  
Hashtags
CLUSTERING
Medical	
  
Imaging
ANOMALY	
  	
  
DETECTION
Sensor	
  Networks
SUBSEQUENCES
Traffic	
  
Analysis
Applications
!
!
"
"
107
Sampling
[1]	
  J.	
  S.	
  Viaer.	
  Random	
  Sampling	
  with	
  a	
  Reservoir.	
  ACM	
  Transac@ons	
  on	
  Mathema@cal	
  Soqware,	
  Vol.	
  11(1):37–57,	
  March	
  1985.
Obtain	
  a	
  representa@ve	
  sample	
  from	
  a	
  data	
  stream	
  
Maintain	
  dynamic	
  sample	
  
A	
  data	
  stream	
  is	
  a	
  con@nuous	
  process	
  
Not	
  known	
  in	
  advance	
  how	
  many	
  points	
  may	
  elapse	
  before	
  an	
  analyst	
  may	
  need	
  to	
  use	
  a	
  
representa@ve	
  sample	
  
	
  Reservoir	
  sampling	
  [1]	
  
Probabilis@c	
  inser@ons	
  and	
  dele@ons	
  on	
  arrival	
  of	
  new	
  stream	
  points	
  
Probability	
  of	
  successive	
  inser@on	
  of	
  new	
  points	
  reduces	
  with	
  progression	
  of	
  the	
  stream	
  
An	
  unbiased	
  sample	
  contains	
  a	
  larger	
  and	
  larger	
  frac@on	
  of	
  points	
  from	
  the	
  distant	
  history	
  of	
  the	
  stream	
  
Prac@cal	
  perspec@ve	
  
Data	
  stream	
  may	
  evolve	
  and	
  hence,	
  the	
  majority	
  of	
  the	
  points	
  in	
  the	
  sample	
  may	
  represent	
  the	
  
stale	
  history
108
Sampling
Sliding	
  window	
  approach	
  (sample	
  size	
  k,	
  window	
  width	
  n)	
  
Sequence-­‐based	
  	
  
Replace	
  expired	
  element	
  with	
  newly	
  arrived	
  element	
  	
  
Disadvantage:	
  highly	
  periodic	
  
Chain-­‐sample	
  approach	
  	
  
Select	
  element	
  ith	
  with	
  probability	
  Min(i,n)/n	
  
Select	
  uniformly	
  at	
  random	
  an	
  index	
  from	
  [i+1,	
  i+n]	
  of	
  the	
  element	
  which	
  will	
  replace	
  the	
  ith	
  item	
  
	
  Maintain	
  k	
  independent	
  chain	
  samples	
  
	
  Timestamp-­‐based	
  	
  
#	
  elements	
  in	
  a	
  moving	
  window	
  may	
  vary	
  over	
  @me	
  
Priority-­‐sample	
  approach
3 5 1 4 6 2 8 5 2 3 5 4 2 2 5 0 9 8 4 6 7 3
3 5 1 4 6 2 8 5 2 3 5 4 2 2 5 0 9 8 4 6 7 3
3 5 1 4 6 2 8 5 2 3 5 4 2 2 5 0 9 8 4 6 7 3
3 5 1 4 6 2 8 5 2 3 5 4 2 2 5 0 9 8 4 6 7 3
[1]	
  B.	
  Babcock.	
  Sampling	
  From	
  a	
  Moving	
  Window	
  Over	
  Streaming	
  Data.	
  In	
  Proceedings	
  of	
  SODA,	
  2002.
109
Sampling
[1]	
  C.	
  C.	
  Aggarwal.On	
  Biased	
  Reservoir	
  Sampling	
  in	
  the	
  presence	
  of	
  Stream	
  Evolu@on.	
  in	
  Proceedings	
  of	
  VLDB,	
  2006.
	
  Biased	
  Reservoir	
  Sampling	
  [1]	
  
Use	
  a	
  temporal	
  bias	
  func@on	
  -­‐	
  recent	
  points	
  have	
  higher	
  probability	
  of	
  	
  
being	
  represented	
  in	
  the	
  sample	
  reservoir	
  
Memory-­‐less	
  bias	
  func@ons	
  
Future	
  probability	
  of	
  retaining	
  a	
  current	
  point	
  in	
  the	
  reservoir	
  is	
  independent	
  of	
  its	
  past	
  history	
  or	
  	
  
arrival	
  @me	
  	
  
Probability	
  of	
  an	
  rth	
  point	
  belonging	
  to	
  the	
  reservoir	
  at	
  the	
  @me	
  t	
  is	
  propor@onal	
  to	
  the	
  bias	
  func@on	
  	
  	
  	
  
Exponen@al	
  bias	
  func@ons	
  for	
  rth	
  data	
  point	
  at	
  @me	
  t,	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  where,	
  r	
  ≤	
  t,	
  	
  λ	
  ∈	
  [0,	
  1]	
  is	
  the	
  	
  
bias	
  rate	
  
Maximum	
  reservoir	
  requirement	
  R(t)	
  is	
  bounded
110
Filtering
Set	
  Membership	
  
	
  	
  	
  	
  Determine,	
  with	
  some	
  false	
  probability,	
  if	
  an	
  item	
  in	
  a	
  data	
  stream	
  has	
  
	
  	
  	
  	
  been	
  seen	
  before	
  
Databases	
  (e.g.,	
  speed	
  up	
  semi-­‐join	
  opera@ons),	
  Caches,	
  Routers,	
  Storage	
  Systems	
  
Reduce	
  space	
  requirement	
  in	
  probabilis@c	
  rou@ng	
  tables	
  
Speedup	
  longest-­‐prefix	
  matching	
  of	
  IP	
  addresses	
  
Encode	
  mul@cast	
  forwarding	
  informa@on	
  in	
  packets	
  
Summarize	
  content	
  to	
  aid	
  collabora@ons	
  in	
  overlay	
  and	
  peer-­‐to-­‐peer	
  networks	
  
Improve	
  network	
  state	
  management	
  and	
  monitoring
111
Filtering
Set	
  Membership
Applica@on	
  to	
  hyphena@on	
  	
  
programs	
  
Early	
  UNIX	
  spell	
  checkers
[1]	
  Illustra@on	
  borrowed	
  from	
  hap://www.eecs.harvard.edu/~michaelm/postscripts/im2005b.pdf
[1]
112
Filtering
Set	
  Membership
Natural	
  generaliza@on	
  of	
  hashing	
  	
  
False	
  posi@ves	
  are	
  possible	
  
	
  	
  
No	
  false	
  nega@ves	
  
No	
  dele@ons	
  allowed	
  
For	
  false	
  posi@ve	
  rate	
  ε,	
  #	
  hash	
  func@ons	
  =	
  log2(1/ε)
where,	
  n	
  =	
  #	
  elements,	
  	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  k	
  =	
  #	
  hash	
  func@ons	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  m	
  =	
  #	
  bits	
  in	
  the	
  array
113
Filtering
Set	
  Membership
Minimizing	
  false	
  posi@ve	
  rate	
  ε	
  w.r.t.	
  k	
  [1]	
  
k	
  =	
  ln	
  2	
  *	
  (m/n)	
  
ε	
  =	
  (1/2)k	
  ≈	
  (0.6185)m/n	
  
1.44	
  *	
  log2(1/ε)	
  bits	
  per	
  item	
  
Independent	
  of	
  item	
  size	
  or	
  #	
  items	
  
Informa@on-­‐theore@c	
  minimum:	
  log2(1/ε)	
  bits	
  per	
  item	
  
44%	
  overhead	
  	
  
X	
  =	
  #	
  0	
  bits
where
[1]	
  A.	
  Broder	
  and	
  M.	
  Mitzenmacher.	
  Network	
  Applica@ons	
  of	
  Bloom	
  Filters:	
  A	
  Survey.	
  In	
  Internet	
  Mathema@cs	
  Vol.	
  1,	
  No.	
  4,	
  2005.
114
Filtering
Set	
  Membership:	
  Cuckoo	
  Filter	
  [1]
Key	
  Highlights	
  
Add	
  and	
  remove	
  items	
  dynamically	
  	
  
For	
  false	
  posi@ve	
  rate	
  ε	
  <	
  3%,	
  more	
  space	
  efficient	
  than	
  Bloom	
  filter	
  
Higher	
  performance	
  than	
  Bloom	
  filter	
  for	
  many	
  real	
  workloads	
  
Asympto@cally	
  worse	
  performance	
  than	
  Bloom	
  filter	
  
	
  Min	
  fingerprint	
  size	
  α	
  log	
  (#	
  entries	
  in	
  table)	
  
Overview	
  	
  
Stores	
  only	
  a	
  fingerprint	
  of	
  an	
  item	
  inserted	
  
Original	
  key	
  and	
  value	
  bits	
  of	
  each	
  item	
  not	
  retrievable	
  	
  
Set	
  membership	
  query	
  for	
  item	
  x:	
  search	
  hash	
  table	
  for	
  fingerprint	
  of	
  x
[1]	
  Fan	
  et	
  al.,	
  Cuckoo	
  Filter:	
  Prac@cally	
  Beaer	
  Than	
  Bloom.	
  In	
  Proceedings	
  of	
  the	
  10th	
  ACM	
  Interna@onal	
  on	
  Conference	
  on	
  Emerging	
  Networking	
  Experiments	
  and	
  Technologies,	
  2014.
115
Filtering
Set	
  Membership
[1]	
  R.	
  Pagh	
  and	
  F.	
  Rodler.	
  Cuckoo	
  hashing.	
  Journal	
  of	
  Algorithms,	
  51(2):122-­‐144,	
  2004.	
  
[2]	
  Illustra@on	
  borrowed	
  from	
  “Fan	
  et	
  al.,	
  Cuckoo	
  Filter:	
  Prac@cally	
  Beaer	
  Than	
  Bloom.	
  In	
  Proceedings	
  of	
  the	
  10th	
  ACM	
  Interna@onal	
  on	
  Conference	
  on	
  Emerging	
  Networking	
  Experiments	
  and	
  Technologies,	
  2014.”
[2]
Illustra@on	
  of	
  Cuckoo	
  hashing	
  [2]
Cuckoo Hashing [1]
High	
  space	
  occupancy	
  
Prac@cal	
  implementa@ons:	
  mul@ple	
  items/bucket	
  
Example	
  uses:	
  Soqware-­‐based	
  Ethernet	
  switches	
  
Cuckoo Filter [2]
Uses	
  a	
  mul@-­‐way	
  associa@ve	
  Cuckoo	
  hash	
  table	
  
Employs	
  par@al-­‐key	
  cuckoo	
  hashing	
  
Store	
  fingerprint	
  of	
  an	
  item	
  	
  
Relocate	
  exis@ng	
  fingerprints	
  to	
  their	
  alterna@ve	
  loca@ons
[2]
Dele@on	
  
Item	
  must	
  have	
  been	
  previously	
  inserted
116
Filtering
Set	
  Membership
Cuckoo Filter
Par@al-­‐key	
  cuckoo	
  hashing	
  
Fingerprint	
  hashing	
  ensures	
  uniform	
  
distribu@on	
  of	
  items	
  in	
  the	
  table	
  
Length	
  of	
  fingerprint	
  <<	
  Size	
  of	
  h1	
  or	
  h2	
  
Possible	
   to	
   have	
   mul@ple	
   entries	
   of	
   a	
  
fingerprint	
  in	
  a	
  bucket
Alternate	
  
bucket
Significantly	
  shorter	
  
than	
  h1	
  and	
  h2
117
Filtering
Set	
  Membership
Comparison
k	
  ➛ # hash functions, d	
  ➛ # partitions
118
Cardinality
Dis@nct	
  Elements	
  
Database	
  systems/Search	
  engines	
  
#	
  dis@nct	
  queries	
  
Network	
  monitoring	
  applica@ons	
  
Natural	
  language	
  processing	
  
#	
  dis@nct	
  mo@fs	
  in	
  a	
  DNA	
  sequence	
  
#	
  dis@nct	
  elements	
  of	
  RFID/sensor	
  networks
119
Previous	
  work	
  
Probabilis@c	
  coun@ng	
  [Flajolet	
  and	
  Mar@n,	
  1985]	
  
	
  LogLog	
  coun@ng	
  [Durand	
  and	
  Flajolet,	
  2003]	
  
	
  HyperLogLog	
  [Flajolet	
  et	
  al.,	
  2007]	
  
	
  Sliding	
  HyperLogLog	
  [Chabchoub	
  and	
  Hebrail,	
  2010]	
  
	
  HyperLogLog	
  in	
  Prac@ce	
  [Heule	
  et	
  al.,	
  2013]	
  
	
  Self-­‐Organizing	
  Bitmap	
  [Chen	
  and	
  Cao,	
  2009]	
  
	
  Discrete	
  Max-­‐Count	
  [Ting,	
  2014]	
  
Sequence	
  of	
  sketches	
  forms	
  a	
  Markov	
  chain	
  when	
  h	
  is	
  a	
  strong	
  universal	
  hash	
  
Es@mate	
  cardinality	
  using	
  a	
  mar@ngale
Cardinality
120
Comparison
N	
  ≤	
  109
Cardinality
121
Hyperloglog	
  
Apply	
  hash	
  func@on	
  h	
  to	
  every	
  element	
  in	
  a	
  mul@set	
  	
  
Cardinality	
  of	
  mul@set	
  is	
  2max(ϱ)	
  where	
  0ϱ-­‐11	
  is	
  the	
  bit	
  paaern	
  observed	
  at	
  the	
  
beginning	
  of	
  a	
  hash	
  value	
  
Above	
  suffers	
  with	
  high	
  variance	
  
Employ	
  stochas@c	
  averaging	
  
Par@@on	
  input	
  stream	
  into	
  m	
  sub-­‐streams	
  Si	
  using	
  first	
  p	
  bits	
  of	
  hash	
  values	
  (m	
  =	
  2p)
where
Cardinality
122
Use	
  of	
  64-­‐bit	
  hash	
  func@on	
  	
  
Total	
  memory	
  requirement	
  5	
  *	
  2p	
  -­‐>	
  6	
  *	
  2p,	
  where	
  p	
  is	
  the	
  precision	
  
Empirical	
  bias	
  correc@on	
  
Uses	
  empirically	
  determined	
  data	
  for	
  cardinali@es	
  smaller	
  than	
  5m	
  and	
  uses	
  the	
  unmodified	
  raw	
  
es@mate	
  otherwise	
  
Sparse	
  representa@on	
  
	
  For	
  n≪m,	
  store	
  an	
  integer	
  obtained	
  by	
  concatena@ng	
  the	
  bit	
  paaerns	
  for	
  idx	
  and	
  ϱ(w)	
  
	
  Use	
  variable	
  length	
  encoding	
  for	
  integers	
  that	
  uses	
  variable	
  number	
  of	
  bytes	
  to	
  represent	
  
integers	
  
	
  Use	
  difference	
  encoding	
  -­‐	
  store	
  the	
  difference	
  between	
  successive	
  elements	
  
Other	
  op@miza@ons	
  [1,	
  2]
Hypeloglog Optimizations
[1]	
  hap://druid.io/blog/2014/02/18/hyperloglog-­‐op@miza@ons-­‐for-­‐real-­‐world-­‐systems.html	
  
[2]	
  hap://an@rez.com/news/75
Cardinality
123
Self-­‐Learning	
  Bitmap	
  (S-­‐bitmap)	
  [1]	
  
Achieve	
  constant	
  rela@ve	
  es@ma@on	
  errors	
  for	
  unknown	
  cardinali@es	
  in	
  a	
  wide	
  
range,	
  say	
  from	
  10s	
  to	
  >106	
  
	
  Bitmap	
  obtained	
  via	
  adap@ve	
  sampling	
  process	
  
	
  Bits	
  corresponding	
  to	
  the	
  sampled	
  items	
  are	
  set	
  to	
  1	
  
	
  Sampling	
  rates	
  are	
  learned	
  from	
  #	
  dis@nct	
  items	
  already	
  passed	
  and	
  reduced	
  sequen@ally	
  as	
  
more	
  bits	
  are	
  set	
  to	
  1	
  
	
  For	
  given	
  input	
  parameters	
  Nmax	
  and	
  es@ma@on	
  precision	
  ε,	
  size	
  of	
  bit	
  mask	
  
For	
  r	
  =	
  1	
  -­‐2ε2(1+ε2)-­‐1	
  and	
  sampling	
  probability	
  pk	
  =	
  m	
  (m+1-­‐k)-­‐1(1+ε2)rk,	
  where	
  k	
  ∈	
  [1,m]	
  
	
  	
  	
  	
  	
  Rela@ve	
  error	
  ≣	
  ε
[1]	
  Chen	
  et	
  al.	
  “Dis@nct	
  coun@ng	
  with	
  a	
  self-­‐learning	
  bitmap”.	
  Journal	
  of	
  the	
  American	
  Sta@s@cal	
  Associa@on,	
  106(495):879–890,	
  2011.
Cardinality
124
Quantiles
Quan@les,	
  Histograms	
  
Large	
  set	
  of	
  real-­‐world	
  applica@ons	
  
Database	
  applica@ons	
  
Sensor	
  networks	
  
Opera@ons	
  
Proper@es	
  	
  
Provide	
  tunable	
  and	
  explicit	
  guarantees	
  on	
  the	
  precision	
  of	
  approxima@on	
  
Single	
  pass	
  
Early	
  work	
  
[Greenwald	
  and	
  Khanna,	
  2001]	
  -­‐	
  worst	
  case	
  space	
  requirement	
  	
  
[Arasu	
  and	
  Manku,	
  2004]	
  -­‐	
  sliding	
  window	
  based	
  model,	
  worst	
  case	
  space	
  	
  
requirement	
  
125
q-­‐digest	
  [1]	
  
Groups	
  values	
  in	
  variable	
  size	
  buckets	
  of	
  almost	
  	
  
equal	
  weights	
  
Unlike	
  a	
  tradi@onal	
  histogram,	
  buckets	
  can	
  overlap	
  
Key	
  features	
  
Detailed	
  informa@on	
  about	
  frequent	
  values	
  preserved	
  
Less	
  frequent	
  values	
  lumped	
  into	
  larger	
  buckets	
  
Using	
  message	
  of	
  size	
  m,	
  answer	
  within	
  an	
  error	
  of	
  
Except	
  root	
  and	
  leaf	
  nodes,	
  a	
  node	
  v	
  ∈	
  q-­‐digest	
  iff
Max	
  signal	
  
value
#	
  Elements
Compression	
  
Factor
Complete	
  binary	
  tree
[1]	
  Shrivastava	
  et	
  al.,	
  Medians	
  and	
  Beyond:	
  New	
  Aggrega@on	
  Techniques	
  for	
  Sensor	
  Networks.	
  In	
  Proceedings	
  of	
  SenSys,	
  2004.
Quantiles
126
q-­‐digest	
  
Building	
  a	
  q-­‐digest	
  
q-­‐digests	
  can	
  be	
  constructed	
  in	
  a	
  distributed	
  fashion	
  
	
  Merge	
  q-­‐digests
Quantiles
127
t-­‐digest	
  [1]	
  
Approxima@on	
  of	
  rank-­‐based	
  sta@s@cs	
  
Compute	
  quan@le	
  q	
  with	
  an	
  accuracy	
  rela@ve	
  to	
  max(q,	
  1-­‐q)	
  
Compute	
  hybrid	
  sta@s@cs	
  such	
  as	
  trimmed	
  sta@s@cs	
  
Key	
  features	
  
Robust	
  with	
  respect	
  to	
  highly	
  skewed	
  distribu@ons	
  
Independent	
  of	
  the	
  range	
  of	
  input	
  values	
  (unlike	
  q-­‐digest)	
  
Rela@ve	
  error	
  is	
  bounded	
  
Non-­‐equal	
  bin	
  sizes	
  
Few	
  samples	
  contribute	
  to	
  the	
  bins	
  corresponding	
  to	
  the	
  extreme	
  quan@les	
  
Merging	
  independent	
  t-­‐digests	
  
Reasonable	
  accuracy
[1]T.	
  Dunning	
  and	
  O.	
  Ertl,	
  “”Compu@ng	
  Extremely	
  Accurate	
  Quan@les	
  using	
  t-­‐digests”,	
  2017.	
  haps://github.com/tdunning/t-­‐digest/blob/master/docs/t-­‐digest-­‐paper/histo.pdf	
  
Quantiles
128
t-­‐digest	
  
Group	
  samples	
  into	
  sub-­‐sequences	
  
Smaller	
  sub-­‐sequences	
  near	
  the	
  ends	
  
Larger	
  sub-­‐sequences	
  in	
  the	
  middle	
  
Scaling	
  func@on	
  
Mapping	
  k	
  is	
  monotonic	
  
k(0)	
  =	
  1	
  and	
  k(1)	
  =	
  δ	
  
k-­‐size	
  of	
  each	
  subsequence	
  <	
  1	
  
No@onal	
  
Index
Compression	
  
parameterQuan@le
Quantiles
129
t-­‐digest	
  
Es@ma@ng	
  quan@le	
  via	
  interpola@on	
  
Sub-­‐sequences	
  contain	
  centroid	
  of	
  the	
  samples	
  
Es@mate	
  the	
  boundaries	
  of	
  the	
  sub-­‐sequences	
  
Error	
  
Scales	
  quadra@cally	
  in	
  #	
  samples	
  
Small	
  #	
  samples	
  in	
  the	
  sub-­‐sequences	
  near	
  q=0	
  and	
  q=1	
  improves	
  accuracy	
  
Lower	
  accuracy	
  in	
  the	
  middle	
  of	
  the	
  distribu@on	
  	
  
Larger	
  sub-­‐sequences	
  in	
  the	
  middle	
  
Two	
  flavors	
  
Progressive	
  merging	
  (buffering	
  based)	
  and	
  clustering	
  variant
Quantiles
130
Frequent Elements
Applica@ons	
  
Track	
  bandwidth	
  hogs	
  
Determine	
  popular	
  tourist	
  des@na@ons	
  
Itemset	
  mining	
  
Entropy	
  es@ma@on	
  	
  
Compressed	
  sensing	
  	
  
Search	
  log	
  mining	
  
Network	
  data	
  analysis	
  
DBMS	
  op@miza@on	
  
Count-­‐min	
  Sketch	
  [1]	
  
A	
  two-­‐dimensional	
  array	
  counts	
  with	
  w	
  columns	
  and	
  d	
  rows	
  
Each	
  entry	
  of	
  the	
  array	
  is	
  ini@ally	
  zero	
  
d	
   hash	
   func@ons	
   are	
   chosen	
   uniformly	
   at	
   random	
   from	
   a	
   pairwise	
   independent	
  
family	
  
Update	
  
	
  For	
  a	
  new	
  element	
  i,	
  for	
  each	
  row	
  j	
  and	
  k	
  =	
  hj(i),	
  increment	
  the	
  kth	
  column	
  by	
  one	
  
Point	
  query	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  where,	
  sketch	
  is	
  the	
  table	
  
Parameters
131
),( δε
}1{}1{:,,1 wnhh d ……… →
!
!
"
#
#
$
=
ε
e
w
!
!
"
#
#
$
=
δ
1
lnd
[1]	
  Cormode,	
  Graham;	
  S.	
  Muthukrishnan	
  (2005).	
  "An	
  Improved	
  Data	
  Stream	
  Summary:	
  The	
  Count-­‐Min	
  Sketch	
  and	
  its	
  Applica@ons".	
  J.	
  Algorithms	
  55:	
  29–38.
Frequent Elements
Variants	
  of	
  Count-­‐min	
  Sketch	
  [1]	
  
Count-­‐Min	
  sketch	
  with	
  conserva@ve	
  update	
  (CU	
  sketch)	
  
	
  Update	
  an	
  item	
  with	
  frequency	
  c	
  
	
  Avoid	
  unnecessary	
  upda@ng	
  of	
  counter	
  values	
  =>	
  Reduce	
  over-­‐es@ma@on	
  error	
  
	
  Prone	
  to	
  over-­‐es@ma@on	
  error	
  on	
  low-­‐frequency	
  items	
  	
  
Lossy	
  Conserva@ve	
  Update	
  (LCU)	
  -­‐	
  SWS	
  
	
  Divide	
  stream	
  into	
  windows	
  
	
  At	
  window	
  boundaries,	
  ∀	
  1	
  ≤	
  i	
  ≤	
  w,	
  1	
  ≤	
  j	
  ≤	
  d,	
  decrement	
  sketch[i,j]	
  if	
  0	
  <	
  sketch[i,j]	
  ≤	
  
132
[1]	
  Cormode,	
  G.	
  2009.	
  Encyclopedia	
  entry	
  on	
  ’Count-­‐MinSketch’.	
  In	
  Encyclopedia	
  of	
  Database	
  Systems.	
  Springer.,	
  511–516.
Frequent Elements
133
OPEN SOURCE TWITTER
YAHOO!
HUAWEI
streamDM^
SGD	
  Learner	
  and	
  Perceptron	
  
Naive	
  Bayes	
  
CluStream	
  
Hoeffding	
  Decision	
  Trees	
  
Bagging	
  
Stream	
  KM++
DATA	
  SKETCHES
*
Unique	
  
Quan@le	
  
Histogram	
  
Sampling	
  
Theta	
  Sketches	
  
Tuple	
   Sketches	
  
Most	
  Frequent
ALGEBIRD#
Filtering	
  
Unique	
  
Histogram	
  
Most	
  Frequent
*	
  haps://datasketches.github.io/	
  
#	
  haps://github.com/twiaer/algebird	
  
^	
  hap://huawei-­‐noah.github.io/streamDM/	
  
**	
  haps://github.com/jiecchen/StreamLib
StreamLib**
134
Anomaly Detection
[1]	
  A.	
  S.	
  Willsky,	
  “A	
  survey	
  of	
  design	
  methods	
  for	
  failure	
  detec@on	
  systems,”	
  Automa@ca,	
  vol.	
  12,	
  pp.	
  601–611,	
  1976.
Very	
  rich	
  -­‐	
  over	
  150	
  yrs	
  -­‐	
  history	
  
Manufacturing	
  	
  
	
  Sta@s@cs	
  
	
  Econometrics,	
  Financial	
  engineering	
  
	
  Signal	
  processing	
  
	
  Control	
  systems,	
  Autonomous	
  systems	
  -­‐	
  fault	
  detec@on	
  [1]	
  
	
  Networking	
  
	
  Computa@onal	
  biology	
  (e.g.,	
  microarray	
  analysis)	
  
	
  Computer	
  vision
135
Very	
  rich	
  -­‐	
  over	
  150	
  yrs	
  -­‐	
  history	
  
Anomalies	
  are	
  contextual	
  in	
  nature
“DISCORDANT observations may be defined as those which
present the appearance of differing in respect of their law of
frequency from other observations with which they ale
combined. In the treatment of such observations there is great
diversity between authorities ; but this discordance of methods
may be reduced by the following reflection. Different methods
are adapted to different hypotheses about the cause of a
discordant observation; and different hypotheses are true, or
appropriate, according as the subject-matter, or the degree of
accuracy required, is different.”
F. Y. Edgeworth, “On Discordant Observations”, 1887.
Anomaly Detection
136
Anomaly Detection
CHARACTERISTICS
DIRECTION
Posi@ve,	
  Nega@ve
FREQUENCY
Reliability
WIDTH
Ac@onability
MAGNITUDE
Severity
Global
Local
#
6
!
!
137
Anomaly Detection
COMMON	
  APPROACHES
DOMAINS
STATS	
  
MFG	
  
OPS
NOT	
  VALID
in	
  real-­‐life
Moving Averages	
  
SMA, EWMA, PEWMA
Assumption	
  
Normal Distribution
PARAMS
WIDTH	
  
DECAY
Rule Based	
  
µ ± σ
Stone	
  1868	
  
Glaisher	
  1872	
  
Edgeworth	
  1887	
  
Stewart	
  1920	
  
Irwin	
  1925	
  
Jeffreys	
  1932	
  
Rider	
  1933
138
Anomaly Detection
ROBUST	
  MEASURES
MEDIAN MAD [1] MCD [2] MVEE [3,4]
Median Absolute
Deviation
Minimum Covariance
Determinant
Minimum Volume
Enclosing Ellipsoid
[1]P.	
  J.	
  Rousseeuw	
  and	
  C.	
  Croux,	
  “Alterna@ves	
  to	
  the	
  Median	
  Absolute	
  Devia@on”,	
  1993.	
  
[2]	
  hap://onlinelibrary.wiley.com/wol1/doi/10.1002/wics.61/abstract	
  
[3]	
  P.	
  J.	
  Rousseeuw	
  and	
  A.	
  M.	
  Leroy.,“Robust	
  Regression	
  and	
  Outlier	
  Detec@on”,	
  1987.	
  
[4]	
  M.	
  J.Todda	
  and	
  E.	
  A.	
  Yıldırım	
  ,	
  “On	
  Khachiyan's	
  algorithm	
  for	
  the	
  computa@on	
  of	
  minimum-­‐volume	
  enclosing	
  ellipsoids”,	
  2007.
139
Anomaly Detection
Challenges
NOISE STATIONARITY
SEASONALITY TREND BREAKOUT
140
Anomaly Detection
Challenges	
  
Live	
  Data	
  
Mul@-­‐dimensional	
  	
  
Low	
  memory	
  footprint	
  	
  
Accuracy	
  vs.	
  Speed	
  trade-­‐off	
  
Encoding	
  the	
  context	
  
Data	
  types	
  
Video,	
  Audio,	
  Text	
  
Data	
  veracity	
  
Wearables	
  
Smart	
  ci@es,	
  Connected	
  Home,	
  Internet	
  of	
  Things
TYING	
  IT	
  ALL	
  TOGETHER
142
Real Time Architectures
For	
  the	
  streaming	
  world	
  
Lambda	
  
Run	
  computa@on	
  twice	
  in	
  different	
  systems	
  
Kappa	
  
Run	
  computa@on	
  once	
  
143
Lambda Architecture
Overview
144
Lambda Architecture
Batch	
  Layer	
  
Accurate	
  but	
  delayed	
  
HDFS/Mapreduce	
  
Fast	
  Layer	
  
Inexact	
  but	
  fast	
  
Storm/Ka}a	
  
Query	
  Merge	
  Layer	
  
Merge	
  results	
  from	
  batch	
  and	
  fast	
  layers	
  at	
  query	
  @me	
  
145
Lambda Architecture
Characteris@cs	
  
During	
  Inges@on,	
  Data	
  is	
  cloned	
  into	
  two.	
  
One	
  goes	
  to	
  the	
  batch	
  layer	
  
Other	
  goes	
  to	
  the	
  fast	
  layer	
  
Processing	
  done	
  at	
  two	
  layers	
  
Expressed	
  as	
  Map-­‐reduces	
  in	
  batch	
  layer	
  
Expressed	
  as	
  topologies	
  in	
  the	
  speed	
  layer
146
Lambda Architecture
Challenges	
  
Inherently	
  Inefficient	
  
Data	
  is	
  replicated	
  twice	
  
Computa@on	
  is	
  replicated	
  twice	
  
Opera@onally	
  Inefficient	
  
Maintain	
  both	
  batch	
  and	
  streaming	
  systems	
  
Tune	
  topologies	
  for	
  both	
  systems
147
Kappa Architecture
Streaming	
  is	
  everything	
  
Computa@on	
  is	
  expressed	
  in	
  a	
  topology	
  
Computa@on	
  is	
  mostly	
  done	
  only	
  once	
  when	
  the	
  
data	
  arrives	
  
Data	
  moves	
  into	
  permanent	
  storage	
  
148
Kappa Architecture
149
Kappa Architecture
Challenges	
  
Data	
  Reprocessing	
  could	
  be	
  very	
  expensive	
  
Code/Logic	
  Changes	
  
Either	
  Data	
  needs	
  to	
  be	
  brought	
  back	
  from	
  Storage	
  to	
  the	
  bus	
  
Or	
  Computa@on	
  needs	
  to	
  be	
  expressed	
  to	
  run	
  on	
  bulk-­‐storage	
  
Historic	
  Analysis	
  
How	
  to	
  do	
  data	
  analy@cs	
  over	
  all	
  of	
  last	
  years	
  data
150
151
Observations
Lambda	
  is	
  complicated	
  and	
  inefficient	
  
Replica@on	
  of	
  Data	
  and	
  Computa@on	
  
Mul@ple	
  Systems	
  to	
  operate	
  and	
  tune	
  
Kappa	
  is	
  too	
  simplis@c	
  
Data	
  reprocessing	
  too	
  expensive	
  
Historical	
  analysis	
  not	
  possible
152
Observations
Computa@on	
  across	
  batch/real@me	
  is	
  similar	
  
Expressed	
  as	
  DAGS	
  
Run	
  parallely	
  on	
  the	
  cluster	
  
Intermediate	
  results	
  need	
  not	
  be	
  materialized	
  
Func@onal/Declara@ve	
  APIs	
  
Storage	
  is	
  the	
  key	
  
Messaging/Storage	
  are	
  two	
  faces	
  of	
  the	
  same	
  coin	
  
They	
  serve	
  the	
  same	
  data
153
Real-Time Storage Requirements
Requirements	
  for	
  a	
  real-­‐Hme	
  storage	
  plakorm
Be	
  able	
  to	
  write	
  and	
  read	
  streams	
  of	
  records	
  with	
  low	
  latency,	
  storage	
  durability	
  
Data	
  storage	
  should	
  be	
  durable,	
  consistent	
  and	
  fault	
  tolerant	
  
Enable	
  clients	
  to	
  stream	
  or	
  tail	
  ledgers	
  to	
  propagate	
  data	
  as	
  they’re	
  wriaen	
  
Store	
  and	
  provide	
  access	
  to	
  both	
  historic	
  and	
  real-­‐@me	
  data
154
Apache BookKeeper - Stream Storage
A	
  storage	
  for	
  log	
  streams
Replicated,	
  durable	
  storage	
  of	
  log	
  streams	
  
Provide	
  fast	
  tailing/streaming	
  facility	
  
Op@mized	
  for	
  immutable	
  data	
  
Low-­‐latency	
  durability	
  
Simple	
  repeatable	
  read	
  consistency	
  
High	
  write	
  and	
  read	
  availability	
  
155
Record
Smallest	
  I/O	
  and	
  Address	
  Unit
A	
  sequence	
  of	
  invisible	
  records	
  
A	
  record	
  is	
  sequence	
  of	
  bytes	
  
The	
  smallest	
  I/O	
  unit,	
  as	
  well	
  as	
  the	
  unit	
  of	
  address	
  
Each	
  record	
  contains	
  sequence	
  numbers	
  for	
  addressing
156
Logs
Two	
  Storage	
  PrimiHves
Ledger:	
  A	
  finite	
  sequence	
  of	
  records.	
  
Stream:	
  An	
  infinite	
  sequence	
  of	
  records.	
  
157
Ledger
Finite	
  sequence	
  of	
  records
Ledger:	
  A	
  finite	
  sequence	
  of	
  records	
  that	
  gets	
  terminated	
  
A	
  client	
  explicitly	
  close	
  it	
  
A	
  writer	
  who	
  writes	
  records	
  into	
  it	
  has	
  crashed.
158
Stream
Infinite	
  sequence	
  of	
  records
Stream:	
  An	
  unbounded,	
  infinite	
  sequence	
  of	
  records	
  
Physically	
  comprised	
  of	
  mul@ple	
  ledgers
159
Bookies
Stores	
  fragment	
  of	
  records
Bookie	
  -­‐	
  A	
  storage	
  server	
  to	
  store	
  data	
  records	
  
Ensemble:	
  A	
  group	
  of	
  bookies	
  storing	
  the	
  data	
  records	
  of	
  a	
  ledger	
  
Individual	
  bookies	
  store	
  fragments	
  of	
  ledgers
160
Bookies
Stores	
  fragment	
  of	
  records
161
Tying it all together
A	
  typical	
  installaHon	
  of	
  Apache	
  BookKeeper
162
BookKeeper - Use Cases
Combine	
  messaging	
  and	
  storage
Stream	
  Storage	
  combines	
  the	
  func@onality	
  of	
  streaming	
  and	
  storage
WAL	
  -­‐	
  Write	
  Ahead	
  Log Message	
  Store Object	
  Store
SnapshotsStream	
  Processing
163
BookKeeper in Real-Time Solution
Durable	
  Messaging,	
  Scalable	
  Compute	
  and	
  Stream	
  Storage
164
BookKeeper in Production
Enterprise	
  Grade	
  Stream	
  Storage
4+	
  years	
  at	
  Twiaer	
  and	
  Yahoo,	
  2+	
  years	
  at	
  Salesforce	
  
Mul@ple	
  use	
  cases	
  from	
  messaging	
  to	
  storage	
  
Database	
  replica@on,	
  Message	
  store,	
  Stream	
  compu@ng	
  …	
  
600+	
  bookies	
  in	
  one	
  single	
  cluster	
  
Data	
  is	
  stored	
  from	
  days	
  to	
  a	
  year	
  
Millions	
  of	
  log	
  streams	
  
1	
  trillion	
  records/day,	
  17	
  PB/day
165
Companies using BookKeeper
Enterprise	
  Grade	
  Stream	
  Storage
166
Real Time is Messy and Unpredictable
Aggregation  
Systems
Messaging  
Systems
Result  
Engine
HDFS
Queryable  
Engines
167
Streamlio - Unified Architecture
Interactive  
  Querying
Storm  API
Trident/Apache  
Beam  
SQL
Application  
Builder
Pulsar  
API
BK/  
HDFS  
API
Kubernetes
Metadata  
Management
Operational  
Monitoring
Chargeback
Security  
Authentication
Quota  
Management
Rules    
Engine
Kafka  
API
168
RESOURCES
Sketching	
  Algorithms
haps://www.cs.upc.edu/~gavalda/papers/portoschool.pdf	
  	
  
haps://mapr.com/blog/some-­‐important-­‐streaming-­‐algorithms-­‐you-­‐should-­‐know-­‐about/	
  
haps://gist.github.com/debasishg/8172796
Synopses	
  for	
  Massive	
  Data:	
  Samples,	
  Histograms,	
  Wavelets,	
  	
  
Sketches
Data	
  Streams:	
  Models	
  and	
  Algorithms
Charu	
  Aggarwal	
  
hap://www.springer.com/us/book/9780387287591
Data	
  Streams:	
  Algorithms	
  and	
  ApplicaHons
Muthu	
  Muthukrishnan	
  
hap://algo.research.googlepages.com/eight.ps
Graph	
  Streaming	
  Algorithms
A.	
  McGregor
G.	
  Cormode,	
  M.	
  Garofalakis	
  and	
  P.	
  J.	
  Haas
Sketching	
  as	
  a	
  Tool	
  	
  for	
  Numerical	
  Linear	
  Algebra
D.	
  Woodruff
169
Dhalion:	
  Self-­‐Regula@ng	
  VLDB’17
Twiaer	
  Heron:	
  Towards	
  Extensible	
  ICDE’17
Dhalion:	
  Self-­‐Regula@ng	
  VLDB’17
MillWheel:	
  	
  VLDB’13
Readings
Stream	
  Processing	
  in	
  Heron
Stream	
  Processing	
  in	
  Heron
Streaming	
  Engines
Twiaer	
  Heron:	
  Stream	
  SIGMOD’15
Processing	
  at	
  scale
Fault-­‐Tolerant	
  Stream	
  Processing	
  at	
  Internet	
  Scale
The	
  Dataflow	
  Model:	
  A	
  Prac@cal	
  VLDB’15
Approach	
  to	
  Balancing	
  Correctness,	
  	
  
Latency	
  and	
  Cost	
  in	
  Massive-­‐Scale,	
  
Unbounded	
  Out-­‐of-­‐Order	
  Data	
  Processing
Anomaly	
  Detec@on	
  in	
  Strata	
  San	
  Jose’17
Real-­‐Time	
  Data	
  Streams	
  Using	
  Heron
170
Readings
FOCS’00	
  
Clustering Data Streams
SIGMOD’02	
  
Querying and mining data streams:
You only get one look
SIAM Journal of Computing’09	
  
Stream Order and Order Statistics: Quantile
Estimation in Random-Order Streams
PODS’02	
  
Models and Issues in Data Stream Systems
SIGMOD’07	
  
Statistical Analysis of Sketch Estimators
PODS’10	
  
An optimal algorithm for the distinct
elements problem
171
Readings
SODA’10	
  
Coresets and Sketches for high dimensional
subspace approximation problems
SIGMOD’16	
  
Time Adaptive Sketches (Ada-Sketches) for
Summarizing Data Streams
SOSR’17	
  
Heavy-Hitter Detection Entirely in the Data
Plane
PODS’12	
  
Graph Sketches: Sparsification, Spanners, and
Subgraphs
Arxiv’16	
  
Coresets and Sketches
ACM Queue’17	
  
Data Sketching: The approximate approach is
often faster and more efficient
173
GET	
  IN	
  TOUCH
C O N T A C T 	
   U S
@arun_kejariwal	
  
@kramasamy,	
  	
  @sanjeerk	
  
@sijieg,	
  	
  @merlimat	
  
@nlu90
karthik@stremlio.io	
  
arun_kejariwal@acm.org
E N J O Y T H E P R E S E N T A T I O N
The End

More Related Content

What's hot

Big Data & Analytics Architecture
Big Data & Analytics ArchitectureBig Data & Analytics Architecture
Big Data & Analytics ArchitectureArvind Sathi
 
Building a Real-Time Analytics Application with Apache Pulsar and Apache Pinot
Building a Real-Time Analytics Application with  Apache Pulsar and Apache PinotBuilding a Real-Time Analytics Application with  Apache Pulsar and Apache Pinot
Building a Real-Time Analytics Application with Apache Pulsar and Apache PinotAltinity Ltd
 
Delta Lake with Azure Databricks
Delta Lake with Azure DatabricksDelta Lake with Azure Databricks
Delta Lake with Azure DatabricksDustin Vannoy
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureDatabricks
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)James Serra
 
ACID ORC, Iceberg, and Delta Lake—An Overview of Table Formats for Large Scal...
ACID ORC, Iceberg, and Delta Lake—An Overview of Table Formats for Large Scal...ACID ORC, Iceberg, and Delta Lake—An Overview of Table Formats for Large Scal...
ACID ORC, Iceberg, and Delta Lake—An Overview of Table Formats for Large Scal...Databricks
 
Introducing Change Data Capture with Debezium
Introducing Change Data Capture with DebeziumIntroducing Change Data Capture with Debezium
Introducing Change Data Capture with DebeziumChengKuan Gan
 
Tag.bio: Self Service Data Mesh Platform
Tag.bio: Self Service Data Mesh PlatformTag.bio: Self Service Data Mesh Platform
Tag.bio: Self Service Data Mesh PlatformSanjay Padhi, Ph.D
 
Mastering Your Customer Data on Apache Spark by Elliott Cordo
Mastering Your Customer Data on Apache Spark by Elliott CordoMastering Your Customer Data on Apache Spark by Elliott Cordo
Mastering Your Customer Data on Apache Spark by Elliott CordoSpark Summit
 
Real-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache PinotReal-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache PinotXiang Fu
 
Making Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse TechnologyMaking Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse TechnologyMatei Zaharia
 
Apache Kafka® Use Cases for Financial Services
Apache Kafka® Use Cases for Financial ServicesApache Kafka® Use Cases for Financial Services
Apache Kafka® Use Cases for Financial Servicesconfluent
 
Real-time Stream Processing with Apache Flink
Real-time Stream Processing with Apache FlinkReal-time Stream Processing with Apache Flink
Real-time Stream Processing with Apache FlinkDataWorks Summit
 
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...HostedbyConfluent
 
Iceberg + Alluxio for Fast Data Analytics
Iceberg + Alluxio for Fast Data AnalyticsIceberg + Alluxio for Fast Data Analytics
Iceberg + Alluxio for Fast Data AnalyticsAlluxio, Inc.
 
Image Processing on Delta Lake
Image Processing on Delta LakeImage Processing on Delta Lake
Image Processing on Delta LakeDatabricks
 
Scaling Apache Pulsar to 10 PB/day
Scaling Apache Pulsar to 10 PB/dayScaling Apache Pulsar to 10 PB/day
Scaling Apache Pulsar to 10 PB/dayKarthik Ramasamy
 
Introduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureIntroduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureDatabricks
 
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...Databricks
 

What's hot (20)

Kafka 101
Kafka 101Kafka 101
Kafka 101
 
Big Data & Analytics Architecture
Big Data & Analytics ArchitectureBig Data & Analytics Architecture
Big Data & Analytics Architecture
 
Building a Real-Time Analytics Application with Apache Pulsar and Apache Pinot
Building a Real-Time Analytics Application with  Apache Pulsar and Apache PinotBuilding a Real-Time Analytics Application with  Apache Pulsar and Apache Pinot
Building a Real-Time Analytics Application with Apache Pulsar and Apache Pinot
 
Delta Lake with Azure Databricks
Delta Lake with Azure DatabricksDelta Lake with Azure Databricks
Delta Lake with Azure Databricks
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data Architecture
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)
 
ACID ORC, Iceberg, and Delta Lake—An Overview of Table Formats for Large Scal...
ACID ORC, Iceberg, and Delta Lake—An Overview of Table Formats for Large Scal...ACID ORC, Iceberg, and Delta Lake—An Overview of Table Formats for Large Scal...
ACID ORC, Iceberg, and Delta Lake—An Overview of Table Formats for Large Scal...
 
Introducing Change Data Capture with Debezium
Introducing Change Data Capture with DebeziumIntroducing Change Data Capture with Debezium
Introducing Change Data Capture with Debezium
 
Tag.bio: Self Service Data Mesh Platform
Tag.bio: Self Service Data Mesh PlatformTag.bio: Self Service Data Mesh Platform
Tag.bio: Self Service Data Mesh Platform
 
Mastering Your Customer Data on Apache Spark by Elliott Cordo
Mastering Your Customer Data on Apache Spark by Elliott CordoMastering Your Customer Data on Apache Spark by Elliott Cordo
Mastering Your Customer Data on Apache Spark by Elliott Cordo
 
Real-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache PinotReal-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache Pinot
 
Making Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse TechnologyMaking Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse Technology
 
Apache Kafka® Use Cases for Financial Services
Apache Kafka® Use Cases for Financial ServicesApache Kafka® Use Cases for Financial Services
Apache Kafka® Use Cases for Financial Services
 
Real-time Stream Processing with Apache Flink
Real-time Stream Processing with Apache FlinkReal-time Stream Processing with Apache Flink
Real-time Stream Processing with Apache Flink
 
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
 
Iceberg + Alluxio for Fast Data Analytics
Iceberg + Alluxio for Fast Data AnalyticsIceberg + Alluxio for Fast Data Analytics
Iceberg + Alluxio for Fast Data Analytics
 
Image Processing on Delta Lake
Image Processing on Delta LakeImage Processing on Delta Lake
Image Processing on Delta Lake
 
Scaling Apache Pulsar to 10 PB/day
Scaling Apache Pulsar to 10 PB/dayScaling Apache Pulsar to 10 PB/day
Scaling Apache Pulsar to 10 PB/day
 
Introduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureIntroduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse Architecture
 
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
 

Similar to Modern real-time streaming architectures

Real-time processing of large amounts of data
Real-time processing of large amounts of dataReal-time processing of large amounts of data
Real-time processing of large amounts of dataconfluent
 
EDA Meets Data Engineering – What's the Big Deal?
EDA Meets Data Engineering – What's the Big Deal?EDA Meets Data Engineering – What's the Big Deal?
EDA Meets Data Engineering – What's the Big Deal?confluent
 
Trivento summercamp masterclass 9/9/2016
Trivento summercamp masterclass 9/9/2016Trivento summercamp masterclass 9/9/2016
Trivento summercamp masterclass 9/9/2016Stavros Kontopoulos
 
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...confluent
 
Real time data-pipeline from inception to production
Real time data-pipeline from inception to productionReal time data-pipeline from inception to production
Real time data-pipeline from inception to productionShreya Mukhopadhyay
 
Streaming analytics state of the art
Streaming analytics state of the artStreaming analytics state of the art
Streaming analytics state of the artStavros Kontopoulos
 
Horizontal Scaling for Millions of Customers!
Horizontal Scaling for Millions of Customers! Horizontal Scaling for Millions of Customers!
Horizontal Scaling for Millions of Customers! elangovans
 
Self-Tuning Data Centers
Self-Tuning Data CentersSelf-Tuning Data Centers
Self-Tuning Data CentersReza Rahimi
 
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...HostedbyConfluent
 
Bridge Your Kafka Streams to Azure Webinar
Bridge Your Kafka Streams to Azure WebinarBridge Your Kafka Streams to Azure Webinar
Bridge Your Kafka Streams to Azure Webinarconfluent
 
Fast Data – Fast Cars: Wie Apache Kafka die Datenwelt revolutioniert
Fast Data – Fast Cars: Wie Apache Kafka die Datenwelt revolutioniertFast Data – Fast Cars: Wie Apache Kafka die Datenwelt revolutioniert
Fast Data – Fast Cars: Wie Apache Kafka die Datenwelt revolutioniertconfluent
 
Spark Streaming and IoT by Mike Freedman
Spark Streaming and IoT by Mike FreedmanSpark Streaming and IoT by Mike Freedman
Spark Streaming and IoT by Mike FreedmanSpark Summit
 
Streamsheets and Apache Kafka – Interactively build real-time Dashboards and ...
Streamsheets and Apache Kafka – Interactively build real-time Dashboards and ...Streamsheets and Apache Kafka – Interactively build real-time Dashboards and ...
Streamsheets and Apache Kafka – Interactively build real-time Dashboards and ...confluent
 
Unlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsUnlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsconfluent
 
Real-time data integration to the cloud
Real-time data integration to the cloudReal-time data integration to the cloud
Real-time data integration to the cloudSankar Nagarajan
 
Open Source Bristol 30 March 2022
Open Source Bristol 30 March 2022Open Source Bristol 30 March 2022
Open Source Bristol 30 March 2022Timothy Spann
 
[Kubecon 2017 Austin, TX] How We Built a Framework at Twitter to Solve Servic...
[Kubecon 2017 Austin, TX] How We Built a Framework at Twitter to Solve Servic...[Kubecon 2017 Austin, TX] How We Built a Framework at Twitter to Solve Servic...
[Kubecon 2017 Austin, TX] How We Built a Framework at Twitter to Solve Servic...Vinu Charanya
 
High Performance Green Infrastructure, New Directions in Real-Time Control
High Performance Green Infrastructure, New Directions in Real-Time ControlHigh Performance Green Infrastructure, New Directions in Real-Time Control
High Performance Green Infrastructure, New Directions in Real-Time ControlMarcus Quigley
 

Similar to Modern real-time streaming architectures (20)

Real-time processing of large amounts of data
Real-time processing of large amounts of dataReal-time processing of large amounts of data
Real-time processing of large amounts of data
 
EDA Meets Data Engineering – What's the Big Deal?
EDA Meets Data Engineering – What's the Big Deal?EDA Meets Data Engineering – What's the Big Deal?
EDA Meets Data Engineering – What's the Big Deal?
 
Trivento summercamp masterclass 9/9/2016
Trivento summercamp masterclass 9/9/2016Trivento summercamp masterclass 9/9/2016
Trivento summercamp masterclass 9/9/2016
 
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
 
Real time data-pipeline from inception to production
Real time data-pipeline from inception to productionReal time data-pipeline from inception to production
Real time data-pipeline from inception to production
 
Streaming analytics state of the art
Streaming analytics state of the artStreaming analytics state of the art
Streaming analytics state of the art
 
Horizontal Scaling for Millions of Customers!
Horizontal Scaling for Millions of Customers! Horizontal Scaling for Millions of Customers!
Horizontal Scaling for Millions of Customers!
 
Self-Tuning Data Centers
Self-Tuning Data CentersSelf-Tuning Data Centers
Self-Tuning Data Centers
 
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
 
Bridge Your Kafka Streams to Azure Webinar
Bridge Your Kafka Streams to Azure WebinarBridge Your Kafka Streams to Azure Webinar
Bridge Your Kafka Streams to Azure Webinar
 
Ecosystem Building for IC Industry
Ecosystem Building for IC IndustryEcosystem Building for IC Industry
Ecosystem Building for IC Industry
 
Fast Data – Fast Cars: Wie Apache Kafka die Datenwelt revolutioniert
Fast Data – Fast Cars: Wie Apache Kafka die Datenwelt revolutioniertFast Data – Fast Cars: Wie Apache Kafka die Datenwelt revolutioniert
Fast Data – Fast Cars: Wie Apache Kafka die Datenwelt revolutioniert
 
Linux capacity planning
Linux capacity planningLinux capacity planning
Linux capacity planning
 
Spark Streaming and IoT by Mike Freedman
Spark Streaming and IoT by Mike FreedmanSpark Streaming and IoT by Mike Freedman
Spark Streaming and IoT by Mike Freedman
 
Streamsheets and Apache Kafka – Interactively build real-time Dashboards and ...
Streamsheets and Apache Kafka – Interactively build real-time Dashboards and ...Streamsheets and Apache Kafka – Interactively build real-time Dashboards and ...
Streamsheets and Apache Kafka – Interactively build real-time Dashboards and ...
 
Unlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsUnlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insights
 
Real-time data integration to the cloud
Real-time data integration to the cloudReal-time data integration to the cloud
Real-time data integration to the cloud
 
Open Source Bristol 30 March 2022
Open Source Bristol 30 March 2022Open Source Bristol 30 March 2022
Open Source Bristol 30 March 2022
 
[Kubecon 2017 Austin, TX] How We Built a Framework at Twitter to Solve Servic...
[Kubecon 2017 Austin, TX] How We Built a Framework at Twitter to Solve Servic...[Kubecon 2017 Austin, TX] How We Built a Framework at Twitter to Solve Servic...
[Kubecon 2017 Austin, TX] How We Built a Framework at Twitter to Solve Servic...
 
High Performance Green Infrastructure, New Directions in Real-Time Control
High Performance Green Infrastructure, New Directions in Real-Time ControlHigh Performance Green Infrastructure, New Directions in Real-Time Control
High Performance Green Infrastructure, New Directions in Real-Time Control
 

More from Arun Kejariwal

Anomaly Detection At The Edge
Anomaly Detection At The EdgeAnomaly Detection At The Edge
Anomaly Detection At The EdgeArun Kejariwal
 
Serverless Streaming Architectures and Algorithms for the Enterprise
Serverless Streaming Architectures and Algorithms for the EnterpriseServerless Streaming Architectures and Algorithms for the Enterprise
Serverless Streaming Architectures and Algorithms for the EnterpriseArun Kejariwal
 
Sequence-to-Sequence Modeling for Time Series
Sequence-to-Sequence Modeling for Time SeriesSequence-to-Sequence Modeling for Time Series
Sequence-to-Sequence Modeling for Time SeriesArun Kejariwal
 
Sequence-to-Sequence Modeling for Time Series
Sequence-to-Sequence Modeling for Time SeriesSequence-to-Sequence Modeling for Time Series
Sequence-to-Sequence Modeling for Time SeriesArun Kejariwal
 
Model Serving via Pulsar Functions
Model Serving via Pulsar FunctionsModel Serving via Pulsar Functions
Model Serving via Pulsar FunctionsArun Kejariwal
 
Designing Modern Streaming Data Applications
Designing Modern Streaming Data ApplicationsDesigning Modern Streaming Data Applications
Designing Modern Streaming Data ApplicationsArun Kejariwal
 
Correlation Analysis on Live Data Streams
Correlation Analysis on Live Data StreamsCorrelation Analysis on Live Data Streams
Correlation Analysis on Live Data StreamsArun Kejariwal
 
Deep Learning for Time Series Data
Deep Learning for Time Series DataDeep Learning for Time Series Data
Deep Learning for Time Series DataArun Kejariwal
 
Correlation Analysis on Live Data Streams
Correlation Analysis on Live Data StreamsCorrelation Analysis on Live Data Streams
Correlation Analysis on Live Data StreamsArun Kejariwal
 
Live Anomaly Detection
Live Anomaly DetectionLive Anomaly Detection
Live Anomaly DetectionArun Kejariwal
 
Anomaly detection in real-time data streams using Heron
Anomaly detection in real-time data streams using HeronAnomaly detection in real-time data streams using Heron
Anomaly detection in real-time data streams using HeronArun Kejariwal
 
Data Data Everywhere: Not An Insight to Take Action Upon
Data Data Everywhere: Not An Insight to Take Action UponData Data Everywhere: Not An Insight to Take Action Upon
Data Data Everywhere: Not An Insight to Take Action UponArun Kejariwal
 
Real Time Analytics: Algorithms and Systems
Real Time Analytics: Algorithms and SystemsReal Time Analytics: Algorithms and Systems
Real Time Analytics: Algorithms and SystemsArun Kejariwal
 
Finding bad apples early: Minimizing performance impact
Finding bad apples early: Minimizing performance impactFinding bad apples early: Minimizing performance impact
Finding bad apples early: Minimizing performance impactArun Kejariwal
 
Statistical Learning Based Anomaly Detection @ Twitter
Statistical Learning Based Anomaly Detection @ TwitterStatistical Learning Based Anomaly Detection @ Twitter
Statistical Learning Based Anomaly Detection @ TwitterArun Kejariwal
 
Days In Green (DIG): Forecasting the life of a healthy service
Days In Green (DIG): Forecasting the life of a healthy serviceDays In Green (DIG): Forecasting the life of a healthy service
Days In Green (DIG): Forecasting the life of a healthy serviceArun Kejariwal
 
Gimme More! Supporting User Growth in a Performant and Efficient Fashion
Gimme More! Supporting User Growth in a Performant and Efficient FashionGimme More! Supporting User Growth in a Performant and Efficient Fashion
Gimme More! Supporting User Growth in a Performant and Efficient FashionArun Kejariwal
 
A Systematic Approach to Capacity Planning in the Real World
A Systematic Approach to Capacity Planning in the Real WorldA Systematic Approach to Capacity Planning in the Real World
A Systematic Approach to Capacity Planning in the Real WorldArun Kejariwal
 
Isolating Events from the Fail Whale
Isolating Events from the Fail WhaleIsolating Events from the Fail Whale
Isolating Events from the Fail WhaleArun Kejariwal
 

More from Arun Kejariwal (20)

Anomaly Detection At The Edge
Anomaly Detection At The EdgeAnomaly Detection At The Edge
Anomaly Detection At The Edge
 
Serverless Streaming Architectures and Algorithms for the Enterprise
Serverless Streaming Architectures and Algorithms for the EnterpriseServerless Streaming Architectures and Algorithms for the Enterprise
Serverless Streaming Architectures and Algorithms for the Enterprise
 
Sequence-to-Sequence Modeling for Time Series
Sequence-to-Sequence Modeling for Time SeriesSequence-to-Sequence Modeling for Time Series
Sequence-to-Sequence Modeling for Time Series
 
Sequence-to-Sequence Modeling for Time Series
Sequence-to-Sequence Modeling for Time SeriesSequence-to-Sequence Modeling for Time Series
Sequence-to-Sequence Modeling for Time Series
 
Model Serving via Pulsar Functions
Model Serving via Pulsar FunctionsModel Serving via Pulsar Functions
Model Serving via Pulsar Functions
 
Designing Modern Streaming Data Applications
Designing Modern Streaming Data ApplicationsDesigning Modern Streaming Data Applications
Designing Modern Streaming Data Applications
 
Correlation Analysis on Live Data Streams
Correlation Analysis on Live Data StreamsCorrelation Analysis on Live Data Streams
Correlation Analysis on Live Data Streams
 
Deep Learning for Time Series Data
Deep Learning for Time Series DataDeep Learning for Time Series Data
Deep Learning for Time Series Data
 
Correlation Analysis on Live Data Streams
Correlation Analysis on Live Data StreamsCorrelation Analysis on Live Data Streams
Correlation Analysis on Live Data Streams
 
Live Anomaly Detection
Live Anomaly DetectionLive Anomaly Detection
Live Anomaly Detection
 
Anomaly detection in real-time data streams using Heron
Anomaly detection in real-time data streams using HeronAnomaly detection in real-time data streams using Heron
Anomaly detection in real-time data streams using Heron
 
Data Data Everywhere: Not An Insight to Take Action Upon
Data Data Everywhere: Not An Insight to Take Action UponData Data Everywhere: Not An Insight to Take Action Upon
Data Data Everywhere: Not An Insight to Take Action Upon
 
Real Time Analytics: Algorithms and Systems
Real Time Analytics: Algorithms and SystemsReal Time Analytics: Algorithms and Systems
Real Time Analytics: Algorithms and Systems
 
Finding bad apples early: Minimizing performance impact
Finding bad apples early: Minimizing performance impactFinding bad apples early: Minimizing performance impact
Finding bad apples early: Minimizing performance impact
 
Velocity 2015-final
Velocity 2015-finalVelocity 2015-final
Velocity 2015-final
 
Statistical Learning Based Anomaly Detection @ Twitter
Statistical Learning Based Anomaly Detection @ TwitterStatistical Learning Based Anomaly Detection @ Twitter
Statistical Learning Based Anomaly Detection @ Twitter
 
Days In Green (DIG): Forecasting the life of a healthy service
Days In Green (DIG): Forecasting the life of a healthy serviceDays In Green (DIG): Forecasting the life of a healthy service
Days In Green (DIG): Forecasting the life of a healthy service
 
Gimme More! Supporting User Growth in a Performant and Efficient Fashion
Gimme More! Supporting User Growth in a Performant and Efficient FashionGimme More! Supporting User Growth in a Performant and Efficient Fashion
Gimme More! Supporting User Growth in a Performant and Efficient Fashion
 
A Systematic Approach to Capacity Planning in the Real World
A Systematic Approach to Capacity Planning in the Real WorldA Systematic Approach to Capacity Planning in the Real World
A Systematic Approach to Capacity Planning in the Real World
 
Isolating Events from the Fail Whale
Isolating Events from the Fail WhaleIsolating Events from the Fail Whale
Isolating Events from the Fail Whale
 

Recently uploaded

Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 

Recently uploaded (20)

Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 

Modern real-time streaming architectures

  • 1. A Tutorial Modern Real-Time Streaming Architectures Karthik  Ramasamy*,  Sanjeev  Kulkarni*,  Neng  Lu#,  Arun  Kejariwal^   and  Sijie  Guo* *Streamlio,  #Twi0er,  ^MZ
  • 2. 2 MESSAGING STREAMING OPERATIONS DATA SKETECHES LAMBDA, KAPPA UNIFICATION OUTLINE
  • 3. 3 Information Age Real  @me  is  key   Ká !
  • 4. 4 Internet of Things (IoT) $1.9  T  in  value  by  2020  -­‐  Mfg  (15%),  Health  Care  (15%),  Insurance  (11%)   26  B  -­‐  75  B  units  [2,  3,  4,  5] Improve  opera@onal  efficiencies,  customer  experience,  new  business  modelsY Beacons:  Retailers  and  bank  branches   60M  units  market  by  2019  [6] Smart  buildings:    Reduce  energy  costs,  cut  maintenance  costs   Increase  safety  &  security Large  Market  Poten@al  
  • 5. 5 The Future Biostamps [2] Mobile Sensor Network Exponential growth [1] [1]  hap://opensignal.com/assets/pdf/reports/2015_08_fragmenta@on_report.pdf   [2]  hap://www.ericsson.com/thinkingahead/networked_society/stories/#/film/mc10-­‐biostamp
  • 6. 6 Intelligent Health Care Con@nuous  Monitoring   Tracking Movements Measure  effect  of  social   influences Google Lens Measure  glucose  level  in   tears Watch/Wristband Smart Textiles Skin  temperature   Perspira@on Ingestible Sensors Medica@on  compliance  [1] Heart  func@on ! !
  • 7. 7 User Experience, Productivity Real  @me   Real-time Video Streams N E W S Drones Robotics I N D U S T R Y   $ 4 0   B   b y   2 0 2 0   [ 3 ] [2]
  • 8. 8 Increasingly Connected World Internet of Things 30  B  connected  devices  by  2020 Health Care 153  Exabytes  (2013)  -­‐>  2314  Exabytes  (2020) Machine Data 40%  of  digital  universe  by  2020 Connected Vehicles Data  transferred  per  vehicle  per  month   4  MB  -­‐>  5  GB Digital Assistants (Predictive Analytics) $2B  (2012)  -­‐>  $6.5B  (2019)  [1]   Siri/Cortana/Google  Now Augmented/Virtual Reality $150B  by  2020  [2]   Oculus/HoloLens/Magic  Leap Ñ !+ >
  • 10. 10 Traditional Data Processing Challenges   Introduces  too  much  “decision  latency”   Responses  are  delivered  “aqer  the  fact”   Maximum  value  of  the  iden@fied  situa@on  is  lost   Decisions  are  made  on  old  and  stale  data   Data  at  Rest Store Analyze Act
  • 11. 11 The New Era: Streaming Data/Fast Data Events  are  analyzed  and  processed  in  real-­‐@me  as  they  drive   Decisions  are  @mely,  contextual  and  based  on  fresh  data   Decision  latency  is  eliminated   Data  in  mo@on
  • 12. 12 Real Time Use Cases Algorithmic  trading   Online  fraud  detec@on   Geo  fencing   Proximity/loca@on  tracking   Intrusion  detec@on  systems   Traffic  management Real  @me  recommenda@ons   Churn  detec@on   Internet  of  things   Social  media/data  analy@cs   Gaming  data  feed
  • 13. 13 Requirements of Stream Processing In-stream Handle imperfections Predictable Performance Process  data  as  it  is   passes  by Delayed,  missing  and   out-­‐of-­‐order  data and  Repeatable and  Scalability I
  • 14. 14 High level languages Integrate stored and streaming data Data safety and availability Process and respond SQL  or  DSL for  comparing  present   with  the  past and  Repeatable Applica@on  should  keep   at  high  volumes " # $ Requirements of Stream Processing
  • 15. 15 Real Time Stack REAL TIME STACK Collectors s Compute J Messaging a Storage b
  • 17. 17 Current Messaging Systems 01 02 03 04 05 06 07 08 ActiveMQ RabbitMQ Pulsar RocketMQ Azure Event Hub Google Pub-Sub Satori Kafka
  • 18. 18 Why Apache Pulsar? Ordering   Guaranteed  ordering MulH-­‐tenancy   A  single  cluster  can  support  many   tenants  and  use  cases High  throughput   Can  reach  1.8  M  messages/s  in  a   single  parHHon Durability   Data  replicated  and  synced  to  disk Geo-­‐replicaHon   Out  of  box  support  for  geographically   distributed  applicaHons Unified  messaging  model   Support  both  Topic  &  Queue   semanHc  in  a  single  model Delivery  Guarantees   At  least  once,  at  most  once  and  effecHvely   once Low  Latency   Low  publish  latency  of  5ms  at  99pct Highly  scalable   Can  support  millions  of  topics
  • 19. 19 Unified Messaging Model Producer  (X)   Producer  (Y) Topic  (T) Subscrip@on  (A)   Subscrip@on  (B)   Consumer  (A1) Consumer  (B2) Consumer  (B1) Consumer  (B3)
  • 20. 20 Pulsar Producer PulsarClient client = PulsarClient.create( “http://broker.usw.example.com:8080”); Producer producer = client.createProducer( “persistent://my-property/us-west/my-namespace/my-topic”); // handles retries in case of failure producer.send("my-message".getBytes()); // Async version: producer.sendAsync("my-message".getBytes()).thenRun(() -> { // Message was persisted });
  • 21. 21 Pulsar Consumer PulsarClient client = PulsarClient.create( "http://broker.usw.example.com:8080"); Consumer consumer = client.subscribe( "persistent://my-property/us-west/my-namespace/my-topic", "my-subscription-name"); while (true) { // Wait for a message Message msg = consumer.receive(); System.out.println("Received message: " + msg.getData()); // Acknowledge the message so that it can be deleted by broker consumer.acknowledge(msg); }
  • 22. 22 Pulsar Architecture Pulsar  Broker  1 Pulsar  Broker  1 Pulsar  Broker  1 Bookie  1 Bookie  2 Bookie  3 Bookie  4 Bookie  5 Apache  BookKeeper Apache  Pulsar Producer   Consumer   Stateless  Serving BROKER   Clients interact only with brokers No state is stored in brokers BOOKIES   Apache BookKeeper as the storage Storage is append only Provides high performance, low latency Durability   No data loss. fsync before acknowledgement
  • 23. 23 Pulsar Architecture Pulsar  Broker  1 Pulsar  Broker  1 Pulsar  Broker  1 Bookie  1 Bookie  2 Bookie  3 Bookie  4 Bookie  5 Apache  BookKeeper Apache  Pulsar Producer   Consumer   SeparaHon  of  Storage  and  Serving SERVING Brokers can be added independently Traffic can be shifted quickly across brokers STORAGE   Bookies can be added independently New bookies will ramp up traffic quickly
  • 24. 24 Pulsar Architecture Clients CLIENTS Lookup correct broker through service discovery Establish connections to brokers Enforce authentication/authorization during connection establishment Establish producer/consumer session Reconnect with backoff strategy Dispatcher Load  Balancer Managed   Ledger CacheGlobal  Replica@on Producer   Consumer  Service   Discovery   Pulsar   Broker Bookie
  • 25. 25 Pulsar Architecture Message  Dispatching DISPATCHER End-to-end async message processing Messages relayed across producers, bookies and consumers with no copies Pooled reference count buffers Dispatcher Load  Balancer Managed   Ledger CacheGlobal  Replica@on Producer   Consumer  Service   Discovery   Pulsar   Broker Bookie MANAGED LEDGER Abstraction of single topic storage Cache recent messages
  • 26. 26 Pulsar Architecture Geo  ReplicaHon GEO REPLICATION Asynchronous replication Integrated in the broker message flow Simple configuration to add/remove regions Topic  (T1) Topic  (T1) Topic  (T1) Subscrip@on   (S1) Subscrip@on   (S1) Producer     (P1) Consumer     (C1) Producer     (P3) Producer     (P2) Consumer     (C2) Data  Center  A Data  Center  B Data  Center  C
  • 27. 27 Pulsar Use Cases - Message Queue Online  Events Topic  (T) Worker  1 Worker  2 Decouple  Online/Offline Topic  (T) Worker  3 MESSAGE QUEUES Decouple online or background High availability Reliable data transport NoHficaHons Long  running  tasks Low  latency     publish
  • 28. 28 Pulsar Use Cases - Feedback System Event Topic  (T) Propagate  States Controller Topic  (T) Serving  System Serving  System Serving  System FEEDBACK SYSTEM Coordinate large number of machines Propagate states Examples State propagation Personalization Ad-systems Feedback Updates
  • 29. 29 Pulsar in Production 3+  years   Serves  2.3  million  topics   100  billion  messages/day   Average  latency  <  5  ms   99%  15  ms  (strong  durability  guarantees)   Zero  data  loss   80+  applica@ons   Self  served  provisioning   Full-­‐mesh  cross-­‐datacenter  replica@on  -­‐   8+  data  centers
  • 32. 32 Current Streaming Frameworks 01 02 03 04 05 06 07 08 Beam S-Store Spark Flink Heron Storm Apex KAFKA STREAMS
  • 33. 33 Apache Beam Promises   Abstrac@ng  the  Computa@on   Express  Computa@on   Expressive  Windowing/Triggering   Incremental  Processing  for  late  data   Selectable  Engine   Select  criteria   Latency   Resource  Cost   Supported  Engines   Google  DataFlow   Apache  Spark,  Apache  Flink,  Apache  Apex
  • 34. 34 Apache Beam ComputaHon  AbstracHon   All  Data  is  4  tuple   Key,  Value   Event  Time   Window  the  tuple  belongs   Core  Operators   ParDo   User  supplied  DoFn   Emits  Zero  or  more  elements   GroupByKey   Groups  tuples  by  keys  in  the  window
  • 35. 35 Apache Beam Windowing   Window  Assignment   Fixed(a.k.a.  Tumbling),  Sliding  Session   Pluggable   Window  Merging   Happens  during  GroupByKey   Window  Triggering   Discard/Accumulate/Retract
  • 36. 36 Apache Beam Challenges   Mul@ple  Layers   API  vs  Execu@on   Troubleshoo@ng  complexi@es   Need  higher  level  APIs   Mul@ple  efforts  on  their  way   Other  Cloud  Vendor  Buy-­‐in?   Azure/AWS?
  • 37. 37 IBM S-Store Promises   Combine  Stream  Processing  and  Transac@ons   Extended  an  OLTP  engine(H-­‐Store)  adding   Tuple  Ordering   Windowing   Push-­‐based  processing   Exactly  Once  Seman@cs
  • 38. 38 IBM S-Store Data  and  Processing  Model     Tuples  grouped  into  Atomic  Batches   Grouping  of  Non-­‐overlapping  tuples   Treated  like  a  Transac@on   Atomic  Batches  belong  to  one  Stream   Processing  is  modeled  as  a  DAG   DAG  nodes  consume  one  or  more  streams  and  possibly  output  more   Node  logic  is  treated  as  a  Transac@on  
  • 39. 39 IBM S-Store Exactly  Once  Guarantees   Strong   Inputs  and  Outputs  are  logged  at  every  DAG  node   On  component  failure,  the  log  is  replayed  from  snapshot   Weak   Distributed  Snapsho{ng
  • 40. 40 IBM S-Store Challenges   Throughput   Non  OLTP  processing  is  much  slower  compared  to  modern  systems   Scalability   Mul@-­‐Node  s@ll  in  research  (2016)
  • 41. 41 Heron Terminology Topology Directed  acyclic  graph     ver@ces  =  computa@on,  and     edges  =  streams  of  data  tuples Spouts Sources  of  data  tuples  for  the  topology   Examples  -­‐  Pulsar/Ka}a/MySQL/Postgres Bolts Process  incoming  tuples,  and  emit  outgoing  tuples   Examples  -­‐  filtering/aggrega@on/join/any  func@on , %
  • 42. 42 Heron Topology % % % % % Spout 1 Spout 2 Bolt 1 Bolt 2 Bolt 3 Bolt 4 Bolt 5
  • 43. 43 Heron Topology - Physical Execution % % % % % Spout 1 Spout 2 Bolt 1 Bolt 2 Bolt 3 Bolt 4 Bolt 5 %% %% %% %% %%
  • 44. 44 Heron Groupings 01 02 03 04 Shuffle Grouping Random distribution of tuples Fields Grouping Group tuples by a field or multiple fields All Grouping Replicates tuples to all tasks Global Grouping Send the entire stream to one task / . - ,
  • 45. 45 Heron Topology - Physical Execution % % % % % Spout 1 Spout 2 Bolt 1 Bolt 2 Bolt 3 Bolt 4 Bolt 5 %% %% %% %% %% Shuffle Grouping Shuffle Grouping Fields Grouping Fields Grouping Fields Grouping Fields Grouping
  • 46. 46 Writing Heron Topologies Procedural - Low Level API Directly  write  your  spouts  and   bolts Functional - Mid Level API Use  of  maps,  flat  maps,  transform,  windows Declarative - SQL (coming) Use  of  declara@ve  language  -­‐  specify  what  you   want,  system  will  figure  it  out. , %
  • 47. 47 Heron Design Goals Efficiency   Reduce  resource  consump@on Support  for  diverse  workloads   Throughput  vs  latency  sensi@ve Support  for  mulHple  semanHcs   Atmost  once,  Atleast  once,   Effec@vely  once NaHve  MulH-­‐Language  Support   C++,  Java,  Python Task  IsolaHon   Ease  of  debug-­‐ability/isola@on/profiling   Support  for  back  pressure   Topologies  should  be  self  adjus@ng Use  of  containers   Runs  in  schedulers  -­‐  Kubernetes  &  DCOS  &   many  more MulH-­‐level  APIs   Procedural,  Func@onal  and  Declara@ve  for   diverse  applica@ons Diverse  deployment  models   Run  as  a  service  or  pure  library
  • 48. 48 Heron Architecture Scheduler Topology 1 Topology 2 Topology N Topology Submission
  • 49. 49 Topology Master Monitoring of containers Gateway for metrics Assigns role
  • 50. 50 Topology Architecture Topology Master ZK Cluster Stream Manager I1 I2 I3 I4 Stream Manager I1 I2 I3 I4 Logical Plan, Physical Plan and Execution State Sync Physical Plan DATA CONTAINER DATA CONTAINER Metrics Manager Metrics Manager Metrics Manager Health Manager MASTER CONTAINER
  • 51. 51 Stream Manager Routes tuples Implements backpressure Ack management
  • 53. 53 Stream Manager Physical  execuHon S1 B2 B3 Stream Manager Stream Manager Stream Manager Stream Manager S1 B2 B3 B4 S1 B2 B3 S1 B2 B3 B4 B4
  • 54. 54 Stream Manager Backpressure Spout based back pressureTCP backpressure Stage by stage back pressure
  • 55. 55 Stream Manager Backpressure TCP  based  backpressure Slows upstream and downstream instances S1 B2 B3 Stream Manager Stream Manager Stream Manager Stream Manager S1 B2 B3 B4 S1 B2 B3 S1 B2 B3 B4 B4
  • 56. 56 Stream Manager Backpressure Spout  based  backpressure S1 S1 S1S1S1 S1 S1S1 B2 B3 Stream Manager Stream Manager Stream Manager Stream Manager B2 B3 B4 B2 B3 B2 B3 B4 B4
  • 57. 57 Heron Instance Runs only one task (spout/bolt) Exposes Heron API Collects several metrics API G
  • 61. 61 Pulsar Operations Reac@ng  to  Failures   Brokers   Bookies   Common  Issues   Consumer  Backlog   I/O  Priori@za@on  and  Throaling   Mul@-­‐Tenancy
  • 62. 62 Reacting to Failures - Brokers Brokers  don’t  have  durable  state   Easily  replaceable   Topics  are  immediately  reassigned  to  healthy  brokers   Expanding  capacity   Simply  add  new  broker  node   If  other  brokers  are  overloaded,  traffic  will  be  automa@cally  assigned   Load  manager   Monitor  traffic  load  on  all  brokers  (CPU,  memory,  network,  topics)   Ini@ally  place  topics  to  least  loaded  brokers   Reassign  topics  when  a  broker  is  overloaded
  • 63. 63 Reacting to Failures - Bookies When  a  bookie  fails,  brokers  will  immediately  con@nue  on  other  bookies   Auto-­‐Recovery  mechanism  will  re-­‐establish  the  replica@on  factor  in  background   If  a  bookie  keeps  giving  errors  or  @meouts,  it  will  be  “quaran@ned”   Not  considered  for  new  ledgers  for  some  period  of  @me
  • 64. 64 Consumer Backlog Metrics  are  available  to  make  assessments:   When  problem  started   How  big  is  backlog?  Messages?  Disk  space?     How  fast  is  draining?   What’s  the  ETA  to  catch  up  with  publishers?   Establish  where  is  the  boaleneck   Applica@on  is  not  fast  enough   Disk  read  IO
  • 65. 65 I/O Prioritization and Throttling Priori@ze  access  to  IO   During  an  outage  many  tenants  might  try  to  drain  backlog  as  fast  as  they  can   Read  IO  becomes  the  boaleneck   Throaling  can  be  used  to  priori@ze  draining:   Cri@cal  use  cases  can  recover  quickly   Fewer  concurrent  readers  lead  to  higher  throughput   Once  they  catch  up,  message  will  be  dispatched  from  cache
  • 66. 66 Enforcing Multi-Tenancy Ensure  tenants  don’t  cause  performance  issues  on  other  tenants   Backlog  quotas   Soq-­‐Isola@on   Flow  control   Throaling   In  cases  when  user  behavior  is  triggering  performance  degrada@on   Hard-­‐isola@on  as  a  last  resource  for  quick  reac@on  while  proper  fix  is  deployed   Isolate  tenant  on  a  subset  of  brokers   Can  be  also  applied  at  the  BookKeeper  level
  • 67. 67 Heron @Twitter LARGEST  CLUSTER 100’s  of  TOPOLOGIES BILLIONS  OF  MESSAGES100’s  OF  TERABYTESREDUCED  INCIDENTS GOOD  NIGHT  SLEEP 3X - 5X reduction in resource usage
  • 68. 68 Heron Deployment Topology 1 Topology 2 Topology N Heron Tracker Heron VIZ Heron Web ZK Cluster Aurora Services Observability
  • 70. 70 Heron Use Cases Monitoring Real  Time     Machine  Learning Ads Real  Time  Trends Product  Safety Real  Time  Business   Intelligence
  • 72. 72 Heron Topology Scale CONTAINERS - 1 TO 600 INSTANCES - 10 TO 6000
  • 73. 73 Heron Happy Facts :) v  No  more  pages  during  midnight  for  Heron  team   Ø  Very  rare  incidents  for  Heron  customer  teams   ü  Easy  to  debug  during  incident  for  quick  turn  around   §  Reduced  resource  u@liza@on  saving  cost
  • 74. 74 Heron Developer Issues 01 02 Container  resource  allocaHon Parallelism  tuning
  • 75. 75 Heron Operational Issues 01 02 03 Slow Hosts Network Issues Data Skew / . - 04 Load Variations , 05 SLA Violations /
  • 76. 76 Slow Hosts Memory Parity Errors Impending Disk Failures Lower GHZ
  • 78. 78 Network Slowness 01 02 03 Delays  processing Data  is  accumulaHng Timelines  of  results   Is  affected
  • 79. 79 Data Skew Multiple Keys Several  keys  map  into  single   instance  and  their  count  is   high Single Key Single   key   maps   into   a   instance   and  its  count  is  high H C
  • 80. 80 Load Variations Spikes Sudden  surge  of  data  -­‐  short   lived  vs  last  for  several   minutes Daily Patterns Predictable  change  in  traffic H C
  • 81. 81 Self Regulating Streaming Systems Automate  Tuning SLO     Maintenance Self  RegulaHng   Streaming  Systems Tuning Manual,  @me-­‐consuming  and   error-­‐prone   task   of   tuning   various   systems   knobs   to   achieve  SLOs SLO Maintenance  of  SLOs  in  the  face  of   unpredictable   load   varia@ons   and   hardware   or   soqware   performance   degrada@on Self  RegulaHng  Streaming  Systems System   that   adjusts   itself   to   the   environmental   changes   and  con@nue  to  produce  results
  • 82. 82 Self Regulating Streaming Systems Self tuning Self stabilizing Self healing G ! Several tuning knobs Time consuming tuning phase The system should take as input an SLO and automatically configure the knobs. The system should react to external shocks a n d a u t o m a t i c a l l y reconfigure itself Stream jobs are long running Load variations are common The system should identify internal faults and attempt to recover from them System performance affected by hardware or software delivering degraded quality of service
  • 83. 83 Enter Dhalion Dhalion periodically executes well-specified policies that optimize execution based on some objective. We created policies that dynamically provision resources in the presence of load variations and auto-tune streaming applications so that a throughput SLO is met. Dhalion is a policy based framework integrated into Heron
  • 84. 84 Dhalion Policy Framework Symptom Detector 1 Symptom Detector 2 Symptom Detector 3 Symptom Detector N .... Diagnoser 1 Diagnoser 2 Diagnoser M .... Resolver Invocation D iagnosis 1 Diagnosis 2 D iagnosis M Symptom 1 Symptom 2 Symptom 3 Symptom N Symptom Detection Diagnosis Generation Resolution Resolver 1 Resolver 2 Resolver M .... Resolver Selection Metrics
  • 85. 85 Dynamic Resource Provisioning Policy This  policy  reacts  to  unexpected   load  varia@ons  (workload  spikes) Goal Goal  is  to  scale  up  and  scale  down   the  topology  resources  as  needed  -­‐   while   keeping   the   topology   in   a   steady  state  where  back  pressure  is   not  observed H C Policy
  • 86. 86 Dynamic Resource Provisioning Pending Tuples Detector Backpressure Detector Processing Rate Skew Detector Resource Over provisioning Diagnoser Resource Under Provisioning Diagnoser Data Skew Diagnoser Resolver Invocation Diagnosis Symptoms Symptom Detection Diagnosis Generation Resolution Metrics Slow Instances Diagnoser Bolt  Scale     Down  Resolver Bolt  Scale     Up  Resolver Data  Skew   Resolver Restart   Instances   Resolver ImplementaHon
  • 87. 87 Dynamic Resource Provisioning Tweet Spout Tweet Spout Tweet Spout % % % % Splitter Spout Splitter Spout Counter Bolt Counter Bolt 100  |  20 100  |  20 processing  rate  (tps)  |  queue  size  (#tuples) Steady  State
  • 88. 88 Dynamic Resource Provisioning Tweet Spout Tweet Spout Tweet Spout % % % % Splitter Spout Splitter Spout Counter Bolt Counter Bolt 150  |  80 150  |  80 processing  rate  (tps)  |  queue  size  (#tuples) Under  provisioning
  • 89. 89 Dynamic Resource Provisioning Tweet Spout Tweet Spout Tweet Spout % % % % Splitter Spout Splitter Spout Counter Bolt Counter Bolt 100  |  20 100  |  20 processing  rate  (tps)  |  queue  size  (#tuples) Steady  State
  • 90. 90 Dynamic Resource Provisioning Tweet Spout Tweet Spout Tweet Spout % % % % Splitter Spout Splitter Spout Counter Bolt Counter Bolt 50  |  05 50  |  80 processing  rate  (tps)  |  queue  size  (#tuples) Slow  Instance
  • 91. 91 Dynamic Resource Provisioning Tweet Spout Tweet Spout Tweet Spout % % % % Splitter Spout Splitter Spout Counter Bolt Counter Bolt 100  |  20 100  |  20 processing  rate  (tps)  |  queue  size  (#tuples) Steady  State
  • 92. 92 Dynamic Resource Provisioning Tweet Spout Tweet Spout Tweet Spout % % % % Splitter Spout Splitter Spout Counter Bolt Counter Bolt 50  |  05 150  |  80 processing  rate  (tps)  |  queue  size  (#tuples) Data  Skew
  • 93. 93 Experimental Setup % % Spout Splitter Bolt Counter Bolt Shuffle Grouping Fields Grouping Microsoq  HDInsight   Intel  Xeon  ES-­‐2673  CPU@2.40  GHz   28  GB  of  Memory Throughput  of  Spouts  (No.  Of   tuples  emiaed  over  1  min)   Throughput  of  Bolts  (No.  of  tuples   emiaed  over  1  min)   Number  of  Heron  Instances   provisioned Hardware  and  Soqware  Configura@on Evalua@on  Metrics
  • 94. 94 Dynamic Provisioning Profile 0.00 0.20 0.40 0.60 0.80 1.00 1.20 1.40 0 10 20 30 40 50 60 70 80 90 100 110 120 Normalized Throughput Time (in minutes) Spout Splitter Bolt Counter Bolt Scale Down Scale Up S1 S2 S3 The Dynamic Resource Provisioning Policy is able to adjust the topology resources on-the-fly when workload spikes occur. The policy can correctly detect and resolve bottlenecks even on multi- stage topologies where backpressure is gradually propagated from one stage of the topology to another. 0 5 10 15 0 20 40 60 80 100 120 Number of Bolts Time (in minutes) Splitter Bolt Counter Bolt Heron Instances are gradually scaled up and down according to the input load
  • 97. DATA  CHARACTERISTICS UNORDERED          Varying  Skew                  Event  Hme                  Processing  Hme          Correctness          Completeness
  • 98. 98 Neklix  ,  HBO,  Hulu,  YouTube,  DailymoHon,  ESPN,  Sling  TV Satori,  Facebook  Live,  Periscope SpoHfy,  Pandora,  Apple  Music,  Tidal Amazon  Twitch,  YouTube  Gaming,  Microsom  Mixer
  • 100. HIGH  VELOCITY  DATA 100 KEY  CHARACTERISTICS In-Order O(1) Storage O(n) Time ONE PASS
  • 101. 101 DISTRIBUTED COMPUTATION SCALE OUT ROBUST CHARACTERISTICS FAULT TOLERANCE KEY
  • 102. 102 DATA SKETCHES Early  work The space complexity of approximating the frequency moments Counting Frequent Elements [Misra and Gries, 1982] Flajolet and Martin 1985] Computing on Data Streams [Henzinger et al. 1998] [Alon et al. 1996] Counting [Morris, 1977] Median of a sequence [Munro and Paterson, 1980] Membership [Bloom, 1970]
  • 103. DATA  SKETCHES UNIQUE   FILTER   COUNT   HISTOGRAM   QUANTILE   MOMENTS   TOP-­‐K
  • 104. ADVANCED    DATA  SKETCHES RANDOM  PROJECTIONS,  FREQUENT  DIRECTIONS          Dimensionality  Reduc@on   RANDOMIZED  NUMERICAL  ALGEBRA            Matrix  mul@ply   GRAPHS          Summarize  adjacency                    Connec@vity,  k-­‐connec@vity,  Spanners,  Sparsifica@on   GEOMETRIC          Diameter,  Lp  distances,  Min-­‐cost  matchings          Informa@on  Distances,  e.g.,  Hellinger  distance   SKETCHING  SKETCHES          Tes@ng  independence
  • 105. 105 1 2 3 5 4 SAMPLING A/B   Tes@ng FILTERING Set   membership CORRELATION Fraud   Detec@on QUANTILES Network     Analysis CARDINALITY Site  Audience   Analysis Applications
  • 106. 106 6 7 8 10 9 MOMENTS Database FREQUENT     ELEMENTS Trending   Hashtags CLUSTERING Medical   Imaging ANOMALY     DETECTION Sensor  Networks SUBSEQUENCES Traffic   Analysis Applications ! ! " "
  • 107. 107 Sampling [1]  J.  S.  Viaer.  Random  Sampling  with  a  Reservoir.  ACM  Transac@ons  on  Mathema@cal  Soqware,  Vol.  11(1):37–57,  March  1985. Obtain  a  representa@ve  sample  from  a  data  stream   Maintain  dynamic  sample   A  data  stream  is  a  con@nuous  process   Not  known  in  advance  how  many  points  may  elapse  before  an  analyst  may  need  to  use  a   representa@ve  sample    Reservoir  sampling  [1]   Probabilis@c  inser@ons  and  dele@ons  on  arrival  of  new  stream  points   Probability  of  successive  inser@on  of  new  points  reduces  with  progression  of  the  stream   An  unbiased  sample  contains  a  larger  and  larger  frac@on  of  points  from  the  distant  history  of  the  stream   Prac@cal  perspec@ve   Data  stream  may  evolve  and  hence,  the  majority  of  the  points  in  the  sample  may  represent  the   stale  history
  • 108. 108 Sampling Sliding  window  approach  (sample  size  k,  window  width  n)   Sequence-­‐based     Replace  expired  element  with  newly  arrived  element     Disadvantage:  highly  periodic   Chain-­‐sample  approach     Select  element  ith  with  probability  Min(i,n)/n   Select  uniformly  at  random  an  index  from  [i+1,  i+n]  of  the  element  which  will  replace  the  ith  item    Maintain  k  independent  chain  samples    Timestamp-­‐based     #  elements  in  a  moving  window  may  vary  over  @me   Priority-­‐sample  approach 3 5 1 4 6 2 8 5 2 3 5 4 2 2 5 0 9 8 4 6 7 3 3 5 1 4 6 2 8 5 2 3 5 4 2 2 5 0 9 8 4 6 7 3 3 5 1 4 6 2 8 5 2 3 5 4 2 2 5 0 9 8 4 6 7 3 3 5 1 4 6 2 8 5 2 3 5 4 2 2 5 0 9 8 4 6 7 3 [1]  B.  Babcock.  Sampling  From  a  Moving  Window  Over  Streaming  Data.  In  Proceedings  of  SODA,  2002.
  • 109. 109 Sampling [1]  C.  C.  Aggarwal.On  Biased  Reservoir  Sampling  in  the  presence  of  Stream  Evolu@on.  in  Proceedings  of  VLDB,  2006.  Biased  Reservoir  Sampling  [1]   Use  a  temporal  bias  func@on  -­‐  recent  points  have  higher  probability  of     being  represented  in  the  sample  reservoir   Memory-­‐less  bias  func@ons   Future  probability  of  retaining  a  current  point  in  the  reservoir  is  independent  of  its  past  history  or     arrival  @me     Probability  of  an  rth  point  belonging  to  the  reservoir  at  the  @me  t  is  propor@onal  to  the  bias  func@on         Exponen@al  bias  func@ons  for  rth  data  point  at  @me  t,                                                                        where,  r  ≤  t,    λ  ∈  [0,  1]  is  the     bias  rate   Maximum  reservoir  requirement  R(t)  is  bounded
  • 110. 110 Filtering Set  Membership          Determine,  with  some  false  probability,  if  an  item  in  a  data  stream  has          been  seen  before   Databases  (e.g.,  speed  up  semi-­‐join  opera@ons),  Caches,  Routers,  Storage  Systems   Reduce  space  requirement  in  probabilis@c  rou@ng  tables   Speedup  longest-­‐prefix  matching  of  IP  addresses   Encode  mul@cast  forwarding  informa@on  in  packets   Summarize  content  to  aid  collabora@ons  in  overlay  and  peer-­‐to-­‐peer  networks   Improve  network  state  management  and  monitoring
  • 111. 111 Filtering Set  Membership Applica@on  to  hyphena@on     programs   Early  UNIX  spell  checkers [1]  Illustra@on  borrowed  from  hap://www.eecs.harvard.edu/~michaelm/postscripts/im2005b.pdf [1]
  • 112. 112 Filtering Set  Membership Natural  generaliza@on  of  hashing     False  posi@ves  are  possible       No  false  nega@ves   No  dele@ons  allowed   For  false  posi@ve  rate  ε,  #  hash  func@ons  =  log2(1/ε) where,  n  =  #  elements,                              k  =  #  hash  func@ons                            m  =  #  bits  in  the  array
  • 113. 113 Filtering Set  Membership Minimizing  false  posi@ve  rate  ε  w.r.t.  k  [1]   k  =  ln  2  *  (m/n)   ε  =  (1/2)k  ≈  (0.6185)m/n   1.44  *  log2(1/ε)  bits  per  item   Independent  of  item  size  or  #  items   Informa@on-­‐theore@c  minimum:  log2(1/ε)  bits  per  item   44%  overhead     X  =  #  0  bits where [1]  A.  Broder  and  M.  Mitzenmacher.  Network  Applica@ons  of  Bloom  Filters:  A  Survey.  In  Internet  Mathema@cs  Vol.  1,  No.  4,  2005.
  • 114. 114 Filtering Set  Membership:  Cuckoo  Filter  [1] Key  Highlights   Add  and  remove  items  dynamically     For  false  posi@ve  rate  ε  <  3%,  more  space  efficient  than  Bloom  filter   Higher  performance  than  Bloom  filter  for  many  real  workloads   Asympto@cally  worse  performance  than  Bloom  filter    Min  fingerprint  size  α  log  (#  entries  in  table)   Overview     Stores  only  a  fingerprint  of  an  item  inserted   Original  key  and  value  bits  of  each  item  not  retrievable     Set  membership  query  for  item  x:  search  hash  table  for  fingerprint  of  x [1]  Fan  et  al.,  Cuckoo  Filter:  Prac@cally  Beaer  Than  Bloom.  In  Proceedings  of  the  10th  ACM  Interna@onal  on  Conference  on  Emerging  Networking  Experiments  and  Technologies,  2014.
  • 115. 115 Filtering Set  Membership [1]  R.  Pagh  and  F.  Rodler.  Cuckoo  hashing.  Journal  of  Algorithms,  51(2):122-­‐144,  2004.   [2]  Illustra@on  borrowed  from  “Fan  et  al.,  Cuckoo  Filter:  Prac@cally  Beaer  Than  Bloom.  In  Proceedings  of  the  10th  ACM  Interna@onal  on  Conference  on  Emerging  Networking  Experiments  and  Technologies,  2014.” [2] Illustra@on  of  Cuckoo  hashing  [2] Cuckoo Hashing [1] High  space  occupancy   Prac@cal  implementa@ons:  mul@ple  items/bucket   Example  uses:  Soqware-­‐based  Ethernet  switches   Cuckoo Filter [2] Uses  a  mul@-­‐way  associa@ve  Cuckoo  hash  table   Employs  par@al-­‐key  cuckoo  hashing   Store  fingerprint  of  an  item     Relocate  exis@ng  fingerprints  to  their  alterna@ve  loca@ons [2]
  • 116. Dele@on   Item  must  have  been  previously  inserted 116 Filtering Set  Membership Cuckoo Filter Par@al-­‐key  cuckoo  hashing   Fingerprint  hashing  ensures  uniform   distribu@on  of  items  in  the  table   Length  of  fingerprint  <<  Size  of  h1  or  h2   Possible   to   have   mul@ple   entries   of   a   fingerprint  in  a  bucket Alternate   bucket Significantly  shorter   than  h1  and  h2
  • 117. 117 Filtering Set  Membership Comparison k  ➛ # hash functions, d  ➛ # partitions
  • 118. 118 Cardinality Dis@nct  Elements   Database  systems/Search  engines   #  dis@nct  queries   Network  monitoring  applica@ons   Natural  language  processing   #  dis@nct  mo@fs  in  a  DNA  sequence   #  dis@nct  elements  of  RFID/sensor  networks
  • 119. 119 Previous  work   Probabilis@c  coun@ng  [Flajolet  and  Mar@n,  1985]    LogLog  coun@ng  [Durand  and  Flajolet,  2003]    HyperLogLog  [Flajolet  et  al.,  2007]    Sliding  HyperLogLog  [Chabchoub  and  Hebrail,  2010]    HyperLogLog  in  Prac@ce  [Heule  et  al.,  2013]    Self-­‐Organizing  Bitmap  [Chen  and  Cao,  2009]    Discrete  Max-­‐Count  [Ting,  2014]   Sequence  of  sketches  forms  a  Markov  chain  when  h  is  a  strong  universal  hash   Es@mate  cardinality  using  a  mar@ngale Cardinality
  • 121. 121 Hyperloglog   Apply  hash  func@on  h  to  every  element  in  a  mul@set     Cardinality  of  mul@set  is  2max(ϱ)  where  0ϱ-­‐11  is  the  bit  paaern  observed  at  the   beginning  of  a  hash  value   Above  suffers  with  high  variance   Employ  stochas@c  averaging   Par@@on  input  stream  into  m  sub-­‐streams  Si  using  first  p  bits  of  hash  values  (m  =  2p) where Cardinality
  • 122. 122 Use  of  64-­‐bit  hash  func@on     Total  memory  requirement  5  *  2p  -­‐>  6  *  2p,  where  p  is  the  precision   Empirical  bias  correc@on   Uses  empirically  determined  data  for  cardinali@es  smaller  than  5m  and  uses  the  unmodified  raw   es@mate  otherwise   Sparse  representa@on    For  n≪m,  store  an  integer  obtained  by  concatena@ng  the  bit  paaerns  for  idx  and  ϱ(w)    Use  variable  length  encoding  for  integers  that  uses  variable  number  of  bytes  to  represent   integers    Use  difference  encoding  -­‐  store  the  difference  between  successive  elements   Other  op@miza@ons  [1,  2] Hypeloglog Optimizations [1]  hap://druid.io/blog/2014/02/18/hyperloglog-­‐op@miza@ons-­‐for-­‐real-­‐world-­‐systems.html   [2]  hap://an@rez.com/news/75 Cardinality
  • 123. 123 Self-­‐Learning  Bitmap  (S-­‐bitmap)  [1]   Achieve  constant  rela@ve  es@ma@on  errors  for  unknown  cardinali@es  in  a  wide   range,  say  from  10s  to  >106    Bitmap  obtained  via  adap@ve  sampling  process    Bits  corresponding  to  the  sampled  items  are  set  to  1    Sampling  rates  are  learned  from  #  dis@nct  items  already  passed  and  reduced  sequen@ally  as   more  bits  are  set  to  1    For  given  input  parameters  Nmax  and  es@ma@on  precision  ε,  size  of  bit  mask   For  r  =  1  -­‐2ε2(1+ε2)-­‐1  and  sampling  probability  pk  =  m  (m+1-­‐k)-­‐1(1+ε2)rk,  where  k  ∈  [1,m]            Rela@ve  error  ≣  ε [1]  Chen  et  al.  “Dis@nct  coun@ng  with  a  self-­‐learning  bitmap”.  Journal  of  the  American  Sta@s@cal  Associa@on,  106(495):879–890,  2011. Cardinality
  • 124. 124 Quantiles Quan@les,  Histograms   Large  set  of  real-­‐world  applica@ons   Database  applica@ons   Sensor  networks   Opera@ons   Proper@es     Provide  tunable  and  explicit  guarantees  on  the  precision  of  approxima@on   Single  pass   Early  work   [Greenwald  and  Khanna,  2001]  -­‐  worst  case  space  requirement     [Arasu  and  Manku,  2004]  -­‐  sliding  window  based  model,  worst  case  space     requirement  
  • 125. 125 q-­‐digest  [1]   Groups  values  in  variable  size  buckets  of  almost     equal  weights   Unlike  a  tradi@onal  histogram,  buckets  can  overlap   Key  features   Detailed  informa@on  about  frequent  values  preserved   Less  frequent  values  lumped  into  larger  buckets   Using  message  of  size  m,  answer  within  an  error  of   Except  root  and  leaf  nodes,  a  node  v  ∈  q-­‐digest  iff Max  signal   value #  Elements Compression   Factor Complete  binary  tree [1]  Shrivastava  et  al.,  Medians  and  Beyond:  New  Aggrega@on  Techniques  for  Sensor  Networks.  In  Proceedings  of  SenSys,  2004. Quantiles
  • 126. 126 q-­‐digest   Building  a  q-­‐digest   q-­‐digests  can  be  constructed  in  a  distributed  fashion    Merge  q-­‐digests Quantiles
  • 127. 127 t-­‐digest  [1]   Approxima@on  of  rank-­‐based  sta@s@cs   Compute  quan@le  q  with  an  accuracy  rela@ve  to  max(q,  1-­‐q)   Compute  hybrid  sta@s@cs  such  as  trimmed  sta@s@cs   Key  features   Robust  with  respect  to  highly  skewed  distribu@ons   Independent  of  the  range  of  input  values  (unlike  q-­‐digest)   Rela@ve  error  is  bounded   Non-­‐equal  bin  sizes   Few  samples  contribute  to  the  bins  corresponding  to  the  extreme  quan@les   Merging  independent  t-­‐digests   Reasonable  accuracy [1]T.  Dunning  and  O.  Ertl,  “”Compu@ng  Extremely  Accurate  Quan@les  using  t-­‐digests”,  2017.  haps://github.com/tdunning/t-­‐digest/blob/master/docs/t-­‐digest-­‐paper/histo.pdf   Quantiles
  • 128. 128 t-­‐digest   Group  samples  into  sub-­‐sequences   Smaller  sub-­‐sequences  near  the  ends   Larger  sub-­‐sequences  in  the  middle   Scaling  func@on   Mapping  k  is  monotonic   k(0)  =  1  and  k(1)  =  δ   k-­‐size  of  each  subsequence  <  1   No@onal   Index Compression   parameterQuan@le Quantiles
  • 129. 129 t-­‐digest   Es@ma@ng  quan@le  via  interpola@on   Sub-­‐sequences  contain  centroid  of  the  samples   Es@mate  the  boundaries  of  the  sub-­‐sequences   Error   Scales  quadra@cally  in  #  samples   Small  #  samples  in  the  sub-­‐sequences  near  q=0  and  q=1  improves  accuracy   Lower  accuracy  in  the  middle  of  the  distribu@on     Larger  sub-­‐sequences  in  the  middle   Two  flavors   Progressive  merging  (buffering  based)  and  clustering  variant Quantiles
  • 130. 130 Frequent Elements Applica@ons   Track  bandwidth  hogs   Determine  popular  tourist  des@na@ons   Itemset  mining   Entropy  es@ma@on     Compressed  sensing     Search  log  mining   Network  data  analysis   DBMS  op@miza@on  
  • 131. Count-­‐min  Sketch  [1]   A  two-­‐dimensional  array  counts  with  w  columns  and  d  rows   Each  entry  of  the  array  is  ini@ally  zero   d   hash   func@ons   are   chosen   uniformly   at   random   from   a   pairwise   independent   family   Update    For  a  new  element  i,  for  each  row  j  and  k  =  hj(i),  increment  the  kth  column  by  one   Point  query                                                                                                          where,  sketch  is  the  table   Parameters 131 ),( δε }1{}1{:,,1 wnhh d ……… → ! ! " # # $ = ε e w ! ! " # # $ = δ 1 lnd [1]  Cormode,  Graham;  S.  Muthukrishnan  (2005).  "An  Improved  Data  Stream  Summary:  The  Count-­‐Min  Sketch  and  its  Applica@ons".  J.  Algorithms  55:  29–38. Frequent Elements
  • 132. Variants  of  Count-­‐min  Sketch  [1]   Count-­‐Min  sketch  with  conserva@ve  update  (CU  sketch)    Update  an  item  with  frequency  c    Avoid  unnecessary  upda@ng  of  counter  values  =>  Reduce  over-­‐es@ma@on  error    Prone  to  over-­‐es@ma@on  error  on  low-­‐frequency  items     Lossy  Conserva@ve  Update  (LCU)  -­‐  SWS    Divide  stream  into  windows    At  window  boundaries,  ∀  1  ≤  i  ≤  w,  1  ≤  j  ≤  d,  decrement  sketch[i,j]  if  0  <  sketch[i,j]  ≤   132 [1]  Cormode,  G.  2009.  Encyclopedia  entry  on  ’Count-­‐MinSketch’.  In  Encyclopedia  of  Database  Systems.  Springer.,  511–516. Frequent Elements
  • 133. 133 OPEN SOURCE TWITTER YAHOO! HUAWEI streamDM^ SGD  Learner  and  Perceptron   Naive  Bayes   CluStream   Hoeffding  Decision  Trees   Bagging   Stream  KM++ DATA  SKETCHES * Unique   Quan@le   Histogram   Sampling   Theta  Sketches   Tuple   Sketches   Most  Frequent ALGEBIRD# Filtering   Unique   Histogram   Most  Frequent *  haps://datasketches.github.io/   #  haps://github.com/twiaer/algebird   ^  hap://huawei-­‐noah.github.io/streamDM/   **  haps://github.com/jiecchen/StreamLib StreamLib**
  • 134. 134 Anomaly Detection [1]  A.  S.  Willsky,  “A  survey  of  design  methods  for  failure  detec@on  systems,”  Automa@ca,  vol.  12,  pp.  601–611,  1976. Very  rich  -­‐  over  150  yrs  -­‐  history   Manufacturing      Sta@s@cs    Econometrics,  Financial  engineering    Signal  processing    Control  systems,  Autonomous  systems  -­‐  fault  detec@on  [1]    Networking    Computa@onal  biology  (e.g.,  microarray  analysis)    Computer  vision
  • 135. 135 Very  rich  -­‐  over  150  yrs  -­‐  history   Anomalies  are  contextual  in  nature “DISCORDANT observations may be defined as those which present the appearance of differing in respect of their law of frequency from other observations with which they ale combined. In the treatment of such observations there is great diversity between authorities ; but this discordance of methods may be reduced by the following reflection. Different methods are adapted to different hypotheses about the cause of a discordant observation; and different hypotheses are true, or appropriate, according as the subject-matter, or the degree of accuracy required, is different.” F. Y. Edgeworth, “On Discordant Observations”, 1887. Anomaly Detection
  • 137. 137 Anomaly Detection COMMON  APPROACHES DOMAINS STATS   MFG   OPS NOT  VALID in  real-­‐life Moving Averages   SMA, EWMA, PEWMA Assumption   Normal Distribution PARAMS WIDTH   DECAY Rule Based   µ ± σ Stone  1868   Glaisher  1872   Edgeworth  1887   Stewart  1920   Irwin  1925   Jeffreys  1932   Rider  1933
  • 138. 138 Anomaly Detection ROBUST  MEASURES MEDIAN MAD [1] MCD [2] MVEE [3,4] Median Absolute Deviation Minimum Covariance Determinant Minimum Volume Enclosing Ellipsoid [1]P.  J.  Rousseeuw  and  C.  Croux,  “Alterna@ves  to  the  Median  Absolute  Devia@on”,  1993.   [2]  hap://onlinelibrary.wiley.com/wol1/doi/10.1002/wics.61/abstract   [3]  P.  J.  Rousseeuw  and  A.  M.  Leroy.,“Robust  Regression  and  Outlier  Detec@on”,  1987.   [4]  M.  J.Todda  and  E.  A.  Yıldırım  ,  “On  Khachiyan's  algorithm  for  the  computa@on  of  minimum-­‐volume  enclosing  ellipsoids”,  2007.
  • 140. 140 Anomaly Detection Challenges   Live  Data   Mul@-­‐dimensional     Low  memory  footprint     Accuracy  vs.  Speed  trade-­‐off   Encoding  the  context   Data  types   Video,  Audio,  Text   Data  veracity   Wearables   Smart  ci@es,  Connected  Home,  Internet  of  Things
  • 141. TYING  IT  ALL  TOGETHER
  • 142. 142 Real Time Architectures For  the  streaming  world   Lambda   Run  computa@on  twice  in  different  systems   Kappa   Run  computa@on  once  
  • 144. 144 Lambda Architecture Batch  Layer   Accurate  but  delayed   HDFS/Mapreduce   Fast  Layer   Inexact  but  fast   Storm/Ka}a   Query  Merge  Layer   Merge  results  from  batch  and  fast  layers  at  query  @me  
  • 145. 145 Lambda Architecture Characteris@cs   During  Inges@on,  Data  is  cloned  into  two.   One  goes  to  the  batch  layer   Other  goes  to  the  fast  layer   Processing  done  at  two  layers   Expressed  as  Map-­‐reduces  in  batch  layer   Expressed  as  topologies  in  the  speed  layer
  • 146. 146 Lambda Architecture Challenges   Inherently  Inefficient   Data  is  replicated  twice   Computa@on  is  replicated  twice   Opera@onally  Inefficient   Maintain  both  batch  and  streaming  systems   Tune  topologies  for  both  systems
  • 147. 147 Kappa Architecture Streaming  is  everything   Computa@on  is  expressed  in  a  topology   Computa@on  is  mostly  done  only  once  when  the   data  arrives   Data  moves  into  permanent  storage  
  • 149. 149 Kappa Architecture Challenges   Data  Reprocessing  could  be  very  expensive   Code/Logic  Changes   Either  Data  needs  to  be  brought  back  from  Storage  to  the  bus   Or  Computa@on  needs  to  be  expressed  to  run  on  bulk-­‐storage   Historic  Analysis   How  to  do  data  analy@cs  over  all  of  last  years  data
  • 150. 150
  • 151. 151 Observations Lambda  is  complicated  and  inefficient   Replica@on  of  Data  and  Computa@on   Mul@ple  Systems  to  operate  and  tune   Kappa  is  too  simplis@c   Data  reprocessing  too  expensive   Historical  analysis  not  possible
  • 152. 152 Observations Computa@on  across  batch/real@me  is  similar   Expressed  as  DAGS   Run  parallely  on  the  cluster   Intermediate  results  need  not  be  materialized   Func@onal/Declara@ve  APIs   Storage  is  the  key   Messaging/Storage  are  two  faces  of  the  same  coin   They  serve  the  same  data
  • 153. 153 Real-Time Storage Requirements Requirements  for  a  real-­‐Hme  storage  plakorm Be  able  to  write  and  read  streams  of  records  with  low  latency,  storage  durability   Data  storage  should  be  durable,  consistent  and  fault  tolerant   Enable  clients  to  stream  or  tail  ledgers  to  propagate  data  as  they’re  wriaen   Store  and  provide  access  to  both  historic  and  real-­‐@me  data
  • 154. 154 Apache BookKeeper - Stream Storage A  storage  for  log  streams Replicated,  durable  storage  of  log  streams   Provide  fast  tailing/streaming  facility   Op@mized  for  immutable  data   Low-­‐latency  durability   Simple  repeatable  read  consistency   High  write  and  read  availability  
  • 155. 155 Record Smallest  I/O  and  Address  Unit A  sequence  of  invisible  records   A  record  is  sequence  of  bytes   The  smallest  I/O  unit,  as  well  as  the  unit  of  address   Each  record  contains  sequence  numbers  for  addressing
  • 156. 156 Logs Two  Storage  PrimiHves Ledger:  A  finite  sequence  of  records.   Stream:  An  infinite  sequence  of  records.  
  • 157. 157 Ledger Finite  sequence  of  records Ledger:  A  finite  sequence  of  records  that  gets  terminated   A  client  explicitly  close  it   A  writer  who  writes  records  into  it  has  crashed.
  • 158. 158 Stream Infinite  sequence  of  records Stream:  An  unbounded,  infinite  sequence  of  records   Physically  comprised  of  mul@ple  ledgers
  • 159. 159 Bookies Stores  fragment  of  records Bookie  -­‐  A  storage  server  to  store  data  records   Ensemble:  A  group  of  bookies  storing  the  data  records  of  a  ledger   Individual  bookies  store  fragments  of  ledgers
  • 161. 161 Tying it all together A  typical  installaHon  of  Apache  BookKeeper
  • 162. 162 BookKeeper - Use Cases Combine  messaging  and  storage Stream  Storage  combines  the  func@onality  of  streaming  and  storage WAL  -­‐  Write  Ahead  Log Message  Store Object  Store SnapshotsStream  Processing
  • 163. 163 BookKeeper in Real-Time Solution Durable  Messaging,  Scalable  Compute  and  Stream  Storage
  • 164. 164 BookKeeper in Production Enterprise  Grade  Stream  Storage 4+  years  at  Twiaer  and  Yahoo,  2+  years  at  Salesforce   Mul@ple  use  cases  from  messaging  to  storage   Database  replica@on,  Message  store,  Stream  compu@ng  …   600+  bookies  in  one  single  cluster   Data  is  stored  from  days  to  a  year   Millions  of  log  streams   1  trillion  records/day,  17  PB/day
  • 165. 165 Companies using BookKeeper Enterprise  Grade  Stream  Storage
  • 166. 166 Real Time is Messy and Unpredictable Aggregation   Systems Messaging   Systems Result   Engine HDFS Queryable   Engines
  • 167. 167 Streamlio - Unified Architecture Interactive    Querying Storm  API Trident/Apache   Beam   SQL Application   Builder Pulsar   API BK/   HDFS   API Kubernetes Metadata   Management Operational   Monitoring Chargeback Security   Authentication Quota   Management Rules     Engine Kafka   API
  • 168. 168 RESOURCES Sketching  Algorithms haps://www.cs.upc.edu/~gavalda/papers/portoschool.pdf     haps://mapr.com/blog/some-­‐important-­‐streaming-­‐algorithms-­‐you-­‐should-­‐know-­‐about/   haps://gist.github.com/debasishg/8172796 Synopses  for  Massive  Data:  Samples,  Histograms,  Wavelets,     Sketches Data  Streams:  Models  and  Algorithms Charu  Aggarwal   hap://www.springer.com/us/book/9780387287591 Data  Streams:  Algorithms  and  ApplicaHons Muthu  Muthukrishnan   hap://algo.research.googlepages.com/eight.ps Graph  Streaming  Algorithms A.  McGregor G.  Cormode,  M.  Garofalakis  and  P.  J.  Haas Sketching  as  a  Tool    for  Numerical  Linear  Algebra D.  Woodruff
  • 169. 169 Dhalion:  Self-­‐Regula@ng  VLDB’17 Twiaer  Heron:  Towards  Extensible  ICDE’17 Dhalion:  Self-­‐Regula@ng  VLDB’17 MillWheel:    VLDB’13 Readings Stream  Processing  in  Heron Stream  Processing  in  Heron Streaming  Engines Twiaer  Heron:  Stream  SIGMOD’15 Processing  at  scale Fault-­‐Tolerant  Stream  Processing  at  Internet  Scale The  Dataflow  Model:  A  Prac@cal  VLDB’15 Approach  to  Balancing  Correctness,     Latency  and  Cost  in  Massive-­‐Scale,   Unbounded  Out-­‐of-­‐Order  Data  Processing Anomaly  Detec@on  in  Strata  San  Jose’17 Real-­‐Time  Data  Streams  Using  Heron
  • 170. 170 Readings FOCS’00   Clustering Data Streams SIGMOD’02   Querying and mining data streams: You only get one look SIAM Journal of Computing’09   Stream Order and Order Statistics: Quantile Estimation in Random-Order Streams PODS’02   Models and Issues in Data Stream Systems SIGMOD’07   Statistical Analysis of Sketch Estimators PODS’10   An optimal algorithm for the distinct elements problem
  • 171. 171 Readings SODA’10   Coresets and Sketches for high dimensional subspace approximation problems SIGMOD’16   Time Adaptive Sketches (Ada-Sketches) for Summarizing Data Streams SOSR’17   Heavy-Hitter Detection Entirely in the Data Plane PODS’12   Graph Sketches: Sparsification, Spanners, and Subgraphs Arxiv’16   Coresets and Sketches ACM Queue’17   Data Sketching: The approximate approach is often faster and more efficient
  • 172.
  • 173. 173 GET  IN  TOUCH C O N T A C T   U S @arun_kejariwal   @kramasamy,    @sanjeerk   @sijieg,    @merlimat   @nlu90 karthik@stremlio.io   arun_kejariwal@acm.org
  • 174.
  • 175. E N J O Y T H E P R E S E N T A T I O N The End