Modern real-time streaming architectures

A Tutorial
Modern Real-Time Streaming
Architectures
Karthik
Ramasamy*,
Sanjeev
Kulkarni*,
Neng
Lu#,
Arun
Kejariwal^

and
Sijie
Guo*
*Streamlio,
#Twi0er,
^MZ

2
MESSAGING STREAMING OPERATIONS
DATA SKETECHES LAMBDA, KAPPA UNIFICATION
OUTLINE

3
Information Age
Real
@me
is
key

Ká !

4
Internet of Things (IoT)
$1.9
T
in
value
by
2020
-‐
Mfg
(15%),
Health
Care
(15%),
Insurance
(11%)

26
B
-‐
75
B
units
[2,
3,
4,
5]
Improve
opera@onal
eﬃciencies,
customer
experience,
new
business
modelsY
Beacons:
Retailers
and
bank
branches

60M
units
market
by
2019
[6]
Smart
buildings:

Reduce
energy
costs,
cut
maintenance
costs

Increase
safety
&
security
Large
Market
Poten@al

5
The Future
Biostamps [2]
Mobile
Sensor Network
Exponential growth [1]
[1]
hap://opensignal.com/assets/pdf/reports/2015_08_fragmenta@on_report.pdf

[2]
hap://www.ericsson.com/thinkingahead/networked_society/stories/#/ﬁlm/mc10-‐biostamp

6
Intelligent Health Care
Con@nuous
Monitoring

Tracking Movements
Measure
eﬀect
of
social

inﬂuences
Google Lens
Measure
glucose
level
in

tears
Watch/Wristband
Smart Textiles
Skin
temperature

Perspira@on
Ingestible Sensors
Medica@on
compliance
[1]
Heart
func@on
!
!

7
User Experience, Productivity
Real
@me

Real-time Video Streams
N E W S
Drones Robotics
I N D U S T R Y

$ 4 0
B
b y
2 0 2 0
[ 3 ]
[2]

8
Increasingly Connected World
Internet of Things
30
B
connected
devices
by
2020
Health Care
153
Exabytes
(2013)
-‐>
2314
Exabytes
(2020)
Machine Data
40%
of
digital
universe
by
2020
Connected Vehicles
Data
transferred
per
vehicle
per
month

4
MB
-‐>
5
GB
Digital Assistants (Predictive Analytics)
$2B
(2012)
-‐>
$6.5B
(2019)
[1]

Siri/Cortana/Google
Now
Augmented/Virtual Reality
$150B
by
2020
[2]

Oculus/HoloLens/Magic
Leap
Ñ
!+
>

10
Traditional Data Processing
Challenges

Introduces
too
much
“decision
latency”

Responses
are
delivered
“aqer
the
fact”

Maximum
value
of
the
iden@ﬁed
situa@on
is
lost

Decisions
are
made
on
old
and
stale
data

Data
at
Rest
Store Analyze Act

11
The New Era: Streaming Data/Fast Data
Events
are
analyzed
and
processed
in
real-‐@me
as
they
drive

Decisions
are
@mely,
contextual
and
based
on
fresh
data

Decision
latency
is
eliminated

Data
in
mo@on

12
Real Time Use Cases
Algorithmic
trading

Online
fraud
detec@on

Geo
fencing

Proximity/loca@on
tracking

Intrusion
detec@on
systems

Traﬃc
management
Real
@me
recommenda@ons

Churn
detec@on

Internet
of
things

Social
media/data
analy@cs

Gaming
data
feed

13
Requirements of Stream Processing
In-stream Handle imperfections Predictable Performance
Process
data
as
it
is

passes
by
Delayed,
missing
and

out-‐of-‐order
data
and
Repeatable and
Scalability
I

14
High level languages Integrate stored and
streaming data
Data safety and
availability
Process and respond
SQL
or
DSL
for
comparing
present

with
the
past
and
Repeatable
Applica@on
should
keep

at
high
volumes
" # $
Requirements of Stream Processing

15
Real Time Stack
REAL TIME
STACK
Collectors
s
Compute
J
Messaging
a
Storage
b

16
of MESSAGING FRAMEWORKS
An In-Depth

17
Current Messaging Systems
01 02 03 04
05 06 07 08
ActiveMQ RabbitMQ Pulsar RocketMQ
Azure
Event Hub
Google
Pub-Sub
Satori Kafka

18
Why Apache Pulsar?
Ordering

Guaranteed
ordering
MulH-‐tenancy

A
single
cluster
can
support
many

tenants
and
use
cases
High
throughput

Can
reach
1.8
M
messages/s
in
a

single
parHHon
Durability

Data
replicated
and
synced
to
disk
Geo-‐replicaHon

Out
of
box
support
for
geographically

distributed
applicaHons
Uniﬁed
messaging
model

Support
both
Topic
&
Queue

semanHc
in
a
single
model
Delivery
Guarantees

At
least
once,
at
most
once
and
eﬀecHvely

once
Low
Latency

Low
publish
latency
of
5ms
at
99pct
Highly
scalable

Can
support
millions
of
topics

19
Uniﬁed Messaging Model
Producer
(X)

Producer
(Y)
Topic
(T)
Subscrip@on
(A)

Subscrip@on
(B)

Consumer
(A1)
Consumer
(B2)
Consumer
(B1)
Consumer
(B3)

20
Pulsar Producer
PulsarClient client = PulsarClient.create(
“http://broker.usw.example.com:8080”);
Producer producer = client.createProducer(
“persistent://my-property/us-west/my-namespace/my-topic”);
// handles retries in case of failure
producer.send("my-message".getBytes());
// Async version:
producer.sendAsync("my-message".getBytes()).thenRun(() -> {
// Message was persisted
});

21
Pulsar Consumer
PulsarClient client = PulsarClient.create(
"http://broker.usw.example.com:8080");
Consumer consumer = client.subscribe(
"persistent://my-property/us-west/my-namespace/my-topic",
"my-subscription-name");
while (true) {
// Wait for a message
Message msg = consumer.receive();
System.out.println("Received message: " + msg.getData());
// Acknowledge the message so that it can be deleted by broker
consumer.acknowledge(msg);
}

22
Pulsar Architecture
Pulsar
Broker
1 Pulsar
Broker
1 Pulsar
Broker
1
Bookie
1 Bookie
2 Bookie
3 Bookie
4 Bookie
5
Apache
BookKeeper
Apache
Pulsar
Producer
Consumer

Stateless
Serving
BROKER

Clients interact only with brokers
No state is stored in brokers
BOOKIES

Apache BookKeeper as the storage
Storage is append only
Provides high performance, low latency
Durability

No data loss. fsync before acknowledgement

23
Pulsar Architecture
Pulsar
Broker
1 Pulsar
Broker
1 Pulsar
Broker
1
Bookie
1 Bookie
2 Bookie
3 Bookie
4 Bookie
5
Apache
BookKeeper
Apache
Pulsar
Producer
Consumer

SeparaHon
of
Storage
and
Serving
SERVING
Brokers can be added independently
Trafﬁc can be shifted quickly across brokers
STORAGE

Bookies can be added independently
New bookies will ramp up trafﬁc quickly

24
Pulsar Architecture
Clients
CLIENTS
Lookup correct broker through service discovery
Establish connections to brokers
Enforce authentication/authorization during
connection establishment
Establish producer/consumer session
Reconnect with backoff strategy
Dispatcher
Load
Balancer
Managed

Ledger
CacheGlobal
Replica@on
Producer
Consumer
Service

Discovery

Pulsar

Broker
Bookie

25
Pulsar Architecture
Message
Dispatching
DISPATCHER
End-to-end async message processing
Messages relayed across producers, bookies and
consumers with no copies
Pooled reference count buffers
Dispatcher
Load
Balancer
Managed

Ledger
CacheGlobal
Replica@on
Producer
Consumer
Service

Discovery

Pulsar

Broker
Bookie
MANAGED LEDGER
Abstraction of single topic storage
Cache recent messages

26
Pulsar Architecture
Geo
ReplicaHon
GEO REPLICATION
Asynchronous replication
Integrated in the broker message ﬂow
Simple conﬁguration to add/remove regions
Topic
(T1) Topic
(T1)
Topic
(T1)
Subscrip@on

(S1)
Subscrip@on

(S1)
Producer

(P1)
Consumer

(C1)
Producer

(P3)
Producer

(P2)
Consumer

(C2)
Data
Center
A Data
Center
B
Data
Center
C

27
Pulsar Use Cases - Message Queue
Online
Events Topic
(T)
Worker
1
Worker
2
Decouple
Online/Oﬄine
Topic
(T)
Worker
3
MESSAGE QUEUES
Decouple online or background
High availability
Reliable data transport
NoHﬁcaHons
Long
running
tasks
Low
latency

publish

28
Pulsar Use Cases - Feedback System
Event Topic
(T)
Propagate
States
Controller
Topic
(T)
Serving
System Serving
System Serving
System
FEEDBACK SYSTEM
Coordinate large number of machines
Propagate states
Examples
State propagation
Personalization
Ad-systems
Feedback
Updates

29
Pulsar in Production
3+
years

Serves
2.3
million
topics

100
billion
messages/day

Average
latency
<
5
ms

99%
15
ms
(strong
durability
guarantees)

Zero
data
loss

80+
applica@ons

Self
served
provisioning

Full-‐mesh
cross-‐datacenter
replica@on
-‐

8+
data
centers

31
of STREAMING FRAMEWORKS
An In-Depth

32
Current Streaming Frameworks
01 02 03 04
05 06 07 08
Beam S-Store Spark Flink
Heron Storm Apex
KAFKA
STREAMS

33
Apache Beam
Promises

Abstrac@ng
the
Computa@on

Express
Computa@on

Expressive
Windowing/Triggering

Incremental
Processing
for
late
data

Selectable
Engine

Select
criteria

Latency

Resource
Cost

Supported
Engines

Google
DataFlow

Apache
Spark,
Apache
Flink,
Apache
Apex

34
Apache Beam
ComputaHon
AbstracHon

All
Data
is
4
tuple

Key,
Value

Event
Time

Window
the
tuple
belongs

Core
Operators

ParDo

User
supplied
DoFn

Emits
Zero
or
more
elements

GroupByKey

Groups
tuples
by
keys
in
the
window

35
Apache Beam
Windowing

Window
Assignment

Fixed(a.k.a.
Tumbling),
Sliding
Session

Pluggable

Window
Merging

Happens
during
GroupByKey

Window
Triggering

Discard/Accumulate/Retract

36
Apache Beam
Challenges

Mul@ple
Layers

API
vs
Execu@on

Troubleshoo@ng
complexi@es

Need
higher
level
APIs

Mul@ple
eﬀorts
on
their
way

Other
Cloud
Vendor
Buy-‐in?

Azure/AWS?

37
IBM S-Store
Promises

Combine
Stream
Processing
and
Transac@ons

Extended
an
OLTP
engine(H-‐Store)
adding

Tuple
Ordering

Windowing

Push-‐based
processing

Exactly
Once
Seman@cs

38
IBM S-Store
Data
and
Processing
Model

Tuples
grouped
into
Atomic
Batches

Grouping
of
Non-‐overlapping
tuples

Treated
like
a
Transac@on

Atomic
Batches
belong
to
one
Stream

Processing
is
modeled
as
a
DAG

DAG
nodes
consume
one
or
more
streams
and
possibly
output
more

Node
logic
is
treated
as
a
Transac@on

39
IBM S-Store
Exactly
Once
Guarantees

Strong

Inputs
and
Outputs
are
logged
at
every
DAG
node

On
component
failure,
the
log
is
replayed
from
snapshot

Weak

Distributed
Snapsho{ng

40
IBM S-Store
Challenges

Throughput

Non
OLTP
processing
is
much
slower
compared
to
modern
systems

Scalability

Mul@-‐Node
s@ll
in
research
(2016)

41
Heron Terminology
Topology
Directed
acyclic
graph

ver@ces
=
computa@on,
and

edges
=
streams
of
data
tuples
Spouts
Sources
of
data
tuples
for
the
topology

Examples
-‐
Pulsar/Ka}a/MySQL/Postgres
Bolts
Process
incoming
tuples,
and
emit
outgoing
tuples

Examples
-‐
ﬁltering/aggrega@on/join/any
func@on
,
%

42
Heron Topology
%
%
%
%
%
Spout 1
Spout 2
Bolt 1
Bolt 2
Bolt 3
Bolt 4
Bolt 5

43
Heron Topology - Physical Execution
%
%
%
%
%
Spout 1
Spout 2
Bolt 1
Bolt 2
Bolt 3
Bolt 4
Bolt 5
%%
%%
%%
%%
%%

44
Heron Groupings
01 02 03 04
Shuﬀle Grouping
Random distribution of tuples
Fields Grouping
Group tuples by a field or
multiple fields
All Grouping
Replicates tuples to all tasks
Global Grouping
Send the entire stream to one
task
/
.
-
,

45
Heron Topology - Physical Execution
%
%
%
%
%
Spout 1
Spout 2
Bolt 1
Bolt 2
Bolt 3
Bolt 4
Bolt 5
%%
%%
%%
%%
%%
Shuﬀle Grouping
Shuﬀle Grouping
Fields Grouping
Fields Grouping
Fields Grouping
Fields Grouping

46
Writing Heron Topologies
Procedural - Low Level API
Directly
write
your
spouts
and

bolts
Functional - Mid Level API
Use
of
maps,
ﬂat
maps,
transform,
windows
Declarative - SQL (coming)
Use
of
declara@ve
language
-‐
specify
what
you

want,
system
will
ﬁgure
it
out.
,
%

47
Heron Design Goals
Efficiency

Reduce
resource
consump@on
Support
for
diverse
workloads

Throughput
vs
latency
sensi@ve
Support
for
mulHple
semanHcs

Atmost
once,
Atleast
once,

Effec@vely
once
NaHve
MulH-‐Language
Support

C++,
Java,
Python
Task
IsolaHon

Ease
of
debug-‐ability/isola@on/profiling

Support
for
back
pressure

Topologies
should
be
self
adjus@ng
Use
of
containers

Runs
in
schedulers
-‐
Kubernetes
&
DCOS
&

many
more
MulH-‐level
APIs

Procedural,
Func@onal
and
Declara@ve
for

diverse
applica@ons
Diverse
deployment
models

Run
as
a
service
or
pure
library

48
Heron Architecture
Scheduler
Topology 1 Topology 2 Topology N
Topology
Submission

49
Topology Master
Monitoring of containers Gateway for metrics Assigns role

50
Topology Architecture
Topology Master
ZK
Cluster
Stream
Manager
I1 I2 I3 I4
Stream
Manager
I1 I2 I3 I4
Logical Plan,
Physical Plan and
Execution State
Sync Physical Plan
DATA CONTAINER DATA CONTAINER
Metrics
Manager
Metrics
Manager
Metrics
Manager
Health
Manager
MASTER
CONTAINER

51
Stream Manager
Routes tuples Implements backpressure Ack management

52
Stream Manager
Sample
topology
% %
S1 B2 B3
%
B4

53
Stream Manager
Physical
execuHon
S1 B2
B3
Stream
Manager
Stream
Manager
Stream
Manager
Stream
Manager
S1 B2
B3 B4
S1 B2
B3
S1 B2
B3 B4
B4

54
Stream Manager Backpressure
Spout based back pressureTCP backpressure Stage by stage back pressure

55
TCP
based
backpressure
Slows upstream and downstream instances
S1 B2
B3
Stream
Manager
Stream
Manager
Stream
Manager
Stream
Manager
S1 B2
B3 B4
S1 B2
B3
S1 B2
B3 B4
B4

56
Spout
based
backpressure
S1 S1
S1S1S1 S1
S1S1 B2
B3
Stream
Manager
Stream
Manager
Stream
Manager
Stream
Manager
B2
B3 B4
B2
B3
B2
B3 B4
B4

57
Heron Instance
Runs only one task (spout/bolt)
Exposes Heron API
Collects several metrics
API
G

58
Heron Instance
Stream
Manager
Metrics
Manager
Gateway
Thread
Task Execution
Thread
data-in queue
data-out queue
metrics-out queue

60
AND
of STREAM PROCESSING APPLICATIONS

61
Pulsar Operations
Reac@ng
to
Failures

Brokers

Bookies

Common
Issues

Consumer
Backlog

I/O
Priori@za@on
and
Throaling

Mul@-‐Tenancy

62
Reacting to Failures - Brokers
Brokers
don’t
have
durable
state

Easily
replaceable

Topics
are
immediately
reassigned
to
healthy
brokers

Expanding
capacity

Simply
add
new
broker
node

If
other
brokers
are
overloaded,
traﬃc
will
be
automa@cally
assigned

Load
manager

Monitor
traﬃc
load
on
all
brokers
(CPU,
memory,
network,
topics)

Ini@ally
place
topics
to
least
loaded
brokers

Reassign
topics
when
a
broker
is
overloaded

63
Reacting to Failures - Bookies
When
a
bookie
fails,
brokers
will
immediately
con@nue
on
other
bookies

Auto-‐Recovery
mechanism
will
re-‐establish
the
replica@on
factor
in
background

If
a
bookie
keeps
giving
errors
or
@meouts,
it
will
be
“quaran@ned”

Not
considered
for
new
ledgers
for
some
period
of
@me

64
Consumer Backlog
Metrics
are
available
to
make
assessments:

When
problem
started

How
big
is
backlog?
Messages?
Disk
space?

How
fast
is
draining?

What’s
the
ETA
to
catch
up
with
publishers?

Establish
where
is
the
boaleneck

Applica@on
is
not
fast
enough

Disk
read
IO

65
I/O Prioritization and Throttling
Priori@ze
access
to
IO

During
an
outage
many
tenants
might
try
to
drain
backlog
as
fast
as
they
can

Read
IO
becomes
the
boaleneck

Throaling
can
be
used
to
priori@ze
draining:

Cri@cal
use
cases
can
recover
quickly

Fewer
concurrent
readers
lead
to
higher
throughput

Once
they
catch
up,
message
will
be
dispatched
from
cache

66
Enforcing Multi-Tenancy
Ensure
tenants
don’t
cause
performance
issues
on
other
tenants

Backlog
quotas

Soq-‐Isola@on

Flow
control

Throaling

In
cases
when
user
behavior
is
triggering
performance
degrada@on

Hard-‐isola@on
as
a
last
resource
for
quick
reac@on
while
proper
ﬁx
is
deployed

Isolate
tenant
on
a
subset
of
brokers

Can
be
also
applied
at
the
BookKeeper
level

67
Heron @Twitter
LARGEST
CLUSTER
100’s
of
TOPOLOGIES
BILLIONS
OF
MESSAGES100’s
OF
TERABYTESREDUCED
INCIDENTS
GOOD
NIGHT
SLEEP
3X - 5X reduction in resource usage

68
Heron Deployment
Topology 1
Topology 2
Topology N
Heron
Tracker
Heron
VIZ
Heron
Web
ZK
Cluster
Aurora Services
Observability

70
Heron Use Cases
Monitoring
Real
Time

Machine
Learning
Ads
Real
Time
Trends
Product
Safety
Real
Time
Business

Intelligence

72
Heron Topology Scale
CONTAINERS - 1 TO 600 INSTANCES - 10 TO 6000

73
Heron Happy Facts :)
v
No
more
pages
during
midnight
for
Heron
team

Ø
Very
rare
incidents
for
Heron
customer
teams

ü
Easy
to
debug
during
incident
for
quick
turn
around

§
Reduced
resource
u@liza@on
saving
cost

74
Heron Developer Issues
01 02
Container
resource
allocaHon
Parallelism
tuning

75
Heron Operational Issues
01 02 03
Slow Hosts Network Issues Data Skew
/
.
-
04
Load Variations
,
05
SLA Violations
/

76
Slow Hosts
Memory Parity Errors Impending Disk Failures Lower GHZ

77
Network Issues
Network Slowness
Network Partitioning
G

78
Network Slowness
01 02 03
Delays
processing
Data
is
accumulaHng
Timelines
of
results

Is
aﬀected

79
Data Skew
Multiple Keys
Several
keys
map
into
single

instance
and
their
count
is

high
Single Key
Single
key
maps
into
a
instance

and
its
count
is
high
H
C

80
Load Variations
Spikes
Sudden
surge
of
data
-‐
short

lived
vs
last
for
several

minutes
Daily Patterns
Predictable
change
in
traﬃc
H
C

81
Self Regulating Streaming Systems
Automate
Tuning
SLO

Maintenance
Self
RegulaHng

Streaming
Systems
Tuning
Manual,
@me-‐consuming
and

error-‐prone
task
of
tuning

various
systems
knobs
to

achieve
SLOs
SLO
Maintenance
of
SLOs
in
the
face
of

unpredictable
load
varia@ons
and

hardware
or
soqware
performance

degrada@on
Self
RegulaHng
Streaming
Systems
System
that
adjusts
itself
to
the
environmental
changes

and
con@nue
to
produce
results

82
Self Regulating Streaming Systems
Self tuning Self stabilizing Self healing
G !
Several tuning knobs
Time consuming tuning phase
The system should take
as input an SLO and
automatically configure
the knobs.
The system should
react to external shocks
a n d a u t o m a t i c a l l y
reconfigure itself
Stream jobs are long running
Load variations are common
The system should
identify internal faults
and attempt to recover
from them
System performance affected
by hardware or software
delivering degraded quality of
service

83
Enter Dhalion
Dhalion periodically executes
well-specified policies that
optimize execution based on
some objective.
We created policies that
dynamically provision resources
in the presence of load variations
and auto-tune streaming
applications so that a throughput
SLO is met.
Dhalion is a policy based
framework integrated into Heron

84
Dhalion Policy Framework
Symptom
Detector 1
Symptom
Detector 2
Symptom
Detector 3
Symptom
Detector N
....
Diagnoser 1
Diagnoser 2
Diagnoser M
....
Resolver
Invocation
D
iagnosis
1
Diagnosis 2
D
iagnosis
M
Symptom 1
Symptom 2
Symptom 3
Symptom N
Symptom
Detection
Diagnosis
Generation
Resolution
Resolver 1
Resolver 2
Resolver M
....
Resolver
Selection
Metrics

85
Dynamic Resource Provisioning
Policy
This
policy
reacts
to
unexpected

load
varia@ons
(workload
spikes)
Goal
Goal
is
to
scale
up
and
scale
down

the
topology
resources
as
needed
-‐

while
keeping
the
topology
in
a

steady
state
where
back
pressure
is

not
observed
H
C
Policy

86
Pending Tuples
Detector
Backpressure
Detector
Processing Rate
Skew Detector
Resource Over
provisioning
Diagnoser
Resource Under
Provisioning
Diagnoser
Data Skew
Diagnoser
Resolver
Invocation
Diagnosis
Symptoms
Symptom
Detection
Diagnosis
Generation
Resolution
Metrics
Slow Instances
Diagnoser
Bolt
Scale

Down
Resolver
Bolt
Scale

Up
Resolver
Data
Skew

Resolver
Restart

Instances

Resolver
ImplementaHon

87
Tweet Spout
Tweet Spout
Tweet Spout
%
%
%
%
Splitter Spout
Splitter Spout Counter Bolt
Counter Bolt
100
|
20
100
|
20
processing
rate
(tps)
|
queue
size
(#tuples)
Steady
State

88
Tweet Spout
Tweet Spout
Tweet Spout
%
%
%
%
Splitter Spout
Counter Bolt
150
|
80
150
|
80
processing
rate
(tps)
|
queue
size
(#tuples)
Under
provisioning

89
Tweet Spout
Tweet Spout
Tweet Spout
%
%
%
%
Splitter Spout
Counter Bolt
100
|
20
100
|
20
processing
rate
(tps)
|
queue
size
(#tuples)
Steady
State

90
Tweet Spout
Tweet Spout
Tweet Spout
%
%
%
%
Splitter Spout
Counter Bolt
50
|
05
50
|
80
processing
rate
(tps)
|
queue
size
(#tuples)
Slow
Instance

91
Tweet Spout
Tweet Spout
Tweet Spout
%
%
%
%
Splitter Spout
Counter Bolt
100
|
20
100
|
20
processing
rate
(tps)
|
queue
size
(#tuples)
Steady
State

92
Tweet Spout
Tweet Spout
Tweet Spout
%
%
%
%
Splitter Spout
Counter Bolt
50
|
05
150
|
80
processing
rate
(tps)
|
queue
size
(#tuples)
Data
Skew

93
Experimental Setup
% %
Spout Splitter Bolt Counter Bolt
Shuﬀle Grouping Fields Grouping
Microsoq
HDInsight

Intel
Xeon
ES-‐2673
CPU@2.40
GHz

28
GB
of
Memory
Throughput
of
Spouts
(No.
Of

tuples
emiaed
over
1
min)

Throughput
of
Bolts
(No.
of
tuples

emiaed
over
1
min)

Number
of
Heron
Instances

provisioned
Hardware
and
Soqware
Conﬁgura@on Evalua@on
Metrics

94
Dynamic Provisioning Proﬁle
0.00
0.20
0.40
0.60
0.80
1.00
1.20
1.40
0 10 20 30 40 50 60 70 80 90 100 110 120
Normalized Throughput
Time (in minutes)
Spout Splitter Bolt Counter Bolt
Scale
Down
Scale Up
S1
S2
S3
The Dynamic Resource
Provisioning Policy is able to
adjust the topology resources
on-the-fly when workload
spikes occur.
The policy can correctly detect and
resolve bottlenecks even on multi-
stage topologies where
backpressure is gradually
propagated from one stage of the
topology to another.
0
5
10
15
0 20 40 60 80 100 120
Number of Bolts
Time (in minutes)
Splitter Bolt Counter Bolt
Heron Instances are gradually
scaled up and down according
to the input load

Streaming and One-Pass
Algorithms

DATA
CHARACTERISTICS
UNBOUNDED

DATA
CHARACTERISTICS
UNORDERED

Varying
Skew

Event
Hme

Processing
Hme

Correctness

Completeness

98
Neklix
,
HBO,
Hulu,
YouTube,
DailymoHon,
ESPN,
Sling
TV
Satori,
Facebook
Live,
Periscope
SpoHfy,
Pandora,
Apple
Music,
Tidal
Amazon
Twitch,
YouTube
Gaming,
Microsom
Mixer

KEY
CHARACTERISTICS
LOW
LATENCY

HIGH
VELOCITY
DATA
100
KEY
CHARACTERISTICS
In-Order O(1) Storage O(n) Time
ONE PASS

101
DISTRIBUTED COMPUTATION SCALE OUT
ROBUST
CHARACTERISTICS
FAULT TOLERANCE
KEY

102
DATA SKETCHES
Early
work
The space complexity of approximating
the frequency moments
Counting
Frequent Elements
[Misra and Gries, 1982]
Flajolet and Martin 1985]
Computing on Data Streams
[Henzinger et al. 1998]
[Alon et al. 1996]
Counting
[Morris, 1977]
Median of a sequence
[Munro and Paterson, 1980]
Membership
[Bloom, 1970]

DATA
SKETCHES
UNIQUE

FILTER

COUNT

HISTOGRAM

QUANTILE

MOMENTS

TOP-‐K

ADVANCED

DATA
SKETCHES
RANDOM
PROJECTIONS,
FREQUENT
DIRECTIONS

Dimensionality
Reduc@on

RANDOMIZED
NUMERICAL
ALGEBRA

Matrix
mul@ply

GRAPHS

Summarize
adjacency

Connec@vity,
k-‐connec@vity,
Spanners,
Sparsiﬁca@on

GEOMETRIC

Diameter,
Lp
distances,
Min-‐cost
matchings

Informa@on
Distances,
e.g.,
Hellinger
distance

SKETCHING
SKETCHES

Tes@ng
independence

105
1 2 3
5 4
SAMPLING
A/B

Tes@ng
FILTERING
Set

membership
CORRELATION
Fraud

Detec@on
QUANTILES
Network

Analysis
CARDINALITY
Site
Audience

Analysis
Applications

106
6 7 8
10 9
MOMENTS
Database
FREQUENT

ELEMENTS
Trending

Hashtags
CLUSTERING
Medical

Imaging
ANOMALY

DETECTION
Sensor
Networks
SUBSEQUENCES
Traﬃc

Analysis
Applications
!
!
"
"

107
Sampling
[1]
J.
S.
Viaer.
Random
Sampling
with
a
Reservoir.
ACM
Transac@ons
on
Mathema@cal
Soqware,
Vol.
11(1):37–57,
March
1985.
Obtain
a
representa@ve
sample
from
a
data
stream

Maintain
dynamic
sample

A
data
stream
is
a
con@nuous
process

Not
known
in
advance
how
many
points
may
elapse
before
an
analyst
may
need
to
use
a

representa@ve
sample

Reservoir
sampling
[1]

Probabilis@c
inser@ons
and
dele@ons
on
arrival
of
new
stream
points

Probability
of
successive
inser@on
of
new
points
reduces
with
progression
of
the
stream

An
unbiased
sample
contains
a
larger
and
larger
frac@on
of
points
from
the
distant
history
of
the
stream

Prac@cal
perspec@ve

Data
stream
may
evolve
and
hence,
the
majority
of
the
points
in
the
sample
may
represent
the

stale
history

108
Sampling
Sliding
window
approach
(sample
size
k,
window
width
n)

Sequence-‐based

Replace
expired
element
with
newly
arrived
element

Disadvantage:
highly
periodic

Chain-‐sample
approach

Select
element
ith
with
probability
Min(i,n)/n

Select
uniformly
at
random
an
index
from
[i+1,
i+n]
of
the
element
which
will
replace
the
ith
item

Maintain
k
independent
chain
samples

Timestamp-‐based

#
elements
in
a
moving
window
may
vary
over
@me

Priority-‐sample
approach
3 5 1 4 6 2 8 5 2 3 5 4 2 2 5 0 9 8 4 6 7 3
3 5 1 4 6 2 8 5 2 3 5 4 2 2 5 0 9 8 4 6 7 3
3 5 1 4 6 2 8 5 2 3 5 4 2 2 5 0 9 8 4 6 7 3
3 5 1 4 6 2 8 5 2 3 5 4 2 2 5 0 9 8 4 6 7 3
[1]
B.
Babcock.
Sampling
From
a
Moving
Window
Over
Streaming
Data.
In
Proceedings
of
SODA,
2002.

109
Sampling
[1]
C.
C.
Aggarwal.On
Biased
Reservoir
Sampling
in
the
presence
of
Stream
Evolu@on.
in
Proceedings
of
VLDB,
2006.

Biased
Reservoir
Sampling
[1]

Use
a
temporal
bias
func@on
-‐
recent
points
have
higher
probability
of

being
represented
in
the
sample
reservoir

Memory-‐less
bias
func@ons

Future
probability
of
retaining
a
current
point
in
the
reservoir
is
independent
of
its
past
history
or

arrival
@me

Probability
of
an
rth
point
belonging
to
the
reservoir
at
the
@me
t
is
propor@onal
to
the
bias
func@on

Exponen@al
bias
func@ons
for
rth
data
point
at
@me
t,

where,
r
≤
t,

λ
∈
[0,
1]
is
the

bias
rate

Maximum
reservoir
requirement
R(t)
is
bounded

110
Filtering
Set
Membership

Determine,
with
some
false
probability,
if
an
item
in
a
data
stream
has

been
seen
before

Databases
(e.g.,
speed
up
semi-‐join
opera@ons),
Caches,
Routers,
Storage
Systems

Reduce
space
requirement
in
probabilis@c
rou@ng
tables

Speedup
longest-‐preﬁx
matching
of
IP
addresses

Encode
mul@cast
forwarding
informa@on
in
packets

Summarize
content
to
aid
collabora@ons
in
overlay
and
peer-‐to-‐peer
networks

Improve
network
state
management
and
monitoring

111
Filtering
Set
Membership
Applica@on
to
hyphena@on

programs

Early
UNIX
spell
checkers
[1]
Illustra@on
borrowed
from
hap://www.eecs.harvard.edu/~michaelm/postscripts/im2005b.pdf
[1]

112
Filtering
Set
Membership
Natural
generaliza@on
of
hashing

False
posi@ves
are
possible

No
false
nega@ves

No
dele@ons
allowed

For
false
posi@ve
rate
ε,
#
hash
func@ons
=
log2(1/ε)
where,
n
=
#
elements,

k
=
#
hash
func@ons

m
=
#
bits
in
the
array

113
Filtering
Set
Membership
Minimizing
false
posi@ve
rate
ε
w.r.t.
k
[1]

k
=
ln
2
*
(m/n)

ε
=
(1/2)k
≈
(0.6185)m/n

1.44
*
log2(1/ε)
bits
per
item

Independent
of
item
size
or
#
items

Informa@on-‐theore@c
minimum:
log2(1/ε)
bits
per
item

44%
overhead

X
=
#
0
bits
where
[1]
A.
Broder
and
M.
Mitzenmacher.
Network
Applica@ons
of
Bloom
Filters:
A
Survey.
In
Internet
Mathema@cs
Vol.
1,
No.
4,
2005.

114
Filtering
Set
Membership:
Cuckoo
Filter
[1]
Key
Highlights

Add
and
remove
items
dynamically

For
false
posi@ve
rate
ε
<
3%,
more
space
efficient
than
Bloom
filter

Higher
performance
than
Bloom
filter
for
many
real
workloads

Asympto@cally
worse
performance
than
Bloom
filter

Min
fingerprint
size
α
log
(#
entries
in
table)

Overview

Stores
only
a
fingerprint
of
an
item
inserted

Original
key
and
value
bits
of
each
item
not
retrievable

Set
membership
query
for
item
x:
search
hash
table
for
fingerprint
of
x
[1]
Fan
et
al.,
Cuckoo
Filter:
Prac@cally
Beaer
Than
Bloom.
In
Proceedings
of
the
10th
ACM
Interna@onal
on
Conference
on
Emerging
Networking
Experiments
and
Technologies,
2014.

115
Filtering
Set
Membership
[1]
R.
Pagh
and
F.
Rodler.
Cuckoo
hashing.
Journal
of
Algorithms,
51(2):122-‐144,
2004.

[2]
Illustra@on
borrowed
from
“Fan
et
al.,
Cuckoo
Filter:
Prac@cally
Beaer
Than
Bloom.
In
Proceedings
of
the
10th
ACM
Interna@onal
on
Conference
on
Emerging
Networking
Experiments
and
Technologies,
2014.”
[2]
Illustra@on
of
Cuckoo
hashing
[2]
Cuckoo Hashing [1]
High
space
occupancy

Prac@cal
implementa@ons:
mul@ple
items/bucket

Example
uses:
Soqware-‐based
Ethernet
switches

Cuckoo Filter [2]
Uses
a
mul@-‐way
associa@ve
Cuckoo
hash
table

Employs
par@al-‐key
cuckoo
hashing

Store
ﬁngerprint
of
an
item

Relocate
exis@ng
ﬁngerprints
to
their
alterna@ve
loca@ons
[2]

Dele@on

Item
must
have
been
previously
inserted
116
Filtering
Set
Membership
Cuckoo Filter
Par@al-‐key
cuckoo
hashing

Fingerprint
hashing
ensures
uniform

distribu@on
of
items
in
the
table

Length
of
fingerprint
<<
Size
of
h1
or
h2

Possible
to
have
mul@ple
entries
of
a

fingerprint
in
a
bucket
Alternate

bucket
Significantly
shorter

than
h1
and
h2

117
Filtering
Set
Membership
Comparison
k
➛ # hash functions, d
➛ # partitions

118
Cardinality
Dis@nct
Elements

Database
systems/Search
engines

#
dis@nct
queries

Network
monitoring
applica@ons

Natural
language
processing

#
dis@nct
mo@fs
in
a
DNA
sequence

#
dis@nct
elements
of
RFID/sensor
networks

119
Previous
work

Probabilis@c
coun@ng
[Flajolet
and
Mar@n,
1985]

LogLog
coun@ng
[Durand
and
Flajolet,
2003]

HyperLogLog
[Flajolet
et
al.,
2007]

Sliding
HyperLogLog
[Chabchoub
and
Hebrail,
2010]

HyperLogLog
in
Prac@ce
[Heule
et
al.,
2013]

Self-‐Organizing
Bitmap
[Chen
and
Cao,
2009]

Discrete
Max-‐Count
[Ting,
2014]

Sequence
of
sketches
forms
a
Markov
chain
when
h
is
a
strong
universal
hash

Es@mate
cardinality
using
a
mar@ngale
Cardinality

120
Comparison
N
≤
109
Cardinality

121
Hyperloglog

Apply
hash
func@on
h
to
every
element
in
a
mul@set

Cardinality
of
mul@set
is
2max(ϱ)
where
0ϱ-‐11
is
the
bit
paaern
observed
at
the

beginning
of
a
hash
value

Above
suﬀers
with
high
variance

Employ
stochas@c
averaging

Par@@on
input
stream
into
m
sub-‐streams
Si
using
ﬁrst
p
bits
of
hash
values
(m
=
2p)
where
Cardinality

122
Use
of
64-‐bit
hash
func@on

Total
memory
requirement
5
*
2p
-‐>
6
*
2p,
where
p
is
the
precision

Empirical
bias
correc@on

Uses
empirically
determined
data
for
cardinali@es
smaller
than
5m
and
uses
the
unmodified
raw

es@mate
otherwise

Sparse
representa@on

For
n≪m,
store
an
integer
obtained
by
concatena@ng
the
bit
paaerns
for
idx
and
ϱ(w)

Use
variable
length
encoding
for
integers
that
uses
variable
number
of
bytes
to
represent

integers

Use
difference
encoding
-‐
store
the
difference
between
successive
elements

Other
op@miza@ons
[1,
2]
Hypeloglog Optimizations
[1]
hap://druid.io/blog/2014/02/18/hyperloglog-‐op@miza@ons-‐for-‐real-‐world-‐systems.html

[2]
hap://an@rez.com/news/75
Cardinality

123
Self-‐Learning
Bitmap
(S-‐bitmap)
[1]

Achieve
constant
rela@ve
es@ma@on
errors
for
unknown
cardinali@es
in
a
wide

range,
say
from
10s
to
>106

Bitmap
obtained
via
adap@ve
sampling
process

Bits
corresponding
to
the
sampled
items
are
set
to
1

Sampling
rates
are
learned
from
#
dis@nct
items
already
passed
and
reduced
sequen@ally
as

more
bits
are
set
to
1

For
given
input
parameters
Nmax
and
es@ma@on
precision
ε,
size
of
bit
mask

For
r
=
1
-‐2ε2(1+ε2)-‐1
and
sampling
probability
pk
=
m
(m+1-‐k)-‐1(1+ε2)rk,
where
k
∈
[1,m]

Rela@ve
error
≣
ε
[1]
Chen
et
al.
“Dis@nct
coun@ng
with
a
self-‐learning
bitmap”.
Journal
of
the
American
Sta@s@cal
Associa@on,
106(495):879–890,
2011.
Cardinality

124
Quantiles
Quan@les,
Histograms

Large
set
of
real-‐world
applica@ons

Database
applica@ons

Sensor
networks

Opera@ons

Proper@es

Provide
tunable
and
explicit
guarantees
on
the
precision
of
approxima@on

Single
pass

Early
work

[Greenwald
and
Khanna,
2001]
-‐
worst
case
space
requirement

[Arasu
and
Manku,
2004]
-‐
sliding
window
based
model,
worst
case
space

requirement

125
q-‐digest
[1]

Groups
values
in
variable
size
buckets
of
almost

equal
weights

Unlike
a
tradi@onal
histogram,
buckets
can
overlap

Key
features

Detailed
informa@on
about
frequent
values
preserved

Less
frequent
values
lumped
into
larger
buckets

Using
message
of
size
m,
answer
within
an
error
of

Except
root
and
leaf
nodes,
a
node
v
∈
q-‐digest
iﬀ
Max
signal

value
#
Elements
Compression

Factor
Complete
binary
tree
[1]
Shrivastava
et
al.,
Medians
and
Beyond:
New
Aggrega@on
Techniques
for
Sensor
Networks.
In
Proceedings
of
SenSys,
2004.
Quantiles

126
q-‐digest

Building
a
q-‐digest

q-‐digests
can
be
constructed
in
a
distributed
fashion

Merge
q-‐digests
Quantiles

127
t-‐digest
[1]

Approxima@on
of
rank-‐based
sta@s@cs

Compute
quan@le
q
with
an
accuracy
rela@ve
to
max(q,
1-‐q)

Compute
hybrid
sta@s@cs
such
as
trimmed
sta@s@cs

Key
features

Robust
with
respect
to
highly
skewed
distribu@ons

Independent
of
the
range
of
input
values
(unlike
q-‐digest)

Rela@ve
error
is
bounded

Non-‐equal
bin
sizes

Few
samples
contribute
to
the
bins
corresponding
to
the
extreme
quan@les

Merging
independent
t-‐digests

Reasonable
accuracy
[1]T.
Dunning
and
O.
Ertl,
“”Compu@ng
Extremely
Accurate
Quan@les
using
t-‐digests”,
2017.
haps://github.com/tdunning/t-‐digest/blob/master/docs/t-‐digest-‐paper/histo.pdf

Quantiles

128
t-‐digest

Group
samples
into
sub-‐sequences

Smaller
sub-‐sequences
near
the
ends

Larger
sub-‐sequences
in
the
middle

Scaling
func@on

Mapping
k
is
monotonic

k(0)
=
1
and
k(1)
=
δ

k-‐size
of
each
subsequence
<
1

No@onal

Index
Compression

parameterQuan@le
Quantiles

129
t-‐digest

Es@ma@ng
quan@le
via
interpola@on

Sub-‐sequences
contain
centroid
of
the
samples

Es@mate
the
boundaries
of
the
sub-‐sequences

Error

Scales
quadra@cally
in
#
samples

Small
#
samples
in
the
sub-‐sequences
near
q=0
and
q=1
improves
accuracy

Lower
accuracy
in
the
middle
of
the
distribu@on

Larger
sub-‐sequences
in
the
middle

Two
ﬂavors

Progressive
merging
(buﬀering
based)
and
clustering
variant
Quantiles

130
Frequent Elements
Applica@ons

Track
bandwidth
hogs

Determine
popular
tourist
des@na@ons

Itemset
mining

Entropy
es@ma@on

Compressed
sensing

Search
log
mining

Network
data
analysis

DBMS
op@miza@on

Count-‐min
Sketch
[1]

A
two-‐dimensional
array
counts
with
w
columns
and
d
rows

Each
entry
of
the
array
is
ini@ally
zero

d
hash
func@ons
are
chosen
uniformly
at
random
from
a
pairwise
independent

family

Update

For
a
new
element
i,
for
each
row
j
and
k
=
hj(i),
increment
the
kth
column
by
one

Point
query

where,
sketch
is
the
table

Parameters
131
),( δε
}1{}1{:,,1 wnhh d ……… →
!
!
"
#
#
$
=
ε
e
w
!
!
"
#
#
$
=
δ
1
lnd
[1]
Cormode,
Graham;
S.
Muthukrishnan
(2005).
"An
Improved
Data
Stream
Summary:
The
Count-‐Min
Sketch
and
its
Applica@ons".
J.
Algorithms
55:
29–38.
Frequent Elements

Variants
of
Count-‐min
Sketch
[1]

Count-‐Min
sketch
with
conserva@ve
update
(CU
sketch)

Update
an
item
with
frequency
c

Avoid
unnecessary
upda@ng
of
counter
values
=>
Reduce
over-‐es@ma@on
error

Prone
to
over-‐es@ma@on
error
on
low-‐frequency
items

Lossy
Conserva@ve
Update
(LCU)
-‐
SWS

Divide
stream
into
windows

At
window
boundaries,
∀
1
≤
i
≤
w,
1
≤
j
≤
d,
decrement
sketch[i,j]
if
0
<
sketch[i,j]
≤

132
[1]
Cormode,
G.
2009.
Encyclopedia
entry
on
’Count-‐MinSketch’.
In
Encyclopedia
of
Database
Systems.
Springer.,
511–516.
Frequent Elements

133
OPEN SOURCE TWITTER
YAHOO!
HUAWEI
streamDM^
SGD
Learner
and
Perceptron

Naive
Bayes

CluStream

Hoeﬀding
Decision
Trees

Bagging

Stream
KM++
DATA
SKETCHES
*
Unique

Quan@le

Histogram

Sampling

Theta
Sketches

Tuple
Sketches

Most
Frequent
ALGEBIRD#
Filtering

Unique

Histogram

Most
Frequent
*
haps://datasketches.github.io/

#
haps://github.com/twiaer/algebird

^
hap://huawei-‐noah.github.io/streamDM/

**
haps://github.com/jiecchen/StreamLib
StreamLib**

134
Anomaly Detection
[1]
A.
S.
Willsky,
“A
survey
of
design
methods
for
failure
detec@on
systems,”
Automa@ca,
vol.
12,
pp.
601–611,
1976.
Very
rich
-‐
over
150
yrs
-‐
history

Manufacturing

Sta@s@cs

Econometrics,
Financial
engineering

Signal
processing

Control
systems,
Autonomous
systems
-‐
fault
detec@on
[1]

Networking

Computa@onal
biology
(e.g.,
microarray
analysis)

Computer
vision

135
Very
rich
-‐
over
150
yrs
-‐
history

Anomalies
are
contextual
in
nature
“DISCORDANT observations may be deﬁned as those which
present the appearance of differing in respect of their law of
frequency from other observations with which they ale
combined. In the treatment of such observations there is great
diversity between authorities ; but this discordance of methods
may be reduced by the following reﬂection. Different methods
are adapted to different hypotheses about the cause of a
discordant observation; and different hypotheses are true, or
appropriate, according as the subject-matter, or the degree of
accuracy required, is different.”
F. Y. Edgeworth, “On Discordant Observations”, 1887.
Anomaly Detection

136
Anomaly Detection
CHARACTERISTICS
DIRECTION
Posi@ve,
Nega@ve
FREQUENCY
Reliability
WIDTH
Ac@onability
MAGNITUDE
Severity
Global
Local
#
6
!
!

137
Anomaly Detection
COMMON
APPROACHES
DOMAINS
STATS

MFG

OPS
NOT
VALID
in
real-‐life
Moving Averages

SMA, EWMA, PEWMA
Assumption

Normal Distribution
PARAMS
WIDTH

DECAY
Rule Based

µ ± σ
Stone
1868

Glaisher
1872

Edgeworth
1887

Stewart
1920

Irwin
1925

Jeﬀreys
1932

Rider
1933

138
Anomaly Detection
ROBUST
MEASURES
MEDIAN MAD [1] MCD [2] MVEE [3,4]
Median Absolute
Deviation
Minimum Covariance
Determinant
Minimum Volume
Enclosing Ellipsoid
[1]P.
J.
Rousseeuw
and
C.
Croux,
“Alterna@ves
to
the
Median
Absolute
Devia@on”,
1993.

[2]
hap://onlinelibrary.wiley.com/wol1/doi/10.1002/wics.61/abstract

[3]
P.
J.
Rousseeuw
and
A.
M.
Leroy.,“Robust
Regression
and
Outlier
Detec@on”,
1987.

[4]
M.
J.Todda
and
E.
A.
Yıldırım
,
“On
Khachiyan's
algorithm
for
the
computa@on
of
minimum-‐volume
enclosing
ellipsoids”,
2007.

139
Anomaly Detection
Challenges
NOISE STATIONARITY
SEASONALITY TREND BREAKOUT

140
Anomaly Detection
Challenges

Live
Data

Mul@-‐dimensional

Low
memory
footprint

Accuracy
vs.
Speed
trade-‐oﬀ

Encoding
the
context

Data
types

Video,
Audio,
Text

Data
veracity

Wearables

Smart
ci@es,
Connected
Home,
Internet
of
Things

TYING
IT
ALL
TOGETHER

142
Real Time Architectures
For
the
streaming
world

Lambda

Run
computa@on
twice
in
diﬀerent
systems

Kappa

Run
computa@on
once

143
Lambda Architecture
Overview

144
Lambda Architecture
Batch
Layer

Accurate
but
delayed

HDFS/Mapreduce

Fast
Layer

Inexact
but
fast

Storm/Ka}a

Query
Merge
Layer

Merge
results
from
batch
and
fast
layers
at
query
@me

145
Lambda Architecture
Characteris@cs

During
Inges@on,
Data
is
cloned
into
two.

One
goes
to
the
batch
layer

Other
goes
to
the
fast
layer

Processing
done
at
two
layers

Expressed
as
Map-‐reduces
in
batch
layer

Expressed
as
topologies
in
the
speed
layer

146
Lambda Architecture
Challenges

Inherently
Ineﬃcient

Data
is
replicated
twice

Computa@on
is
replicated
twice

Opera@onally
Ineﬃcient

Maintain
both
batch
and
streaming
systems

Tune
topologies
for
both
systems

147
Kappa Architecture
Streaming
is
everything

Computa@on
is
expressed
in
a
topology

Computa@on
is
mostly
done
only
once
when
the

data
arrives

Data
moves
into
permanent
storage

149
Kappa Architecture
Challenges

Data
Reprocessing
could
be
very
expensive

Code/Logic
Changes

Either
Data
needs
to
be
brought
back
from
Storage
to
the
bus

Or
Computa@on
needs
to
be
expressed
to
run
on
bulk-‐storage

Historic
Analysis

How
to
do
data
analy@cs
over
all
of
last
years
data

151
Observations
Lambda
is
complicated
and
ineﬃcient

Replica@on
of
Data
and
Computa@on

Mul@ple
Systems
to
operate
and
tune

Kappa
is
too
simplis@c

Data
reprocessing
too
expensive

Historical
analysis
not
possible

152
Observations
Computa@on
across
batch/real@me
is
similar

Expressed
as
DAGS

Run
parallely
on
the
cluster

Intermediate
results
need
not
be
materialized

Func@onal/Declara@ve
APIs

Storage
is
the
key

Messaging/Storage
are
two
faces
of
the
same
coin

They
serve
the
same
data

153
Real-Time Storage Requirements
Requirements
for
a
real-‐Hme
storage
plakorm
Be
able
to
write
and
read
streams
of
records
with
low
latency,
storage
durability

Data
storage
should
be
durable,
consistent
and
fault
tolerant

Enable
clients
to
stream
or
tail
ledgers
to
propagate
data
as
they’re
wriaen

Store
and
provide
access
to
both
historic
and
real-‐@me
data

154
Apache BookKeeper - Stream Storage
A
storage
for
log
streams
Replicated,
durable
storage
of
log
streams

Provide
fast
tailing/streaming
facility

Op@mized
for
immutable
data

Low-‐latency
durability

Simple
repeatable
read
consistency

High
write
and
read
availability

155
Record
Smallest
I/O
and
Address
Unit
A
sequence
of
invisible
records

A
record
is
sequence
of
bytes

The
smallest
I/O
unit,
as
well
as
the
unit
of
address

Each
record
contains
sequence
numbers
for
addressing

156
Logs
Two
Storage
PrimiHves
Ledger:
A
ﬁnite
sequence
of
records.

Stream:
An
inﬁnite
sequence
of
records.

157
Ledger
Finite
sequence
of
records
Ledger:
A
ﬁnite
sequence
of
records
that
gets
terminated

A
client
explicitly
close
it

A
writer
who
writes
records
into
it
has
crashed.

158
Stream
Inﬁnite
sequence
of
records
Stream:
An
unbounded,
inﬁnite
sequence
of
records

Physically
comprised
of
mul@ple
ledgers

159
Bookies
Stores
fragment
of
records
Bookie
-‐
A
storage
server
to
store
data
records

Ensemble:
A
group
of
bookies
storing
the
data
records
of
a
ledger

Individual
bookies
store
fragments
of
ledgers

160
Bookies
Stores
fragment
of
records

161
Tying it all together
A
typical
installaHon
of
Apache
BookKeeper

162
BookKeeper - Use Cases
Combine
messaging
and
storage
Stream
Storage
combines
the
func@onality
of
streaming
and
storage
WAL
-‐
Write
Ahead
Log Message
Store Object
Store
SnapshotsStream
Processing

163
BookKeeper in Real-Time Solution
Durable
Messaging,
Scalable
Compute
and
Stream
Storage

164
BookKeeper in Production
Enterprise
Grade
Stream
Storage
4+
years
at
Twiaer
and
Yahoo,
2+
years
at
Salesforce

Mul@ple
use
cases
from
messaging
to
storage

Database
replica@on,
Message
store,
Stream
compu@ng
…

600+
bookies
in
one
single
cluster

Data
is
stored
from
days
to
a
year

Millions
of
log
streams

1
trillion
records/day,
17
PB/day

165
Companies using BookKeeper
Enterprise
Grade
Stream
Storage

166
Real Time is Messy and Unpredictable
Aggregation
Systems
Messaging
Systems
Result
Engine
HDFS
Queryable
Engines

167
Streamlio - Uniﬁed Architecture
Interactive
Querying
Storm API
Trident/Apache
Beam
SQL
Application
Builder
Pulsar
API
BK/
HDFS
API
Kubernetes
Metadata
Management
Operational
Monitoring
Chargeback
Security
Authentication
Quota
Management
Rules
Engine
Kafka
API

168
RESOURCES
Sketching
Algorithms
haps://www.cs.upc.edu/~gavalda/papers/portoschool.pdf

haps://mapr.com/blog/some-‐important-‐streaming-‐algorithms-‐you-‐should-‐know-‐about/

haps://gist.github.com/debasishg/8172796
Synopses
for
Massive
Data:
Samples,
Histograms,
Wavelets,

Sketches
Data
Streams:
Models
and
Algorithms
Charu
Aggarwal

hap://www.springer.com/us/book/9780387287591
Data
Streams:
Algorithms
and
ApplicaHons
Muthu
Muthukrishnan

hap://algo.research.googlepages.com/eight.ps
Graph
Streaming
Algorithms
A.
McGregor
G.
Cormode,
M.
Garofalakis
and
P.
J.
Haas
Sketching
as
a
Tool

for
Numerical
Linear
Algebra
D.
Woodruﬀ

169
Dhalion:
Self-‐Regula@ng
VLDB’17
Twiaer
Heron:
Towards
Extensible
ICDE’17
Dhalion:
Self-‐Regula@ng
VLDB’17
MillWheel:

VLDB’13
Readings
Stream
Processing
in
Heron
Stream
Processing
in
Heron
Streaming
Engines
Twiaer
Heron:
Stream
SIGMOD’15
Processing
at
scale
Fault-‐Tolerant
Stream
Processing
at
Internet
Scale
The
Dataﬂow
Model:
A
Prac@cal
VLDB’15
Approach
to
Balancing
Correctness,

Latency
and
Cost
in
Massive-‐Scale,

Unbounded
Out-‐of-‐Order
Data
Processing
Anomaly
Detec@on
in
Strata
San
Jose’17
Real-‐Time
Data
Streams
Using
Heron

170
Readings
FOCS’00

Clustering Data Streams
SIGMOD’02

Querying and mining data streams:
You only get one look
SIAM Journal of Computing’09

Stream Order and Order Statistics: Quantile
Estimation in Random-Order Streams
PODS’02

Models and Issues in Data Stream Systems
SIGMOD’07

Statistical Analysis of Sketch Estimators
PODS’10

An optimal algorithm for the distinct
elements problem

171
Readings
SODA’10

Coresets and Sketches for high dimensional
subspace approximation problems
SIGMOD’16

Time Adaptive Sketches (Ada-Sketches) for
Summarizing Data Streams
SOSR’17

Heavy-Hitter Detection Entirely in the Data
Plane
PODS’12

Graph Sketches: Sparsiﬁcation, Spanners, and
Subgraphs
Arxiv’16

Coresets and Sketches
ACM Queue’17

Data Sketching: The approximate approach is
often faster and more efﬁcient

173
GET
IN
TOUCH
C O N T A C T
U S
@arun_kejariwal

@kramasamy,

@sanjeerk

@sijieg,

@merlimat

@nlu90
karthik@stremlio.io

arun_kejariwal@acm.org

E N J O Y T H E P R E S E N T A T I O N
The End

Modern real-time streaming architectures

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Modern real-time streaming architectures

Similar to Modern real-time streaming architectures (20)

More from Arun Kejariwal

More from Arun Kejariwal (20)

Recently uploaded

Recently uploaded (20)

Modern real-time streaming architectures