In this tutorial we walk through state-of-the-art streaming systems, algorithms, and deployment architectures and cover the typical challenges in modern real-time big data platforms and offering insights on how to address them. We also discuss how advances in technology might impact the streaming architectures and applications of the future. Along the way, we explore the interplay between storage and stream processing and discuss future developments.
4. 4
Internet of Things (IoT)
$1.9
T
in
value
by
2020
-‐
Mfg
(15%),
Health
Care
(15%),
Insurance
(11%)
26
B
-‐
75
B
units
[2,
3,
4,
5]
Improve
opera@onal
efficiencies,
customer
experience,
new
business
modelsY
Beacons:
Retailers
and
bank
branches
60M
units
market
by
2019
[6]
Smart
buildings:
Reduce
energy
costs,
cut
maintenance
costs
Increase
safety
&
security
Large
Market
Poten@al
5. 5
The Future
Biostamps [2]
Mobile
Sensor Network
Exponential growth [1]
[1]
hap://opensignal.com/assets/pdf/reports/2015_08_fragmenta@on_report.pdf
[2]
hap://www.ericsson.com/thinkingahead/networked_society/stories/#/film/mc10-‐biostamp
6. 6
Intelligent Health Care
Con@nuous
Monitoring
Tracking Movements
Measure
effect
of
social
influences
Google Lens
Measure
glucose
level
in
tears
Watch/Wristband
Smart Textiles
Skin
temperature
Perspira@on
Ingestible Sensors
Medica@on
compliance
[1]
Heart
func@on
!
!
8. 8
Increasingly Connected World
Internet of Things
30
B
connected
devices
by
2020
Health Care
153
Exabytes
(2013)
-‐>
2314
Exabytes
(2020)
Machine Data
40%
of
digital
universe
by
2020
Connected Vehicles
Data
transferred
per
vehicle
per
month
4
MB
-‐>
5
GB
Digital Assistants (Predictive Analytics)
$2B
(2012)
-‐>
$6.5B
(2019)
[1]
Siri/Cortana/Google
Now
Augmented/Virtual Reality
$150B
by
2020
[2]
Oculus/HoloLens/Magic
Leap
Ñ
!+
>
10. 10
Traditional Data Processing
Challenges
Introduces
too
much
“decision
latency”
Responses
are
delivered
“aqer
the
fact”
Maximum
value
of
the
iden@fied
situa@on
is
lost
Decisions
are
made
on
old
and
stale
data
Data
at
Rest
Store Analyze Act
11. 11
The New Era: Streaming Data/Fast Data
Events
are
analyzed
and
processed
in
real-‐@me
as
they
drive
Decisions
are
@mely,
contextual
and
based
on
fresh
data
Decision
latency
is
eliminated
Data
in
mo@on
12. 12
Real Time Use Cases
Algorithmic
trading
Online
fraud
detec@on
Geo
fencing
Proximity/loca@on
tracking
Intrusion
detec@on
systems
Traffic
management
Real
@me
recommenda@ons
Churn
detec@on
Internet
of
things
Social
media/data
analy@cs
Gaming
data
feed
13. 13
Requirements of Stream Processing
In-stream Handle imperfections Predictable Performance
Process
data
as
it
is
passes
by
Delayed,
missing
and
out-‐of-‐order
data
and
Repeatable and
Scalability
I
14. 14
High level languages Integrate stored and
streaming data
Data safety and
availability
Process and respond
SQL
or
DSL
for
comparing
present
with
the
past
and
Repeatable
Applica@on
should
keep
at
high
volumes
" # $
Requirements of Stream Processing
17. 17
Current Messaging Systems
01 02 03 04
05 06 07 08
ActiveMQ RabbitMQ Pulsar RocketMQ
Azure
Event Hub
Google
Pub-Sub
Satori Kafka
18. 18
Why Apache Pulsar?
Ordering
Guaranteed
ordering
MulH-‐tenancy
A
single
cluster
can
support
many
tenants
and
use
cases
High
throughput
Can
reach
1.8
M
messages/s
in
a
single
parHHon
Durability
Data
replicated
and
synced
to
disk
Geo-‐replicaHon
Out
of
box
support
for
geographically
distributed
applicaHons
Unified
messaging
model
Support
both
Topic
&
Queue
semanHc
in
a
single
model
Delivery
Guarantees
At
least
once,
at
most
once
and
effecHvely
once
Low
Latency
Low
publish
latency
of
5ms
at
99pct
Highly
scalable
Can
support
millions
of
topics
20. 20
Pulsar Producer
PulsarClient client = PulsarClient.create(
“http://broker.usw.example.com:8080”);
Producer producer = client.createProducer(
“persistent://my-property/us-west/my-namespace/my-topic”);
// handles retries in case of failure
producer.send("my-message".getBytes());
// Async version:
producer.sendAsync("my-message".getBytes()).thenRun(() -> {
// Message was persisted
});
21. 21
Pulsar Consumer
PulsarClient client = PulsarClient.create(
"http://broker.usw.example.com:8080");
Consumer consumer = client.subscribe(
"persistent://my-property/us-west/my-namespace/my-topic",
"my-subscription-name");
while (true) {
// Wait for a message
Message msg = consumer.receive();
System.out.println("Received message: " + msg.getData());
// Acknowledge the message so that it can be deleted by broker
consumer.acknowledge(msg);
}
22. 22
Pulsar Architecture
Pulsar
Broker
1 Pulsar
Broker
1 Pulsar
Broker
1
Bookie
1 Bookie
2 Bookie
3 Bookie
4 Bookie
5
Apache
BookKeeper
Apache
Pulsar
Producer
Consumer
Stateless
Serving
BROKER
Clients interact only with brokers
No state is stored in brokers
BOOKIES
Apache BookKeeper as the storage
Storage is append only
Provides high performance, low latency
Durability
No data loss. fsync before acknowledgement
23. 23
Pulsar Architecture
Pulsar
Broker
1 Pulsar
Broker
1 Pulsar
Broker
1
Bookie
1 Bookie
2 Bookie
3 Bookie
4 Bookie
5
Apache
BookKeeper
Apache
Pulsar
Producer
Consumer
SeparaHon
of
Storage
and
Serving
SERVING
Brokers can be added independently
Traffic can be shifted quickly across brokers
STORAGE
Bookies can be added independently
New bookies will ramp up traffic quickly
24. 24
Pulsar Architecture
Clients
CLIENTS
Lookup correct broker through service discovery
Establish connections to brokers
Enforce authentication/authorization during
connection establishment
Establish producer/consumer session
Reconnect with backoff strategy
Dispatcher
Load
Balancer
Managed
Ledger
CacheGlobal
Replica@on
Producer
Consumer
Service
Discovery
Pulsar
Broker
Bookie
25. 25
Pulsar Architecture
Message
Dispatching
DISPATCHER
End-to-end async message processing
Messages relayed across producers, bookies and
consumers with no copies
Pooled reference count buffers
Dispatcher
Load
Balancer
Managed
Ledger
CacheGlobal
Replica@on
Producer
Consumer
Service
Discovery
Pulsar
Broker
Bookie
MANAGED LEDGER
Abstraction of single topic storage
Cache recent messages
26. 26
Pulsar Architecture
Geo
ReplicaHon
GEO REPLICATION
Asynchronous replication
Integrated in the broker message flow
Simple configuration to add/remove regions
Topic
(T1) Topic
(T1)
Topic
(T1)
Subscrip@on
(S1)
Subscrip@on
(S1)
Producer
(P1)
Consumer
(C1)
Producer
(P3)
Producer
(P2)
Consumer
(C2)
Data
Center
A Data
Center
B
Data
Center
C
27. 27
Pulsar Use Cases - Message Queue
Online
Events Topic
(T)
Worker
1
Worker
2
Decouple
Online/Offline
Topic
(T)
Worker
3
MESSAGE QUEUES
Decouple online or background
High availability
Reliable data transport
NoHficaHons
Long
running
tasks
Low
latency
publish
28. 28
Pulsar Use Cases - Feedback System
Event Topic
(T)
Propagate
States
Controller
Topic
(T)
Serving
System Serving
System Serving
System
FEEDBACK SYSTEM
Coordinate large number of machines
Propagate states
Examples
State propagation
Personalization
Ad-systems
Feedback
Updates
29. 29
Pulsar in Production
3+
years
Serves
2.3
million
topics
100
billion
messages/day
Average
latency
<
5
ms
99%
15
ms
(strong
durability
guarantees)
Zero
data
loss
80+
applica@ons
Self
served
provisioning
Full-‐mesh
cross-‐datacenter
replica@on
-‐
8+
data
centers
33. 33
Apache Beam
Promises
Abstrac@ng
the
Computa@on
Express
Computa@on
Expressive
Windowing/Triggering
Incremental
Processing
for
late
data
Selectable
Engine
Select
criteria
Latency
Resource
Cost
Supported
Engines
Google
DataFlow
Apache
Spark,
Apache
Flink,
Apache
Apex
34. 34
Apache Beam
ComputaHon
AbstracHon
All
Data
is
4
tuple
Key,
Value
Event
Time
Window
the
tuple
belongs
Core
Operators
ParDo
User
supplied
DoFn
Emits
Zero
or
more
elements
GroupByKey
Groups
tuples
by
keys
in
the
window
36. 36
Apache Beam
Challenges
Mul@ple
Layers
API
vs
Execu@on
Troubleshoo@ng
complexi@es
Need
higher
level
APIs
Mul@ple
efforts
on
their
way
Other
Cloud
Vendor
Buy-‐in?
Azure/AWS?
37. 37
IBM S-Store
Promises
Combine
Stream
Processing
and
Transac@ons
Extended
an
OLTP
engine(H-‐Store)
adding
Tuple
Ordering
Windowing
Push-‐based
processing
Exactly
Once
Seman@cs
38. 38
IBM S-Store
Data
and
Processing
Model
Tuples
grouped
into
Atomic
Batches
Grouping
of
Non-‐overlapping
tuples
Treated
like
a
Transac@on
Atomic
Batches
belong
to
one
Stream
Processing
is
modeled
as
a
DAG
DAG
nodes
consume
one
or
more
streams
and
possibly
output
more
Node
logic
is
treated
as
a
Transac@on
39. 39
IBM S-Store
Exactly
Once
Guarantees
Strong
Inputs
and
Outputs
are
logged
at
every
DAG
node
On
component
failure,
the
log
is
replayed
from
snapshot
Weak
Distributed
Snapsho{ng
40. 40
IBM S-Store
Challenges
Throughput
Non
OLTP
processing
is
much
slower
compared
to
modern
systems
Scalability
Mul@-‐Node
s@ll
in
research
(2016)
41. 41
Heron Terminology
Topology
Directed
acyclic
graph
ver@ces
=
computa@on,
and
edges
=
streams
of
data
tuples
Spouts
Sources
of
data
tuples
for
the
topology
Examples
-‐
Pulsar/Ka}a/MySQL/Postgres
Bolts
Process
incoming
tuples,
and
emit
outgoing
tuples
Examples
-‐
filtering/aggrega@on/join/any
func@on
,
%
44. 44
Heron Groupings
01 02 03 04
Shuffle Grouping
Random distribution of tuples
Fields Grouping
Group tuples by a field or
multiple fields
All Grouping
Replicates tuples to all tasks
Global Grouping
Send the entire stream to one
task
/
.
-
,
46. 46
Writing Heron Topologies
Procedural - Low Level API
Directly
write
your
spouts
and
bolts
Functional - Mid Level API
Use
of
maps,
flat
maps,
transform,
windows
Declarative - SQL (coming)
Use
of
declara@ve
language
-‐
specify
what
you
want,
system
will
figure
it
out.
,
%
47. 47
Heron Design Goals
Efficiency
Reduce
resource
consump@on
Support
for
diverse
workloads
Throughput
vs
latency
sensi@ve
Support
for
mulHple
semanHcs
Atmost
once,
Atleast
once,
Effec@vely
once
NaHve
MulH-‐Language
Support
C++,
Java,
Python
Task
IsolaHon
Ease
of
debug-‐ability/isola@on/profiling
Support
for
back
pressure
Topologies
should
be
self
adjus@ng
Use
of
containers
Runs
in
schedulers
-‐
Kubernetes
&
DCOS
&
many
more
MulH-‐level
APIs
Procedural,
Func@onal
and
Declara@ve
for
diverse
applica@ons
Diverse
deployment
models
Run
as
a
service
or
pure
library
61. 61
Pulsar Operations
Reac@ng
to
Failures
Brokers
Bookies
Common
Issues
Consumer
Backlog
I/O
Priori@za@on
and
Throaling
Mul@-‐Tenancy
62. 62
Reacting to Failures - Brokers
Brokers
don’t
have
durable
state
Easily
replaceable
Topics
are
immediately
reassigned
to
healthy
brokers
Expanding
capacity
Simply
add
new
broker
node
If
other
brokers
are
overloaded,
traffic
will
be
automa@cally
assigned
Load
manager
Monitor
traffic
load
on
all
brokers
(CPU,
memory,
network,
topics)
Ini@ally
place
topics
to
least
loaded
brokers
Reassign
topics
when
a
broker
is
overloaded
63. 63
Reacting to Failures - Bookies
When
a
bookie
fails,
brokers
will
immediately
con@nue
on
other
bookies
Auto-‐Recovery
mechanism
will
re-‐establish
the
replica@on
factor
in
background
If
a
bookie
keeps
giving
errors
or
@meouts,
it
will
be
“quaran@ned”
Not
considered
for
new
ledgers
for
some
period
of
@me
64. 64
Consumer Backlog
Metrics
are
available
to
make
assessments:
When
problem
started
How
big
is
backlog?
Messages?
Disk
space?
How
fast
is
draining?
What’s
the
ETA
to
catch
up
with
publishers?
Establish
where
is
the
boaleneck
Applica@on
is
not
fast
enough
Disk
read
IO
65. 65
I/O Prioritization and Throttling
Priori@ze
access
to
IO
During
an
outage
many
tenants
might
try
to
drain
backlog
as
fast
as
they
can
Read
IO
becomes
the
boaleneck
Throaling
can
be
used
to
priori@ze
draining:
Cri@cal
use
cases
can
recover
quickly
Fewer
concurrent
readers
lead
to
higher
throughput
Once
they
catch
up,
message
will
be
dispatched
from
cache
66. 66
Enforcing Multi-Tenancy
Ensure
tenants
don’t
cause
performance
issues
on
other
tenants
Backlog
quotas
Soq-‐Isola@on
Flow
control
Throaling
In
cases
when
user
behavior
is
triggering
performance
degrada@on
Hard-‐isola@on
as
a
last
resource
for
quick
reac@on
while
proper
fix
is
deployed
Isolate
tenant
on
a
subset
of
brokers
Can
be
also
applied
at
the
BookKeeper
level
67. 67
Heron @Twitter
LARGEST
CLUSTER
100’s
of
TOPOLOGIES
BILLIONS
OF
MESSAGES100’s
OF
TERABYTESREDUCED
INCIDENTS
GOOD
NIGHT
SLEEP
3X - 5X reduction in resource usage
73. 73
Heron Happy Facts :)
v
No
more
pages
during
midnight
for
Heron
team
Ø
Very
rare
incidents
for
Heron
customer
teams
ü
Easy
to
debug
during
incident
for
quick
turn
around
§
Reduced
resource
u@liza@on
saving
cost
79. 79
Data Skew
Multiple Keys
Several
keys
map
into
single
instance
and
their
count
is
high
Single Key
Single
key
maps
into
a
instance
and
its
count
is
high
H
C
81. 81
Self Regulating Streaming Systems
Automate
Tuning
SLO
Maintenance
Self
RegulaHng
Streaming
Systems
Tuning
Manual,
@me-‐consuming
and
error-‐prone
task
of
tuning
various
systems
knobs
to
achieve
SLOs
SLO
Maintenance
of
SLOs
in
the
face
of
unpredictable
load
varia@ons
and
hardware
or
soqware
performance
degrada@on
Self
RegulaHng
Streaming
Systems
System
that
adjusts
itself
to
the
environmental
changes
and
con@nue
to
produce
results
82. 82
Self Regulating Streaming Systems
Self tuning Self stabilizing Self healing
G !
Several tuning knobs
Time consuming tuning phase
The system should take
as input an SLO and
automatically configure
the knobs.
The system should
react to external shocks
a n d a u t o m a t i c a l l y
reconfigure itself
Stream jobs are long running
Load variations are common
The system should
identify internal faults
and attempt to recover
from them
System performance affected
by hardware or software
delivering degraded quality of
service
83. 83
Enter Dhalion
Dhalion periodically executes
well-specified policies that
optimize execution based on
some objective.
We created policies that
dynamically provision resources
in the presence of load variations
and auto-tune streaming
applications so that a throughput
SLO is met.
Dhalion is a policy based
framework integrated into Heron
84. 84
Dhalion Policy Framework
Symptom
Detector 1
Symptom
Detector 2
Symptom
Detector 3
Symptom
Detector N
....
Diagnoser 1
Diagnoser 2
Diagnoser M
....
Resolver
Invocation
D
iagnosis
1
Diagnosis 2
D
iagnosis
M
Symptom 1
Symptom 2
Symptom 3
Symptom N
Symptom
Detection
Diagnosis
Generation
Resolution
Resolver 1
Resolver 2
Resolver M
....
Resolver
Selection
Metrics
85. 85
Dynamic Resource Provisioning
Policy
This
policy
reacts
to
unexpected
load
varia@ons
(workload
spikes)
Goal
Goal
is
to
scale
up
and
scale
down
the
topology
resources
as
needed
-‐
while
keeping
the
topology
in
a
steady
state
where
back
pressure
is
not
observed
H
C
Policy
86. 86
Dynamic Resource Provisioning
Pending Tuples
Detector
Backpressure
Detector
Processing Rate
Skew Detector
Resource Over
provisioning
Diagnoser
Resource Under
Provisioning
Diagnoser
Data Skew
Diagnoser
Resolver
Invocation
Diagnosis
Symptoms
Symptom
Detection
Diagnosis
Generation
Resolution
Metrics
Slow Instances
Diagnoser
Bolt
Scale
Down
Resolver
Bolt
Scale
Up
Resolver
Data
Skew
Resolver
Restart
Instances
Resolver
ImplementaHon
93. 93
Experimental Setup
% %
Spout Splitter Bolt Counter Bolt
Shuffle Grouping Fields Grouping
Microsoq
HDInsight
Intel
Xeon
ES-‐2673
CPU@2.40
GHz
28
GB
of
Memory
Throughput
of
Spouts
(No.
Of
tuples
emiaed
over
1
min)
Throughput
of
Bolts
(No.
of
tuples
emiaed
over
1
min)
Number
of
Heron
Instances
provisioned
Hardware
and
Soqware
Configura@on Evalua@on
Metrics
94. 94
Dynamic Provisioning Profile
0.00
0.20
0.40
0.60
0.80
1.00
1.20
1.40
0 10 20 30 40 50 60 70 80 90 100 110 120
Normalized Throughput
Time (in minutes)
Spout Splitter Bolt Counter Bolt
Scale
Down
Scale Up
S1
S2
S3
The Dynamic Resource
Provisioning Policy is able to
adjust the topology resources
on-the-fly when workload
spikes occur.
The policy can correctly detect and
resolve bottlenecks even on multi-
stage topologies where
backpressure is gradually
propagated from one stage of the
topology to another.
0
5
10
15
0 20 40 60 80 100 120
Number of Bolts
Time (in minutes)
Splitter Bolt Counter Bolt
Heron Instances are gradually
scaled up and down according
to the input load
102. 102
DATA SKETCHES
Early
work
The space complexity of approximating
the frequency moments
Counting
Frequent Elements
[Misra and Gries, 1982]
Flajolet and Martin 1985]
Computing on Data Streams
[Henzinger et al. 1998]
[Alon et al. 1996]
Counting
[Morris, 1977]
Median of a sequence
[Munro and Paterson, 1980]
Membership
[Bloom, 1970]
107. 107
Sampling
[1]
J.
S.
Viaer.
Random
Sampling
with
a
Reservoir.
ACM
Transac@ons
on
Mathema@cal
Soqware,
Vol.
11(1):37–57,
March
1985.
Obtain
a
representa@ve
sample
from
a
data
stream
Maintain
dynamic
sample
A
data
stream
is
a
con@nuous
process
Not
known
in
advance
how
many
points
may
elapse
before
an
analyst
may
need
to
use
a
representa@ve
sample
Reservoir
sampling
[1]
Probabilis@c
inser@ons
and
dele@ons
on
arrival
of
new
stream
points
Probability
of
successive
inser@on
of
new
points
reduces
with
progression
of
the
stream
An
unbiased
sample
contains
a
larger
and
larger
frac@on
of
points
from
the
distant
history
of
the
stream
Prac@cal
perspec@ve
Data
stream
may
evolve
and
hence,
the
majority
of
the
points
in
the
sample
may
represent
the
stale
history
108. 108
Sampling
Sliding
window
approach
(sample
size
k,
window
width
n)
Sequence-‐based
Replace
expired
element
with
newly
arrived
element
Disadvantage:
highly
periodic
Chain-‐sample
approach
Select
element
ith
with
probability
Min(i,n)/n
Select
uniformly
at
random
an
index
from
[i+1,
i+n]
of
the
element
which
will
replace
the
ith
item
Maintain
k
independent
chain
samples
Timestamp-‐based
#
elements
in
a
moving
window
may
vary
over
@me
Priority-‐sample
approach
3 5 1 4 6 2 8 5 2 3 5 4 2 2 5 0 9 8 4 6 7 3
3 5 1 4 6 2 8 5 2 3 5 4 2 2 5 0 9 8 4 6 7 3
3 5 1 4 6 2 8 5 2 3 5 4 2 2 5 0 9 8 4 6 7 3
3 5 1 4 6 2 8 5 2 3 5 4 2 2 5 0 9 8 4 6 7 3
[1]
B.
Babcock.
Sampling
From
a
Moving
Window
Over
Streaming
Data.
In
Proceedings
of
SODA,
2002.
109. 109
Sampling
[1]
C.
C.
Aggarwal.On
Biased
Reservoir
Sampling
in
the
presence
of
Stream
Evolu@on.
in
Proceedings
of
VLDB,
2006.
Biased
Reservoir
Sampling
[1]
Use
a
temporal
bias
func@on
-‐
recent
points
have
higher
probability
of
being
represented
in
the
sample
reservoir
Memory-‐less
bias
func@ons
Future
probability
of
retaining
a
current
point
in
the
reservoir
is
independent
of
its
past
history
or
arrival
@me
Probability
of
an
rth
point
belonging
to
the
reservoir
at
the
@me
t
is
propor@onal
to
the
bias
func@on
Exponen@al
bias
func@ons
for
rth
data
point
at
@me
t,
where,
r
≤
t,
λ
∈
[0,
1]
is
the
bias
rate
Maximum
reservoir
requirement
R(t)
is
bounded
110. 110
Filtering
Set
Membership
Determine,
with
some
false
probability,
if
an
item
in
a
data
stream
has
been
seen
before
Databases
(e.g.,
speed
up
semi-‐join
opera@ons),
Caches,
Routers,
Storage
Systems
Reduce
space
requirement
in
probabilis@c
rou@ng
tables
Speedup
longest-‐prefix
matching
of
IP
addresses
Encode
mul@cast
forwarding
informa@on
in
packets
Summarize
content
to
aid
collabora@ons
in
overlay
and
peer-‐to-‐peer
networks
Improve
network
state
management
and
monitoring
111. 111
Filtering
Set
Membership
Applica@on
to
hyphena@on
programs
Early
UNIX
spell
checkers
[1]
Illustra@on
borrowed
from
hap://www.eecs.harvard.edu/~michaelm/postscripts/im2005b.pdf
[1]
112. 112
Filtering
Set
Membership
Natural
generaliza@on
of
hashing
False
posi@ves
are
possible
No
false
nega@ves
No
dele@ons
allowed
For
false
posi@ve
rate
ε,
#
hash
func@ons
=
log2(1/ε)
where,
n
=
#
elements,
k
=
#
hash
func@ons
m
=
#
bits
in
the
array
113. 113
Filtering
Set
Membership
Minimizing
false
posi@ve
rate
ε
w.r.t.
k
[1]
k
=
ln
2
*
(m/n)
ε
=
(1/2)k
≈
(0.6185)m/n
1.44
*
log2(1/ε)
bits
per
item
Independent
of
item
size
or
#
items
Informa@on-‐theore@c
minimum:
log2(1/ε)
bits
per
item
44%
overhead
X
=
#
0
bits
where
[1]
A.
Broder
and
M.
Mitzenmacher.
Network
Applica@ons
of
Bloom
Filters:
A
Survey.
In
Internet
Mathema@cs
Vol.
1,
No.
4,
2005.
114. 114
Filtering
Set
Membership:
Cuckoo
Filter
[1]
Key
Highlights
Add
and
remove
items
dynamically
For
false
posi@ve
rate
ε
<
3%,
more
space
efficient
than
Bloom
filter
Higher
performance
than
Bloom
filter
for
many
real
workloads
Asympto@cally
worse
performance
than
Bloom
filter
Min
fingerprint
size
α
log
(#
entries
in
table)
Overview
Stores
only
a
fingerprint
of
an
item
inserted
Original
key
and
value
bits
of
each
item
not
retrievable
Set
membership
query
for
item
x:
search
hash
table
for
fingerprint
of
x
[1]
Fan
et
al.,
Cuckoo
Filter:
Prac@cally
Beaer
Than
Bloom.
In
Proceedings
of
the
10th
ACM
Interna@onal
on
Conference
on
Emerging
Networking
Experiments
and
Technologies,
2014.
115. 115
Filtering
Set
Membership
[1]
R.
Pagh
and
F.
Rodler.
Cuckoo
hashing.
Journal
of
Algorithms,
51(2):122-‐144,
2004.
[2]
Illustra@on
borrowed
from
“Fan
et
al.,
Cuckoo
Filter:
Prac@cally
Beaer
Than
Bloom.
In
Proceedings
of
the
10th
ACM
Interna@onal
on
Conference
on
Emerging
Networking
Experiments
and
Technologies,
2014.”
[2]
Illustra@on
of
Cuckoo
hashing
[2]
Cuckoo Hashing [1]
High
space
occupancy
Prac@cal
implementa@ons:
mul@ple
items/bucket
Example
uses:
Soqware-‐based
Ethernet
switches
Cuckoo Filter [2]
Uses
a
mul@-‐way
associa@ve
Cuckoo
hash
table
Employs
par@al-‐key
cuckoo
hashing
Store
fingerprint
of
an
item
Relocate
exis@ng
fingerprints
to
their
alterna@ve
loca@ons
[2]
116. Dele@on
Item
must
have
been
previously
inserted
116
Filtering
Set
Membership
Cuckoo Filter
Par@al-‐key
cuckoo
hashing
Fingerprint
hashing
ensures
uniform
distribu@on
of
items
in
the
table
Length
of
fingerprint
<<
Size
of
h1
or
h2
Possible
to
have
mul@ple
entries
of
a
fingerprint
in
a
bucket
Alternate
bucket
Significantly
shorter
than
h1
and
h2
118. 118
Cardinality
Dis@nct
Elements
Database
systems/Search
engines
#
dis@nct
queries
Network
monitoring
applica@ons
Natural
language
processing
#
dis@nct
mo@fs
in
a
DNA
sequence
#
dis@nct
elements
of
RFID/sensor
networks
119. 119
Previous
work
Probabilis@c
coun@ng
[Flajolet
and
Mar@n,
1985]
LogLog
coun@ng
[Durand
and
Flajolet,
2003]
HyperLogLog
[Flajolet
et
al.,
2007]
Sliding
HyperLogLog
[Chabchoub
and
Hebrail,
2010]
HyperLogLog
in
Prac@ce
[Heule
et
al.,
2013]
Self-‐Organizing
Bitmap
[Chen
and
Cao,
2009]
Discrete
Max-‐Count
[Ting,
2014]
Sequence
of
sketches
forms
a
Markov
chain
when
h
is
a
strong
universal
hash
Es@mate
cardinality
using
a
mar@ngale
Cardinality
121. 121
Hyperloglog
Apply
hash
func@on
h
to
every
element
in
a
mul@set
Cardinality
of
mul@set
is
2max(ϱ)
where
0ϱ-‐11
is
the
bit
paaern
observed
at
the
beginning
of
a
hash
value
Above
suffers
with
high
variance
Employ
stochas@c
averaging
Par@@on
input
stream
into
m
sub-‐streams
Si
using
first
p
bits
of
hash
values
(m
=
2p)
where
Cardinality
122. 122
Use
of
64-‐bit
hash
func@on
Total
memory
requirement
5
*
2p
-‐>
6
*
2p,
where
p
is
the
precision
Empirical
bias
correc@on
Uses
empirically
determined
data
for
cardinali@es
smaller
than
5m
and
uses
the
unmodified
raw
es@mate
otherwise
Sparse
representa@on
For
n≪m,
store
an
integer
obtained
by
concatena@ng
the
bit
paaerns
for
idx
and
ϱ(w)
Use
variable
length
encoding
for
integers
that
uses
variable
number
of
bytes
to
represent
integers
Use
difference
encoding
-‐
store
the
difference
between
successive
elements
Other
op@miza@ons
[1,
2]
Hypeloglog Optimizations
[1]
hap://druid.io/blog/2014/02/18/hyperloglog-‐op@miza@ons-‐for-‐real-‐world-‐systems.html
[2]
hap://an@rez.com/news/75
Cardinality
123. 123
Self-‐Learning
Bitmap
(S-‐bitmap)
[1]
Achieve
constant
rela@ve
es@ma@on
errors
for
unknown
cardinali@es
in
a
wide
range,
say
from
10s
to
>106
Bitmap
obtained
via
adap@ve
sampling
process
Bits
corresponding
to
the
sampled
items
are
set
to
1
Sampling
rates
are
learned
from
#
dis@nct
items
already
passed
and
reduced
sequen@ally
as
more
bits
are
set
to
1
For
given
input
parameters
Nmax
and
es@ma@on
precision
ε,
size
of
bit
mask
For
r
=
1
-‐2ε2(1+ε2)-‐1
and
sampling
probability
pk
=
m
(m+1-‐k)-‐1(1+ε2)rk,
where
k
∈
[1,m]
Rela@ve
error
≣
ε
[1]
Chen
et
al.
“Dis@nct
coun@ng
with
a
self-‐learning
bitmap”.
Journal
of
the
American
Sta@s@cal
Associa@on,
106(495):879–890,
2011.
Cardinality
124. 124
Quantiles
Quan@les,
Histograms
Large
set
of
real-‐world
applica@ons
Database
applica@ons
Sensor
networks
Opera@ons
Proper@es
Provide
tunable
and
explicit
guarantees
on
the
precision
of
approxima@on
Single
pass
Early
work
[Greenwald
and
Khanna,
2001]
-‐
worst
case
space
requirement
[Arasu
and
Manku,
2004]
-‐
sliding
window
based
model,
worst
case
space
requirement
125. 125
q-‐digest
[1]
Groups
values
in
variable
size
buckets
of
almost
equal
weights
Unlike
a
tradi@onal
histogram,
buckets
can
overlap
Key
features
Detailed
informa@on
about
frequent
values
preserved
Less
frequent
values
lumped
into
larger
buckets
Using
message
of
size
m,
answer
within
an
error
of
Except
root
and
leaf
nodes,
a
node
v
∈
q-‐digest
iff
Max
signal
value
#
Elements
Compression
Factor
Complete
binary
tree
[1]
Shrivastava
et
al.,
Medians
and
Beyond:
New
Aggrega@on
Techniques
for
Sensor
Networks.
In
Proceedings
of
SenSys,
2004.
Quantiles
126. 126
q-‐digest
Building
a
q-‐digest
q-‐digests
can
be
constructed
in
a
distributed
fashion
Merge
q-‐digests
Quantiles
127. 127
t-‐digest
[1]
Approxima@on
of
rank-‐based
sta@s@cs
Compute
quan@le
q
with
an
accuracy
rela@ve
to
max(q,
1-‐q)
Compute
hybrid
sta@s@cs
such
as
trimmed
sta@s@cs
Key
features
Robust
with
respect
to
highly
skewed
distribu@ons
Independent
of
the
range
of
input
values
(unlike
q-‐digest)
Rela@ve
error
is
bounded
Non-‐equal
bin
sizes
Few
samples
contribute
to
the
bins
corresponding
to
the
extreme
quan@les
Merging
independent
t-‐digests
Reasonable
accuracy
[1]T.
Dunning
and
O.
Ertl,
“”Compu@ng
Extremely
Accurate
Quan@les
using
t-‐digests”,
2017.
haps://github.com/tdunning/t-‐digest/blob/master/docs/t-‐digest-‐paper/histo.pdf
Quantiles
128. 128
t-‐digest
Group
samples
into
sub-‐sequences
Smaller
sub-‐sequences
near
the
ends
Larger
sub-‐sequences
in
the
middle
Scaling
func@on
Mapping
k
is
monotonic
k(0)
=
1
and
k(1)
=
δ
k-‐size
of
each
subsequence
<
1
No@onal
Index
Compression
parameterQuan@le
Quantiles
129. 129
t-‐digest
Es@ma@ng
quan@le
via
interpola@on
Sub-‐sequences
contain
centroid
of
the
samples
Es@mate
the
boundaries
of
the
sub-‐sequences
Error
Scales
quadra@cally
in
#
samples
Small
#
samples
in
the
sub-‐sequences
near
q=0
and
q=1
improves
accuracy
Lower
accuracy
in
the
middle
of
the
distribu@on
Larger
sub-‐sequences
in
the
middle
Two
flavors
Progressive
merging
(buffering
based)
and
clustering
variant
Quantiles
130. 130
Frequent Elements
Applica@ons
Track
bandwidth
hogs
Determine
popular
tourist
des@na@ons
Itemset
mining
Entropy
es@ma@on
Compressed
sensing
Search
log
mining
Network
data
analysis
DBMS
op@miza@on
131. Count-‐min
Sketch
[1]
A
two-‐dimensional
array
counts
with
w
columns
and
d
rows
Each
entry
of
the
array
is
ini@ally
zero
d
hash
func@ons
are
chosen
uniformly
at
random
from
a
pairwise
independent
family
Update
For
a
new
element
i,
for
each
row
j
and
k
=
hj(i),
increment
the
kth
column
by
one
Point
query
where,
sketch
is
the
table
Parameters
131
),( δε
}1{}1{:,,1 wnhh d ……… →
!
!
"
#
#
$
=
ε
e
w
!
!
"
#
#
$
=
δ
1
lnd
[1]
Cormode,
Graham;
S.
Muthukrishnan
(2005).
"An
Improved
Data
Stream
Summary:
The
Count-‐Min
Sketch
and
its
Applica@ons".
J.
Algorithms
55:
29–38.
Frequent Elements
132. Variants
of
Count-‐min
Sketch
[1]
Count-‐Min
sketch
with
conserva@ve
update
(CU
sketch)
Update
an
item
with
frequency
c
Avoid
unnecessary
upda@ng
of
counter
values
=>
Reduce
over-‐es@ma@on
error
Prone
to
over-‐es@ma@on
error
on
low-‐frequency
items
Lossy
Conserva@ve
Update
(LCU)
-‐
SWS
Divide
stream
into
windows
At
window
boundaries,
∀
1
≤
i
≤
w,
1
≤
j
≤
d,
decrement
sketch[i,j]
if
0
<
sketch[i,j]
≤
132
[1]
Cormode,
G.
2009.
Encyclopedia
entry
on
’Count-‐MinSketch’.
In
Encyclopedia
of
Database
Systems.
Springer.,
511–516.
Frequent Elements
133. 133
OPEN SOURCE TWITTER
YAHOO!
HUAWEI
streamDM^
SGD
Learner
and
Perceptron
Naive
Bayes
CluStream
Hoeffding
Decision
Trees
Bagging
Stream
KM++
DATA
SKETCHES
*
Unique
Quan@le
Histogram
Sampling
Theta
Sketches
Tuple
Sketches
Most
Frequent
ALGEBIRD#
Filtering
Unique
Histogram
Most
Frequent
*
haps://datasketches.github.io/
#
haps://github.com/twiaer/algebird
^
hap://huawei-‐noah.github.io/streamDM/
**
haps://github.com/jiecchen/StreamLib
StreamLib**
134. 134
Anomaly Detection
[1]
A.
S.
Willsky,
“A
survey
of
design
methods
for
failure
detec@on
systems,”
Automa@ca,
vol.
12,
pp.
601–611,
1976.
Very
rich
-‐
over
150
yrs
-‐
history
Manufacturing
Sta@s@cs
Econometrics,
Financial
engineering
Signal
processing
Control
systems,
Autonomous
systems
-‐
fault
detec@on
[1]
Networking
Computa@onal
biology
(e.g.,
microarray
analysis)
Computer
vision
135. 135
Very
rich
-‐
over
150
yrs
-‐
history
Anomalies
are
contextual
in
nature
“DISCORDANT observations may be defined as those which
present the appearance of differing in respect of their law of
frequency from other observations with which they ale
combined. In the treatment of such observations there is great
diversity between authorities ; but this discordance of methods
may be reduced by the following reflection. Different methods
are adapted to different hypotheses about the cause of a
discordant observation; and different hypotheses are true, or
appropriate, according as the subject-matter, or the degree of
accuracy required, is different.”
F. Y. Edgeworth, “On Discordant Observations”, 1887.
Anomaly Detection
137. 137
Anomaly Detection
COMMON
APPROACHES
DOMAINS
STATS
MFG
OPS
NOT
VALID
in
real-‐life
Moving Averages
SMA, EWMA, PEWMA
Assumption
Normal Distribution
PARAMS
WIDTH
DECAY
Rule Based
µ ± σ
Stone
1868
Glaisher
1872
Edgeworth
1887
Stewart
1920
Irwin
1925
Jeffreys
1932
Rider
1933
138. 138
Anomaly Detection
ROBUST
MEASURES
MEDIAN MAD [1] MCD [2] MVEE [3,4]
Median Absolute
Deviation
Minimum Covariance
Determinant
Minimum Volume
Enclosing Ellipsoid
[1]P.
J.
Rousseeuw
and
C.
Croux,
“Alterna@ves
to
the
Median
Absolute
Devia@on”,
1993.
[2]
hap://onlinelibrary.wiley.com/wol1/doi/10.1002/wics.61/abstract
[3]
P.
J.
Rousseeuw
and
A.
M.
Leroy.,“Robust
Regression
and
Outlier
Detec@on”,
1987.
[4]
M.
J.Todda
and
E.
A.
Yıldırım
,
“On
Khachiyan's
algorithm
for
the
computa@on
of
minimum-‐volume
enclosing
ellipsoids”,
2007.
140. 140
Anomaly Detection
Challenges
Live
Data
Mul@-‐dimensional
Low
memory
footprint
Accuracy
vs.
Speed
trade-‐off
Encoding
the
context
Data
types
Video,
Audio,
Text
Data
veracity
Wearables
Smart
ci@es,
Connected
Home,
Internet
of
Things
144. 144
Lambda Architecture
Batch
Layer
Accurate
but
delayed
HDFS/Mapreduce
Fast
Layer
Inexact
but
fast
Storm/Ka}a
Query
Merge
Layer
Merge
results
from
batch
and
fast
layers
at
query
@me
145. 145
Lambda Architecture
Characteris@cs
During
Inges@on,
Data
is
cloned
into
two.
One
goes
to
the
batch
layer
Other
goes
to
the
fast
layer
Processing
done
at
two
layers
Expressed
as
Map-‐reduces
in
batch
layer
Expressed
as
topologies
in
the
speed
layer
146. 146
Lambda Architecture
Challenges
Inherently
Inefficient
Data
is
replicated
twice
Computa@on
is
replicated
twice
Opera@onally
Inefficient
Maintain
both
batch
and
streaming
systems
Tune
topologies
for
both
systems
147. 147
Kappa Architecture
Streaming
is
everything
Computa@on
is
expressed
in
a
topology
Computa@on
is
mostly
done
only
once
when
the
data
arrives
Data
moves
into
permanent
storage
149. 149
Kappa Architecture
Challenges
Data
Reprocessing
could
be
very
expensive
Code/Logic
Changes
Either
Data
needs
to
be
brought
back
from
Storage
to
the
bus
Or
Computa@on
needs
to
be
expressed
to
run
on
bulk-‐storage
Historic
Analysis
How
to
do
data
analy@cs
over
all
of
last
years
data
151. 151
Observations
Lambda
is
complicated
and
inefficient
Replica@on
of
Data
and
Computa@on
Mul@ple
Systems
to
operate
and
tune
Kappa
is
too
simplis@c
Data
reprocessing
too
expensive
Historical
analysis
not
possible
152. 152
Observations
Computa@on
across
batch/real@me
is
similar
Expressed
as
DAGS
Run
parallely
on
the
cluster
Intermediate
results
need
not
be
materialized
Func@onal/Declara@ve
APIs
Storage
is
the
key
Messaging/Storage
are
two
faces
of
the
same
coin
They
serve
the
same
data
153. 153
Real-Time Storage Requirements
Requirements
for
a
real-‐Hme
storage
plakorm
Be
able
to
write
and
read
streams
of
records
with
low
latency,
storage
durability
Data
storage
should
be
durable,
consistent
and
fault
tolerant
Enable
clients
to
stream
or
tail
ledgers
to
propagate
data
as
they’re
wriaen
Store
and
provide
access
to
both
historic
and
real-‐@me
data
154. 154
Apache BookKeeper - Stream Storage
A
storage
for
log
streams
Replicated,
durable
storage
of
log
streams
Provide
fast
tailing/streaming
facility
Op@mized
for
immutable
data
Low-‐latency
durability
Simple
repeatable
read
consistency
High
write
and
read
availability
155. 155
Record
Smallest
I/O
and
Address
Unit
A
sequence
of
invisible
records
A
record
is
sequence
of
bytes
The
smallest
I/O
unit,
as
well
as
the
unit
of
address
Each
record
contains
sequence
numbers
for
addressing
157. 157
Ledger
Finite
sequence
of
records
Ledger:
A
finite
sequence
of
records
that
gets
terminated
A
client
explicitly
close
it
A
writer
who
writes
records
into
it
has
crashed.
158. 158
Stream
Infinite
sequence
of
records
Stream:
An
unbounded,
infinite
sequence
of
records
Physically
comprised
of
mul@ple
ledgers
159. 159
Bookies
Stores
fragment
of
records
Bookie
-‐
A
storage
server
to
store
data
records
Ensemble:
A
group
of
bookies
storing
the
data
records
of
a
ledger
Individual
bookies
store
fragments
of
ledgers
161. 161
Tying it all together
A
typical
installaHon
of
Apache
BookKeeper
162. 162
BookKeeper - Use Cases
Combine
messaging
and
storage
Stream
Storage
combines
the
func@onality
of
streaming
and
storage
WAL
-‐
Write
Ahead
Log Message
Store Object
Store
SnapshotsStream
Processing
164. 164
BookKeeper in Production
Enterprise
Grade
Stream
Storage
4+
years
at
Twiaer
and
Yahoo,
2+
years
at
Salesforce
Mul@ple
use
cases
from
messaging
to
storage
Database
replica@on,
Message
store,
Stream
compu@ng
…
600+
bookies
in
one
single
cluster
Data
is
stored
from
days
to
a
year
Millions
of
log
streams
1
trillion
records/day,
17
PB/day
166. 166
Real Time is Messy and Unpredictable
Aggregation
Systems
Messaging
Systems
Result
Engine
HDFS
Queryable
Engines
167. 167
Streamlio - Unified Architecture
Interactive
Querying
Storm API
Trident/Apache
Beam
SQL
Application
Builder
Pulsar
API
BK/
HDFS
API
Kubernetes
Metadata
Management
Operational
Monitoring
Chargeback
Security
Authentication
Quota
Management
Rules
Engine
Kafka
API
168. 168
RESOURCES
Sketching
Algorithms
haps://www.cs.upc.edu/~gavalda/papers/portoschool.pdf
haps://mapr.com/blog/some-‐important-‐streaming-‐algorithms-‐you-‐should-‐know-‐about/
haps://gist.github.com/debasishg/8172796
Synopses
for
Massive
Data:
Samples,
Histograms,
Wavelets,
Sketches
Data
Streams:
Models
and
Algorithms
Charu
Aggarwal
hap://www.springer.com/us/book/9780387287591
Data
Streams:
Algorithms
and
ApplicaHons
Muthu
Muthukrishnan
hap://algo.research.googlepages.com/eight.ps
Graph
Streaming
Algorithms
A.
McGregor
G.
Cormode,
M.
Garofalakis
and
P.
J.
Haas
Sketching
as
a
Tool
for
Numerical
Linear
Algebra
D.
Woodruff
169. 169
Dhalion:
Self-‐Regula@ng
VLDB’17
Twiaer
Heron:
Towards
Extensible
ICDE’17
Dhalion:
Self-‐Regula@ng
VLDB’17
MillWheel:
VLDB’13
Readings
Stream
Processing
in
Heron
Stream
Processing
in
Heron
Streaming
Engines
Twiaer
Heron:
Stream
SIGMOD’15
Processing
at
scale
Fault-‐Tolerant
Stream
Processing
at
Internet
Scale
The
Dataflow
Model:
A
Prac@cal
VLDB’15
Approach
to
Balancing
Correctness,
Latency
and
Cost
in
Massive-‐Scale,
Unbounded
Out-‐of-‐Order
Data
Processing
Anomaly
Detec@on
in
Strata
San
Jose’17
Real-‐Time
Data
Streams
Using
Heron
170. 170
Readings
FOCS’00
Clustering Data Streams
SIGMOD’02
Querying and mining data streams:
You only get one look
SIAM Journal of Computing’09
Stream Order and Order Statistics: Quantile
Estimation in Random-Order Streams
PODS’02
Models and Issues in Data Stream Systems
SIGMOD’07
Statistical Analysis of Sketch Estimators
PODS’10
An optimal algorithm for the distinct
elements problem
171. 171
Readings
SODA’10
Coresets and Sketches for high dimensional
subspace approximation problems
SIGMOD’16
Time Adaptive Sketches (Ada-Sketches) for
Summarizing Data Streams
SOSR’17
Heavy-Hitter Detection Entirely in the Data
Plane
PODS’12
Graph Sketches: Sparsification, Spanners, and
Subgraphs
Arxiv’16
Coresets and Sketches
ACM Queue’17
Data Sketching: The approximate approach is
often faster and more efficient
172.
173. 173
GET
IN
TOUCH
C O N T A C T
U S
@arun_kejariwal
@kramasamy,
@sanjeerk
@sijieg,
@merlimat
@nlu90
karthik@stremlio.io
arun_kejariwal@acm.org
174.
175. E N J O Y T H E P R E S E N T A T I O N
The End