SlideShare a Scribd company logo
1 of 35
Download to read offline
Elastic Scaling of a High-Throughput
Content-Based Publish/Subscribe Engine
Raphaël Barazzutti, Thomas Heinze, André Martin,
Emanuel Onica, Pascal Felber, Christof Fetzer,
Zbigniew Jerzak, Marcelo Pasin and Etienne Rivière
University of Neuchâtel, Switzerland
SAP AG, Germany
TU Dresden, Germany
ICDCS 2014, Madrid, Spain
etienne.riviere@unine.ch
Sunday 6 July 14
ICDCS 2014 - Etienne Rivière
Context
• Publish/Subscribe as a service
• Running on private or public clouds
• Decoupled communication
• Composition of applications from multiple domains
• Content-based filtering
• Subscriptions filter on the content of publications
• Canonical example: stock quotes filtering
2
Publisher Subscriber
Pub/Sub as a Service
Adm. domain A Adm. domain B Adm. domain C
PublisherSubscriber
sp
notify(pub)
p
s
register(sub)
pregister(pub)
s name = "IBM"
price > 120$
volume > 10,000
p name = "IBM"
price = 131$
volume = 12,312
p name = "IBM"
price = 112$
volume = 9,892
open = 109$
close = 113 $
Sunday 6 July 14
ICDCS 2014 - Etienne Rivière
Pub/sub as a service: requirements
• Arbitrary representation of publications and subscriptions
• Not limited to attribute-based filtering
• Untrusted domains: encrypted filtering
• High-throughput and scalability
• Thousands subscriptions per second
• Thousands publications per second
• Thousands to millions notifications per second
• Availability, dependability and low delays
• StreamHub [DEBS 2013]
• Supports arbitrary filtering scheme, in particular
encrypted filtering (ASPE)
• Built on top of a Stream Processing Engine
3
Publisher Subscriber
P/Sa.a.S.
Trusted domain Trusted domain
Subscriber
?
p s
?
encryption encryption
?
Untrusted domain
?
p
Sunday 6 July 14
ICDCS 2014 - Etienne Rivière
Problem: resource provisioning
• Pub/sub service resource usage is unpredictable and varies over time
• Example: Frankfurt stock exchange, Nov 18, 2011
• Throughput/delay requirements vs budget require appropriate provisioning
• This talk: can we make a high-throughput pub/sub service elastic?
• Scale in and scale out based on actual requirements
• Maintain service continuity and minimize reconfiguration impact
4
0
200
400
600
800
1000
1200
09:00 12:00 15:00 18:00 21:00
Tickspersecond
Time (hours)
Sunday 6 July 14
ICDCS 2014 - Etienne Rivière
Outline
• Background
• StreamHub high-throughput content-based pub/sub engine
• StreamMine3G stream processing engine
• (elastic)e-StreamHub principles
• Components
• Slice migration
• Enforcing elasticity policies
• Evaluation
• Micro-benchmarks
• Trace-based using Frankfurt stock exchange workload
5
Sunday 6 July 14
ICDCS 2014 - Etienne Rivière
StreamHub: principles
• Tiered approach to pub/sub for cloud deployments
• Split pub/sub into three fundamental, consecutive operations
• Exploit massive data parallelism of each operation
• Supports arbitrary filtering mechanism
• Event flows do not depend on a filtering scheme characteristics
• This work: use computationally intensive encrypted filtering
• StreamHub engine = Stream Processing application
• Each of the 3 pub/sub operations mapped to an stream operator
• Operators supported by a Stream Processing Engine
6
Sunday 6 July 14
ICDCS 2014 - Etienne Rivière
StreamMine3G stream processing engine
• Distributed stream computation as a DAG of operators
• Each operator split in multiple slices
• Externalized state management, no state sharing between slices
• Support for unicast, anycast & broadcast primitives
7
slice 1
slice 2
slice n-1
slice n
...
Operator 1 Operator 2
Operator 3
?
Broadcast
Anycast
Unicast
slice 3
...
Operator 4
Each slice
supported
by multiple
cores
<key,value>
e
<k,v> (k%2=1)
e
Slice state
management
No shared
memory
e
e
Sunday 6 July 14
ICDCS 2014 - Etienne Rivière
StreamHub Engine [DEBS13]
8
Operator Access Point (AP) Matching (M) Exit Point (EP)
Function Subscriptions Partitioning Publications Filtering Publications Dispatching
State Stateless Stateful (persistent) Stateful (transient)
Role
➡ Decide where to store
subs
➡ Broadcast pubs to next
operator
➡ Store subs
➡ Filter incoming pubs,
create list of matching
subscribers ids
➡ Aggregate lists of
matching subscribers ids
➡ Prepare & dispatch
notifications
Subscriber
SUB
x > 3 &
y == 5
encrypts
SUB
C5F80
BA363
Publisher
PUB
x = 7
y = 10
encrypts
PUB
88F3B
2A09C
DCCP/ConnectionPoint
stores
sends
notifications of
matching encrypted publications
AP:1
AP operator M operator
p
s
Broadcast
(pubs)
Unicast
(subs)
AP:2
AP:3
AP:4
Unicast Multicast
AP:5
AP:6
Anycast
?
EP operator
M:1
M:2
M:3
M:4
M:5
M:6
EP:1
EP:2
EP:3
EP:4
EP:5
EP:6
DCCP/ConnectionPoint
e-StreamHub engine
ASPE
Public Cloud deployment
(same
DCCP)
Sunday 6 July 14
ICDCS 2014 - Etienne Rivière
e(elastic)-StreamHub principle
• Fixed number of slices for each operator
• Elastic scaling decisions based on experienced load
• Scale out: allocate new host(s), migrate slices
• Scale in: migrate slices, deallocate host(s)
• Redistribute slices upon load unbalance between hosts
9
EP:1
EP:2
EP:3
EP:4
EP:5
EP:6
AP:1
AP:2
AP:3
AP:4
AP:5
AP:6
M:1
M:2
M:3
M:4
M:5
M:6
Host 1
Host 1 Host 2 Host 3
EP:1
EP:3
EP:6
AP:1 M:1
M:2
EP:2
AP:5
AP:6
M:3
M:4
EP:4
EP:5
AP:3
AP:4
M:5
M:6
AP:2
scale
out
Host 1 Host 2
EP:1
EP:3
EP:6
AP:1 M:1
M:2
EP:2
AP:5
AP:6
M:3
M:4
AP:2
EP:4
EP:5AP:3
AP:4 M:5 M:6
scale
in
initial single host
placement
Sunday 6 July 14
ICDCS 2014 - Etienne Rivière
e-StreamHub: components
• StreamMine3G: support for slice migration (including state)
• Goal: minimal interruption of flow
• Manager: orchestrates migration and collect workload probes
• SM3G and manager state stored in ZooKeeper for dependability
• Elasticity enforcer: takes migration decisions based on observed load
• Based on elasticity policies
10
ZooKeeper
SM3G
Host 1
AP:1
M:1
EP:1
SM3G
Host 2
AP:2
M:2
AP:3 SM3G
Host 3
M:3
EP:2
EP:3
M:3
elasticity
policies
Elasticity
enforcer
Manager
aggregated
probes
migration
requests
collects probes,
orchestrates
migrations
configures
Sunday 6 July 14
ICDCS 2014 - Etienne Rivière
Slice migration
1. Input events duplicated to a buffer on dest host
2. Slice halted; state is copied from src to dest host
• Incoming events are still buffered
• States associated with vectors of sequence numbers, one for each of previous
operator slices
• Allows knowing which events were accounted in
3. Missed events replayed on state, slice resumed
➡ “Downtime” of the slice is essentially equivalent to state copying time
11
OP:1
SM3G runtime
Host 1
OP:1
SM3G runtime
Host 1
(inactive)
SM3G runtime
Host 2
buffer
(inactive)
SM3G runtime
Host 1
(inactive)
SM3G runtime
Host 2
buffer
(inactive)
SM3G runtime
Host 1
SM3G runtime
Host 2
OP:1
SM3G runtime
Host 2
OP:1
events state
t t
t t+n
buffer
n
1 2 3 4 5
Sunday 6 July 14
ICDCS 2014 - Etienne Rivière
Elasticity policy
• CPU utilization probes collected periodically
• Secondary metrics: network usage and memory usage
• Global rules: criteria on the average load
• Trigger scale-in and scale-out operations
• Define a target average CPU utilization
• Local rules: criteria on the load of a single host
• Grace period between migrations (30s)
• Minimize cost of migrations (sum of migrated states)
12
Rules CPU Effect
Global
average > 70% scale out
Global
average < 30% scale in
Local < 20% or > 80% redistribute slices
Target average CPU = 50%average CPU = 50%
Sunday 6 July 14
ICDCS 2014 - Etienne Rivière
Elasticity policy enforcement
• Three-steps resolution when global or local rule is violated
1. Decide on the set of slices to migrate
• For each host, identify a set of slices with:
sum(CPU utilization) ≥ abs(current average utilization - target utilization)
• Subset sum problem: dynamic programming, pseudo-polynomial complexity
• Returns multiple solutions: select the one with less state transfer involved
2. Scaling out: add enough hosts for avg(load) to be ≤ target (50%)
Scaling in: mark enough least-loaded hosts for avg(load) to be ≥ target (50%)
3. Decide on new placement
• First-fit bin packing algorithm
• Start with current state without selected slices, greedily assign by decreasing weight
13
Host 1 Host 2
12%AP:1
11%AP:2
25%M:1
25%M:2
26%M:3
23%M:4
13%EP:1
12%EP:2
Host 1 Host 2
25%M:1
25%M:2
26%M:3
23%M:4
Host 3
12%AP:1
11%AP:2
13%EP:1
12%EP:2
before scale out after scale out
Sunday 6 July 14
ICDCS 2014 - Etienne Rivière
Evaluation
14
Sunday 6 July 14
ICDCS 2014 - Etienne Rivière
Experimental setup
• Prototype in C, C++ and Java
• 22 hosts, each with 8 cores and 8 GB
• 1 Gbps switched network
• 4 hosts with G and S: generators and sinks
• 3 hosts for infrastructure
• 1 to 15 hosts used for e-StreamHub
• 8 to 120 cores
• Number of slices is fixed
• micro-benchmarks: 4 AP, 8 M, 4 EP
• trace-based evaluation: 8 AP, 16 M, 8 EP
• Encrypted pubs and subs (ASPE)
• 100,000 encrypted subs
• Each sub is ~1.2 KB
• 1% matching ratio: 1,000 notifications / pub
15
EP:1
EP:2
EP:3
EP:4
EP:5
EP:6
AP:1
AP:2
AP:3
AP:4
AP:5
AP:6
M:1
M:2
M:3
M:4
M:5
M:6
Host 1/15
M:7
M:8
M:9
M:10
M:11
M:12
AP:7
AP:8
EP:7
EP:8
M:13
M:14
M:15
M:16
Host E
ZooKeeper
Host F
Manager
Host G
Elasticity
enforcer
Host D
G:4 S:4
Host C
G:3 S:3
Host B
G:2 S:2
Host A
G:1 S:1
(unused)
Host 2/15
(unused)
Host 15/15
Sunday 6 July 14
ICDCS 2014 - Etienne Rivière
Static e-StreamHub: Performance
16
Delays(ms)
Number of hosts and attribution to operators (AP|M|EP)
Distribution of delays under half of max. throughput
Max 75
th
50
th
25
th Min
0
100
200
300
400
500
2:0.5|1|0.5
4:
1|2|1
6:1.5|3|1.5
8:
2|4|2
10:2.5|5|2.5
12:
3|6|3
Publications/s
Maximum throughput
0
100
200
300
400
500
Per second:
42.2 million matchings
422,000 notifications
Median delay
12 hosts:
3 hosts for AP slices
6 hosts for M slices
3 hosts for EP slices
Sunday 6 July 14
ICDCS 2014 - Etienne Rivière
Slice migration: times
• Individual migration times, averaged over 25 migrations
• Under constant flow of 100 publications / second
• Worst-case scenario with “large” slices states
• 4 AP, 8 M and 4 EP with 100K and 400K subscriptions
• Migration time minimal for AP and EP, dominated by state size for M
17
AP
M
(12.500 subscriptions,
total of 100,000)
M
(50.000 subscriptions,
total of 400,000)
EP
average 232 ms 1.497 s 2.533 s 275 ms
standard
deviation
31 ms 354 ms 1.557 s 52 ms
Sunday 6 July 14
ICDCS 2014 - Etienne Rivière
Impact of migrations on delay
18
0.0
0.5
1.0
1.5
2.0
2.5
100 120 140 160 180 200
delay(seconds)
time (seconds)
migrations: AP (1) AP (2) M (1) M (2) EP
min average max
Sunday 6 July 14
ICDCS 2014 - Etienne Rivière
Steady increase/decrease of load
19
• Increase/decrease of
publication rate
• 0 to 350 publications
per second
• 0 to 35 millions
matching operations
per second
• CPU load
• average, max and min
over 30 seconds
Sunday 6 July 14
ICDCS 2014 - Etienne Rivière
Steady increase/decrease of load
19
• Increase/decrease of
publication rate
• 0 to 350 publications
per second
• 0 to 35 millions
matching operations
per second
• CPU load
• average, max and min
over 30 seconds
publication rate (publications / second)
0
50
100
150
200
250
300
350
400
hosts
0
2
4
6
8
10
12
14
16
18
Time (seconds)
host CPU load: min avg max
0
20
40
60
80
100
0 300 600 900 1200 1500 1800
Sunday 6 July 14
ICDCS 2014 - Etienne Rivière
Steady increase/decrease of load
19
• Increase/decrease of
publication rate
• 0 to 350 publications
per second
• 0 to 35 millions
matching operations
per second
• CPU load
• average, max and min
over 30 seconds
publication rate (publications / second)
0
50
100
150
200
250
300
350
400
hosts
0
2
4
6
8
10
12
14
16
18
Time (seconds)
average delays (seconds)
0
0.5
1
1.5
2
2.5
3
3.5
4
0 300 600 900 1200 1500 1800
• Notification delays
• average over 30
seconds
Sunday 6 July 14
ICDCS 2014 - Etienne Rivière
Frankfurt stock exchange
20
publication rate (publications / second)
0
50
100
150
200
hosts
0
2
4
6
8
10
Time (seconds)
host CPU load: min avg max
0
20
40
60
80
100
0 500 1000 1500 2000 2500
• Replay of the Frankfurt
stock exchange
workload
• Sped up 10 times: about
40 minutes shown
• 100,000 encrypted
subscriptions
• CPU load
• average, max and min
over 30 seconds
Sunday 6 July 14
ICDCS 2014 - Etienne Rivière
Frankfurt stock exchange
20
publication rate (publications / second)
0
50
100
150
200
hosts
0
2
4
6
8
10
Time (seconds)
average delays (seconds)
0
0.5
1
1.5
2
0 500 1000 1500 2000 2500
• Replay of the Frankfurt
stock exchange
workload
• Sped up 10 times: about
40 minutes shown
• 100,000 encrypted
subscriptions
• CPU load
• average, max and min
over 30 seconds
• Notification delays
• average over 30 seconds
Sunday 6 July 14
ICDCS 2014 - Etienne Rivière
Conclusion
• Elastic scaling of a high-throughput content-based pub/sub engine
• Implemented at the level of the support stream processing engine
• Applicability to other long-lived stream processing applications
• Support of live operator slice migration
• Redistribution of slices according to monitored workload
• Allows deploying pub/sub as a service on a public cloud without
the need to provision for worst-case scenarios
• Evaluation using Frankfurt stock exchange traces
• Perspectives
• Monitor network flows, minimize inter-host communications in
migration plans
• Leverage active replication, used for dependability, to minimize impact
on delay while migrating (this requires deterministic execution)
21
Sunday 6 July 14
Elastic Scaling of a High-Throughput
Content-Based Publish/Subscribe Engine
Questions?
This work was supported by the EU FP7 program (SRT-15 project)
http://www.srt-15.eu
ICDCS 2014, Madrid, Spain
etienne.riviere@unine.ch
Sunday 6 July 14
ICDCS 2014 - Etienne Rivière
Additional slides
23
Sunday 6 July 14
ICDCS 2014 - Etienne Rivière
Independence from Filtering Scheme
• Attribute-based filtering widely studied
• Represent content using discrete attributes
• Subscriptions = conjunctions of discrete predicates on attributes values
• Broker overlays typically rely on containment & aggregation capabilities of
attribute-based filtering
• Alternative filtering schemes
• Encrypted filtering
• ASPE (Choi et al., DEXA10)
• Prefiltering (DEBS 2012)
➡ No guaranteed support for
containment or aggregation
• The architecture of a pub/sub engine
should be independent from the
filtering scheme(s) it supports
24
Sunday 6 July 14
ICDCS 2014 - Etienne Rivière
Independence from Filtering Scheme
• Attribute-based filtering widely studied
• Represent content using discrete attributes
• Subscriptions = conjunctions of discrete predicates on attributes values
• Broker overlays typically rely on containment & aggregation capabilities of
attribute-based filtering
• Alternative filtering schemes
• Encrypted filtering
• ASPE (Choi et al., DEXA10)
• Prefiltering (DEBS 2012)
➡ No guaranteed support for
containment or aggregation
• The architecture of a pub/sub engine
should be independent from the
filtering scheme(s) it supports
24
Publisher Subscriber
P/Sa.a.S.
Trusted domain Trusted domain
Subscriber
p
s
Trusted (?) domain
p
s
Sunday 6 July 14
ICDCS 2014 - Etienne Rivière
Independence from Filtering Scheme
• Attribute-based filtering widely studied
• Represent content using discrete attributes
• Subscriptions = conjunctions of discrete predicates on attributes values
• Broker overlays typically rely on containment & aggregation capabilities of
attribute-based filtering
• Alternative filtering schemes
• Encrypted filtering
• ASPE (Choi et al., DEXA10)
• Prefiltering (DEBS 2012)
➡ No guaranteed support for
containment or aggregation
• The architecture of a pub/sub engine
should be independent from the
filtering scheme(s) it supports
24
Publisher Subscriber
P/Sa.a.S.
Trusted domain Trusted domain
Subscriber
p
s
Trusted (?) domain
p
s
Sunday 6 July 14
ICDCS 2014 - Etienne Rivière
Independence from Filtering Scheme
• Attribute-based filtering widely studied
• Represent content using discrete attributes
• Subscriptions = conjunctions of discrete predicates on attributes values
• Broker overlays typically rely on containment & aggregation capabilities of
attribute-based filtering
• Alternative filtering schemes
• Encrypted filtering
• ASPE (Choi et al., DEXA10)
• Prefiltering (DEBS 2012)
➡ No guaranteed support for
containment or aggregation
• The architecture of a pub/sub engine
should be independent from the
filtering scheme(s) it supports
24
Publisher Subscriber
P/Sa.a.S.
Trusted domain Trusted domain
Subscriber
?
p s
?
encryption encryption
?
Untrusted domain
?
p
Sunday 6 July 14
ICDCS 2014 - Etienne Rivière
Events Paths and Support Libraries
25
libcluster
Subscriber
s
TCP
DCCP AP
Anycast
M
Determine key to
matching operator
using clustering
Store subscription
in operator
slice state
libfilter
Optional
lib Library
State
O Operator slice
StreamHub client
C Static component
M
Unicast
Sunday 6 July 14
ICDCS 2014 - Etienne Rivière
Events Paths and Support Libraries
25
libcluster
Publisher
Subscriber
s
TCP
DCCP AP
Anycast
M
p
TCP
DCCP AP
Anycast
M
Determine key to
matching operator
using clustering
Store subscription
in operator
slice state
libfilter
Filter publication,
return matching
subscribers list
Broadcast
M
EP
EP
Unicast
DCCP
DCCP
Multicast
DCCP
Subscriber
p
TCP
p
p
...
Optional
p,{s}
Publication +
subscribers
lib Library
State
O Operator slice
StreamHub client
C Static component
M
Unicast
Sunday 6 July 14
ICDCS 2014 - Etienne Rivière
StreamHub interface
• Data Conversion and Connection Point(s) (DCCP)
• Stateless component(s) running on server(s) with direct WAN connectivity
• Maintain persistent TCP connections to/from publishers and subscribers
• Convert between external/internal format
26
Publisher
Subscriber
StreamHub engine
p
s
Data converter &
connection point
p
DCCP
p
sPersistent TCP
connections
GPB
format
Internal
format
p
Sunday 6 July 14
ICDCS 2014 - Etienne Rivière
Use an Overlay of Brokers?
• Brokers organized in an overlay (mesh, tree) ;
each broker performs all pub/sub operations
• Storing subscriptions; matching, forwarding, notifying of publications
• Complex maintenance of routing tables between brokers
• Assumptions on the filtering scheme for inter-broker communication
and publication forwarding: containment & aggregation
• Scalability/throughput depend on workload, notification delays may vary
27
Publisher Subscriber
s
s
s
s
Subscriber Subscriber Subscriber
s s
Publisher Publisher
Broker
Broker
Broker
s
RT
updates
RT
updates
Sunday 6 July 14
ICDCS 2014 - Etienne Rivière
Use an Overlay of Brokers?
• Brokers organized in an overlay (mesh, tree) ;
each broker performs all pub/sub operations
• Storing subscriptions; matching, forwarding, notifying of publications
• Complex maintenance of routing tables between brokers
• Assumptions on the filtering scheme for inter-broker communication
and publication forwarding: containment & aggregation
• Scalability/throughput depend on workload, notification delays may vary
27
Publisher Subscriber
p
s
s
s
Subscriber Subscriber SubscriberPublisher Publisher
p
p
Broker
Broker
Broker
p p
Sunday 6 July 14

More Related Content

What's hot

Cassandra and Storm at Health Market Sceince
Cassandra and Storm at Health Market SceinceCassandra and Storm at Health Market Sceince
Cassandra and Storm at Health Market Sceince
P. Taylor Goetz
 

What's hot (20)

Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014
 
Future of Apache Storm
Future of Apache StormFuture of Apache Storm
Future of Apache Storm
 
Yahoo compares Storm and Spark
Yahoo compares Storm and SparkYahoo compares Storm and Spark
Yahoo compares Storm and Spark
 
Drill at the Chug 9-19-12
Drill at the Chug 9-19-12Drill at the Chug 9-19-12
Drill at the Chug 9-19-12
 
Slide #1:Introduction to Apache Storm
Slide #1:Introduction to Apache StormSlide #1:Introduction to Apache Storm
Slide #1:Introduction to Apache Storm
 
Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...
Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...
Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...
 
Real Time Data Streaming using Kafka & Storm
Real Time Data Streaming using Kafka & StormReal Time Data Streaming using Kafka & Storm
Real Time Data Streaming using Kafka & Storm
 
Storm and Cassandra
Storm and Cassandra Storm and Cassandra
Storm and Cassandra
 
Design Patterns for working with Fast Data
Design Patterns for working with Fast DataDesign Patterns for working with Fast Data
Design Patterns for working with Fast Data
 
Time Series Data in a Time Series World
Time Series Data in a Time Series WorldTime Series Data in a Time Series World
Time Series Data in a Time Series World
 
Introduction to Apache Storm
Introduction to Apache StormIntroduction to Apache Storm
Introduction to Apache Storm
 
Resource Aware Scheduling in Apache Storm
Resource Aware Scheduling in Apache StormResource Aware Scheduling in Apache Storm
Resource Aware Scheduling in Apache Storm
 
Graph Everything
Graph EverythingGraph Everything
Graph Everything
 
Storm-on-YARN: Convergence of Low-Latency and Big-Data
Storm-on-YARN: Convergence of Low-Latency and Big-DataStorm-on-YARN: Convergence of Low-Latency and Big-Data
Storm-on-YARN: Convergence of Low-Latency and Big-Data
 
Spark+flume seattle
Spark+flume seattleSpark+flume seattle
Spark+flume seattle
 
Improved Reliable Streaming Processing: Apache Storm as example
Improved Reliable Streaming Processing: Apache Storm as exampleImproved Reliable Streaming Processing: Apache Storm as example
Improved Reliable Streaming Processing: Apache Storm as example
 
Cassandra and Storm at Health Market Sceince
Cassandra and Storm at Health Market SceinceCassandra and Storm at Health Market Sceince
Cassandra and Storm at Health Market Sceince
 
OPTIMIZING THE TICK STACK
OPTIMIZING THE TICK STACKOPTIMIZING THE TICK STACK
OPTIMIZING THE TICK STACK
 
Kubernetes Observability with Prometheus by Example
Kubernetes Observability with Prometheus by ExampleKubernetes Observability with Prometheus by Example
Kubernetes Observability with Prometheus by Example
 
Opensample: A Low-latency, Sampling-based Measurement Platform for Software D...
Opensample: A Low-latency, Sampling-based Measurement Platform for Software D...Opensample: A Low-latency, Sampling-based Measurement Platform for Software D...
Opensample: A Low-latency, Sampling-based Measurement Platform for Software D...
 

Viewers also liked

Cambodian Dinner Night 15/11/08
Cambodian Dinner Night 15/11/08Cambodian Dinner Night 15/11/08
Cambodian Dinner Night 15/11/08
camkh12
 
Improve Your Health
Improve Your HealthImprove Your Health
Improve Your Health
henryvoc
 
Чести проблеми в сигурността на уеб проектите
Чести проблеми в сигурността на уеб проектитеЧести проблеми в сигурността на уеб проектите
Чести проблеми в сигурността на уеб проектите
Veselin Nikolov
 
Mukul's Wedding Invitation
Mukul's Wedding InvitationMukul's Wedding Invitation
Mukul's Wedding Invitation
Mukulbadonia
 
WordPress Security @ Vienna WordPress + Drupal Meetup
WordPress Security @ Vienna WordPress + Drupal MeetupWordPress Security @ Vienna WordPress + Drupal Meetup
WordPress Security @ Vienna WordPress + Drupal Meetup
Veselin Nikolov
 
Cutting tool
Cutting toolCutting tool
Cutting tool
ShdwClaw
 
Prefix Forwarding for Publish/Subscribe
Prefix Forwarding for Publish/SubscribePrefix Forwarding for Publish/Subscribe
Prefix Forwarding for Publish/Subscribe
Zbigniew Jerzak
 

Viewers also liked (20)

Accelerating Your Web Application with NGINX
Accelerating Your Web Application with NGINXAccelerating Your Web Application with NGINX
Accelerating Your Web Application with NGINX
 
Cositutti Chi Siamo
Cositutti   Chi SiamoCositutti   Chi Siamo
Cositutti Chi Siamo
 
PowerPoint Training - The power of visuals
PowerPoint Training - The power of visualsPowerPoint Training - The power of visuals
PowerPoint Training - The power of visuals
 
IPR Enforcement in India through Criminal Measures - By Vijay Pal Dalmia
IPR Enforcement in India through Criminal Measures - By Vijay Pal DalmiaIPR Enforcement in India through Criminal Measures - By Vijay Pal Dalmia
IPR Enforcement in India through Criminal Measures - By Vijay Pal Dalmia
 
The slumber buddy
The slumber buddyThe slumber buddy
The slumber buddy
 
Cambodian Dinner Night 15/11/08
Cambodian Dinner Night 15/11/08Cambodian Dinner Night 15/11/08
Cambodian Dinner Night 15/11/08
 
Integration
IntegrationIntegration
Integration
 
Imp Act Presentation
Imp Act PresentationImp Act Presentation
Imp Act Presentation
 
Community Engagement and Capacity Building Cultural Planning
Community Engagementand Capacity Building Cultural PlanningCommunity Engagementand Capacity Building Cultural Planning
Community Engagement and Capacity Building Cultural Planning
 
Improve Your Health
Improve Your HealthImprove Your Health
Improve Your Health
 
Чести проблеми в сигурността на уеб проектите
Чести проблеми в сигурността на уеб проектитеЧести проблеми в сигурността на уеб проектите
Чести проблеми в сигурността на уеб проектите
 
20 начина да си убиеш блога, без да се усетиш
20 начина да си убиеш блога, без да се усетиш20 начина да си убиеш блога, без да се усетиш
20 начина да си убиеш блога, без да се усетиш
 
Mukul's Wedding Invitation
Mukul's Wedding InvitationMukul's Wedding Invitation
Mukul's Wedding Invitation
 
WordPress Security @ Vienna WordPress + Drupal Meetup
WordPress Security @ Vienna WordPress + Drupal MeetupWordPress Security @ Vienna WordPress + Drupal Meetup
WordPress Security @ Vienna WordPress + Drupal Meetup
 
Guide for de mystifying law of trade mark litigation in India
Guide for de mystifying law of trade mark litigation in IndiaGuide for de mystifying law of trade mark litigation in India
Guide for de mystifying law of trade mark litigation in India
 
Cultural Asset Mapping in Niagara
Cultural Asset Mapping in NiagaraCultural Asset Mapping in Niagara
Cultural Asset Mapping in Niagara
 
Jim Crotty Photography Of Summer 2006
Jim Crotty Photography Of Summer 2006Jim Crotty Photography Of Summer 2006
Jim Crotty Photography Of Summer 2006
 
Cutting tool
Cutting toolCutting tool
Cutting tool
 
Prefix Forwarding for Publish/Subscribe
Prefix Forwarding for Publish/SubscribePrefix Forwarding for Publish/Subscribe
Prefix Forwarding for Publish/Subscribe
 
Go &amp; microservices
Go &amp; microservicesGo &amp; microservices
Go &amp; microservices
 

Similar to Elastic Scaling of a High-Throughput Content-Based Publish/Subscribe Engine

ddsf-student-presentation_756205.pptx
ddsf-student-presentation_756205.pptxddsf-student-presentation_756205.pptx
ddsf-student-presentation_756205.pptx
ssuser498be2
 
Introduction to Apache Apex and writing a big data streaming application
Introduction to Apache Apex and writing a big data streaming application  Introduction to Apache Apex and writing a big data streaming application
Introduction to Apache Apex and writing a big data streaming application
Apache Apex
 

Similar to Elastic Scaling of a High-Throughput Content-Based Publish/Subscribe Engine (20)

Architectual Comparison of Apache Apex and Spark Streaming
Architectual Comparison of Apache Apex and Spark StreamingArchitectual Comparison of Apache Apex and Spark Streaming
Architectual Comparison of Apache Apex and Spark Streaming
 
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache Apex
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache ApexHadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache Apex
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache Apex
 
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and TransformIntro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
 
Next Gen Big Data Analytics with Apache Apex
Next Gen Big Data Analytics with Apache Apex Next Gen Big Data Analytics with Apache Apex
Next Gen Big Data Analytics with Apache Apex
 
ddsf-student-presentation_756205.pptx
ddsf-student-presentation_756205.pptxddsf-student-presentation_756205.pptx
ddsf-student-presentation_756205.pptx
 
Apache Big Data 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data 2016: Next Gen Big Data Analytics with Apache ApexApache Big Data 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data 2016: Next Gen Big Data Analytics with Apache Apex
 
Apache Storm vs. Spark Streaming – two Stream Processing Platforms compared
Apache Storm vs. Spark Streaming – two Stream Processing Platforms comparedApache Storm vs. Spark Streaming – two Stream Processing Platforms compared
Apache Storm vs. Spark Streaming – two Stream Processing Platforms compared
 
Exascale Deep Learning for Climate Analytics
Exascale Deep Learning for Climate AnalyticsExascale Deep Learning for Climate Analytics
Exascale Deep Learning for Climate Analytics
 
When a FILTER makes the di fference in continuously answering SPARQL queries ...
When a FILTER makes the difference in continuously answering SPARQL queries ...When a FILTER makes the difference in continuously answering SPARQL queries ...
When a FILTER makes the di fference in continuously answering SPARQL queries ...
 
Cassandra in xPatterns
Cassandra in xPatternsCassandra in xPatterns
Cassandra in xPatterns
 
Introduction to Apache Apex and writing a big data streaming application
Introduction to Apache Apex and writing a big data streaming application  Introduction to Apache Apex and writing a big data streaming application
Introduction to Apache Apex and writing a big data streaming application
 
Intro to Apache Apex (next gen Hadoop) & comparison to Spark Streaming
Intro to Apache Apex (next gen Hadoop) & comparison to Spark StreamingIntro to Apache Apex (next gen Hadoop) & comparison to Spark Streaming
Intro to Apache Apex (next gen Hadoop) & comparison to Spark Streaming
 
Building Data Streaming Platforms using OpenShift and Kafka
Building Data Streaming Platforms using OpenShift and KafkaBuilding Data Streaming Platforms using OpenShift and Kafka
Building Data Streaming Platforms using OpenShift and Kafka
 
Accelerating Networked Applications with Flexible Packet Processing
Accelerating Networked Applications with Flexible Packet ProcessingAccelerating Networked Applications with Flexible Packet Processing
Accelerating Networked Applications with Flexible Packet Processing
 
Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex
 
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
 
IoT Ingestion & Analytics using Apache Apex - A Native Hadoop Platform
 IoT Ingestion & Analytics using Apache Apex - A Native Hadoop Platform IoT Ingestion & Analytics using Apache Apex - A Native Hadoop Platform
IoT Ingestion & Analytics using Apache Apex - A Native Hadoop Platform
 
HPC Controls Future
HPC Controls FutureHPC Controls Future
HPC Controls Future
 
Stream data from Apache Kafka for processing with Apache Apex
Stream data from Apache Kafka for processing with Apache ApexStream data from Apache Kafka for processing with Apache Apex
Stream data from Apache Kafka for processing with Apache Apex
 
Aerospike Go Language Client
Aerospike Go Language ClientAerospike Go Language Client
Aerospike Go Language Client
 

More from Zbigniew Jerzak

Optimization of Continuous Queries in Federated Database and Stream Processin...
Optimization of Continuous Queries in Federated Database and Stream Processin...Optimization of Continuous Queries in Federated Database and Stream Processin...
Optimization of Continuous Queries in Federated Database and Stream Processin...
Zbigniew Jerzak
 
Visualization-Driven Data Aggregation
Visualization-Driven Data AggregationVisualization-Driven Data Aggregation
Visualization-Driven Data Aggregation
Zbigniew Jerzak
 
ThesisXSiena: The Content-Based Publish/Subscribe System
ThesisXSiena: The Content-Based Publish/Subscribe SystemThesisXSiena: The Content-Based Publish/Subscribe System
ThesisXSiena: The Content-Based Publish/Subscribe System
Zbigniew Jerzak
 
Highly Available Publish/Subscribe
Highly Available Publish/SubscribeHighly Available Publish/Subscribe
Highly Available Publish/Subscribe
Zbigniew Jerzak
 

More from Zbigniew Jerzak (14)

Adaptive Replication for Elastic Data Stream Processing
Adaptive Replication for Elastic Data Stream ProcessingAdaptive Replication for Elastic Data Stream Processing
Adaptive Replication for Elastic Data Stream Processing
 
Optimization of Continuous Queries in Federated Database and Stream Processin...
Optimization of Continuous Queries in Federated Database and Stream Processin...Optimization of Continuous Queries in Federated Database and Stream Processin...
Optimization of Continuous Queries in Federated Database and Stream Processin...
 
Visualization-Driven Data Aggregation
Visualization-Driven Data AggregationVisualization-Driven Data Aggregation
Visualization-Driven Data Aggregation
 
Latency-aware Elastic Scaling for Distributed Data Stream Processing Systems
Latency-aware Elastic Scaling for Distributed Data Stream Processing SystemsLatency-aware Elastic Scaling for Distributed Data Stream Processing Systems
Latency-aware Elastic Scaling for Distributed Data Stream Processing Systems
 
Auto-scaling Techniques for Elastic Data Stream Processing
Auto-scaling Techniques for Elastic Data Stream ProcessingAuto-scaling Techniques for Elastic Data Stream Processing
Auto-scaling Techniques for Elastic Data Stream Processing
 
Cloud-based Data Stream Processing
Cloud-based Data Stream ProcessingCloud-based Data Stream Processing
Cloud-based Data Stream Processing
 
ThesisXSiena: The Content-Based Publish/Subscribe System
ThesisXSiena: The Content-Based Publish/Subscribe SystemThesisXSiena: The Content-Based Publish/Subscribe System
ThesisXSiena: The Content-Based Publish/Subscribe System
 
Clock Synchronization in Distributed Systems
Clock Synchronization in Distributed SystemsClock Synchronization in Distributed Systems
Clock Synchronization in Distributed Systems
 
XSiena: The Content-Based Publish/Subscribe System
XSiena: The Content-Based Publish/Subscribe SystemXSiena: The Content-Based Publish/Subscribe System
XSiena: The Content-Based Publish/Subscribe System
 
Soft State in Publish/Subscribe
Soft State in Publish/SubscribeSoft State in Publish/Subscribe
Soft State in Publish/Subscribe
 
Highly Available Publish/Subscribe
Highly Available Publish/SubscribeHighly Available Publish/Subscribe
Highly Available Publish/Subscribe
 
Fail-Aware Publish/Subscribe
Fail-Aware Publish/SubscribeFail-Aware Publish/Subscribe
Fail-Aware Publish/Subscribe
 
Bloom Filter Based Routing for Content-Based Publish/Subscribe
Bloom Filter Based Routing for Content-Based Publish/SubscribeBloom Filter Based Routing for Content-Based Publish/Subscribe
Bloom Filter Based Routing for Content-Based Publish/Subscribe
 
Adaptive Internal Clock Synchronization
Adaptive Internal Clock SynchronizationAdaptive Internal Clock Synchronization
Adaptive Internal Clock Synchronization
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Recently uploaded (20)

Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 

Elastic Scaling of a High-Throughput Content-Based Publish/Subscribe Engine

  • 1. Elastic Scaling of a High-Throughput Content-Based Publish/Subscribe Engine Raphaël Barazzutti, Thomas Heinze, André Martin, Emanuel Onica, Pascal Felber, Christof Fetzer, Zbigniew Jerzak, Marcelo Pasin and Etienne Rivière University of Neuchâtel, Switzerland SAP AG, Germany TU Dresden, Germany ICDCS 2014, Madrid, Spain etienne.riviere@unine.ch Sunday 6 July 14
  • 2. ICDCS 2014 - Etienne Rivière Context • Publish/Subscribe as a service • Running on private or public clouds • Decoupled communication • Composition of applications from multiple domains • Content-based filtering • Subscriptions filter on the content of publications • Canonical example: stock quotes filtering 2 Publisher Subscriber Pub/Sub as a Service Adm. domain A Adm. domain B Adm. domain C PublisherSubscriber sp notify(pub) p s register(sub) pregister(pub) s name = "IBM" price > 120$ volume > 10,000 p name = "IBM" price = 131$ volume = 12,312 p name = "IBM" price = 112$ volume = 9,892 open = 109$ close = 113 $ Sunday 6 July 14
  • 3. ICDCS 2014 - Etienne Rivière Pub/sub as a service: requirements • Arbitrary representation of publications and subscriptions • Not limited to attribute-based filtering • Untrusted domains: encrypted filtering • High-throughput and scalability • Thousands subscriptions per second • Thousands publications per second • Thousands to millions notifications per second • Availability, dependability and low delays • StreamHub [DEBS 2013] • Supports arbitrary filtering scheme, in particular encrypted filtering (ASPE) • Built on top of a Stream Processing Engine 3 Publisher Subscriber P/Sa.a.S. Trusted domain Trusted domain Subscriber ? p s ? encryption encryption ? Untrusted domain ? p Sunday 6 July 14
  • 4. ICDCS 2014 - Etienne Rivière Problem: resource provisioning • Pub/sub service resource usage is unpredictable and varies over time • Example: Frankfurt stock exchange, Nov 18, 2011 • Throughput/delay requirements vs budget require appropriate provisioning • This talk: can we make a high-throughput pub/sub service elastic? • Scale in and scale out based on actual requirements • Maintain service continuity and minimize reconfiguration impact 4 0 200 400 600 800 1000 1200 09:00 12:00 15:00 18:00 21:00 Tickspersecond Time (hours) Sunday 6 July 14
  • 5. ICDCS 2014 - Etienne Rivière Outline • Background • StreamHub high-throughput content-based pub/sub engine • StreamMine3G stream processing engine • (elastic)e-StreamHub principles • Components • Slice migration • Enforcing elasticity policies • Evaluation • Micro-benchmarks • Trace-based using Frankfurt stock exchange workload 5 Sunday 6 July 14
  • 6. ICDCS 2014 - Etienne Rivière StreamHub: principles • Tiered approach to pub/sub for cloud deployments • Split pub/sub into three fundamental, consecutive operations • Exploit massive data parallelism of each operation • Supports arbitrary filtering mechanism • Event flows do not depend on a filtering scheme characteristics • This work: use computationally intensive encrypted filtering • StreamHub engine = Stream Processing application • Each of the 3 pub/sub operations mapped to an stream operator • Operators supported by a Stream Processing Engine 6 Sunday 6 July 14
  • 7. ICDCS 2014 - Etienne Rivière StreamMine3G stream processing engine • Distributed stream computation as a DAG of operators • Each operator split in multiple slices • Externalized state management, no state sharing between slices • Support for unicast, anycast & broadcast primitives 7 slice 1 slice 2 slice n-1 slice n ... Operator 1 Operator 2 Operator 3 ? Broadcast Anycast Unicast slice 3 ... Operator 4 Each slice supported by multiple cores <key,value> e <k,v> (k%2=1) e Slice state management No shared memory e e Sunday 6 July 14
  • 8. ICDCS 2014 - Etienne Rivière StreamHub Engine [DEBS13] 8 Operator Access Point (AP) Matching (M) Exit Point (EP) Function Subscriptions Partitioning Publications Filtering Publications Dispatching State Stateless Stateful (persistent) Stateful (transient) Role ➡ Decide where to store subs ➡ Broadcast pubs to next operator ➡ Store subs ➡ Filter incoming pubs, create list of matching subscribers ids ➡ Aggregate lists of matching subscribers ids ➡ Prepare & dispatch notifications Subscriber SUB x > 3 & y == 5 encrypts SUB C5F80 BA363 Publisher PUB x = 7 y = 10 encrypts PUB 88F3B 2A09C DCCP/ConnectionPoint stores sends notifications of matching encrypted publications AP:1 AP operator M operator p s Broadcast (pubs) Unicast (subs) AP:2 AP:3 AP:4 Unicast Multicast AP:5 AP:6 Anycast ? EP operator M:1 M:2 M:3 M:4 M:5 M:6 EP:1 EP:2 EP:3 EP:4 EP:5 EP:6 DCCP/ConnectionPoint e-StreamHub engine ASPE Public Cloud deployment (same DCCP) Sunday 6 July 14
  • 9. ICDCS 2014 - Etienne Rivière e(elastic)-StreamHub principle • Fixed number of slices for each operator • Elastic scaling decisions based on experienced load • Scale out: allocate new host(s), migrate slices • Scale in: migrate slices, deallocate host(s) • Redistribute slices upon load unbalance between hosts 9 EP:1 EP:2 EP:3 EP:4 EP:5 EP:6 AP:1 AP:2 AP:3 AP:4 AP:5 AP:6 M:1 M:2 M:3 M:4 M:5 M:6 Host 1 Host 1 Host 2 Host 3 EP:1 EP:3 EP:6 AP:1 M:1 M:2 EP:2 AP:5 AP:6 M:3 M:4 EP:4 EP:5 AP:3 AP:4 M:5 M:6 AP:2 scale out Host 1 Host 2 EP:1 EP:3 EP:6 AP:1 M:1 M:2 EP:2 AP:5 AP:6 M:3 M:4 AP:2 EP:4 EP:5AP:3 AP:4 M:5 M:6 scale in initial single host placement Sunday 6 July 14
  • 10. ICDCS 2014 - Etienne Rivière e-StreamHub: components • StreamMine3G: support for slice migration (including state) • Goal: minimal interruption of flow • Manager: orchestrates migration and collect workload probes • SM3G and manager state stored in ZooKeeper for dependability • Elasticity enforcer: takes migration decisions based on observed load • Based on elasticity policies 10 ZooKeeper SM3G Host 1 AP:1 M:1 EP:1 SM3G Host 2 AP:2 M:2 AP:3 SM3G Host 3 M:3 EP:2 EP:3 M:3 elasticity policies Elasticity enforcer Manager aggregated probes migration requests collects probes, orchestrates migrations configures Sunday 6 July 14
  • 11. ICDCS 2014 - Etienne Rivière Slice migration 1. Input events duplicated to a buffer on dest host 2. Slice halted; state is copied from src to dest host • Incoming events are still buffered • States associated with vectors of sequence numbers, one for each of previous operator slices • Allows knowing which events were accounted in 3. Missed events replayed on state, slice resumed ➡ “Downtime” of the slice is essentially equivalent to state copying time 11 OP:1 SM3G runtime Host 1 OP:1 SM3G runtime Host 1 (inactive) SM3G runtime Host 2 buffer (inactive) SM3G runtime Host 1 (inactive) SM3G runtime Host 2 buffer (inactive) SM3G runtime Host 1 SM3G runtime Host 2 OP:1 SM3G runtime Host 2 OP:1 events state t t t t+n buffer n 1 2 3 4 5 Sunday 6 July 14
  • 12. ICDCS 2014 - Etienne Rivière Elasticity policy • CPU utilization probes collected periodically • Secondary metrics: network usage and memory usage • Global rules: criteria on the average load • Trigger scale-in and scale-out operations • Define a target average CPU utilization • Local rules: criteria on the load of a single host • Grace period between migrations (30s) • Minimize cost of migrations (sum of migrated states) 12 Rules CPU Effect Global average > 70% scale out Global average < 30% scale in Local < 20% or > 80% redistribute slices Target average CPU = 50%average CPU = 50% Sunday 6 July 14
  • 13. ICDCS 2014 - Etienne Rivière Elasticity policy enforcement • Three-steps resolution when global or local rule is violated 1. Decide on the set of slices to migrate • For each host, identify a set of slices with: sum(CPU utilization) ≥ abs(current average utilization - target utilization) • Subset sum problem: dynamic programming, pseudo-polynomial complexity • Returns multiple solutions: select the one with less state transfer involved 2. Scaling out: add enough hosts for avg(load) to be ≤ target (50%) Scaling in: mark enough least-loaded hosts for avg(load) to be ≥ target (50%) 3. Decide on new placement • First-fit bin packing algorithm • Start with current state without selected slices, greedily assign by decreasing weight 13 Host 1 Host 2 12%AP:1 11%AP:2 25%M:1 25%M:2 26%M:3 23%M:4 13%EP:1 12%EP:2 Host 1 Host 2 25%M:1 25%M:2 26%M:3 23%M:4 Host 3 12%AP:1 11%AP:2 13%EP:1 12%EP:2 before scale out after scale out Sunday 6 July 14
  • 14. ICDCS 2014 - Etienne Rivière Evaluation 14 Sunday 6 July 14
  • 15. ICDCS 2014 - Etienne Rivière Experimental setup • Prototype in C, C++ and Java • 22 hosts, each with 8 cores and 8 GB • 1 Gbps switched network • 4 hosts with G and S: generators and sinks • 3 hosts for infrastructure • 1 to 15 hosts used for e-StreamHub • 8 to 120 cores • Number of slices is fixed • micro-benchmarks: 4 AP, 8 M, 4 EP • trace-based evaluation: 8 AP, 16 M, 8 EP • Encrypted pubs and subs (ASPE) • 100,000 encrypted subs • Each sub is ~1.2 KB • 1% matching ratio: 1,000 notifications / pub 15 EP:1 EP:2 EP:3 EP:4 EP:5 EP:6 AP:1 AP:2 AP:3 AP:4 AP:5 AP:6 M:1 M:2 M:3 M:4 M:5 M:6 Host 1/15 M:7 M:8 M:9 M:10 M:11 M:12 AP:7 AP:8 EP:7 EP:8 M:13 M:14 M:15 M:16 Host E ZooKeeper Host F Manager Host G Elasticity enforcer Host D G:4 S:4 Host C G:3 S:3 Host B G:2 S:2 Host A G:1 S:1 (unused) Host 2/15 (unused) Host 15/15 Sunday 6 July 14
  • 16. ICDCS 2014 - Etienne Rivière Static e-StreamHub: Performance 16 Delays(ms) Number of hosts and attribution to operators (AP|M|EP) Distribution of delays under half of max. throughput Max 75 th 50 th 25 th Min 0 100 200 300 400 500 2:0.5|1|0.5 4: 1|2|1 6:1.5|3|1.5 8: 2|4|2 10:2.5|5|2.5 12: 3|6|3 Publications/s Maximum throughput 0 100 200 300 400 500 Per second: 42.2 million matchings 422,000 notifications Median delay 12 hosts: 3 hosts for AP slices 6 hosts for M slices 3 hosts for EP slices Sunday 6 July 14
  • 17. ICDCS 2014 - Etienne Rivière Slice migration: times • Individual migration times, averaged over 25 migrations • Under constant flow of 100 publications / second • Worst-case scenario with “large” slices states • 4 AP, 8 M and 4 EP with 100K and 400K subscriptions • Migration time minimal for AP and EP, dominated by state size for M 17 AP M (12.500 subscriptions, total of 100,000) M (50.000 subscriptions, total of 400,000) EP average 232 ms 1.497 s 2.533 s 275 ms standard deviation 31 ms 354 ms 1.557 s 52 ms Sunday 6 July 14
  • 18. ICDCS 2014 - Etienne Rivière Impact of migrations on delay 18 0.0 0.5 1.0 1.5 2.0 2.5 100 120 140 160 180 200 delay(seconds) time (seconds) migrations: AP (1) AP (2) M (1) M (2) EP min average max Sunday 6 July 14
  • 19. ICDCS 2014 - Etienne Rivière Steady increase/decrease of load 19 • Increase/decrease of publication rate • 0 to 350 publications per second • 0 to 35 millions matching operations per second • CPU load • average, max and min over 30 seconds Sunday 6 July 14
  • 20. ICDCS 2014 - Etienne Rivière Steady increase/decrease of load 19 • Increase/decrease of publication rate • 0 to 350 publications per second • 0 to 35 millions matching operations per second • CPU load • average, max and min over 30 seconds publication rate (publications / second) 0 50 100 150 200 250 300 350 400 hosts 0 2 4 6 8 10 12 14 16 18 Time (seconds) host CPU load: min avg max 0 20 40 60 80 100 0 300 600 900 1200 1500 1800 Sunday 6 July 14
  • 21. ICDCS 2014 - Etienne Rivière Steady increase/decrease of load 19 • Increase/decrease of publication rate • 0 to 350 publications per second • 0 to 35 millions matching operations per second • CPU load • average, max and min over 30 seconds publication rate (publications / second) 0 50 100 150 200 250 300 350 400 hosts 0 2 4 6 8 10 12 14 16 18 Time (seconds) average delays (seconds) 0 0.5 1 1.5 2 2.5 3 3.5 4 0 300 600 900 1200 1500 1800 • Notification delays • average over 30 seconds Sunday 6 July 14
  • 22. ICDCS 2014 - Etienne Rivière Frankfurt stock exchange 20 publication rate (publications / second) 0 50 100 150 200 hosts 0 2 4 6 8 10 Time (seconds) host CPU load: min avg max 0 20 40 60 80 100 0 500 1000 1500 2000 2500 • Replay of the Frankfurt stock exchange workload • Sped up 10 times: about 40 minutes shown • 100,000 encrypted subscriptions • CPU load • average, max and min over 30 seconds Sunday 6 July 14
  • 23. ICDCS 2014 - Etienne Rivière Frankfurt stock exchange 20 publication rate (publications / second) 0 50 100 150 200 hosts 0 2 4 6 8 10 Time (seconds) average delays (seconds) 0 0.5 1 1.5 2 0 500 1000 1500 2000 2500 • Replay of the Frankfurt stock exchange workload • Sped up 10 times: about 40 minutes shown • 100,000 encrypted subscriptions • CPU load • average, max and min over 30 seconds • Notification delays • average over 30 seconds Sunday 6 July 14
  • 24. ICDCS 2014 - Etienne Rivière Conclusion • Elastic scaling of a high-throughput content-based pub/sub engine • Implemented at the level of the support stream processing engine • Applicability to other long-lived stream processing applications • Support of live operator slice migration • Redistribution of slices according to monitored workload • Allows deploying pub/sub as a service on a public cloud without the need to provision for worst-case scenarios • Evaluation using Frankfurt stock exchange traces • Perspectives • Monitor network flows, minimize inter-host communications in migration plans • Leverage active replication, used for dependability, to minimize impact on delay while migrating (this requires deterministic execution) 21 Sunday 6 July 14
  • 25. Elastic Scaling of a High-Throughput Content-Based Publish/Subscribe Engine Questions? This work was supported by the EU FP7 program (SRT-15 project) http://www.srt-15.eu ICDCS 2014, Madrid, Spain etienne.riviere@unine.ch Sunday 6 July 14
  • 26. ICDCS 2014 - Etienne Rivière Additional slides 23 Sunday 6 July 14
  • 27. ICDCS 2014 - Etienne Rivière Independence from Filtering Scheme • Attribute-based filtering widely studied • Represent content using discrete attributes • Subscriptions = conjunctions of discrete predicates on attributes values • Broker overlays typically rely on containment & aggregation capabilities of attribute-based filtering • Alternative filtering schemes • Encrypted filtering • ASPE (Choi et al., DEXA10) • Prefiltering (DEBS 2012) ➡ No guaranteed support for containment or aggregation • The architecture of a pub/sub engine should be independent from the filtering scheme(s) it supports 24 Sunday 6 July 14
  • 28. ICDCS 2014 - Etienne Rivière Independence from Filtering Scheme • Attribute-based filtering widely studied • Represent content using discrete attributes • Subscriptions = conjunctions of discrete predicates on attributes values • Broker overlays typically rely on containment & aggregation capabilities of attribute-based filtering • Alternative filtering schemes • Encrypted filtering • ASPE (Choi et al., DEXA10) • Prefiltering (DEBS 2012) ➡ No guaranteed support for containment or aggregation • The architecture of a pub/sub engine should be independent from the filtering scheme(s) it supports 24 Publisher Subscriber P/Sa.a.S. Trusted domain Trusted domain Subscriber p s Trusted (?) domain p s Sunday 6 July 14
  • 29. ICDCS 2014 - Etienne Rivière Independence from Filtering Scheme • Attribute-based filtering widely studied • Represent content using discrete attributes • Subscriptions = conjunctions of discrete predicates on attributes values • Broker overlays typically rely on containment & aggregation capabilities of attribute-based filtering • Alternative filtering schemes • Encrypted filtering • ASPE (Choi et al., DEXA10) • Prefiltering (DEBS 2012) ➡ No guaranteed support for containment or aggregation • The architecture of a pub/sub engine should be independent from the filtering scheme(s) it supports 24 Publisher Subscriber P/Sa.a.S. Trusted domain Trusted domain Subscriber p s Trusted (?) domain p s Sunday 6 July 14
  • 30. ICDCS 2014 - Etienne Rivière Independence from Filtering Scheme • Attribute-based filtering widely studied • Represent content using discrete attributes • Subscriptions = conjunctions of discrete predicates on attributes values • Broker overlays typically rely on containment & aggregation capabilities of attribute-based filtering • Alternative filtering schemes • Encrypted filtering • ASPE (Choi et al., DEXA10) • Prefiltering (DEBS 2012) ➡ No guaranteed support for containment or aggregation • The architecture of a pub/sub engine should be independent from the filtering scheme(s) it supports 24 Publisher Subscriber P/Sa.a.S. Trusted domain Trusted domain Subscriber ? p s ? encryption encryption ? Untrusted domain ? p Sunday 6 July 14
  • 31. ICDCS 2014 - Etienne Rivière Events Paths and Support Libraries 25 libcluster Subscriber s TCP DCCP AP Anycast M Determine key to matching operator using clustering Store subscription in operator slice state libfilter Optional lib Library State O Operator slice StreamHub client C Static component M Unicast Sunday 6 July 14
  • 32. ICDCS 2014 - Etienne Rivière Events Paths and Support Libraries 25 libcluster Publisher Subscriber s TCP DCCP AP Anycast M p TCP DCCP AP Anycast M Determine key to matching operator using clustering Store subscription in operator slice state libfilter Filter publication, return matching subscribers list Broadcast M EP EP Unicast DCCP DCCP Multicast DCCP Subscriber p TCP p p ... Optional p,{s} Publication + subscribers lib Library State O Operator slice StreamHub client C Static component M Unicast Sunday 6 July 14
  • 33. ICDCS 2014 - Etienne Rivière StreamHub interface • Data Conversion and Connection Point(s) (DCCP) • Stateless component(s) running on server(s) with direct WAN connectivity • Maintain persistent TCP connections to/from publishers and subscribers • Convert between external/internal format 26 Publisher Subscriber StreamHub engine p s Data converter & connection point p DCCP p sPersistent TCP connections GPB format Internal format p Sunday 6 July 14
  • 34. ICDCS 2014 - Etienne Rivière Use an Overlay of Brokers? • Brokers organized in an overlay (mesh, tree) ; each broker performs all pub/sub operations • Storing subscriptions; matching, forwarding, notifying of publications • Complex maintenance of routing tables between brokers • Assumptions on the filtering scheme for inter-broker communication and publication forwarding: containment & aggregation • Scalability/throughput depend on workload, notification delays may vary 27 Publisher Subscriber s s s s Subscriber Subscriber Subscriber s s Publisher Publisher Broker Broker Broker s RT updates RT updates Sunday 6 July 14
  • 35. ICDCS 2014 - Etienne Rivière Use an Overlay of Brokers? • Brokers organized in an overlay (mesh, tree) ; each broker performs all pub/sub operations • Storing subscriptions; matching, forwarding, notifying of publications • Complex maintenance of routing tables between brokers • Assumptions on the filtering scheme for inter-broker communication and publication forwarding: containment & aggregation • Scalability/throughput depend on workload, notification delays may vary 27 Publisher Subscriber p s s s Subscriber Subscriber SubscriberPublisher Publisher p p Broker Broker Broker p p Sunday 6 July 14