Publish/subscribe (pub/sub) infrastructures, running as a service in cloud environments, offer simplicity and flexibility for composing distributed applications. Provisioning them appropriately is, however, challenging. The amount of stored subscriptions and incoming publications varies over time, and the computational cost depends on the nature of the applications and in particular on the filtering operations they require, e.g., content-based vs. topic-based, encrypted vs. non-encrypted filtering. The ability to elastically adapt the amount of resources required to sustain given throughput and delay requirements is key to achieving cost-effectiveness for a pub/sub service running in a cloud environment. In this paper, we present the design and evaluation of an elastic content-based pub/sub system: eStreamHub. Specific contributions of this paper include: (1) a mechanism for dynamic scaling, both in and out, of stateful and stateless pub/sub operators, (2) a local and global elasticity policy enforcer maintaining high system utilization and stable end-to-end latencies, and (3) an evaluation using real-world tick workload from the Frankfurt Stock Exchange and encrypted content-based filtering.
CNIC Information System with Pakdata Cf In Pakistan
Elastic Scaling of a High-Throughput Content-Based Publish/Subscribe Engine
1. Elastic Scaling of a High-Throughput
Content-Based Publish/Subscribe Engine
Raphaël Barazzutti, Thomas Heinze, André Martin,
Emanuel Onica, Pascal Felber, Christof Fetzer,
Zbigniew Jerzak, Marcelo Pasin and Etienne Rivière
University of Neuchâtel, Switzerland
SAP AG, Germany
TU Dresden, Germany
ICDCS 2014, Madrid, Spain
etienne.riviere@unine.ch
Sunday 6 July 14
2. ICDCS 2014 - Etienne Rivière
Context
• Publish/Subscribe as a service
• Running on private or public clouds
• Decoupled communication
• Composition of applications from multiple domains
• Content-based filtering
• Subscriptions filter on the content of publications
• Canonical example: stock quotes filtering
2
Publisher Subscriber
Pub/Sub as a Service
Adm. domain A Adm. domain B Adm. domain C
PublisherSubscriber
sp
notify(pub)
p
s
register(sub)
pregister(pub)
s name = "IBM"
price > 120$
volume > 10,000
p name = "IBM"
price = 131$
volume = 12,312
p name = "IBM"
price = 112$
volume = 9,892
open = 109$
close = 113 $
Sunday 6 July 14
3. ICDCS 2014 - Etienne Rivière
Pub/sub as a service: requirements
• Arbitrary representation of publications and subscriptions
• Not limited to attribute-based filtering
• Untrusted domains: encrypted filtering
• High-throughput and scalability
• Thousands subscriptions per second
• Thousands publications per second
• Thousands to millions notifications per second
• Availability, dependability and low delays
• StreamHub [DEBS 2013]
• Supports arbitrary filtering scheme, in particular
encrypted filtering (ASPE)
• Built on top of a Stream Processing Engine
3
Publisher Subscriber
P/Sa.a.S.
Trusted domain Trusted domain
Subscriber
?
p s
?
encryption encryption
?
Untrusted domain
?
p
Sunday 6 July 14
4. ICDCS 2014 - Etienne Rivière
Problem: resource provisioning
• Pub/sub service resource usage is unpredictable and varies over time
• Example: Frankfurt stock exchange, Nov 18, 2011
• Throughput/delay requirements vs budget require appropriate provisioning
• This talk: can we make a high-throughput pub/sub service elastic?
• Scale in and scale out based on actual requirements
• Maintain service continuity and minimize reconfiguration impact
4
0
200
400
600
800
1000
1200
09:00 12:00 15:00 18:00 21:00
Tickspersecond
Time (hours)
Sunday 6 July 14
6. ICDCS 2014 - Etienne Rivière
StreamHub: principles
• Tiered approach to pub/sub for cloud deployments
• Split pub/sub into three fundamental, consecutive operations
• Exploit massive data parallelism of each operation
• Supports arbitrary filtering mechanism
• Event flows do not depend on a filtering scheme characteristics
• This work: use computationally intensive encrypted filtering
• StreamHub engine = Stream Processing application
• Each of the 3 pub/sub operations mapped to an stream operator
• Operators supported by a Stream Processing Engine
6
Sunday 6 July 14
7. ICDCS 2014 - Etienne Rivière
StreamMine3G stream processing engine
• Distributed stream computation as a DAG of operators
• Each operator split in multiple slices
• Externalized state management, no state sharing between slices
• Support for unicast, anycast & broadcast primitives
7
slice 1
slice 2
slice n-1
slice n
...
Operator 1 Operator 2
Operator 3
?
Broadcast
Anycast
Unicast
slice 3
...
Operator 4
Each slice
supported
by multiple
cores
<key,value>
e
<k,v> (k%2=1)
e
Slice state
management
No shared
memory
e
e
Sunday 6 July 14
8. ICDCS 2014 - Etienne Rivière
StreamHub Engine [DEBS13]
8
Operator Access Point (AP) Matching (M) Exit Point (EP)
Function Subscriptions Partitioning Publications Filtering Publications Dispatching
State Stateless Stateful (persistent) Stateful (transient)
Role
➡ Decide where to store
subs
➡ Broadcast pubs to next
operator
➡ Store subs
➡ Filter incoming pubs,
create list of matching
subscribers ids
➡ Aggregate lists of
matching subscribers ids
➡ Prepare & dispatch
notifications
Subscriber
SUB
x > 3 &
y == 5
encrypts
SUB
C5F80
BA363
Publisher
PUB
x = 7
y = 10
encrypts
PUB
88F3B
2A09C
DCCP/ConnectionPoint
stores
sends
notifications of
matching encrypted publications
AP:1
AP operator M operator
p
s
Broadcast
(pubs)
Unicast
(subs)
AP:2
AP:3
AP:4
Unicast Multicast
AP:5
AP:6
Anycast
?
EP operator
M:1
M:2
M:3
M:4
M:5
M:6
EP:1
EP:2
EP:3
EP:4
EP:5
EP:6
DCCP/ConnectionPoint
e-StreamHub engine
ASPE
Public Cloud deployment
(same
DCCP)
Sunday 6 July 14
9. ICDCS 2014 - Etienne Rivière
e(elastic)-StreamHub principle
• Fixed number of slices for each operator
• Elastic scaling decisions based on experienced load
• Scale out: allocate new host(s), migrate slices
• Scale in: migrate slices, deallocate host(s)
• Redistribute slices upon load unbalance between hosts
9
EP:1
EP:2
EP:3
EP:4
EP:5
EP:6
AP:1
AP:2
AP:3
AP:4
AP:5
AP:6
M:1
M:2
M:3
M:4
M:5
M:6
Host 1
Host 1 Host 2 Host 3
EP:1
EP:3
EP:6
AP:1 M:1
M:2
EP:2
AP:5
AP:6
M:3
M:4
EP:4
EP:5
AP:3
AP:4
M:5
M:6
AP:2
scale
out
Host 1 Host 2
EP:1
EP:3
EP:6
AP:1 M:1
M:2
EP:2
AP:5
AP:6
M:3
M:4
AP:2
EP:4
EP:5AP:3
AP:4 M:5 M:6
scale
in
initial single host
placement
Sunday 6 July 14
10. ICDCS 2014 - Etienne Rivière
e-StreamHub: components
• StreamMine3G: support for slice migration (including state)
• Goal: minimal interruption of flow
• Manager: orchestrates migration and collect workload probes
• SM3G and manager state stored in ZooKeeper for dependability
• Elasticity enforcer: takes migration decisions based on observed load
• Based on elasticity policies
10
ZooKeeper
SM3G
Host 1
AP:1
M:1
EP:1
SM3G
Host 2
AP:2
M:2
AP:3 SM3G
Host 3
M:3
EP:2
EP:3
M:3
elasticity
policies
Elasticity
enforcer
Manager
aggregated
probes
migration
requests
collects probes,
orchestrates
migrations
configures
Sunday 6 July 14
11. ICDCS 2014 - Etienne Rivière
Slice migration
1. Input events duplicated to a buffer on dest host
2. Slice halted; state is copied from src to dest host
• Incoming events are still buffered
• States associated with vectors of sequence numbers, one for each of previous
operator slices
• Allows knowing which events were accounted in
3. Missed events replayed on state, slice resumed
➡ “Downtime” of the slice is essentially equivalent to state copying time
11
OP:1
SM3G runtime
Host 1
OP:1
SM3G runtime
Host 1
(inactive)
SM3G runtime
Host 2
buffer
(inactive)
SM3G runtime
Host 1
(inactive)
SM3G runtime
Host 2
buffer
(inactive)
SM3G runtime
Host 1
SM3G runtime
Host 2
OP:1
SM3G runtime
Host 2
OP:1
events state
t t
t t+n
buffer
n
1 2 3 4 5
Sunday 6 July 14
12. ICDCS 2014 - Etienne Rivière
Elasticity policy
• CPU utilization probes collected periodically
• Secondary metrics: network usage and memory usage
• Global rules: criteria on the average load
• Trigger scale-in and scale-out operations
• Define a target average CPU utilization
• Local rules: criteria on the load of a single host
• Grace period between migrations (30s)
• Minimize cost of migrations (sum of migrated states)
12
Rules CPU Effect
Global
average > 70% scale out
Global
average < 30% scale in
Local < 20% or > 80% redistribute slices
Target average CPU = 50%average CPU = 50%
Sunday 6 July 14
13. ICDCS 2014 - Etienne Rivière
Elasticity policy enforcement
• Three-steps resolution when global or local rule is violated
1. Decide on the set of slices to migrate
• For each host, identify a set of slices with:
sum(CPU utilization) ≥ abs(current average utilization - target utilization)
• Subset sum problem: dynamic programming, pseudo-polynomial complexity
• Returns multiple solutions: select the one with less state transfer involved
2. Scaling out: add enough hosts for avg(load) to be ≤ target (50%)
Scaling in: mark enough least-loaded hosts for avg(load) to be ≥ target (50%)
3. Decide on new placement
• First-fit bin packing algorithm
• Start with current state without selected slices, greedily assign by decreasing weight
13
Host 1 Host 2
12%AP:1
11%AP:2
25%M:1
25%M:2
26%M:3
23%M:4
13%EP:1
12%EP:2
Host 1 Host 2
25%M:1
25%M:2
26%M:3
23%M:4
Host 3
12%AP:1
11%AP:2
13%EP:1
12%EP:2
before scale out after scale out
Sunday 6 July 14
15. ICDCS 2014 - Etienne Rivière
Experimental setup
• Prototype in C, C++ and Java
• 22 hosts, each with 8 cores and 8 GB
• 1 Gbps switched network
• 4 hosts with G and S: generators and sinks
• 3 hosts for infrastructure
• 1 to 15 hosts used for e-StreamHub
• 8 to 120 cores
• Number of slices is fixed
• micro-benchmarks: 4 AP, 8 M, 4 EP
• trace-based evaluation: 8 AP, 16 M, 8 EP
• Encrypted pubs and subs (ASPE)
• 100,000 encrypted subs
• Each sub is ~1.2 KB
• 1% matching ratio: 1,000 notifications / pub
15
EP:1
EP:2
EP:3
EP:4
EP:5
EP:6
AP:1
AP:2
AP:3
AP:4
AP:5
AP:6
M:1
M:2
M:3
M:4
M:5
M:6
Host 1/15
M:7
M:8
M:9
M:10
M:11
M:12
AP:7
AP:8
EP:7
EP:8
M:13
M:14
M:15
M:16
Host E
ZooKeeper
Host F
Manager
Host G
Elasticity
enforcer
Host D
G:4 S:4
Host C
G:3 S:3
Host B
G:2 S:2
Host A
G:1 S:1
(unused)
Host 2/15
(unused)
Host 15/15
Sunday 6 July 14
16. ICDCS 2014 - Etienne Rivière
Static e-StreamHub: Performance
16
Delays(ms)
Number of hosts and attribution to operators (AP|M|EP)
Distribution of delays under half of max. throughput
Max 75
th
50
th
25
th Min
0
100
200
300
400
500
2:0.5|1|0.5
4:
1|2|1
6:1.5|3|1.5
8:
2|4|2
10:2.5|5|2.5
12:
3|6|3
Publications/s
Maximum throughput
0
100
200
300
400
500
Per second:
42.2 million matchings
422,000 notifications
Median delay
12 hosts:
3 hosts for AP slices
6 hosts for M slices
3 hosts for EP slices
Sunday 6 July 14
17. ICDCS 2014 - Etienne Rivière
Slice migration: times
• Individual migration times, averaged over 25 migrations
• Under constant flow of 100 publications / second
• Worst-case scenario with “large” slices states
• 4 AP, 8 M and 4 EP with 100K and 400K subscriptions
• Migration time minimal for AP and EP, dominated by state size for M
17
AP
M
(12.500 subscriptions,
total of 100,000)
M
(50.000 subscriptions,
total of 400,000)
EP
average 232 ms 1.497 s 2.533 s 275 ms
standard
deviation
31 ms 354 ms 1.557 s 52 ms
Sunday 6 July 14
18. ICDCS 2014 - Etienne Rivière
Impact of migrations on delay
18
0.0
0.5
1.0
1.5
2.0
2.5
100 120 140 160 180 200
delay(seconds)
time (seconds)
migrations: AP (1) AP (2) M (1) M (2) EP
min average max
Sunday 6 July 14
19. ICDCS 2014 - Etienne Rivière
Steady increase/decrease of load
19
• Increase/decrease of
publication rate
• 0 to 350 publications
per second
• 0 to 35 millions
matching operations
per second
• CPU load
• average, max and min
over 30 seconds
Sunday 6 July 14
20. ICDCS 2014 - Etienne Rivière
Steady increase/decrease of load
19
• Increase/decrease of
publication rate
• 0 to 350 publications
per second
• 0 to 35 millions
matching operations
per second
• CPU load
• average, max and min
over 30 seconds
publication rate (publications / second)
0
50
100
150
200
250
300
350
400
hosts
0
2
4
6
8
10
12
14
16
18
Time (seconds)
host CPU load: min avg max
0
20
40
60
80
100
0 300 600 900 1200 1500 1800
Sunday 6 July 14
21. ICDCS 2014 - Etienne Rivière
Steady increase/decrease of load
19
• Increase/decrease of
publication rate
• 0 to 350 publications
per second
• 0 to 35 millions
matching operations
per second
• CPU load
• average, max and min
over 30 seconds
publication rate (publications / second)
0
50
100
150
200
250
300
350
400
hosts
0
2
4
6
8
10
12
14
16
18
Time (seconds)
average delays (seconds)
0
0.5
1
1.5
2
2.5
3
3.5
4
0 300 600 900 1200 1500 1800
• Notification delays
• average over 30
seconds
Sunday 6 July 14
22. ICDCS 2014 - Etienne Rivière
Frankfurt stock exchange
20
publication rate (publications / second)
0
50
100
150
200
hosts
0
2
4
6
8
10
Time (seconds)
host CPU load: min avg max
0
20
40
60
80
100
0 500 1000 1500 2000 2500
• Replay of the Frankfurt
stock exchange
workload
• Sped up 10 times: about
40 minutes shown
• 100,000 encrypted
subscriptions
• CPU load
• average, max and min
over 30 seconds
Sunday 6 July 14
23. ICDCS 2014 - Etienne Rivière
Frankfurt stock exchange
20
publication rate (publications / second)
0
50
100
150
200
hosts
0
2
4
6
8
10
Time (seconds)
average delays (seconds)
0
0.5
1
1.5
2
0 500 1000 1500 2000 2500
• Replay of the Frankfurt
stock exchange
workload
• Sped up 10 times: about
40 minutes shown
• 100,000 encrypted
subscriptions
• CPU load
• average, max and min
over 30 seconds
• Notification delays
• average over 30 seconds
Sunday 6 July 14
24. ICDCS 2014 - Etienne Rivière
Conclusion
• Elastic scaling of a high-throughput content-based pub/sub engine
• Implemented at the level of the support stream processing engine
• Applicability to other long-lived stream processing applications
• Support of live operator slice migration
• Redistribution of slices according to monitored workload
• Allows deploying pub/sub as a service on a public cloud without
the need to provision for worst-case scenarios
• Evaluation using Frankfurt stock exchange traces
• Perspectives
• Monitor network flows, minimize inter-host communications in
migration plans
• Leverage active replication, used for dependability, to minimize impact
on delay while migrating (this requires deterministic execution)
21
Sunday 6 July 14
25. Elastic Scaling of a High-Throughput
Content-Based Publish/Subscribe Engine
Questions?
This work was supported by the EU FP7 program (SRT-15 project)
http://www.srt-15.eu
ICDCS 2014, Madrid, Spain
etienne.riviere@unine.ch
Sunday 6 July 14
27. ICDCS 2014 - Etienne Rivière
Independence from Filtering Scheme
• Attribute-based filtering widely studied
• Represent content using discrete attributes
• Subscriptions = conjunctions of discrete predicates on attributes values
• Broker overlays typically rely on containment & aggregation capabilities of
attribute-based filtering
• Alternative filtering schemes
• Encrypted filtering
• ASPE (Choi et al., DEXA10)
• Prefiltering (DEBS 2012)
➡ No guaranteed support for
containment or aggregation
• The architecture of a pub/sub engine
should be independent from the
filtering scheme(s) it supports
24
Sunday 6 July 14
28. ICDCS 2014 - Etienne Rivière
Independence from Filtering Scheme
• Attribute-based filtering widely studied
• Represent content using discrete attributes
• Subscriptions = conjunctions of discrete predicates on attributes values
• Broker overlays typically rely on containment & aggregation capabilities of
attribute-based filtering
• Alternative filtering schemes
• Encrypted filtering
• ASPE (Choi et al., DEXA10)
• Prefiltering (DEBS 2012)
➡ No guaranteed support for
containment or aggregation
• The architecture of a pub/sub engine
should be independent from the
filtering scheme(s) it supports
24
Publisher Subscriber
P/Sa.a.S.
Trusted domain Trusted domain
Subscriber
p
s
Trusted (?) domain
p
s
Sunday 6 July 14
29. ICDCS 2014 - Etienne Rivière
Independence from Filtering Scheme
• Attribute-based filtering widely studied
• Represent content using discrete attributes
• Subscriptions = conjunctions of discrete predicates on attributes values
• Broker overlays typically rely on containment & aggregation capabilities of
attribute-based filtering
• Alternative filtering schemes
• Encrypted filtering
• ASPE (Choi et al., DEXA10)
• Prefiltering (DEBS 2012)
➡ No guaranteed support for
containment or aggregation
• The architecture of a pub/sub engine
should be independent from the
filtering scheme(s) it supports
24
Publisher Subscriber
P/Sa.a.S.
Trusted domain Trusted domain
Subscriber
p
s
Trusted (?) domain
p
s
Sunday 6 July 14
30. ICDCS 2014 - Etienne Rivière
Independence from Filtering Scheme
• Attribute-based filtering widely studied
• Represent content using discrete attributes
• Subscriptions = conjunctions of discrete predicates on attributes values
• Broker overlays typically rely on containment & aggregation capabilities of
attribute-based filtering
• Alternative filtering schemes
• Encrypted filtering
• ASPE (Choi et al., DEXA10)
• Prefiltering (DEBS 2012)
➡ No guaranteed support for
containment or aggregation
• The architecture of a pub/sub engine
should be independent from the
filtering scheme(s) it supports
24
Publisher Subscriber
P/Sa.a.S.
Trusted domain Trusted domain
Subscriber
?
p s
?
encryption encryption
?
Untrusted domain
?
p
Sunday 6 July 14
31. ICDCS 2014 - Etienne Rivière
Events Paths and Support Libraries
25
libcluster
Subscriber
s
TCP
DCCP AP
Anycast
M
Determine key to
matching operator
using clustering
Store subscription
in operator
slice state
libfilter
Optional
lib Library
State
O Operator slice
StreamHub client
C Static component
M
Unicast
Sunday 6 July 14
32. ICDCS 2014 - Etienne Rivière
Events Paths and Support Libraries
25
libcluster
Publisher
Subscriber
s
TCP
DCCP AP
Anycast
M
p
TCP
DCCP AP
Anycast
M
Determine key to
matching operator
using clustering
Store subscription
in operator
slice state
libfilter
Filter publication,
return matching
subscribers list
Broadcast
M
EP
EP
Unicast
DCCP
DCCP
Multicast
DCCP
Subscriber
p
TCP
p
p
...
Optional
p,{s}
Publication +
subscribers
lib Library
State
O Operator slice
StreamHub client
C Static component
M
Unicast
Sunday 6 July 14
33. ICDCS 2014 - Etienne Rivière
StreamHub interface
• Data Conversion and Connection Point(s) (DCCP)
• Stateless component(s) running on server(s) with direct WAN connectivity
• Maintain persistent TCP connections to/from publishers and subscribers
• Convert between external/internal format
26
Publisher
Subscriber
StreamHub engine
p
s
Data converter &
connection point
p
DCCP
p
sPersistent TCP
connections
GPB
format
Internal
format
p
Sunday 6 July 14
34. ICDCS 2014 - Etienne Rivière
Use an Overlay of Brokers?
• Brokers organized in an overlay (mesh, tree) ;
each broker performs all pub/sub operations
• Storing subscriptions; matching, forwarding, notifying of publications
• Complex maintenance of routing tables between brokers
• Assumptions on the filtering scheme for inter-broker communication
and publication forwarding: containment & aggregation
• Scalability/throughput depend on workload, notification delays may vary
27
Publisher Subscriber
s
s
s
s
Subscriber Subscriber Subscriber
s s
Publisher Publisher
Broker
Broker
Broker
s
RT
updates
RT
updates
Sunday 6 July 14
35. ICDCS 2014 - Etienne Rivière
Use an Overlay of Brokers?
• Brokers organized in an overlay (mesh, tree) ;
each broker performs all pub/sub operations
• Storing subscriptions; matching, forwarding, notifying of publications
• Complex maintenance of routing tables between brokers
• Assumptions on the filtering scheme for inter-broker communication
and publication forwarding: containment & aggregation
• Scalability/throughput depend on workload, notification delays may vary
27
Publisher Subscriber
p
s
s
s
Subscriber Subscriber SubscriberPublisher Publisher
p
p
Broker
Broker
Broker
p p
Sunday 6 July 14