SlideShare a Scribd company logo
1 of 13
Download to read offline
A Scalable and Reliable Matching Service for
Content-Based Publish/Subscribe Systems
Xingkong Ma, Student Member, IEEE, Yijie Wang, Member, IEEE, and Xiaoqiang Pei
Abstract—Characterized by the increasing arrival rate of live content, the emergency applications pose a great challenge: how to
disseminate large-scale live content to interested users in a scalable and reliable manner. The publish/subscribe (pub/sub) model is
widely used for data dissemination because of its capacity of seamlessly expanding the system to massive size. However, most event
matching services of existing pub/sub systems either lead to low matching throughput when matching a large number of skewed
subscriptions, or interrupt dissemination when a large number of servers fail. The cloud computing provides great opportunities for the
requirements of complex computing and reliable communication. In this paper, we propose SREM, a scalable and reliable event
matching service for content-based pub/sub systems in cloud computing environment. To achieve low routing latency and reliable links
among servers, we propose a distributed overlay SkipCloud to organize servers of SREM. Through a hybrid space partitioning
technique HPartition, large-scale skewed subscriptions are mapped into multiple subspaces, which ensures high matching throughput
and provides multiple candidate servers for each event. Moreover, a series of dynamics maintenance mechanisms are extensively
studied. To evaluate the performance of SREM, 64 servers are deployed and millions of live content items are tested in a CloudStack
testbed. Under various parameter settings, the experimental results demonstrate that the traffic overhead of routing events in
SkipCloud is at least 60 percent smaller than in Chord overlay, the matching rate in SREM is at least 3.7 times and at most 40.4 times
larger than the single-dimensional partitioning technique of BlueDove. Besides, SREM enables the event loss rate to drop back to 0 in
tens of seconds even if a large number of servers fail simultaneously.
Index Terms—Publish/subscribe, event matching, overlay construction, content space partitioning, cloud computing
Ç
1 INTRODUCTION
BECAUSE of the importance in helping users to make real-
time decisions, data dissemination has become dramati-
cally significant in many large-scale emergency applications,
such as earthquake monitoring, disaster weather warning,
and status update in social networks. Recently, data dissemi-
nation in these emergency applications presents a number of
fresh trends. One is the rapid growth of live content. For
instance, Facebook users publish over 600,000 pieces of con-
tent and Twitter users send over 100,000 tweets on average
per minute [1]. The other is the highly dynamic network
environment. For instance, the measurement studies indi-
cates that most users’ sessions in social networks only last
several minutes [2]. In emergency scenarios, the sudden dis-
asters like earthquake or bad weather may lead to the failure
of a large number of users instantaneously.
These characteristics require the data dissemination
system to be scalable and reliable. Firstly, the system
must be scalable to support the large amount of live con-
tent. The key is to offer a scalable event matching service
to filter out irrelevant users. Otherwise, the content may
have to traverse a large number of uninterested users
before they reach interested users. Secondly, with the
dynamic network environment, it’s quite necessary to
provide reliable schemes to keep continuous data dissem-
ination capacity. Otherwise, the system interruption may
cause the live content becomes obsolete content.
Driven by these requirements, publish/subscribe (pub/
sub) pattern is widely used to disseminate data due
to its flexibility, scalability, and efficient support of com-
plex event processing. In pub/sub systems (pub/subs), a
receiver (subscriber) registers its interest in the form of a
subscription. Events are published by senders to the pub/
sub system. The system matches events against subscrip-
tions and disseminates them to interested subscribers.
In traditional data dissemination applications, the live
content are generated by publishers at a low speed, which
makes many pub/subs adopt the multi-hop routing techni-
ques to disseminate events. A large body of broker-based
pub/subs forward events and subscriptions through orga-
nizing nodes into diverse distributed overlays, such as tree-
based design [3], [4], [5], [6], cluster-based design [7], [8]
and DHT-based design [9], [10], [11]. However, the multi-
hop routing techniques in these broker-based systems lead
to a low matching throughput, which is inadequate to apply
to current high arrival rate of live content.
Recently, cloud computing provides great opportunities
for the applications of complex computing and high speed
communication [12], [13], where the servers are connected
by high speed networks, and have powerful computing and
storage capacities. A number of pub/sub services based on
the cloud computing environment have been proposed, such
as Move [14], BlueDove [15] and SEMAS [16]. However,
most of them can not completely meet the requirements of
 The authors are with the Science and Technology on Parallel and Distrib-
uted Processing Laboratory, College of Computer, National University of
Defense Technology, Changsha, Hunan 410073, P.R. China.
E-mail: {maxingkong, wangyijie, xiaoqiangpei}@nudt.edu.cn.
Manuscript received 7 Mar. 2014; revised 22 May 2014; accepted 18 June
2014. Date of publication 28 July 2014; date of current version 11 Mar. 2015.
Recommended for acceptance by I. Bojanova, R.C.H. Hua, O. Rana, and
M. Parashar.
For information on obtaining reprints of this article, please send e-mail to:
reprints@ieee.org, and reference the Digital Object Identifier below.
Digital Object Identifier no. 10.1109/TCC.2014.2338327
IEEE TRANSACTIONS ON CLOUD COMPUTING, VOL. 3, NO. 1, JANUARY-MARCH 2015 1
2168-7161 ß 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
both scalability and reliability when matching large-scale
live content under highly dynamic environments. This
mainly stems from the following facts: 1) Most of them are
inappropriate to the matching of live content with high data
dimensionality due to the limitation of their subscription
space partitioning techniques, which bring either low match-
ing throughput or high memory overhead. 2) These systems
adopt the one-hop lookup technique [17] among servers to
reduce routing latency. In spite of its high efficiency, it
requires each dispatching server to have the same view of
matching servers. Otherwise, the subscriptions or events
may be assigned to the wrong matching servers, which
brings the availability problem in the face of current joining
or crash of matching servers. A number of schemes can be
used to keep the consistent view, like periodically sending
heartbeat messages to dispatching servers or exchanging
messages among matching servers. However, these extra
schemes may bring a large traffic overhead or the interrup-
tion of event matching service.
Motivated by these factors, we propose a scalable and
reliable matching service for content-based pub/sub service
in cloud computing environments, called SREM. Specifi-
cally, we mainly focus on two problems: one is how to orga-
nize servers in the cloud computing environment to achieve
scalable and reliable routing. The other is how to manage
subscriptions and events to achieve parallel matching
among these servers. Generally speaking, we provide the
following contributions:
 We propose a distributed overlay protocol, called
SkipCloud, to organize servers in the cloud comput-
ing environment. SkipCloud enables subscriptions
and events to be forwarded among brokers in a scal-
able and reliable manner. Also it is easy to imple-
ment and maintain.
 To achieve scalable and reliable event matching
among multiple servers, we propose a hybrid multi-
dimensional space partitioning technique, called
HPartition. It allows similar subscriptions to be
divided into the same server and provides multiple
candidate matching servers for each event. More-
over, it adaptively alleviates hot spots and keeps
workload balance among all servers.
 We implement extensive experiments based on a
CloudStack testbed to verify the performance of
SREM under various parameter settings.
The rest of this paper is organized as follows. Section 2
introduces content-based data model and an essential
framework of SREM. Section 3 presents the SkipCloud
overlay in detail. Section 4 describes HPartition in detail.
Section 5 discusses the dynamics maintenance mechanisms
of SREM. We evaluate the performance of SREM in Section
6. In Section 7, we review the related work on the matching
of existing content-based pub/subs. Finally, we conclude
the paper and outline our future work in Section 8.
2 DESIGN OF SREM
2.1 Content-Based Data Model
SREM uses a multi-dimensional content-based data model.
Consider our data model consists of k dimensions
A1; A2; . . . ; Ak. Let Ri be the ordered set of all possible val-
ues of Ai. So, V ¼ R1 Â R2 Â Á Á Á Â Rk is the entire content
space. A subscription is a conjunction of predicates over one
or more dimensions. Each predicate Pi specifies a continu-
ous range for a dimension Ai, and it can be described by the
tuple ðAi; vi; OiÞ, where vi 2 Ri and Oi represents a rela-
tional operator (  ; ; 6¼; !;  , etc). The general form of a
subscription is S ¼ ^k
i¼1Pi. An event is a point within the
content space V. It can be represented as k dimension-value
pairs, i.e., e ¼ ^k
j¼1ðAj; vjÞ. For each pair ðAj; vjÞ, we say it
satisfies a predicate ðAi; vi; OiÞ if Aj ¼ Ai and vjOivi. By this
definition we say an event e matches S if each predicate of S
satisfies some pairs of e.
2.2 Overview of SREM
To support large-scale users, we consider a cloud comput-
ing environment with a set of geographically distributed
datacenters through the Internet. Each datacenter contains a
large number of servers (brokers), which are managed by a
datacenter management service such as Amazon EC2 or
OpenStack.
We illustrate a simple overview of SREM in Fig. 1. All
brokers in SREM as the front-end are exposed to the Inter-
net, and any subscriber and publisher can connect to them
directly. To achieve reliable connectivity and low routing
latency, these brokers are connected through an distrib-
uted overlay, called SkipCloud. The entire content space is
partitioned into disjoint subspaces, each of which is man-
aged by a number of brokers. Subscriptions and events are
dispatched to the subspaces that are overlapping with
them through SkipCloud. Thus, subscriptions and events
falling into the same subspace are matched on the same
broker. After the matching process completes, events are
broadcasted to the corresponding interested subscribers.
As shown in Fig. 1, the subscriptions generated by sub-
scribers S1 and S2 are dispatched to broker B2 and B5,
respectively. Upon receiving events from publishers, B2
and B5 will send matched events to S1 and S2, respectively.
One may argue that different datacenters are responsi-
ble for some subset of the subscriptions according to the
geographical location, such that we do not really need
much collaboration among the servers [3], [4]. In this
case, since the pub/sub system needs to find all the
matched subscribers, it requires each event to be
matched in all datacenters, which leads to large traffic
overhead with the increasing number of datacenters and
the increasing arrival rate of live content. Besides, it’s
hard to achieve workload balance among the servers of
Fig. 1. System framework.
2 IEEE TRANSACTIONS ON CLOUD COMPUTING, VOL. 3, NO. 1, JANUARY-MARCH 2015
all datacenters due to the various skewed distributions
of users’ interests. Another question is that why we need
a distributed overlay like SkipCloud to ensure reliable
logical connectivity in datacenter environment where
servers are more stable than the peers in P2P networks.
This is because as the number of servers increases in
datacenters, the node failure becomes normal, but not
rare exception [18]. The node failure may lead to unreli-
able and inefficient routing among servers. To this end,
we try to organize servers into SkipCloud to reduce the
routing latency in a scalable and reliable manner.
Such a framework offers a number of advantages for
real-time and reliable data dissemination. First, it allows the
system to timely group similar subscriptions into the same
broker due to the high bandwidth among brokers in the
cloud computing environment, such that the local searching
time can be greatly reduced. This is critical to reach high
matching throughput. Second, since each subspace is man-
aged by multiple brokers, this framework is fault-tolerant
even if a large number of brokers crash instantaneously.
Third, because the datacenter management service provides
scalable and elastic servers, the system can be easily
expanded to Internet-scale.
3 SKIPCLOUD
3.1 Topology Construction
Generally speaking, SkipCloud organizes all brokers into
levels of clusters. As shown in Fig. 2, the clusters at each
level of SkipCloud can be treated as a partition of the whole
broker set. Table 1 shows key notations used in this section.
At the top level, brokers are organized into multiple
clusters whose topologies are complete graphs. Each clus-
ter at this level is called top cluster. It contains a leader broker
which generates a unique b-ary identifier with length of m
using a hash function (e.g. MD-5). This identifier is called
ClusterID. Correspondingly, each broker’s identifier is a
unique string with length of m þ 1 and shares common pre-
fix of length m with its ClusterID. At this level, brokers in
the same cluster are responsible for the same content sub-
spaces, which provides multiple matching candidates for
each event. Since brokers in the same top cluster generate
frequent communication among themselves, such as updat-
ing subscriptions and dispatching events, they are orga-
nized into a complete graph to reach each other in one hop.
After the top clusters have been well organized, the clus-
ters at the rest levels can be generated level by level. Specifi-
cally, each broker decides to join which cluster at every level.
The brokers whose identifiers share the common prefix with
length i would join the same cluster at level i, and the com-
mon prefix is referred to as the ClusterID at level i. That is,
clusters at level i þ 1 can be regarded as a b-partition of the
clusters at level i. Thus, the number of clusters reduces line-
arly with the decreasing of levels. Let  be the empty identi-
fier. All brokers at level 0 join one single cluster, called global
cluster. Therefore, there are bi
clusters at level i. Fig. 2 shows
an example of how SkipCloud organizes eight brokers into
three levels of clusters by binary identifers.
To organize clusters of the non-top levels, we employ a
light-weighted peer sampling protocol based on Cyclon
[19], which provides robust connectivity of each cluster.
Suppose there are m levels in SkipCloud. Specifically,
each cluster runs Cyclon to keep reliable connectivity
among brokers in the cluster. Since each broker falls into
a cluster at each level, it maintains m neighbor lists. For a
neighbor list at the cluster of level i, it samples equal
number of neighbors from their corresponding children
clusters at level i þ 1. This ensures that the routing from
the bottom level can always find a broker pointing to a
higher level. Because brokers maintain levels of neighbor
lists, they update the neighbor list of one level and the
subsequent ones in a ring to reduce the traffic cost. The
pseudo-code of view maintenance algorithm is show in
Algorithm 1. This topology of the multi-level neighbor
lists is similar to Tapestry [20]. Compared with Tapestry,
SkipCloud uses multiple brokers in top clusters as targets
to ensure reliable routing.
Algorithm 1. Neighbor List Maintenance
Input: views: the neighbor lists.
m: the total number of levels in SkipCloud.
cycle: current cycle.
1: j ¼ cycle%ðm þ 1Þ;
2: for each i in [0,m-1] do
3: update views[i] by the peer sampling
service based on Cyclon.
4: for each i in [0,m-1] do
5: if views[i] contains empty slots then
6: fill these empty slots with other levels’
items who share common prefix of length i
with the ClusterID of views[i].
3.2 Prefix Routing
Prefix routing in SkipCloud is mainly used to efficiently
route subscriptions and events to the top clusters. Note that
the cluster identifiers at level i þ 1 are generated by append-
ing one b-ary to the corresponding clusters at level i. The
relation of identifiers between clusters is the foundation of
routing to target clusters. Briefly, when receiving a routing
TABLE 1
Notations in SkipCloud
Nb the number of brokers in SkipCloud
m the number of levels in SkipCloud
Dc the average degree in each cluster of SkipCloud
Nc the number of top clusters in SkipCloud
Fig. 2. An example of SkipCloud with eight brokers and three levels.
Each broker is identified by a binary string.
MA ET AL.: A SCALABLE AND RELIABLE MATCHING SERVICE FOR CONTENT-BASED PUBLISH/SUBSCRIBE SYSTEMS 3
request to a specific cluster, a broker examines its neighbor
lists of all levels and chooses the neighbor which shares the
longest common prefix with the target ClusterID as the next
hop. The routing operation repeats until a broker can not
find a neighbor whose identifier is more closer than itself.
Algorithm 2 describes the prefix routing algorithm in
pseudo-code.
Algorithm 2. Prefix Routing
1: l ¼ commonPrefixLength(self.ID, event.ClusterID);
2: if (l ¼¼ m) then
3: process(event);
4: else
5: destB the broker whose identifier matches
event.ClusterID with the longest common prefix
from self:views.
6: lmax ¼ commonPrefixLength(destB.identifier,
event.ClusterID);
7: if (lmax l) then
8: destB the broker whose identifier is
closest to event.ClusterID from view½lŠ.
9: if (destB and myself is in the same cluster of
level l) then
10: process(event);
11: forwardTo(destB, event);
Since a neighbor list in the cluster at level i is a uniform
sampling from the corresponding children clusters at level
i þ 1, each broker can find a neighbor whose identifier
matches at least one more longer common prefix with the
target identifier before reaching the target cluster. Therefore,
the prefix routing in Algorithm 2 guarantees that any top
cluster will be reached in at most logbNc hops, where Nc is
the the number of top clusters. Assume the average degree
of brokers in each cluster is Dc. Thus, each broker only needs
to keep Dc à logbNc neighbors.
4 HPARTITION
In order to take advantage of multiple distributed brokers,
SREM divides the entire content space among the top clus-
ters of SkipCloud, so that each top cluster only handles a
subset of the entire space and searches a small number of
candidate subscriptions. SREM employs a hybrid multi-
dimensional space partitioning technique, called HPartition, to
achieve scalable and reliable event matching. Generally
speaking, HPartition divides the entire content space into
disjoint subspaces (Section 4.1). Subscriptions and events
with overlapping subspaces are dispatched and matched
on the same top cluster of SkipCloud (Sections 4.2 and 4.3).
To keep workload balance among servers, HPartition
divides the hot spots into multiple cold spots in an adaptive
manner (Section 4.4). Table 2 shows key notations used in
this section.
4.1 Logical Space Construction
Our idea of logical space construction is inspired by existing
single-dimensional space partitioning (called SPartition) and
all-dimensional space partitioning (called APartition). Let k
be the number of dimensions in the entire space V and Ri
be the ordered set of values of the dimension Ai. Each Ri is
split into Nseg continuous and disjoint segments fRj
i; j ¼
1; 2; . . . ; Nsegg, where j is the segment identifier (called
SegID) of the dimension Ai.
The basic idea of SPartition like BlueDove [15] is to treat
each dimension as a separated space, as show in Fig. 3a.
Specifically, the range of each dimension is divided into
Nseg segments, each of which is regarded as a separated sub-
space. Thus, the entire space V is divided into k à Nseg sub-
spaces. Subscriptions and events falling into the same
subspace are matched against each other. Due to the coarse-
grained partitioning of SPartition, each subscription falls
into a small number of subspaces, which brings multiple
candidate brokers for each event and low memory over-
head. On the other hand, SPartition may form a hot spot
easily if a large number of subscribers are interested in the
same range of a dimension.
On the other hand, the idea of APartition like SEMAS
[16] is to treat the combination of all dimensions as a sepa-
rate space, as shown in Fig. 3b. Formally, the whole space V
is partitioned into ðNsegÞk
subspaces. Compared with SParti-
tion, APartition leads to a smaller number of hot spots, since
a hot subspace would be formed only if its all segments are
subscribed by a large number of subscriptions. On the other
hand, each event in APartition only has one candidate bro-
ker. Compared with SPartition, each subscription in AParti-
tion falls into more subspaces, which leads to a higher
memory cost.
Inspired by both SPartition and APartition, HPartition
provides a flexible manner to construct logical space. Its
basic idea is to divide all dimensions into a number of
groups and treat each group as a separate space, as shown
in Fig. 3c. Formally, k dimensions in HPartition are classi-
fied into t groups, each of which contains ki dimensions,
TABLE 2
Notations in HPartition
V the entire content space
k the number of dimensions of V
Ai the dimension i, i 2 ½1; kŠ
Ri the range of Ai
Pi the predicate of Ai
Nseg the number of segments on Ri
N
0
seg
the number of segments of hot spots
a the minimum size of the hot cluster
Nsub the number of subscriptions
Gx1ÁÁÁxk
the subspace with SubspaceID x1 Á Á Á xk
Fig. 3. Comparison among three different logical space construction
techniques.
4 IEEE TRANSACTIONS ON CLOUD COMPUTING, VOL. 3, NO. 1, JANUARY-MARCH 2015
where
Pt
i¼1 ki ¼ k. That is, V is split into t separated spaces
fVi; i 2 ½1; tŠg. HPartition adopts APartition to divide each
Vi. Thus, V is divided into
Pt
i¼1ðNsegÞki
subspaces. For each
space Vi, let Gx1ÁÁÁxk
; xi 2 ½0; NsegŠ be its subspace, where
x1 Á Á Á xk is the concatenation of SegIDs of the subspace,
called SubspaceID. For the dimensions that a subspace does
not contain, the corresponding positions of the SubspaceID
are set to 0.
Each subspace of HPartition is managed by a top cluster
of SkipCloud. To dispatch each subspace Gx1ÁÁÁxk
to its corre-
sponding top cluster of SkipCloud, HPartition uses a hash
function like MurMurHash [21] to map each SubspaceID
x1 Á Á Á xk to a b-ary identifier with length of m, represented
by Hðx1 Á Á Á xkÞ. According to the prefix routing in Algo-
rithm 2, each subspace will be forwarded to the top cluster
whose ClusterID is nearest to Hðx1 Á Á Á xkÞ. Note that the
brokers in the same top cluster are in charge of the same set
of subspaces, which ensures the reliability of each subspace.
4.2 Subscription Installation
A user can specify a subscription by defining the ranges over
one or more dimensions. We treat S ¼ ^k
i¼1Pi as a general
form of subscriptions, where Pi is a predicate of the dimen-
sion Ai. If S does not contain a dimension Ai, the correspond-
ing Pi is the entire range of Ai. When receiving a subscription
S, the broker first obtains all subspaces Gx1ÁÁÁxk
which are
overlapping with S. Based on the logical space construction
of HPartition, all dimensions are classified into t individual
spaces Vj; j 2 ½1; tŠ. For each Gx1ÁÁÁxk
of Vi, it satisfies
xi ¼
s; if Ai 6¼ Ri; Rs
i  Pi 6¼ ;;
0; if Ai ¼ Ri:
:

(1)
As shown in in Fig. 4, V consists of two groups, and
each range is divided into two segments. For a subscrip-
tion S1 ¼ ð20 A1 80Þ ^ ð80 A2 100Þ ^ ð105 A3
170Þ ^ ð92 A4 115Þ, its matched subspaces are G1200,
G1100 and G0022.
For the subscription S1, its every matched subspace
Gx1ÁÁÁxk
is hashed to a b-ary identifier with length of m, repre-
sented by Hðx1 Á Á Á xkÞ. Thus, S1 is forwarded to the top clus-
ter whose ClusterID is nearest to Hðx1 Á Á Á xkÞ. When a
broker in the top cluster receives the subscription S1, it
broadcasts S1 to other brokers in the same cluster, such that
the top cluster provides reliable event matching and bal-
anced workloads among its brokers.
4.3 Event Assignment and Forwarding Strategy
Upon receiving an event, the broker forwards this event to
its corresponding subspaces. Specifically, considering an
event e ¼ ^k
i¼1ðAi; viÞ. Each subspace Gx1ÁÁÁxk
that e falls into
should satisfy
xi 2 f0; sg; vi 2 Rs
i : (2)
For instance, let e be ðA1; 30Þ ^ ðA2; 100Þ ^ ðA3; 80Þ^ ðA4; 80Þ.
Based on the logical partitioning in Fig. 4, the matched sub-
spaces of e are G1200, G0011.
Recall that HPartition divides the whole space V into t
separated spaces fVig, which indicates there are t candidate
subspaces fGi; 1 i tg for each event. Because of the dif-
ferent workloads, the strategy of how to select one candi-
date subspace for each event may greatly affect the
matching rate. For an event e, suppose each candidate sub-
space Gi contains Ni subscriptions. The most intuitive
approach is the least first strategy, which chooses the sub-
space with least number of candidate subscriptions. Corre-
spondingly, its average searching subscription amount
is Nleast ¼ mini2½1;tŠNi. However, when a large number of
skewed events fall into the same subspace, this approach
may lead to unbalanced workloads among brokers. Another
is the random strategy, which selects each candidate sub-
space with equal probability. It ensures balanced workloads
among brokers. However, the average searching subscrip-
tion amount increases to Navg ¼ 1
t
Pt
i¼1 Ni. In HPartition, we
adopt a probability based forwarding strategy to dispatch
events. Formally, the probability of forwarding e to the
space fVig is pi ¼ ð1 À Ni=
Pt
i¼1 NiÞ=ðt À 1Þ. Thus, the aver-
age searching subscription amount is Nprob ¼ ðN À
Pt
i¼1
N2
i
N Þðt À 1Þ. Note that when every subspace has the same
size, Nprob reaches its maximal value, i.e., Nprob Navg.
Therefore, this approach has better workload balance than
the least first strategy and less searching latency than the
random strategy.
4.4 Hot Spots Alleviation
In real applications, a large number of subscriptions fall into
a few number of subspaces which are called hot spots. The
matching in hot spots leads to a high searching latency. To
alleviate the hot spots, an obvious approach is to add
brokers to the top clusters containing hot spots. However,
this approach raises the total price of the service and wastes
resources of the clusters only containing cold spots.
To utilize distributed multiple clusters, a better solution
is to balance the workloads among clusters through parti-
tioning and migrating hot spots. The gain of the partitioning
technique is greatly affected by the distribution of subscrip-
tions of the hot spot. To this end, HPartition divides each
hot spot into a number of cold spots through two partition-
ing techniques: hierarchical subspace partitioning and sub-
scription set partitioning. The first aims to partition the hot
Fig. 4. An example of subscription assignment.
MA ET AL.: A SCALABLE AND RELIABLE MATCHING SERVICE FOR CONTENT-BASED PUBLISH/SUBSCRIBE SYSTEMS 5
spots where the subscriptions are diffused among the whole
space, and the second aims to partition the hot spots where
the subscriptions fall into a narrow space.
4.4.1 Hierarchical Subspace Partitioning
To alleviate the hot spots whose subscriptions diffuse in its
whole space, we propose a hierarchical subspace partitioning
technique, called HSPartition. Its basic idea is to divide the
space of a hot spot into a number of smaller subspaces and
reassign its subscriptions into these new generated subspaces.
Step one, each hot spot is divided along with its ranges
of dimensions. Assume Vi contains ki dimensions, and
one of its hot spots is Gx1ÁÁÁxk
. For each xi 6¼ 0, the corre-
sponding range Rxi
i is divided into N
0
seg segments, each of
which is denoted by Rxiyi
ð1 xi Ns; 1 yi N
0
segÞ.
Therefore, Gx1ÁÁÁxk
is divided into ðN
0
segÞki
subspaces, each of
which is represented by Gx1ÁÁÁxky1ÁÁÁyk
.
Step two, subscriptions in Gx1ÁÁÁxk
are reassigned to
these new subspaces according to the subscription instal-
lation in Section 4.2. Because each subspace occupies a
small piece of the space of Gx1ÁÁÁxk
, it means a subscription
S of Gx1ÁÁÁxk
may fall into a part of these subspaces, which
will reduce the number of subscriptions that each event
matches against on the hot spot. HSPartition enables the
hot spots to be divided into much smaller subspaces itera-
tively. As shown in the left top corner Fig. 5, the hot spot
G102 is divided into four smaller subspaces by HSParti-
tion, and the maximum number of subscriptions in this
space decreases from 9 to 4.
To avoid concurrency partitioning one hot spot by multi-
ple brokers in the same top cluster, the leader broker of
each top cluster is responsible for periodically checking and
partitioning hot spots. When a broker receives a subscrip-
tion, it hierarchically divides the space of the subscription
until each subspace can not be divided further. Therefore,
the leader broker that is in charge of dividing a hot spot
Gx1ÁÁÁxk
disseminates a update message including a three-
tuple {“Subspace Partition”, x1 Á Á Á xk, N
0
s} to all the other
brokers. Because the global cluster at level 0 of SkipCloud
organizes all brokers into a random graph, we can utilize a
gossip-based multicast method [22] to disseminate the
update message to other brokers in OðlogNbÞ hops, where
Nb is the total broker size.
4.4.2 Subscription Set Partitioning
To alleviate the hot spots whose subscriptions fall into a nar-
row space, we propose a subscription set partitioning, called
SSPartition. Its basic idea is to divide the subscriptions set of
a hot spot into a number of subsets and scatter these subsets
to multiple top clusters. Assume Gx1ÁÁÁxk
is a hot spot and N0
is the size of each subscription subset.
Step one, divide the subscription set of Gx1ÁÁÁxk
into n
subsets, where n ¼ jGx1ÁÁÁxk
j=N0. Each subset is assigned a
new SubspaceID x1 Á Á Á xk À i, ð1 i jGx1ÁÁÁxk
j=N0Þ, and
the subscription with identifier SubID would be dis-
patched to the subset Gx1ÁÁÁxkÀi if HðSubIDÞ%n ¼ i.
Step two, each subset Gx1ÁÁÁxkÀi is forwarded to its corre-
sponding top cluster with ClusterID Hðx1 Á Á Á xk À iÞ. Since
these non-overlapping subsets are scattered to multiple top
clusters, it allows events falling into the hot spots to be
matched in parallel multiple brokers, which brings much
lower matching latency. Similar to the process of HSParti-
tion, the leader broker who is responsible for dividing a hot
spot Gx1ÁÁÁxk
disseminates a three-tuple {“Subscription Parti-
tion”, x1 Á Á Á xk, n} to all the other brokers to support further
partitioning operations. As shown in the right bottom cor-
ner of Fig. 5, the hot spot G2100 into three smaller subsets
G2100À1; G2100À2 and G2100À3 by SSPartition. Correspond-
ingly, the maximal number of subscriptions of the subspace
decreases from 11 to 4.
4.4.3 Adaptive Selection Algorithm
Because of diverse distributions of subscriptions, both
HSPartition and SSPartition can not replace with each other.
On one hand, HSPartition is attractive to divide the hot
spots whose subscriptions are uniform scattered regions.
However, it’s inappropriate to divide the hot spots whose
subscriptions all appear at the same exact point. On the
other hand, SSPartition allows to divide any kind of hot
spots into multiple subsets even if all subscriptions
falls into the same single point. However, compared with
HSPartition, it has to dispatch an event to multiple subspa-
ces, which brings a higher traffic overhead.
To achieve balanced workloads among brokers, we pro-
pose an adaptive selection algorithm to select either HSPar-
tition or SSPartition to alleviate hot spots. The selection is
based on the similarity of subscriptions in the same hot
spot. Specifically, assume Gx1ÁÁÁxky1ÁÁÁyk
is the subspace with
maximal size of subscriptions in HSPartition. a is a thresh-
old value, which represents the similarity degree of sub-
scriptions’ spaces in a hot spot. We choose HSPartition as
the partitioning algorithm if jGx1ÁÁÁxky1ÁÁÁyk
j=jGx1ÁÁÁxk
j  a. Oth-
erwise, we choose SSPartition. Through combining both
partitioning techniques, this selection algorithm can allevi-
ate hot spots in an adaptive manner.
4.5 Performance Analysis
4.5.1 The Average Searching Size
Since each event is matched in one of its candidate subspa-
ces, the average number of subscriptions of all subspaces,
called the average searching size, is critical to reduce the
matching latency. We give the formal analysis as follows.
Theorem 1. Suppose the percentage of each predicate’s range of
each subscription is , then the average searching size Nprob in
HPartition is not more than Nsub
t
Pt
i¼1 ki if Nseg ! 1, where t
is the number of groups in the entire space, ki is the number of
Fig. 5. An example of dividing hot spots. Two hot spots of G1200 and G2100
are divided HSPartition and SSPartition, respectively.
6 IEEE TRANSACTIONS ON CLOUD COMPUTING, VOL. 3, NO. 1, JANUARY-MARCH 2015
dimensions in each group, and Nsub is the number of
subscriptions.
Proof. For each dimension Ai; i 2 ½1; kŠ, the range of Ai is Ri,
the length of corresponding predicate Pi is kRik. Since
each dimension is divided into Nseg segments, the length
of each segment is kRik=Nseg. Thus, the expected number
of segments that Pi falls into is dNsege. For a space Vi, it
contains ðNsegÞki
subspaces. Then the average searching
size in Vi is Ni ¼
dNsegeki Nsub
ðNsegÞki
. When Nseg ! 1, we have
dNsege
Nseg
¼  , and Ni ¼ ki Nsub. According to the probabil-
ity based forwarding strategy in Section 4.3, the average
searching size is Nprob
1
t
Pt
i¼1 Ni. That is, Nprob
Nsub
t
Pt
i¼1 ki . tu
According to the result of Theorem 1, the average search-
ing size Nprob decreases with the reduction of . That is,
smaller  brings less matching time in HPartition. Fortu-
nately, the subscriptions distribution in real world applica-
tions is often skewed, and most predicates occupy small
ranges, which guarantees small average searching size.
Besides, note that Nsub
t
Pt
i¼1 ki reaches its minimal value
Nsub
k
t if ki ¼ kj, where i 2 ½1; tŠ and j 2 ½1; tŠ. It indicates
that the upper bound of Nprob can further decrease to Nsub
k
t
if each group has the same number of dimensions.
4.5.2 Event Matching Reliability
To achieve reliable event matching, each event should have
multiple candidates in the system. In this section, we give a
formal analysis of the event matching availability as follows.
Theorem 2. SREM promises event matching availability in the
face of concurrent crash failure of up to dNtopð1 À e
À t
Ntop Þ À 1
brokers, where d is the number of brokers in each top cluster of
SkipCloud, Ntop is the number of top clusters, and t is the
number of groups in HPartition.
Proof. Based on HPartition, each event has t candidate sub-
spaces which are diffused into Ntop top clusters. Over
n boxes, distributing m balls at random, the expectation
of the number of empty boxes is neÀm
n . Similarly, over
Ntop top clusters, distributing t candidate subspaces at
random, the expectation of the number of non-empty top
clusters is Ntopð1 À e
À t
Ntop Þ. Since each top cluster contains
d brokers which manages the same set of subspaces, the
expectation of the number of non-empty brokers for each
event is dNtopð1 À e
À t
Ntop Þ. Thus, SREM ensures available
event matching in the face of concurrent crash failure of
up to dNtopð1 À e
À t
Ntop Þ À 1 brokers. tu
According to Theorem 2, the event matching availabil-
ity of SREM is affected by d and t. That is, both Skip-
Cloud and HPartition provide flexible schemes to ensure
reliable event matching.
5 PEER MANAGEMENT
In SREM, there are mainly three roles: clients, brokers,
and clusters. Brokers are responsible for managing all of
them. Since the joining or leaving of these roles may
lead to inefficient and unreliable data dissemination, we
will discuss the dynamics maintenance mechanisms used
by brokers in this section.
5.1 Subscriber Dynamics
To detect the status of subscribers, each subscriber estab-
lishes affinity with a broker (called home broker), and period-
ically sends its subscription as a heartbeat message to its
home broker. The home broker maintains a timer for its
every buffered subscription. If the broker has not received a
heartbeat message from a subscriber over Tout time, the sub-
scriber is supposed to be offline. Next, the home broker
removes this subscription from its buffer and notifies the
brokers containing the failed subscription to remove it.
5.2 Broker Dynamics
Broker dynamics may lead to new clusters joining or old
clusters leaving. In this section, we mainly consider the
brokers joining/leaving from existing clusters, rather than
the changing of the cluster size.
When a new broker is generated by its datacenter man-
agement service, it firstly sends a “Broker Join” message to
the leader broker in its top cluster. The leader broker returns
back its top cluster identifier, neighbor lists of all levels of
SkipCloud, and all subspaces including the corresponding
subscriptions. The new broker generates its own identifier
by adding a b-ary number to its top cluster identifier and
takes the received items of each level as its initial neighbors.
There is no particular mechanism to handle broker
departure from a cluster. In the top cluster, its leader broker
can easily monitor the status of other brokers. For the clus-
ters of the rest levels, the sampling service guarantees that
the older items of each neighbor list are prior to be replaced
by fresh ones during the view shuffling operation, which
makes the failed brokers be removed from the system
quickly. From the perspective of event matching, all brokers
in the same top cluster have the same subspaces of subscrip-
tions, which indicates that broker failure would not inter-
rupt the event matching operation if there is at least one
broker alive in each cluster.
5.3 Cluster Dynamics
Brokers dynamics may lead to new clusters joining or old
clusters leaving. Since each subspace is managed by the top
cluster whose identifier is closest to that of the subspace, it’s
necessary to adaptively migrate a number of old clusters to
the new joining clusters. Specifically, the leader broker of
the new cluster delivers its top ClusterID carried on a
“Cluster Join” message to other clusters. The leader brokers
in all other clusters find out the subspaces whose identifiers
are closer to the new ClusterID than their own cluster iden-
tifiers, and migrate them to the new cluster.
Since each subspace is stored in one cluster, the cluster
departure incurs subscription loss. The peer sampling ser-
vice of SkipCloud can be used to detect failed clusters. To
recover lost subscriptions, a simple method is to redirect the
lost subscriptions by their owners’ heartbeat messages. Due
to the unreliable links between subscribers and brokers, this
approach may lead to long repair latency. To this end, we
MA ET AL.: A SCALABLE AND RELIABLE MATCHING SERVICE FOR CONTENT-BASED PUBLISH/SUBSCRIBE SYSTEMS 7
store all subscriptions into a number of well-known servers
of the datacenters. When these servers obtain the failed clus-
ters, they dispatch the subscriptions in these failed clusters
to the corresponding live clusters.
Besides, the number of levels m in SREM is adjusted
adaptively with the change of the broker size Nb to ensure
m ¼ dlogbðNbÞe, where b is the number of children clusters
of each non-top cluster. Each leader broker runs a gossip-
based aggregation service [23] at the global cluster to esti-
mate the total broker size. When the estimated broker size
Ne
d is enough to change m, each leader broker notifies other
brokers to update their neighbor lists through a gossip-
based multicast with a probability of 1=Ne
d. If a number of
leader brokers disseminate the “Update Neighbor Lists”
messages simultaneously, the earlier messages will be
stopped when it collides with later ones, which ensures all
leader brokers have the same view of m. When m decreases,
each broker removes its neighbor list of the top level
directly. When m increases, a new top level is initialized by
filling appropriate neighbors from the neighbor lists at other
levels. Due to the logarithmic relation of m and Nb, only sig-
nificant change of Nb can change m. So, the level dynamics
brings quite low traffic cost.
6 EXPERIMENT
6.1 Implementation
To take advantage of the reliable links and high bandwidth
among servers of the cloud computing environment, we
choose the CloudStack [24] testbed to design and implement
our prototype. To develop the prototype as modular and
portable, we use ICE [25] as the fundamental communica-
tion platform. ICE provides a communication solution that
is easy to program with, and allows the developers to only
focus on their application logic. Based on ICE, we add about
11,000 lines of Java code.
To evaluate the performance of SkipCloud, we imple-
ment both SkipCloud and Chord to forward subscriptions
and messages. To evaluate the performance of HPartition,
the prototype supports different space partitioning policies.
Moreover, the prototype provides three different message
forwarding strategies, i.e, least subscription amount for-
warding, random forwarding, and probability based for-
warding mentioned in Section 4.3.
6.2 Parameters and Metrics
We use a group of virtual machines (VMs) in the
CloudStack testbed to evaluate the performance of SREM.
Each VM is running in a exclusive physical machine, and
we use 64 VMs as brokers. For each VM, it is equipped with
four processor cores, 8 GB memory, and is connected to
Gigabit Ethernet switches.
In SkipCloud, the number of brokers in each top cluster d
is set to 2. To ensure reliable connectivity of clusters, the
average degree of brokers in each cluster Dc is 6. In HParti-
tion, the entire subscription space consists of eight dimen-
sions, each of which has a range from 0 to 500. The range of
each dimension is cut into 10 segments by default. For each
hotspot, the range of its every dimension is cut into two seg-
ments iteratively.
In most experiments, 40; 000 subscriptions are generated
and dispatched to their corresponding brokers. The ranges
of predicates of subscriptions follow a normal distribution
with standard deviation of 50, represented by ssub. Besides,
there are two million events generated by all brokers. For
the events, the value of each pair follows a uniform distri-
bution along the entire range of the corresponding dimen-
sion. A subspace is labeled as a hotspot if its subscription
amount is over 1,000. Besides, the size of basic subscription
subset N0 in Section 4.4.2 is set to 500, and the threshold
value a in Section 4.4.3 is set to 0.7. The detail of default
parameters is shown in Table 3.
Recall that HPartition divides all dimensions into t indi-
vidual spaces Vi, each of which contains ki dimensions. We
set ki to be 1, 2, and 4, respectively. These three partitioning
policies are called HPartition-1, HPartition-2, and HParti-
tion-4, respectively. Note that HPartition-1 represents the
single-dimensional partitioning (mentioned in Section 4.1),
which is adopted by BlueDove [15]. The details of imple-
mented methods are shown in Table 4, where SREM-ki and
Chord-ki represent HPartition-ki under SkipCloud and
Chord, respectively. In the following experiments, we eval-
uate the performance of SkipCloud through comparing
SREM-ki with Chord-ki, and evaluate the performance of
HPartition through comparing HPartition-4 and HParti-
tion-2 with the single-dimensional partitioning in Blue-
Dove, i.e., HPartition-1. Besides, we do not use APartition
(mentioned in Section 4.1) in the experiments, which is
mainly because its fine-grained partitioning technique leads
to extremely high memory cost.
We evaluate the performance of SREM through a number
of metrics.
 Subscription searching size: The number of subscrip-
tions that need to be searched on each broker for
matching a message.
 Matching rate: The number of matched events per
second. Suppose the first event is matched at the
moment of T1, and the last one is matched at the
moment of T2. Thus, the matching rate is Ne
T2ÀT1
.
 Event loss rate: The percentage of lost events in a
specified time period.
TABLE 4
List of Implemented Methods
Name Overlay Partitioning Technique
SREM-4 SkipCloud HPartition-4
SREM-2 SkipCloud HPartition-2
SREM-1 SkipCloud HPartition-1 (SPartition, BlueDove)
Chord-4 Chord HPartition-4
Chord-2 Chord HPartition-2
Chord-1 Chord HPartition-1 (SPartition, BlueDove)
TABLE 3
Default Parameters in SREM
Parameter Nb d Dc k ki Nseg
Value 64 2 6 8 f1; 2; 4g 10
Parameter N
0
seg
Nsub ssub Nhot N0 b
Value 2 40K 50 1; 000 500 0.7
8 IEEE TRANSACTIONS ON CLOUD COMPUTING, VOL. 3, NO. 1, JANUARY-MARCH 2015
6.3 Space Partitioning Policy
The space partitioning policy determines the number of
searching subscriptions for matching a message. It fur-
ther affects the matching rate and workload allocation
among brokers. In this section, we test three different
space partitioning policies : HPartition-1, HPartition-2,
and HPartition-4.
Fig. 6a shows the cumulative distribution function (CDF)
of subscription searching sizes on all brokers. Most subscrip-
tion searching sizes of HPartition-1 are smaller than the
threshold value of Nhot. In Fig. 6a, the percentage of “cold”
subspaces in HPartition-4, HPartition-2, and HPartition-1 is
99.7, 95.1 and 83.9 percent. As HPartition-4 splits the entire
space into more fine-grained subspaces, subscriptions fall
into the same subspace if they are interested in the same
range of four dimensions. In contrast, since HPartition-1 or
HPartition-2 splits the entire space into less fine-grained sub-
spaces, subscriptions are dispatched to the same subspace
with a higher probability. Besides, the CDF of each approach
shows a sharp increase at the subscription searching size of
500, which is caused by SSPartition in Section 4.4.2.
Fig. 6b shows the average subscription searching size of
each broker. The maximal average subscription sizes of
HPartition-4, HPartition-2, and HPartition-1 are 437, 666
and 980, respectively. Their corresponding normalized stan-
dard deviations (standard deviation divided by the aver-
age) of the average subscription sizes of each approach are
0.026, 0.057 and 0.093, respectively. It indicates that brokers
of HPartition-4 have less subscription searching size and
better workload balance, which mainly lies in its more fine-
grained space partitioning.
In conclusion, with the increasing of the partitioning
granularity, HPartition shows better workload balance
among brokers at the cost of the growth of the memory cost.
6.4 Forwarding Policy and Workload Balance
6.4.1 Impact of Message Forwarding Policy
The forwarding policy determines each message to be for-
warded to which broker, which significantly affects the
workload balance among brokers. Fig. 7 shows the matching
rates of three policies: probability based forwarding, least
subscription amount forwarding and random forwarding.
As shown in Fig. 7 the probability based forwarding pol-
icy has highest matching rate in various scenarios. This
mainly lies in its better trade-off between workload balance
and searching latency. For instance, when we use HParti-
tion-4 under SkipCloud, the matching rate of the probability
based policy are 1:27Â and 1:51Â that of random policy and
least subscription amount forwarding policy, respectively.
When we use HPartition-4 under Chord, the corresponding
gains of the probability based policy become 1:26Â and
1:57Â, respectively.
6.4.2 Workload Balance of Brokers
We compare the workload balance of brokers under differ-
ent partitioning strategies and overlays. In the experiment,
we use the probability based forwarding policy to dispatch
events. One million events are forwarded and matched
against 40;000 subscriptions in the experiments. We use an
index of bðNbÞ [26] to evaluate the load balance among
brokers, where bðNbÞ ¼ ð
PNb
i¼1 LoadiÞ2
=ðNb
PNb
i¼1 Load2
i Þ, and
Loadi is the workload of each broker. The value of bðNbÞ is
between 0 and 1, and the higher value means better work-
load balance among brokers.
Fig. 9a shows the distributions of the number of for-
warding events among brokers. The forwarding event size
of SREM-ki is less than that of Chord-ki, which is because
SkipCloud spends less routing hops than Chord. The corre-
sponding values of bðNbÞ of SREM-4, SREM-2, SREM-1,
Chord-4, Chord-2 and Chord-1 are 0.99, 0.96, 0.98, 0.99,
0.99 and 0.98 respectively. These values indicate quite bal-
anced workloads of forwarding events among brokers.
Besides, the number of forwarding events in SREM is at
least 60 percent smaller than in Chord. Fig. 9b shows the
distributions of matching rates among brokers. Their corre-
sponding values of bðNbÞ are 0.98, 0.99, 0.99, 0.99, 0.97 and
0.89, respectively. The balanced matching rates among
brokers is mainly caused by the fine-grained partitioning
of HPartition and the probability based forwarding policy.
6.5 Scalability
In this section, we evaluate the scalability of all approaches
through measuring the changing of matching rate with dif-
ferent values of Nb, Nsub, k and Nseg. In each experiment, only
one parameter of Table 3 is changed to validate its impact.
We first change the number of brokers Nb. As shown in
Fig. 10a, the matching rate of each approach increases line-
arly with the growth of Nb. As Nb increases from 4 to 64, the
gain of the matching rate of SREM-4, SREM-2, SREM-1,
Chord-4, Chord-2 and Chord-1 is 12:8Â, 11:2Â, 9:6Â, 10:0Â,
8:4Â and 5:8Â, respectively. Compared with Chord-ki, the
higher increasing rate of SREM-ki is mainly caused by the
smaller routing hops of SkipCloud. Besides, SREM-4
presents the highest matching rate with various values of
Nb due to the fine-grained partitioning technique of HParti-
tion and the lower forwarding hops of SkipCloud.
We change the number of subscriptions Nsub in Fig. 10b.
As Nsub increases from 40K to 80K, the matching rate of
Fig. 6. Distribution of subscription searching sizes.
Fig. 7. Forwarding policy.
MA ET AL.: A SCALABLE AND RELIABLE MATCHING SERVICE FOR CONTENT-BASED PUBLISH/SUBSCRIBE SYSTEMS 9
SREM-4, SREM-2, SREM-1, Chord-4, Chord-2 and Chord-1
decreases 61.2, 58.7, 39.3, 51.5, 51.0 and 38.6 percent, respec-
tively. Compared with SREM-1, each subscription in SREM-
4 may fall into more subspaces due to its fine-grained parti-
tioning, which leads to a higher increasing rate of the aver-
age subscription searching size in SREM-4. That’s why the
decreasing percentages of the matching rates in SREM-ki
and Chord-ki increase with the growth of ki. In spite of the
higher decreasing percentage, the matching rates of SREM-
4 and Chord-4 are 27:4Â and 33:7Â that of SREM-1 and
Chord-1 respectively, when Nsub equals to 80;000.
We increase the number of dimensions k from 8 to 20 in
Fig. 10c. The matching rate of SREM-4, SREM-2, SREM-1,
Chord-4, Chord-2 and Chord-1 decreases 83.4, 31.3, 25.7,
81.2, 44.0 and 15.0 percent, respectively. Similar to the phe-
nomenon of Fig. 10b, the decreasing percentages of the
matching rates in SREM-ki and Chord-ki increase with the
growth of ki. This is because the fine-grained partitioning of
HPartition-4 leads to faster growing of the average subscrip-
tion searching size. When k equals to 20, the matching rates
of SREM-4 and Chord-4 are 8:7Â and 9:4Â that of SREM-1
and Chord-1, respectively.
We evaluate the impact of different number of segments
Nseg in Fig. 10d. As Nseg increases from 4 to 10, The corre-
sponding matching rates of SREM-4, SREM-2, SREM-1,
Chord-4, Chord-2 and Chord-1 increases 82.7, 179.1, 44.7,
98.3, 286.1 and 48.4 percent, respectively. These numbers
indicate that the bigger Nseg brings more fine-grained parti-
tioning and smaller average subscription searching size.
In conclusion, the matching rate of SREM shows a linear
increasing capacity with the growing Nb, and SREM-4
presents highest matching rate in various scenarios due to
its more fine-grained partitioning technique and less rout-
ing hops of SkipCloud.
6.6 Reliability
In this section, we evaluate the reliability of SREM-4 by test-
ing its ability to recover from server failures. During the
experiment, 64 brokers totally generate 120 million messages
and dispatch them to their corresponding subspaces. After
10 seconds, a part of brokers are shut down simultaneously.
Fig. 11a shows the changing of event loss rates of SREM
when four, eight 16 and 32 brokers fail. From the moment
when brokers fail, the corresponding event loss rates
increase to 3, 8, 18 and 37 percent respectively in 10 seconds,
and drop back to 0 in 20 seconds. This recovery ability
mainly lies in the quick failure detection of the peer sam-
pling service mentioned in Section 3.1 and the quick reas-
signing lost subscriptions by the well-known servers
mentioned in Section 5.3. Note that the maximal event loss
rate in each case is less than the percentage of lost brokers.
This is because each top cluster of SkipCloud initially has
two brokers, and an event will not be dropped if it is dis-
patched to an alive broker of its top cluster.
Fig. 11b shows the changing of the matching rates when a
number of brokers fail. When a number of brokers fail at
the moment of 10 second, there is an apparent dropping
of the matching rate in the following tens of seconds. Note
that the dropping interval increases with the number of
failed brokers, which is because the failure of more brokers
leads to a higher latency of detecting these brokers and
recovering the lost subscriptions. After the handling of
failed brokers, the matching rate of each situation increases
to a higher level. It indicates SREM can function normally
even if a large number of broker fail simultaneously.
In conclusion, the peer sampling service of SkipCloud
ensures continuous matching service even if a large number
of brokers fails simultaneously. Through buffering sub-
scriptions in the well-known servers, failed subscriptions
can be dispatched to the corresponding alive brokers in tens
of seconds.
Fig. 9. Workload balance. Fig. 10. Scalability.
Fig. 8. The impact of workload characteristics.
10 IEEE TRANSACTIONS ON CLOUD COMPUTING, VOL. 3, NO. 1, JANUARY-MARCH 2015
6.7 Workload Characteristics
In this section, we evaluate how workload characteristics
affect the matching rate of each approach.
First, we evaluate the influence of different subscription
distribution through changing the standard deviation ssub
from 50 to 200. Fig. 8a shows the matching rate of each
approach decreases with the increasing of ssub. As ssub
increases, the predicates of each subscription occupy larger
ranges of the corresponding dimensions, which causes that
the subscription is dispatched to more subspaces.
We then evaluate the performance of each approach
under different event distributions. In this experiment,
the standard deviation of the event distribution se
changes from 50 to 200. Note that skewed events lead to
two scenarios. One is the events are skewed in the same
way as subscriptions. Thus, the hot events coincide with
the hot spots, which severely hurts the matching rate. The
other is the events are skewed in the opposite way as sub-
scriptions, which benefits the matching rate. Fig. 8b
shows how the adverse skewed events affect the perfor-
mance of each approach. As se reduces from 200 to 50,
the matching rate of SREM-4, SREM-2, SREM-1, Chord-4,
Chord-2 and Chord-1 decreases 79.4, 63.2, 47.0, 77.2, 59.1
and 52.6 percent, respectively. Although the matching
rate of SREM-4 decreases greatly as se reduces, it is still
higher than other approaches under different se.
We evaluate the performance of each approach under
different combinations of dimensions (called subscription
patterns) in Fig. 8c. Until now, all above mentioned experi-
ments generate subscriptions by one subscription pattern,
where each predicate of the dimension follows the normal
distribution. We uniformly select a group of patterns to
generate subscriptions. For the predicates that a subscrip-
tion does not contain, their ranges are the whole ranges of
the corresponding dimensions. Fig. 8c shows the matching
rate decreases with the number of subscription patterns. As
the number of subscription patterns grows from 1 to 16, the
matching rate of SREM-4, SREM-2, SREM-1, Chord-4,
Chord-2 and Chord-1 decreases 86.7, 63.8, 2.3, 90.3, 65.9
and 10.3 percent, respectively. The sharp decline of the
matching rate of SREM-4 and Chord-4 is mainly caused by
the quick increasing of the average searching size. How-
ever, the matching rates of SREM-4 and Chord-4 are still
much higher than that of the other approaches.
In conclusion, the matching throughput of each approach
decreases greatly as the skewness of subscriptions or events
increases. Compared with other approaches, the fine-
grained partitioning of SREM-4 ensures higher matching
throughput under various parameter settings.
6.8 Memory Overhead
Storing subscriptions is main memory overhead of each bro-
ker. Since each subscription may fall into the subspaces that
are managed by the same broker, each broker only needs to
store a real subscription and a group of its identifications to
reduce the memory overhead. We use the average number
of subscription identifications that each broker stores, noted
by Nsid, as a criterion. Recall that a predicate specifies a con-
tinuous range of a dimension. We restrict a continuous
range by two numerals, each of which occupies 8 bytes.
Suppose that each subscription identification uses 8 bytes.
Thus, the maximal average memory overhead of managing
subscriptions on each broker is 16kNsub þ 8Nsid, where k is
the number of dimensions, and Nsub is the total number of
subscriptions. As mentioned in Section 4.1, HPartition-ki
brings large memory cost and partitioning latency as ki
increases. When we test the performance of HPartition-8,
the Java heap space is out of memory.
We first evaluate the average memory cost of brokers
with the changing of Nsub in Fig. 12a. The value of Nsid of
each partitioning technique increases linearly with the
growth of Nsub. As Nsub increases from 40K to 100K, the
value of Nsid in HPartition-4, HPartition-2 and HPartition-1
grows 143, 151 and 150 percent, respectively. HPartition-4
presents the highest Nsid than that of the other approaches,
which is because the number of subspaces that a subscrip-
tion falls into increases with the decreasing room of each
subspace. In spite of the high Nsid in HPartition-4, its maxi-
mal average overhead on each broker only cost 20.2 MB
when Nsub reaches 100K.
We then evaluate how the memory overhead of brokers
changes with different values of ssub. As ssub increases, each
predicate of subscriptions spans a wider range, which leads
to much more memory overhead. As ssub increases from 50
to 200, the value of Nsid in HPartition-4, HPartition-2 and
HPartition-1 grows 86, 60 and 27 percent, respectively.
Unsurprisingly, Nsid in HPartition-4 still has largest value.
When ssub equals to 200, the maximal average memory
overhead of HPartition-4 is 6.2 MB, which brings small
memory overhead to each broker.
In conclusion, the memory overhead of HPartition-ki
increases slowly with the growth of Nsub and ssub, if ki is no
more than 4.
7 RELATED WORK
A large body of efforts on broker based pub/subs have been
proposed in recent years. One method is to organize brokers
into a tree overlay, such that events can be delivered to all
relevant brokers without duplicate transmissions. Besides,
data replication schemes [27] are employed to ensure
Fig. 12. Memory overhead.
Fig. 11. Reliability.
MA ET AL.: A SCALABLE AND RELIABLE MATCHING SERVICE FOR CONTENT-BASED PUBLISH/SUBSCRIBE SYSTEMS 11
reliable event matching. For instance, Siena [3] advertises
subscriptions to the whole network. When receiving an
event, each broker determines to forward the event to the
corresponding broker according to its routing table. In
Atmosphere [4], it dynamically identifies entourages of
publishers and subscribers to transmit events with low
latency. It is appropriate to the scenarios with small-scale of
subscribers. As the number of subscribers increases, the
over-overlays constructed in Atmosphere probably have
the similar latency like in Siena. To ensure reliable routing,
Kazemzadeh and Jacobsen [5] propose a d-fault-tolerance
algorithm to handle concurrent crash failure of up to d
brokers. Brokers are required to maintain a partial view of
this tree that includes all brokers within distance d þ 1. The
multi-hop routing techniques in these tree-based pub/subs
lead to a high routing latency. Besides, skewed subscrip-
tions and events lead to unbalanced workloads among
brokers, which may severely reduces the matching through-
put. In contrast, SREM uses SkipCloud to reduce the routing
latency and HPartition to balance the workloads of brokers.
Another method is to divide brokers into multiple clusters
through unstructured overlays. Brokers in each cluster are
connected through reliable topologies. For instance, brokers
in Kyra [7] is grouped into cliques based on their network
proximity. Each clique divides the whole content space into
non-overlapping zones based on the number of its brokers.
After that, the brokers in different cliques which are respon-
sible for similar zones are connected by a multicast tree.
Thus, events are forwarded through its corresponding multi-
ple tree. Sub-2-Sub [8] implements epidemic-based cluster-
ing to partition all subscriptions into disjoint subspaces. The
nodes in each subspace are organized into a bidirectional
ring. Due to the long delay of routing events in unstructured
overlays, most of these approaches are inadequate to achieve
scalable event matching. In contrast, SREM uses SkipCloud
to organize brokers, which uses the prefix routing technique
to achieve low routing latency.
To reduce the routing hops, a number of methods orga-
nize brokers through structured overlays which commonly
need QðlogNÞ hops to locate a broker. Subscriptions and
events falling into the same subspace are sent and matched
on a rendezvous broker. For instance, PastryString [9] con-
structs a distributed index tree for each dimension to sup-
port both numerical and string dimensions. The resource
discovery service proposed by Ranjan et al. [10] maps
events and subscriptions into d-dimensional indexes, and
hashes these indexes onto a DHT network. To ensure reli-
able pub/sub service, each broker in Meghdoot [11] has a
back up which is used when the primary broker fails. Com-
pared with these DHT-based approaches, SREM ensures
smaller forwarding latency through the prefix routing of
SkipCloud, and higher event matching reliability by multi-
ple brokers in each top cluster of SkipCloud and multiple
candidate groups of HPartition.
Recently, a number of cloud providers have offered a
series of pub/sub services. For instance, Move [14] pro-
vides high available key-value storage and matching
respectively based on one-hop lookup [17]. BlueDove [15]
adopts a single-dimensional partitioning technique to
divide the entire spare and a performance-aware forward-
ing scheme to select candidate matcher for each event. Its
scalability is limited by the coarse-grained clustering tech-
nique. SEMAS [16] proposes a fine-grained partitioning
technique to achieve high matching rate. However, this
partitioning technique only provides one candidate for
each event and may lead to large memory cost as the num-
ber of data dimensions increases. In contrast, HPartition
makes a better trade-off between the matching throughput
and reliability through a flexible manner of constructing
logical space.
8 CONCLUSIONS AND FUTURE WORK
This paper introduces SREM, a scalable and reliable event
matching service for content-based pub/sub systems in
cloud computing environment. SREM connects the brokers
through a distributed overlay SkipCloud, which ensures
reliable connectivity among brokers through its multi-level
clusters and brings a low routing latency through a prefix
routing algorithm. Through a hybrid multi-dimensional
space partitioning technique, SREM reaches scalable and
balanced clustering of high dimensional skewed subscrip-
tions, and each event is allowed to be matched on any of its
candidate servers. Extensive experiments with real deploy-
ment based on a CloudStack testbed are conducted, produc-
ing results which demonstrate that SREM is effective and
practical, and also presents good workload balance, scal-
ability and reliability under various parameter settings.
Although our proposed event matching service can effi-
ciently filter out irrelevant users from big data volume,
there are still a number of problems we need to solve.
Firstly, we do not provide elastic resource provisioning
strategies in this paper to obtain a good performance price
ratio. We plan to design and implement the elastic strategies
of adjusting the scale of servers based on the churn work-
loads. Secondly, it does not guarantee that the brokers dis-
seminate large live content with various data sizes to the
corresponding subscribers in a real-time manner. For the
dissemination of bulk content, the upload capacity becomes
the main bottleneck. Based on our proposed event matching
service, we will consider utilizing a cloud-assisted tech-
nique to realize a general and scalable data dissemination
service over live content with various data sizes.
ACKNOWLEDGMENTS
This work was supported by the National Grand Funda-
mental Research 973 Program of China (Grant No.
2011CB302601), the National Natural Science Foundation of
China (Grant No.61379052), the National High Technology
Research and Development 863 Program of China (Grant
No.2013AA01A213), the Natural Science Foundation for
Distinguished Young Scholars of Hunan Province (Grant
No.14JJ1026), Specialized Research Fund for the Doctoral
Program of Higher Education (Grant No.20124307110015).
Yijie Wang is the corresponding author.
REFERENCES
[1] Dataperminite. (2014). [Online]. Available: http://www.domo.
com/blog/2012/06/how-much-data-is-created-every-minute/
[2] F. Benevenuto, T. Rodrigues, M. Cha, and V. Almeida,
“Characterizing user behavior in online social networks,” in Proc.
9th ACM SIGCOMM Conf. Internet Meas. Conf., 2009, pp. 49–62.
12 IEEE TRANSACTIONS ON CLOUD COMPUTING, VOL. 3, NO. 1, JANUARY-MARCH 2015
[3] A. Carzaniga, “Architectures for an event notification service
scalable to wide-area networks,” Ph.D. dissertation, Ingegneria
Informatica e Automatica, Politecnico di Milano, Milan, Italy, 1998.
[4] P. Eugster and J. Stephen, “Universal cross-cloud communication,”
IEEE Trans. Cloud Comput., vol. 2, no. 2, pp. 103–116, 2014.
[5] R. S. Kazemzadeh and H.-A Jacobsen, “Reliable and highly avail-
able distributed publish/subscribe service,” in Proc. 28th IEEE Int.
Symp. Reliable Distrib. Syst., 2009, pp. 41–50.
[6] Y. Zhao and J. Wu, “Building a reliable and high-performance
content-based publish/subscribe system,” J. Parallel Distrib.
Comput., vol. 73, no. 4, pp. 371–382, 2013.
[7] F. Cao and J. P. Singh, “Efficient event routing in content-based
publish/subscribe service network,” in Proc. IEEE INFOCOM,
2004, pp. 929–940.
[8] S. Voulgaris, E. Riviere, A. Kermarrec, and M. Van Steen, “Sub-2-
sub: Self-organizing content-based publish and subscribe for
dynamic and large scale collaborative networks,” Res. Rep.
RR5772, INRIA, Rennes, France, 2005.
[9] I. Aekaterinidis and P. Triantafillou, “Pastrystrings: A comprehen-
sive content-based publish/subscribe DHT network,” in Proc.
IEEE Int. Conf. Distrib. Comput. Syst., 2006, pp. 23–32.
[10] R. Ranjan, L. Chan, A. Harwood, S. Karunasekera, and R. Buyya,
“Decentralised resource discovery service for large scale federated
grids,” in Proc. IEEE Int. Conf. e-Sci. Grid Comput., 2007, pp. 379–
387.
[11] A. Gupta, O. D. Sahin, D. Agrawal, and A. El Abbadi, “Meghdoot:
Content-based publish/subscribe over p2p networks,” in Proc. 5th
ACM/IFIP/USENIX Int. Conf. Middleware, 2004, pp. 254–273.
[12] X. Lu, H. Wang, J. Wang, J. Xu, and D. Li, “Internet-based virtual
computing environment: Beyond the data center as a computer,”
Future Gener. Comput. Syst., vol. 29, pp. 309–322, 2011.
[13] Y. Wang, X. Li, X. Li, and Y. Wang, “A survey of queries over
uncertain data,” Knowl. Inf. Syst., vol. 37, no. 3, pp. 485–530, 2013.
[14] W. Rao, L. Chen, P. Hui, and S. Tarkoma, “Move: A large scale
keyword-based content filtering and dissemination system,” in
Proc. IEEE 32nd Int. Conf. Distrib. Comput. Syst., 2012, pp. 445–454.
[15] M. Li, F. Ye, M. Kim, H. Chen, and H. Lei, “A scalable and elastic
publish/subscribe service,” in Proc. IEEE Int. Parallel Distrib. Pro-
cess. Symp., 2011, pp. 1254–1265.
[16] X. Ma, Y. Wang, Q. Qiu, W. Sun, and X. Pei, “Scalable and elastic
event matching for attribute-based publish/subscribe systems,”
Future Gener. Comput. Syst., vol. 36, pp. 102–119, 2013.
[17] A. Lakshman and P. Malik, “Cassandra: A decentralized struc-
tured storage system,” Oper. Syst. Rev., vol. 44, no. 2, pp. 35–40,
2010.
[18] M. Sathiamoorthy, M. Asteris, D. Papailiopoulos, A. G. Dimakis,
R. Vadali, S. Chen, and D. Borthakur, “Xoring elephants: Novel
erasure codes for big data,” in Proc. 39th Int. Conf. Very Large Data
Bases, 2013, pp. 325–336.
[19] S. Voulgaris, D. Gavidia, and M. van Steen, “Cyclon: Inexpensive
membership management for unstructured p2p overlays,”
J. Netw. Syst. Manage., vol. 13, no. 2, pp. 197–217, 2005.
[20] B. Y. Zhao, L. Huang, J. Stribling, S. C. Rhea, A. D. Joseph, and
J. D. Kubiatowicz, “Tapestry: A resilient global-scale overlay for
service deployment,” IEEE J. Sel. Areas Commun., vol. 22, no. 1,
pp. 41–53, Jan. 2004.
[21] Murmurhash. (2014). [Online]. Available: http://burtleburtle.
net/bob/hash/doobs.html
[22] A.-M Kermarrec, L. Massoulie, and A. J. Ganesh, “Probabilistic
reliable dissemination in large-scale systems,” IEEE Trans. Parallel
Distrib. Syst., vol. 14, no. 3, pp. 248–258, Mar. 2003.
[23] M. Jelasity, A. Montresor, and €O. Babaoglu, “Gossip-based aggre-
gation in large dynamic networks,” ACM Trans. Comput. Syst.,
vol. 23, no. 3, pp. 219–252, 2005.
[24] Cloudstack. (2014). [Online]. Available: (http://cloudstack.
apache.org/)
[25] Zeroc. (2014). [Online]. Available: (http://www.zeroc.com/)
[26] B. He, D. Sun, and D. P. Agrawal, “Diffusion based distributed
internet gateway load balancing in a wireless mesh network,” in
Proc. IEEE Global Telecommun. Conf., 2009, pp. 1–6.
[27] Y. Wang and S. Li, “Research and performance evaluation of data
replication technology in distributed storage systems,” Comput.
Math. Appl., vol. 51, no. 11, pp. 1625–1632, 2006.
Xingkong Ma received the BS degree in computer
science and technology from the School of Com-
puter of Shandong University, China, in 2007, and
the MS degree in computer science and technol-
ogy from National University of Defense Technol-
ogy, China, in 2009. He is currently working toward
the PhD degree at the School of Computer of
National University of Defense Technology. His
current research interests lie in the areas of data
dissemination, publish/subscribe. He is a student
member of the IEEE.
Yijie Wang received the PhD degree in computer
science and technology from the National Univer-
sity of Defense Technology, China, in 1998. She
is currently a professor at the Science and Tech-
nology on Parallel and Distributed Processing
Laboratory, National University of Defense Tech-
nology. Her research interests include network
computing, massive data processing, and parallel
and distributed processing. She is a member of
the IEEE.
Xiaoqiang Pei received the BS and MS degrees
in computer science and technology from
National University of Defense Technology,
China, in 2009 and 2011, respectively. He is cur-
rently working toward the PhD degree at the
School of Computer of National University of
Defense Technology. His current research inter-
ests lie in the areas of fault-tolerant storage and
cloud computing.
 For more information on this or any other computing topic,
please visit our Digital Library at www.computer.org/publications/dlib.
MA ET AL.: A SCALABLE AND RELIABLE MATCHING SERVICE FOR CONTENT-BASED PUBLISH/SUBSCRIBE SYSTEMS 13

More Related Content

What's hot

IEEE BE-BTECH NS2 PROJECT@ DREAMWEB TECHNO SOLUTION
IEEE BE-BTECH NS2 PROJECT@ DREAMWEB TECHNO SOLUTIONIEEE BE-BTECH NS2 PROJECT@ DREAMWEB TECHNO SOLUTION
IEEE BE-BTECH NS2 PROJECT@ DREAMWEB TECHNO SOLUTIONranjith kumar
 
IRJET- HHH- A Hyped-up Handling of Hadoop based SAMR-MST for DDOS Attacks...
IRJET-  	  HHH- A Hyped-up Handling of Hadoop based SAMR-MST for DDOS Attacks...IRJET-  	  HHH- A Hyped-up Handling of Hadoop based SAMR-MST for DDOS Attacks...
IRJET- HHH- A Hyped-up Handling of Hadoop based SAMR-MST for DDOS Attacks...IRJET Journal
 
A Distributed Approach to Solving Overlay Mismatching Problem
A Distributed Approach to Solving Overlay Mismatching ProblemA Distributed Approach to Solving Overlay Mismatching Problem
A Distributed Approach to Solving Overlay Mismatching ProblemZhenyun Zhuang
 
Hybrid Periodical Flooding in Unstructured Peer-to-Peer Networks
Hybrid Periodical Flooding in Unstructured Peer-to-Peer NetworksHybrid Periodical Flooding in Unstructured Peer-to-Peer Networks
Hybrid Periodical Flooding in Unstructured Peer-to-Peer NetworksZhenyun Zhuang
 
Dynamic Layer Management in Super-Peer Architectures
Dynamic Layer Management in Super-Peer ArchitecturesDynamic Layer Management in Super-Peer Architectures
Dynamic Layer Management in Super-Peer ArchitecturesZhenyun Zhuang
 
Efficient Dynamic Load-Balance Flow Scheduling in cloud for Big Data Centers.
Efficient Dynamic Load-Balance Flow Scheduling in cloud for Big Data Centers.Efficient Dynamic Load-Balance Flow Scheduling in cloud for Big Data Centers.
Efficient Dynamic Load-Balance Flow Scheduling in cloud for Big Data Centers.IRJET Journal
 
Arun prjct dox
Arun prjct doxArun prjct dox
Arun prjct doxBaig Mirza
 
Reactive Stream Processing for Data-centric Publish/Subscribe
Reactive Stream Processing for Data-centric Publish/SubscribeReactive Stream Processing for Data-centric Publish/Subscribe
Reactive Stream Processing for Data-centric Publish/SubscribeSumant Tambe
 
Pournaghshband_End-to-End-Detection-of-Compression-of-Traffic-Flows-by-interm...
Pournaghshband_End-to-End-Detection-of-Compression-of-Traffic-Flows-by-interm...Pournaghshband_End-to-End-Detection-of-Compression-of-Traffic-Flows-by-interm...
Pournaghshband_End-to-End-Detection-of-Compression-of-Traffic-Flows-by-interm...Robert Chang
 
iaetsd Survey on cooperative relay based data transmission
iaetsd Survey on cooperative relay based data transmissioniaetsd Survey on cooperative relay based data transmission
iaetsd Survey on cooperative relay based data transmissionIaetsd Iaetsd
 
Unit 3 cs6601 Distributed Systems
Unit 3 cs6601 Distributed SystemsUnit 3 cs6601 Distributed Systems
Unit 3 cs6601 Distributed SystemsNandakumar P
 
Spatial ReusabilityAware Routing in Multi-Hop Wireless Networks Using Single ...
Spatial ReusabilityAware Routing in Multi-Hop Wireless Networks Using Single ...Spatial ReusabilityAware Routing in Multi-Hop Wireless Networks Using Single ...
Spatial ReusabilityAware Routing in Multi-Hop Wireless Networks Using Single ...dbpublications
 
F233842
F233842F233842
F233842irjes
 
Data Dissemination in VANET: A Review
Data Dissemination in VANET: A ReviewData Dissemination in VANET: A Review
Data Dissemination in VANET: A ReviewIRJET Journal
 
A Review on Congestion Control Approaches for Real-Time Streaming Application...
A Review on Congestion Control Approaches for Real-Time Streaming Application...A Review on Congestion Control Approaches for Real-Time Streaming Application...
A Review on Congestion Control Approaches for Real-Time Streaming Application...IJCSIS Research Publications
 
ROUTING PROTOCOLS FOR DELAY TOLERANT NETWORKS: SURVEY AND PERFORMANCE EVALUATION
ROUTING PROTOCOLS FOR DELAY TOLERANT NETWORKS: SURVEY AND PERFORMANCE EVALUATIONROUTING PROTOCOLS FOR DELAY TOLERANT NETWORKS: SURVEY AND PERFORMANCE EVALUATION
ROUTING PROTOCOLS FOR DELAY TOLERANT NETWORKS: SURVEY AND PERFORMANCE EVALUATIONijwmn
 
Multicasting in Delay Tolerant Networks: Implementation and Performance Analysis
Multicasting in Delay Tolerant Networks: Implementation and Performance AnalysisMulticasting in Delay Tolerant Networks: Implementation and Performance Analysis
Multicasting in Delay Tolerant Networks: Implementation and Performance AnalysisNagendra Posani
 

What's hot (18)

IEEE BE-BTECH NS2 PROJECT@ DREAMWEB TECHNO SOLUTION
IEEE BE-BTECH NS2 PROJECT@ DREAMWEB TECHNO SOLUTIONIEEE BE-BTECH NS2 PROJECT@ DREAMWEB TECHNO SOLUTION
IEEE BE-BTECH NS2 PROJECT@ DREAMWEB TECHNO SOLUTION
 
IRJET- HHH- A Hyped-up Handling of Hadoop based SAMR-MST for DDOS Attacks...
IRJET-  	  HHH- A Hyped-up Handling of Hadoop based SAMR-MST for DDOS Attacks...IRJET-  	  HHH- A Hyped-up Handling of Hadoop based SAMR-MST for DDOS Attacks...
IRJET- HHH- A Hyped-up Handling of Hadoop based SAMR-MST for DDOS Attacks...
 
Mn3422372248
Mn3422372248Mn3422372248
Mn3422372248
 
A Distributed Approach to Solving Overlay Mismatching Problem
A Distributed Approach to Solving Overlay Mismatching ProblemA Distributed Approach to Solving Overlay Mismatching Problem
A Distributed Approach to Solving Overlay Mismatching Problem
 
Hybrid Periodical Flooding in Unstructured Peer-to-Peer Networks
Hybrid Periodical Flooding in Unstructured Peer-to-Peer NetworksHybrid Periodical Flooding in Unstructured Peer-to-Peer Networks
Hybrid Periodical Flooding in Unstructured Peer-to-Peer Networks
 
Dynamic Layer Management in Super-Peer Architectures
Dynamic Layer Management in Super-Peer ArchitecturesDynamic Layer Management in Super-Peer Architectures
Dynamic Layer Management in Super-Peer Architectures
 
Efficient Dynamic Load-Balance Flow Scheduling in cloud for Big Data Centers.
Efficient Dynamic Load-Balance Flow Scheduling in cloud for Big Data Centers.Efficient Dynamic Load-Balance Flow Scheduling in cloud for Big Data Centers.
Efficient Dynamic Load-Balance Flow Scheduling in cloud for Big Data Centers.
 
Arun prjct dox
Arun prjct doxArun prjct dox
Arun prjct dox
 
Reactive Stream Processing for Data-centric Publish/Subscribe
Reactive Stream Processing for Data-centric Publish/SubscribeReactive Stream Processing for Data-centric Publish/Subscribe
Reactive Stream Processing for Data-centric Publish/Subscribe
 
Pournaghshband_End-to-End-Detection-of-Compression-of-Traffic-Flows-by-interm...
Pournaghshband_End-to-End-Detection-of-Compression-of-Traffic-Flows-by-interm...Pournaghshband_End-to-End-Detection-of-Compression-of-Traffic-Flows-by-interm...
Pournaghshband_End-to-End-Detection-of-Compression-of-Traffic-Flows-by-interm...
 
iaetsd Survey on cooperative relay based data transmission
iaetsd Survey on cooperative relay based data transmissioniaetsd Survey on cooperative relay based data transmission
iaetsd Survey on cooperative relay based data transmission
 
Unit 3 cs6601 Distributed Systems
Unit 3 cs6601 Distributed SystemsUnit 3 cs6601 Distributed Systems
Unit 3 cs6601 Distributed Systems
 
Spatial ReusabilityAware Routing in Multi-Hop Wireless Networks Using Single ...
Spatial ReusabilityAware Routing in Multi-Hop Wireless Networks Using Single ...Spatial ReusabilityAware Routing in Multi-Hop Wireless Networks Using Single ...
Spatial ReusabilityAware Routing in Multi-Hop Wireless Networks Using Single ...
 
F233842
F233842F233842
F233842
 
Data Dissemination in VANET: A Review
Data Dissemination in VANET: A ReviewData Dissemination in VANET: A Review
Data Dissemination in VANET: A Review
 
A Review on Congestion Control Approaches for Real-Time Streaming Application...
A Review on Congestion Control Approaches for Real-Time Streaming Application...A Review on Congestion Control Approaches for Real-Time Streaming Application...
A Review on Congestion Control Approaches for Real-Time Streaming Application...
 
ROUTING PROTOCOLS FOR DELAY TOLERANT NETWORKS: SURVEY AND PERFORMANCE EVALUATION
ROUTING PROTOCOLS FOR DELAY TOLERANT NETWORKS: SURVEY AND PERFORMANCE EVALUATIONROUTING PROTOCOLS FOR DELAY TOLERANT NETWORKS: SURVEY AND PERFORMANCE EVALUATION
ROUTING PROTOCOLS FOR DELAY TOLERANT NETWORKS: SURVEY AND PERFORMANCE EVALUATION
 
Multicasting in Delay Tolerant Networks: Implementation and Performance Analysis
Multicasting in Delay Tolerant Networks: Implementation and Performance AnalysisMulticasting in Delay Tolerant Networks: Implementation and Performance Analysis
Multicasting in Delay Tolerant Networks: Implementation and Performance Analysis
 

Viewers also liked

Godatu.com - The One Stop for Dance Studio near you and Dancers across the W...
 Godatu.com - The One Stop for Dance Studio near you and Dancers across the W... Godatu.com - The One Stop for Dance Studio near you and Dancers across the W...
Godatu.com - The One Stop for Dance Studio near you and Dancers across the W...Ankit Pahuja
 
новости. 16 сентября день здоровья.
новости. 16 сентября   день здоровья.новости. 16 сентября   день здоровья.
новости. 16 сентября день здоровья.virtualtaganrog
 
5 verrassende presentatietips2016
5 verrassende presentatietips20165 verrassende presentatietips2016
5 verrassende presentatietips2016Jannie Jansen
 
Voorbeeld concreet leerdoel 2016
Voorbeeld concreet leerdoel 2016Voorbeeld concreet leerdoel 2016
Voorbeeld concreet leerdoel 2016Jannie Jansen
 
Keynote blended learning vives
Keynote blended learning vivesKeynote blended learning vives
Keynote blended learning vivesWilfredRubens.com
 
Political science part viii
Political science part viiiPolitical science part viii
Political science part viiiAlona Salva
 
A secure anti collusion data sharing scheme for dynamic groups in the cloud
A secure anti collusion data sharing scheme for dynamic groups in the cloudA secure anti collusion data sharing scheme for dynamic groups in the cloud
A secure anti collusion data sharing scheme for dynamic groups in the cloudNinad Samel
 

Viewers also liked (8)

Presentación adriana
Presentación adrianaPresentación adriana
Presentación adriana
 
Godatu.com - The One Stop for Dance Studio near you and Dancers across the W...
 Godatu.com - The One Stop for Dance Studio near you and Dancers across the W... Godatu.com - The One Stop for Dance Studio near you and Dancers across the W...
Godatu.com - The One Stop for Dance Studio near you and Dancers across the W...
 
новости. 16 сентября день здоровья.
новости. 16 сентября   день здоровья.новости. 16 сентября   день здоровья.
новости. 16 сентября день здоровья.
 
5 verrassende presentatietips2016
5 verrassende presentatietips20165 verrassende presentatietips2016
5 verrassende presentatietips2016
 
Voorbeeld concreet leerdoel 2016
Voorbeeld concreet leerdoel 2016Voorbeeld concreet leerdoel 2016
Voorbeeld concreet leerdoel 2016
 
Keynote blended learning vives
Keynote blended learning vivesKeynote blended learning vives
Keynote blended learning vives
 
Political science part viii
Political science part viiiPolitical science part viii
Political science part viii
 
A secure anti collusion data sharing scheme for dynamic groups in the cloud
A secure anti collusion data sharing scheme for dynamic groups in the cloudA secure anti collusion data sharing scheme for dynamic groups in the cloud
A secure anti collusion data sharing scheme for dynamic groups in the cloud
 

Similar to A scalable and reliable matching service for

A SCALABLE AND RELIABLE MATCHING SERVICE FOR CONTENT-BASED PUBLISH/SUBSCRIBE ...
A SCALABLE AND RELIABLE MATCHING SERVICE FOR CONTENT-BASED PUBLISH/SUBSCRIBE ...A SCALABLE AND RELIABLE MATCHING SERVICE FOR CONTENT-BASED PUBLISH/SUBSCRIBE ...
A SCALABLE AND RELIABLE MATCHING SERVICE FOR CONTENT-BASED PUBLISH/SUBSCRIBE ...I3E Technologies
 
A scheme for maximal resource
A scheme for maximal resourceA scheme for maximal resource
A scheme for maximal resourceIJCNCJournal
 
A Study on Replication and Failover Cluster to Maximize System Uptime
A Study on Replication and Failover Cluster to Maximize System UptimeA Study on Replication and Failover Cluster to Maximize System Uptime
A Study on Replication and Failover Cluster to Maximize System UptimeYogeshIJTSRD
 
Classroom Shared Whiteboard System using Multicast Protocol
Classroom Shared Whiteboard System using Multicast ProtocolClassroom Shared Whiteboard System using Multicast Protocol
Classroom Shared Whiteboard System using Multicast Protocolijtsrd
 
Comparison of Open-Source Data Stream Processing Engines: Spark Streaming, Fl...
Comparison of Open-Source Data Stream Processing Engines: Spark Streaming, Fl...Comparison of Open-Source Data Stream Processing Engines: Spark Streaming, Fl...
Comparison of Open-Source Data Stream Processing Engines: Spark Streaming, Fl...Darshan Gorasiya
 
Conference Paper: CHASE: Component High-Availability Scheduler in Cloud Compu...
Conference Paper: CHASE: Component High-Availability Scheduler in Cloud Compu...Conference Paper: CHASE: Component High-Availability Scheduler in Cloud Compu...
Conference Paper: CHASE: Component High-Availability Scheduler in Cloud Compu...Ericsson
 
Cataloging Of Sessions in Genuine Traffic by Packet Size Distribution and Ses...
Cataloging Of Sessions in Genuine Traffic by Packet Size Distribution and Ses...Cataloging Of Sessions in Genuine Traffic by Packet Size Distribution and Ses...
Cataloging Of Sessions in Genuine Traffic by Packet Size Distribution and Ses...IOSR Journals
 
DLTSR_A_Deep_Learning_Framework_for_Recommendations_of_Long-Tail_Web_Services...
DLTSR_A_Deep_Learning_Framework_for_Recommendations_of_Long-Tail_Web_Services...DLTSR_A_Deep_Learning_Framework_for_Recommendations_of_Long-Tail_Web_Services...
DLTSR_A_Deep_Learning_Framework_for_Recommendations_of_Long-Tail_Web_Services...NAbderrahim
 
International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)inventionjournals
 
dynamic resource allocation using virtual machines for cloud computing enviro...
dynamic resource allocation using virtual machines for cloud computing enviro...dynamic resource allocation using virtual machines for cloud computing enviro...
dynamic resource allocation using virtual machines for cloud computing enviro...Kumar Goud
 
Ieee projects-2014-bulk-ieee-projects-2015-title-list-for-me-be-mphil-final-y...
Ieee projects-2014-bulk-ieee-projects-2015-title-list-for-me-be-mphil-final-y...Ieee projects-2014-bulk-ieee-projects-2015-title-list-for-me-be-mphil-final-y...
Ieee projects-2014-bulk-ieee-projects-2015-title-list-for-me-be-mphil-final-y...birdsking
 
Data Distribution Handling on Cloud for Deployment of Big Data
Data Distribution Handling on Cloud for Deployment of Big DataData Distribution Handling on Cloud for Deployment of Big Data
Data Distribution Handling on Cloud for Deployment of Big Dataneirew J
 
Data Distribution Handling on Cloud for Deployment of Big Data
Data Distribution Handling on Cloud for Deployment of Big DataData Distribution Handling on Cloud for Deployment of Big Data
Data Distribution Handling on Cloud for Deployment of Big Dataijccsa
 
A Component-Based Approach For Service Distribution In Sensor Networks
A Component-Based Approach For Service Distribution In Sensor NetworksA Component-Based Approach For Service Distribution In Sensor Networks
A Component-Based Approach For Service Distribution In Sensor NetworksKim Daniels
 
Cooperative Architectures and Algorithms for Discovery and ...
Cooperative Architectures and Algorithms for Discovery and ...Cooperative Architectures and Algorithms for Discovery and ...
Cooperative Architectures and Algorithms for Discovery and ...Videoguy
 
Edge device multi-unicasting for video streaming
Edge device multi-unicasting for video streamingEdge device multi-unicasting for video streaming
Edge device multi-unicasting for video streamingTal Lavian Ph.D.
 

Similar to A scalable and reliable matching service for (20)

A SCALABLE AND RELIABLE MATCHING SERVICE FOR CONTENT-BASED PUBLISH/SUBSCRIBE ...
A SCALABLE AND RELIABLE MATCHING SERVICE FOR CONTENT-BASED PUBLISH/SUBSCRIBE ...A SCALABLE AND RELIABLE MATCHING SERVICE FOR CONTENT-BASED PUBLISH/SUBSCRIBE ...
A SCALABLE AND RELIABLE MATCHING SERVICE FOR CONTENT-BASED PUBLISH/SUBSCRIBE ...
 
A scheme for maximal resource
A scheme for maximal resourceA scheme for maximal resource
A scheme for maximal resource
 
A Study on Replication and Failover Cluster to Maximize System Uptime
A Study on Replication and Failover Cluster to Maximize System UptimeA Study on Replication and Failover Cluster to Maximize System Uptime
A Study on Replication and Failover Cluster to Maximize System Uptime
 
Classroom Shared Whiteboard System using Multicast Protocol
Classroom Shared Whiteboard System using Multicast ProtocolClassroom Shared Whiteboard System using Multicast Protocol
Classroom Shared Whiteboard System using Multicast Protocol
 
B03410609
B03410609B03410609
B03410609
 
Comparison of Open-Source Data Stream Processing Engines: Spark Streaming, Fl...
Comparison of Open-Source Data Stream Processing Engines: Spark Streaming, Fl...Comparison of Open-Source Data Stream Processing Engines: Spark Streaming, Fl...
Comparison of Open-Source Data Stream Processing Engines: Spark Streaming, Fl...
 
Conference Paper: CHASE: Component High-Availability Scheduler in Cloud Compu...
Conference Paper: CHASE: Component High-Availability Scheduler in Cloud Compu...Conference Paper: CHASE: Component High-Availability Scheduler in Cloud Compu...
Conference Paper: CHASE: Component High-Availability Scheduler in Cloud Compu...
 
Ax34298305
Ax34298305Ax34298305
Ax34298305
 
Cataloging Of Sessions in Genuine Traffic by Packet Size Distribution and Ses...
Cataloging Of Sessions in Genuine Traffic by Packet Size Distribution and Ses...Cataloging Of Sessions in Genuine Traffic by Packet Size Distribution and Ses...
Cataloging Of Sessions in Genuine Traffic by Packet Size Distribution and Ses...
 
DLTSR_A_Deep_Learning_Framework_for_Recommendations_of_Long-Tail_Web_Services...
DLTSR_A_Deep_Learning_Framework_for_Recommendations_of_Long-Tail_Web_Services...DLTSR_A_Deep_Learning_Framework_for_Recommendations_of_Long-Tail_Web_Services...
DLTSR_A_Deep_Learning_Framework_for_Recommendations_of_Long-Tail_Web_Services...
 
International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)
 
dynamic resource allocation using virtual machines for cloud computing enviro...
dynamic resource allocation using virtual machines for cloud computing enviro...dynamic resource allocation using virtual machines for cloud computing enviro...
dynamic resource allocation using virtual machines for cloud computing enviro...
 
Distributed computing
Distributed computingDistributed computing
Distributed computing
 
Ieee projects-2014-bulk-ieee-projects-2015-title-list-for-me-be-mphil-final-y...
Ieee projects-2014-bulk-ieee-projects-2015-title-list-for-me-be-mphil-final-y...Ieee projects-2014-bulk-ieee-projects-2015-title-list-for-me-be-mphil-final-y...
Ieee projects-2014-bulk-ieee-projects-2015-title-list-for-me-be-mphil-final-y...
 
Data Distribution Handling on Cloud for Deployment of Big Data
Data Distribution Handling on Cloud for Deployment of Big DataData Distribution Handling on Cloud for Deployment of Big Data
Data Distribution Handling on Cloud for Deployment of Big Data
 
Data Distribution Handling on Cloud for Deployment of Big Data
Data Distribution Handling on Cloud for Deployment of Big DataData Distribution Handling on Cloud for Deployment of Big Data
Data Distribution Handling on Cloud for Deployment of Big Data
 
A Component-Based Approach For Service Distribution In Sensor Networks
A Component-Based Approach For Service Distribution In Sensor NetworksA Component-Based Approach For Service Distribution In Sensor Networks
A Component-Based Approach For Service Distribution In Sensor Networks
 
Cooperative Architectures and Algorithms for Discovery and ...
Cooperative Architectures and Algorithms for Discovery and ...Cooperative Architectures and Algorithms for Discovery and ...
Cooperative Architectures and Algorithms for Discovery and ...
 
Edge device multi-unicasting for video streaming
Edge device multi-unicasting for video streamingEdge device multi-unicasting for video streaming
Edge device multi-unicasting for video streaming
 
A 01
A 01A 01
A 01
 

Recently uploaded

Deconstructing Gendered Language; Feminist World-Making 2024
Deconstructing Gendered Language; Feminist World-Making 2024Deconstructing Gendered Language; Feminist World-Making 2024
Deconstructing Gendered Language; Feminist World-Making 2024samlnance
 
Jeremy Casson - How Painstaking Restoration Has Revealed the Beauty of an Imp...
Jeremy Casson - How Painstaking Restoration Has Revealed the Beauty of an Imp...Jeremy Casson - How Painstaking Restoration Has Revealed the Beauty of an Imp...
Jeremy Casson - How Painstaking Restoration Has Revealed the Beauty of an Imp...Jeremy Casson
 
Islamabad Escorts # 03080115551 # Escorts in Islamabad || Call Girls in Islam...
Islamabad Escorts # 03080115551 # Escorts in Islamabad || Call Girls in Islam...Islamabad Escorts # 03080115551 # Escorts in Islamabad || Call Girls in Islam...
Islamabad Escorts # 03080115551 # Escorts in Islamabad || Call Girls in Islam...wdefrd
 
Young⚡Call Girls in Lajpat Nagar Delhi >༒9667401043 Escort Service
Young⚡Call Girls in Lajpat Nagar Delhi >༒9667401043 Escort ServiceYoung⚡Call Girls in Lajpat Nagar Delhi >༒9667401043 Escort Service
Young⚡Call Girls in Lajpat Nagar Delhi >༒9667401043 Escort Servicesonnydelhi1992
 
Lucknow 💋 Call Girls in Lucknow | Service-oriented sexy call girls 8923113531...
Lucknow 💋 Call Girls in Lucknow | Service-oriented sexy call girls 8923113531...Lucknow 💋 Call Girls in Lucknow | Service-oriented sexy call girls 8923113531...
Lucknow 💋 Call Girls in Lucknow | Service-oriented sexy call girls 8923113531...anilsa9823
 
Islamabad Call Girls # 03091665556 # Call Girls in Islamabad | Islamabad Escorts
Islamabad Call Girls # 03091665556 # Call Girls in Islamabad | Islamabad EscortsIslamabad Call Girls # 03091665556 # Call Girls in Islamabad | Islamabad Escorts
Islamabad Call Girls # 03091665556 # Call Girls in Islamabad | Islamabad Escortswdefrd
 
Alex and Chloe by Daniel Johnson Storyboard
Alex and Chloe by Daniel Johnson StoryboardAlex and Chloe by Daniel Johnson Storyboard
Alex and Chloe by Daniel Johnson Storyboardthephillipta
 
Lucknow 💋 Call Girls in Lucknow ₹7.5k Pick Up & Drop With Cash Payment 892311...
Lucknow 💋 Call Girls in Lucknow ₹7.5k Pick Up & Drop With Cash Payment 892311...Lucknow 💋 Call Girls in Lucknow ₹7.5k Pick Up & Drop With Cash Payment 892311...
Lucknow 💋 Call Girls in Lucknow ₹7.5k Pick Up & Drop With Cash Payment 892311...anilsa9823
 
Gomti Nagar & High Profile Call Girls in Lucknow (Adult Only) 8923113531 Esc...
Gomti Nagar & High Profile Call Girls in Lucknow  (Adult Only) 8923113531 Esc...Gomti Nagar & High Profile Call Girls in Lucknow  (Adult Only) 8923113531 Esc...
Gomti Nagar & High Profile Call Girls in Lucknow (Adult Only) 8923113531 Esc...gurkirankumar98700
 
The First Date by Daniel Johnson (Inspired By True Events)
The First Date by Daniel Johnson (Inspired By True Events)The First Date by Daniel Johnson (Inspired By True Events)
The First Date by Daniel Johnson (Inspired By True Events)thephillipta
 
Hazratganj / Call Girl in Lucknow - Phone 🫗 8923113531 ☛ Escorts Service at 6...
Hazratganj / Call Girl in Lucknow - Phone 🫗 8923113531 ☛ Escorts Service at 6...Hazratganj / Call Girl in Lucknow - Phone 🫗 8923113531 ☛ Escorts Service at 6...
Hazratganj / Call Girl in Lucknow - Phone 🫗 8923113531 ☛ Escorts Service at 6...akbard9823
 
Best Call girls in Lucknow - 9548086042 - with hotel room
Best Call girls in Lucknow - 9548086042 - with hotel roomBest Call girls in Lucknow - 9548086042 - with hotel room
Best Call girls in Lucknow - 9548086042 - with hotel roomdiscovermytutordmt
 
Young⚡Call Girls in Uttam Nagar Delhi >༒9667401043 Escort Service
Young⚡Call Girls in Uttam Nagar Delhi >༒9667401043 Escort ServiceYoung⚡Call Girls in Uttam Nagar Delhi >༒9667401043 Escort Service
Young⚡Call Girls in Uttam Nagar Delhi >༒9667401043 Escort Servicesonnydelhi1992
 
Lucknow 💋 Escort Service in Lucknow (Adult Only) 8923113531 Escort Service 2...
Lucknow 💋 Escort Service in Lucknow  (Adult Only) 8923113531 Escort Service 2...Lucknow 💋 Escort Service in Lucknow  (Adult Only) 8923113531 Escort Service 2...
Lucknow 💋 Escort Service in Lucknow (Adult Only) 8923113531 Escort Service 2...anilsa9823
 
Jeremy Casson - Top Tips for Pottery Wheel Throwing
Jeremy Casson - Top Tips for Pottery Wheel ThrowingJeremy Casson - Top Tips for Pottery Wheel Throwing
Jeremy Casson - Top Tips for Pottery Wheel ThrowingJeremy Casson
 
Lucknow 💋 Call Girls Service Lucknow ₹7.5k Pick Up & Drop With Cash Payment 8...
Lucknow 💋 Call Girls Service Lucknow ₹7.5k Pick Up & Drop With Cash Payment 8...Lucknow 💋 Call Girls Service Lucknow ₹7.5k Pick Up & Drop With Cash Payment 8...
Lucknow 💋 Call Girls Service Lucknow ₹7.5k Pick Up & Drop With Cash Payment 8...anilsa9823
 

Recently uploaded (20)

Indian Deira Call Girls # 0522916705 # Indian Call Girls In Deira Dubai || (UAE)
Indian Deira Call Girls # 0522916705 # Indian Call Girls In Deira Dubai || (UAE)Indian Deira Call Girls # 0522916705 # Indian Call Girls In Deira Dubai || (UAE)
Indian Deira Call Girls # 0522916705 # Indian Call Girls In Deira Dubai || (UAE)
 
Deconstructing Gendered Language; Feminist World-Making 2024
Deconstructing Gendered Language; Feminist World-Making 2024Deconstructing Gendered Language; Feminist World-Making 2024
Deconstructing Gendered Language; Feminist World-Making 2024
 
Jeremy Casson - How Painstaking Restoration Has Revealed the Beauty of an Imp...
Jeremy Casson - How Painstaking Restoration Has Revealed the Beauty of an Imp...Jeremy Casson - How Painstaking Restoration Has Revealed the Beauty of an Imp...
Jeremy Casson - How Painstaking Restoration Has Revealed the Beauty of an Imp...
 
Independent UAE Call Girls # 00971528675665 # Independent Call Girls In Dubai...
Independent UAE Call Girls # 00971528675665 # Independent Call Girls In Dubai...Independent UAE Call Girls # 00971528675665 # Independent Call Girls In Dubai...
Independent UAE Call Girls # 00971528675665 # Independent Call Girls In Dubai...
 
Islamabad Escorts # 03080115551 # Escorts in Islamabad || Call Girls in Islam...
Islamabad Escorts # 03080115551 # Escorts in Islamabad || Call Girls in Islam...Islamabad Escorts # 03080115551 # Escorts in Islamabad || Call Girls in Islam...
Islamabad Escorts # 03080115551 # Escorts in Islamabad || Call Girls in Islam...
 
RAJKOT CALL GIRL 76313*77252 CALL GIRL IN RAJKOT
RAJKOT CALL GIRL 76313*77252 CALL GIRL IN RAJKOTRAJKOT CALL GIRL 76313*77252 CALL GIRL IN RAJKOT
RAJKOT CALL GIRL 76313*77252 CALL GIRL IN RAJKOT
 
Young⚡Call Girls in Lajpat Nagar Delhi >༒9667401043 Escort Service
Young⚡Call Girls in Lajpat Nagar Delhi >༒9667401043 Escort ServiceYoung⚡Call Girls in Lajpat Nagar Delhi >༒9667401043 Escort Service
Young⚡Call Girls in Lajpat Nagar Delhi >༒9667401043 Escort Service
 
Lucknow 💋 Call Girls in Lucknow | Service-oriented sexy call girls 8923113531...
Lucknow 💋 Call Girls in Lucknow | Service-oriented sexy call girls 8923113531...Lucknow 💋 Call Girls in Lucknow | Service-oriented sexy call girls 8923113531...
Lucknow 💋 Call Girls in Lucknow | Service-oriented sexy call girls 8923113531...
 
Islamabad Call Girls # 03091665556 # Call Girls in Islamabad | Islamabad Escorts
Islamabad Call Girls # 03091665556 # Call Girls in Islamabad | Islamabad EscortsIslamabad Call Girls # 03091665556 # Call Girls in Islamabad | Islamabad Escorts
Islamabad Call Girls # 03091665556 # Call Girls in Islamabad | Islamabad Escorts
 
Alex and Chloe by Daniel Johnson Storyboard
Alex and Chloe by Daniel Johnson StoryboardAlex and Chloe by Daniel Johnson Storyboard
Alex and Chloe by Daniel Johnson Storyboard
 
Lucknow 💋 Call Girls in Lucknow ₹7.5k Pick Up & Drop With Cash Payment 892311...
Lucknow 💋 Call Girls in Lucknow ₹7.5k Pick Up & Drop With Cash Payment 892311...Lucknow 💋 Call Girls in Lucknow ₹7.5k Pick Up & Drop With Cash Payment 892311...
Lucknow 💋 Call Girls in Lucknow ₹7.5k Pick Up & Drop With Cash Payment 892311...
 
Gomti Nagar & High Profile Call Girls in Lucknow (Adult Only) 8923113531 Esc...
Gomti Nagar & High Profile Call Girls in Lucknow  (Adult Only) 8923113531 Esc...Gomti Nagar & High Profile Call Girls in Lucknow  (Adult Only) 8923113531 Esc...
Gomti Nagar & High Profile Call Girls in Lucknow (Adult Only) 8923113531 Esc...
 
The First Date by Daniel Johnson (Inspired By True Events)
The First Date by Daniel Johnson (Inspired By True Events)The First Date by Daniel Johnson (Inspired By True Events)
The First Date by Daniel Johnson (Inspired By True Events)
 
Hazratganj / Call Girl in Lucknow - Phone 🫗 8923113531 ☛ Escorts Service at 6...
Hazratganj / Call Girl in Lucknow - Phone 🫗 8923113531 ☛ Escorts Service at 6...Hazratganj / Call Girl in Lucknow - Phone 🫗 8923113531 ☛ Escorts Service at 6...
Hazratganj / Call Girl in Lucknow - Phone 🫗 8923113531 ☛ Escorts Service at 6...
 
Best Call girls in Lucknow - 9548086042 - with hotel room
Best Call girls in Lucknow - 9548086042 - with hotel roomBest Call girls in Lucknow - 9548086042 - with hotel room
Best Call girls in Lucknow - 9548086042 - with hotel room
 
Young⚡Call Girls in Uttam Nagar Delhi >༒9667401043 Escort Service
Young⚡Call Girls in Uttam Nagar Delhi >༒9667401043 Escort ServiceYoung⚡Call Girls in Uttam Nagar Delhi >༒9667401043 Escort Service
Young⚡Call Girls in Uttam Nagar Delhi >༒9667401043 Escort Service
 
Lucknow 💋 Escort Service in Lucknow (Adult Only) 8923113531 Escort Service 2...
Lucknow 💋 Escort Service in Lucknow  (Adult Only) 8923113531 Escort Service 2...Lucknow 💋 Escort Service in Lucknow  (Adult Only) 8923113531 Escort Service 2...
Lucknow 💋 Escort Service in Lucknow (Adult Only) 8923113531 Escort Service 2...
 
Pakistani Deira Call Girls # 00971589162217 # Pakistani Call Girls In Deira D...
Pakistani Deira Call Girls # 00971589162217 # Pakistani Call Girls In Deira D...Pakistani Deira Call Girls # 00971589162217 # Pakistani Call Girls In Deira D...
Pakistani Deira Call Girls # 00971589162217 # Pakistani Call Girls In Deira D...
 
Jeremy Casson - Top Tips for Pottery Wheel Throwing
Jeremy Casson - Top Tips for Pottery Wheel ThrowingJeremy Casson - Top Tips for Pottery Wheel Throwing
Jeremy Casson - Top Tips for Pottery Wheel Throwing
 
Lucknow 💋 Call Girls Service Lucknow ₹7.5k Pick Up & Drop With Cash Payment 8...
Lucknow 💋 Call Girls Service Lucknow ₹7.5k Pick Up & Drop With Cash Payment 8...Lucknow 💋 Call Girls Service Lucknow ₹7.5k Pick Up & Drop With Cash Payment 8...
Lucknow 💋 Call Girls Service Lucknow ₹7.5k Pick Up & Drop With Cash Payment 8...
 

A scalable and reliable matching service for

  • 1. A Scalable and Reliable Matching Service for Content-Based Publish/Subscribe Systems Xingkong Ma, Student Member, IEEE, Yijie Wang, Member, IEEE, and Xiaoqiang Pei Abstract—Characterized by the increasing arrival rate of live content, the emergency applications pose a great challenge: how to disseminate large-scale live content to interested users in a scalable and reliable manner. The publish/subscribe (pub/sub) model is widely used for data dissemination because of its capacity of seamlessly expanding the system to massive size. However, most event matching services of existing pub/sub systems either lead to low matching throughput when matching a large number of skewed subscriptions, or interrupt dissemination when a large number of servers fail. The cloud computing provides great opportunities for the requirements of complex computing and reliable communication. In this paper, we propose SREM, a scalable and reliable event matching service for content-based pub/sub systems in cloud computing environment. To achieve low routing latency and reliable links among servers, we propose a distributed overlay SkipCloud to organize servers of SREM. Through a hybrid space partitioning technique HPartition, large-scale skewed subscriptions are mapped into multiple subspaces, which ensures high matching throughput and provides multiple candidate servers for each event. Moreover, a series of dynamics maintenance mechanisms are extensively studied. To evaluate the performance of SREM, 64 servers are deployed and millions of live content items are tested in a CloudStack testbed. Under various parameter settings, the experimental results demonstrate that the traffic overhead of routing events in SkipCloud is at least 60 percent smaller than in Chord overlay, the matching rate in SREM is at least 3.7 times and at most 40.4 times larger than the single-dimensional partitioning technique of BlueDove. Besides, SREM enables the event loss rate to drop back to 0 in tens of seconds even if a large number of servers fail simultaneously. Index Terms—Publish/subscribe, event matching, overlay construction, content space partitioning, cloud computing Ç 1 INTRODUCTION BECAUSE of the importance in helping users to make real- time decisions, data dissemination has become dramati- cally significant in many large-scale emergency applications, such as earthquake monitoring, disaster weather warning, and status update in social networks. Recently, data dissemi- nation in these emergency applications presents a number of fresh trends. One is the rapid growth of live content. For instance, Facebook users publish over 600,000 pieces of con- tent and Twitter users send over 100,000 tweets on average per minute [1]. The other is the highly dynamic network environment. For instance, the measurement studies indi- cates that most users’ sessions in social networks only last several minutes [2]. In emergency scenarios, the sudden dis- asters like earthquake or bad weather may lead to the failure of a large number of users instantaneously. These characteristics require the data dissemination system to be scalable and reliable. Firstly, the system must be scalable to support the large amount of live con- tent. The key is to offer a scalable event matching service to filter out irrelevant users. Otherwise, the content may have to traverse a large number of uninterested users before they reach interested users. Secondly, with the dynamic network environment, it’s quite necessary to provide reliable schemes to keep continuous data dissem- ination capacity. Otherwise, the system interruption may cause the live content becomes obsolete content. Driven by these requirements, publish/subscribe (pub/ sub) pattern is widely used to disseminate data due to its flexibility, scalability, and efficient support of com- plex event processing. In pub/sub systems (pub/subs), a receiver (subscriber) registers its interest in the form of a subscription. Events are published by senders to the pub/ sub system. The system matches events against subscrip- tions and disseminates them to interested subscribers. In traditional data dissemination applications, the live content are generated by publishers at a low speed, which makes many pub/subs adopt the multi-hop routing techni- ques to disseminate events. A large body of broker-based pub/subs forward events and subscriptions through orga- nizing nodes into diverse distributed overlays, such as tree- based design [3], [4], [5], [6], cluster-based design [7], [8] and DHT-based design [9], [10], [11]. However, the multi- hop routing techniques in these broker-based systems lead to a low matching throughput, which is inadequate to apply to current high arrival rate of live content. Recently, cloud computing provides great opportunities for the applications of complex computing and high speed communication [12], [13], where the servers are connected by high speed networks, and have powerful computing and storage capacities. A number of pub/sub services based on the cloud computing environment have been proposed, such as Move [14], BlueDove [15] and SEMAS [16]. However, most of them can not completely meet the requirements of The authors are with the Science and Technology on Parallel and Distrib- uted Processing Laboratory, College of Computer, National University of Defense Technology, Changsha, Hunan 410073, P.R. China. E-mail: {maxingkong, wangyijie, xiaoqiangpei}@nudt.edu.cn. Manuscript received 7 Mar. 2014; revised 22 May 2014; accepted 18 June 2014. Date of publication 28 July 2014; date of current version 11 Mar. 2015. Recommended for acceptance by I. Bojanova, R.C.H. Hua, O. Rana, and M. Parashar. For information on obtaining reprints of this article, please send e-mail to: reprints@ieee.org, and reference the Digital Object Identifier below. Digital Object Identifier no. 10.1109/TCC.2014.2338327 IEEE TRANSACTIONS ON CLOUD COMPUTING, VOL. 3, NO. 1, JANUARY-MARCH 2015 1 2168-7161 ß 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
  • 2. both scalability and reliability when matching large-scale live content under highly dynamic environments. This mainly stems from the following facts: 1) Most of them are inappropriate to the matching of live content with high data dimensionality due to the limitation of their subscription space partitioning techniques, which bring either low match- ing throughput or high memory overhead. 2) These systems adopt the one-hop lookup technique [17] among servers to reduce routing latency. In spite of its high efficiency, it requires each dispatching server to have the same view of matching servers. Otherwise, the subscriptions or events may be assigned to the wrong matching servers, which brings the availability problem in the face of current joining or crash of matching servers. A number of schemes can be used to keep the consistent view, like periodically sending heartbeat messages to dispatching servers or exchanging messages among matching servers. However, these extra schemes may bring a large traffic overhead or the interrup- tion of event matching service. Motivated by these factors, we propose a scalable and reliable matching service for content-based pub/sub service in cloud computing environments, called SREM. Specifi- cally, we mainly focus on two problems: one is how to orga- nize servers in the cloud computing environment to achieve scalable and reliable routing. The other is how to manage subscriptions and events to achieve parallel matching among these servers. Generally speaking, we provide the following contributions: We propose a distributed overlay protocol, called SkipCloud, to organize servers in the cloud comput- ing environment. SkipCloud enables subscriptions and events to be forwarded among brokers in a scal- able and reliable manner. Also it is easy to imple- ment and maintain. To achieve scalable and reliable event matching among multiple servers, we propose a hybrid multi- dimensional space partitioning technique, called HPartition. It allows similar subscriptions to be divided into the same server and provides multiple candidate matching servers for each event. More- over, it adaptively alleviates hot spots and keeps workload balance among all servers. We implement extensive experiments based on a CloudStack testbed to verify the performance of SREM under various parameter settings. The rest of this paper is organized as follows. Section 2 introduces content-based data model and an essential framework of SREM. Section 3 presents the SkipCloud overlay in detail. Section 4 describes HPartition in detail. Section 5 discusses the dynamics maintenance mechanisms of SREM. We evaluate the performance of SREM in Section 6. In Section 7, we review the related work on the matching of existing content-based pub/subs. Finally, we conclude the paper and outline our future work in Section 8. 2 DESIGN OF SREM 2.1 Content-Based Data Model SREM uses a multi-dimensional content-based data model. Consider our data model consists of k dimensions A1; A2; . . . ; Ak. Let Ri be the ordered set of all possible val- ues of Ai. So, V ¼ R1 Â R2 Â Á Á Á Â Rk is the entire content space. A subscription is a conjunction of predicates over one or more dimensions. Each predicate Pi specifies a continu- ous range for a dimension Ai, and it can be described by the tuple ðAi; vi; OiÞ, where vi 2 Ri and Oi represents a rela- tional operator ( ; ; 6¼; !; , etc). The general form of a subscription is S ¼ ^k i¼1Pi. An event is a point within the content space V. It can be represented as k dimension-value pairs, i.e., e ¼ ^k j¼1ðAj; vjÞ. For each pair ðAj; vjÞ, we say it satisfies a predicate ðAi; vi; OiÞ if Aj ¼ Ai and vjOivi. By this definition we say an event e matches S if each predicate of S satisfies some pairs of e. 2.2 Overview of SREM To support large-scale users, we consider a cloud comput- ing environment with a set of geographically distributed datacenters through the Internet. Each datacenter contains a large number of servers (brokers), which are managed by a datacenter management service such as Amazon EC2 or OpenStack. We illustrate a simple overview of SREM in Fig. 1. All brokers in SREM as the front-end are exposed to the Inter- net, and any subscriber and publisher can connect to them directly. To achieve reliable connectivity and low routing latency, these brokers are connected through an distrib- uted overlay, called SkipCloud. The entire content space is partitioned into disjoint subspaces, each of which is man- aged by a number of brokers. Subscriptions and events are dispatched to the subspaces that are overlapping with them through SkipCloud. Thus, subscriptions and events falling into the same subspace are matched on the same broker. After the matching process completes, events are broadcasted to the corresponding interested subscribers. As shown in Fig. 1, the subscriptions generated by sub- scribers S1 and S2 are dispatched to broker B2 and B5, respectively. Upon receiving events from publishers, B2 and B5 will send matched events to S1 and S2, respectively. One may argue that different datacenters are responsi- ble for some subset of the subscriptions according to the geographical location, such that we do not really need much collaboration among the servers [3], [4]. In this case, since the pub/sub system needs to find all the matched subscribers, it requires each event to be matched in all datacenters, which leads to large traffic overhead with the increasing number of datacenters and the increasing arrival rate of live content. Besides, it’s hard to achieve workload balance among the servers of Fig. 1. System framework. 2 IEEE TRANSACTIONS ON CLOUD COMPUTING, VOL. 3, NO. 1, JANUARY-MARCH 2015
  • 3. all datacenters due to the various skewed distributions of users’ interests. Another question is that why we need a distributed overlay like SkipCloud to ensure reliable logical connectivity in datacenter environment where servers are more stable than the peers in P2P networks. This is because as the number of servers increases in datacenters, the node failure becomes normal, but not rare exception [18]. The node failure may lead to unreli- able and inefficient routing among servers. To this end, we try to organize servers into SkipCloud to reduce the routing latency in a scalable and reliable manner. Such a framework offers a number of advantages for real-time and reliable data dissemination. First, it allows the system to timely group similar subscriptions into the same broker due to the high bandwidth among brokers in the cloud computing environment, such that the local searching time can be greatly reduced. This is critical to reach high matching throughput. Second, since each subspace is man- aged by multiple brokers, this framework is fault-tolerant even if a large number of brokers crash instantaneously. Third, because the datacenter management service provides scalable and elastic servers, the system can be easily expanded to Internet-scale. 3 SKIPCLOUD 3.1 Topology Construction Generally speaking, SkipCloud organizes all brokers into levels of clusters. As shown in Fig. 2, the clusters at each level of SkipCloud can be treated as a partition of the whole broker set. Table 1 shows key notations used in this section. At the top level, brokers are organized into multiple clusters whose topologies are complete graphs. Each clus- ter at this level is called top cluster. It contains a leader broker which generates a unique b-ary identifier with length of m using a hash function (e.g. MD-5). This identifier is called ClusterID. Correspondingly, each broker’s identifier is a unique string with length of m þ 1 and shares common pre- fix of length m with its ClusterID. At this level, brokers in the same cluster are responsible for the same content sub- spaces, which provides multiple matching candidates for each event. Since brokers in the same top cluster generate frequent communication among themselves, such as updat- ing subscriptions and dispatching events, they are orga- nized into a complete graph to reach each other in one hop. After the top clusters have been well organized, the clus- ters at the rest levels can be generated level by level. Specifi- cally, each broker decides to join which cluster at every level. The brokers whose identifiers share the common prefix with length i would join the same cluster at level i, and the com- mon prefix is referred to as the ClusterID at level i. That is, clusters at level i þ 1 can be regarded as a b-partition of the clusters at level i. Thus, the number of clusters reduces line- arly with the decreasing of levels. Let be the empty identi- fier. All brokers at level 0 join one single cluster, called global cluster. Therefore, there are bi clusters at level i. Fig. 2 shows an example of how SkipCloud organizes eight brokers into three levels of clusters by binary identifers. To organize clusters of the non-top levels, we employ a light-weighted peer sampling protocol based on Cyclon [19], which provides robust connectivity of each cluster. Suppose there are m levels in SkipCloud. Specifically, each cluster runs Cyclon to keep reliable connectivity among brokers in the cluster. Since each broker falls into a cluster at each level, it maintains m neighbor lists. For a neighbor list at the cluster of level i, it samples equal number of neighbors from their corresponding children clusters at level i þ 1. This ensures that the routing from the bottom level can always find a broker pointing to a higher level. Because brokers maintain levels of neighbor lists, they update the neighbor list of one level and the subsequent ones in a ring to reduce the traffic cost. The pseudo-code of view maintenance algorithm is show in Algorithm 1. This topology of the multi-level neighbor lists is similar to Tapestry [20]. Compared with Tapestry, SkipCloud uses multiple brokers in top clusters as targets to ensure reliable routing. Algorithm 1. Neighbor List Maintenance Input: views: the neighbor lists. m: the total number of levels in SkipCloud. cycle: current cycle. 1: j ¼ cycle%ðm þ 1Þ; 2: for each i in [0,m-1] do 3: update views[i] by the peer sampling service based on Cyclon. 4: for each i in [0,m-1] do 5: if views[i] contains empty slots then 6: fill these empty slots with other levels’ items who share common prefix of length i with the ClusterID of views[i]. 3.2 Prefix Routing Prefix routing in SkipCloud is mainly used to efficiently route subscriptions and events to the top clusters. Note that the cluster identifiers at level i þ 1 are generated by append- ing one b-ary to the corresponding clusters at level i. The relation of identifiers between clusters is the foundation of routing to target clusters. Briefly, when receiving a routing TABLE 1 Notations in SkipCloud Nb the number of brokers in SkipCloud m the number of levels in SkipCloud Dc the average degree in each cluster of SkipCloud Nc the number of top clusters in SkipCloud Fig. 2. An example of SkipCloud with eight brokers and three levels. Each broker is identified by a binary string. MA ET AL.: A SCALABLE AND RELIABLE MATCHING SERVICE FOR CONTENT-BASED PUBLISH/SUBSCRIBE SYSTEMS 3
  • 4. request to a specific cluster, a broker examines its neighbor lists of all levels and chooses the neighbor which shares the longest common prefix with the target ClusterID as the next hop. The routing operation repeats until a broker can not find a neighbor whose identifier is more closer than itself. Algorithm 2 describes the prefix routing algorithm in pseudo-code. Algorithm 2. Prefix Routing 1: l ¼ commonPrefixLength(self.ID, event.ClusterID); 2: if (l ¼¼ m) then 3: process(event); 4: else 5: destB the broker whose identifier matches event.ClusterID with the longest common prefix from self:views. 6: lmax ¼ commonPrefixLength(destB.identifier, event.ClusterID); 7: if (lmax l) then 8: destB the broker whose identifier is closest to event.ClusterID from view½lŠ. 9: if (destB and myself is in the same cluster of level l) then 10: process(event); 11: forwardTo(destB, event); Since a neighbor list in the cluster at level i is a uniform sampling from the corresponding children clusters at level i þ 1, each broker can find a neighbor whose identifier matches at least one more longer common prefix with the target identifier before reaching the target cluster. Therefore, the prefix routing in Algorithm 2 guarantees that any top cluster will be reached in at most logbNc hops, where Nc is the the number of top clusters. Assume the average degree of brokers in each cluster is Dc. Thus, each broker only needs to keep Dc à logbNc neighbors. 4 HPARTITION In order to take advantage of multiple distributed brokers, SREM divides the entire content space among the top clus- ters of SkipCloud, so that each top cluster only handles a subset of the entire space and searches a small number of candidate subscriptions. SREM employs a hybrid multi- dimensional space partitioning technique, called HPartition, to achieve scalable and reliable event matching. Generally speaking, HPartition divides the entire content space into disjoint subspaces (Section 4.1). Subscriptions and events with overlapping subspaces are dispatched and matched on the same top cluster of SkipCloud (Sections 4.2 and 4.3). To keep workload balance among servers, HPartition divides the hot spots into multiple cold spots in an adaptive manner (Section 4.4). Table 2 shows key notations used in this section. 4.1 Logical Space Construction Our idea of logical space construction is inspired by existing single-dimensional space partitioning (called SPartition) and all-dimensional space partitioning (called APartition). Let k be the number of dimensions in the entire space V and Ri be the ordered set of values of the dimension Ai. Each Ri is split into Nseg continuous and disjoint segments fRj i; j ¼ 1; 2; . . . ; Nsegg, where j is the segment identifier (called SegID) of the dimension Ai. The basic idea of SPartition like BlueDove [15] is to treat each dimension as a separated space, as show in Fig. 3a. Specifically, the range of each dimension is divided into Nseg segments, each of which is regarded as a separated sub- space. Thus, the entire space V is divided into k à Nseg sub- spaces. Subscriptions and events falling into the same subspace are matched against each other. Due to the coarse- grained partitioning of SPartition, each subscription falls into a small number of subspaces, which brings multiple candidate brokers for each event and low memory over- head. On the other hand, SPartition may form a hot spot easily if a large number of subscribers are interested in the same range of a dimension. On the other hand, the idea of APartition like SEMAS [16] is to treat the combination of all dimensions as a sepa- rate space, as shown in Fig. 3b. Formally, the whole space V is partitioned into ðNsegÞk subspaces. Compared with SParti- tion, APartition leads to a smaller number of hot spots, since a hot subspace would be formed only if its all segments are subscribed by a large number of subscriptions. On the other hand, each event in APartition only has one candidate bro- ker. Compared with SPartition, each subscription in AParti- tion falls into more subspaces, which leads to a higher memory cost. Inspired by both SPartition and APartition, HPartition provides a flexible manner to construct logical space. Its basic idea is to divide all dimensions into a number of groups and treat each group as a separate space, as shown in Fig. 3c. Formally, k dimensions in HPartition are classi- fied into t groups, each of which contains ki dimensions, TABLE 2 Notations in HPartition V the entire content space k the number of dimensions of V Ai the dimension i, i 2 ½1; kŠ Ri the range of Ai Pi the predicate of Ai Nseg the number of segments on Ri N 0 seg the number of segments of hot spots a the minimum size of the hot cluster Nsub the number of subscriptions Gx1ÁÁÁxk the subspace with SubspaceID x1 Á Á Á xk Fig. 3. Comparison among three different logical space construction techniques. 4 IEEE TRANSACTIONS ON CLOUD COMPUTING, VOL. 3, NO. 1, JANUARY-MARCH 2015
  • 5. where Pt i¼1 ki ¼ k. That is, V is split into t separated spaces fVi; i 2 ½1; tŠg. HPartition adopts APartition to divide each Vi. Thus, V is divided into Pt i¼1ðNsegÞki subspaces. For each space Vi, let Gx1ÁÁÁxk ; xi 2 ½0; NsegŠ be its subspace, where x1 Á Á Á xk is the concatenation of SegIDs of the subspace, called SubspaceID. For the dimensions that a subspace does not contain, the corresponding positions of the SubspaceID are set to 0. Each subspace of HPartition is managed by a top cluster of SkipCloud. To dispatch each subspace Gx1ÁÁÁxk to its corre- sponding top cluster of SkipCloud, HPartition uses a hash function like MurMurHash [21] to map each SubspaceID x1 Á Á Á xk to a b-ary identifier with length of m, represented by Hðx1 Á Á Á xkÞ. According to the prefix routing in Algo- rithm 2, each subspace will be forwarded to the top cluster whose ClusterID is nearest to Hðx1 Á Á Á xkÞ. Note that the brokers in the same top cluster are in charge of the same set of subspaces, which ensures the reliability of each subspace. 4.2 Subscription Installation A user can specify a subscription by defining the ranges over one or more dimensions. We treat S ¼ ^k i¼1Pi as a general form of subscriptions, where Pi is a predicate of the dimen- sion Ai. If S does not contain a dimension Ai, the correspond- ing Pi is the entire range of Ai. When receiving a subscription S, the broker first obtains all subspaces Gx1ÁÁÁxk which are overlapping with S. Based on the logical space construction of HPartition, all dimensions are classified into t individual spaces Vj; j 2 ½1; tŠ. For each Gx1ÁÁÁxk of Vi, it satisfies xi ¼ s; if Ai 6¼ Ri; Rs i Pi 6¼ ;; 0; if Ai ¼ Ri: : (1) As shown in in Fig. 4, V consists of two groups, and each range is divided into two segments. For a subscrip- tion S1 ¼ ð20 A1 80Þ ^ ð80 A2 100Þ ^ ð105 A3 170Þ ^ ð92 A4 115Þ, its matched subspaces are G1200, G1100 and G0022. For the subscription S1, its every matched subspace Gx1ÁÁÁxk is hashed to a b-ary identifier with length of m, repre- sented by Hðx1 Á Á Á xkÞ. Thus, S1 is forwarded to the top clus- ter whose ClusterID is nearest to Hðx1 Á Á Á xkÞ. When a broker in the top cluster receives the subscription S1, it broadcasts S1 to other brokers in the same cluster, such that the top cluster provides reliable event matching and bal- anced workloads among its brokers. 4.3 Event Assignment and Forwarding Strategy Upon receiving an event, the broker forwards this event to its corresponding subspaces. Specifically, considering an event e ¼ ^k i¼1ðAi; viÞ. Each subspace Gx1ÁÁÁxk that e falls into should satisfy xi 2 f0; sg; vi 2 Rs i : (2) For instance, let e be ðA1; 30Þ ^ ðA2; 100Þ ^ ðA3; 80Þ^ ðA4; 80Þ. Based on the logical partitioning in Fig. 4, the matched sub- spaces of e are G1200, G0011. Recall that HPartition divides the whole space V into t separated spaces fVig, which indicates there are t candidate subspaces fGi; 1 i tg for each event. Because of the dif- ferent workloads, the strategy of how to select one candi- date subspace for each event may greatly affect the matching rate. For an event e, suppose each candidate sub- space Gi contains Ni subscriptions. The most intuitive approach is the least first strategy, which chooses the sub- space with least number of candidate subscriptions. Corre- spondingly, its average searching subscription amount is Nleast ¼ mini2½1;tŠNi. However, when a large number of skewed events fall into the same subspace, this approach may lead to unbalanced workloads among brokers. Another is the random strategy, which selects each candidate sub- space with equal probability. It ensures balanced workloads among brokers. However, the average searching subscrip- tion amount increases to Navg ¼ 1 t Pt i¼1 Ni. In HPartition, we adopt a probability based forwarding strategy to dispatch events. Formally, the probability of forwarding e to the space fVig is pi ¼ ð1 À Ni= Pt i¼1 NiÞ=ðt À 1Þ. Thus, the aver- age searching subscription amount is Nprob ¼ ðN À Pt i¼1 N2 i N Þðt À 1Þ. Note that when every subspace has the same size, Nprob reaches its maximal value, i.e., Nprob Navg. Therefore, this approach has better workload balance than the least first strategy and less searching latency than the random strategy. 4.4 Hot Spots Alleviation In real applications, a large number of subscriptions fall into a few number of subspaces which are called hot spots. The matching in hot spots leads to a high searching latency. To alleviate the hot spots, an obvious approach is to add brokers to the top clusters containing hot spots. However, this approach raises the total price of the service and wastes resources of the clusters only containing cold spots. To utilize distributed multiple clusters, a better solution is to balance the workloads among clusters through parti- tioning and migrating hot spots. The gain of the partitioning technique is greatly affected by the distribution of subscrip- tions of the hot spot. To this end, HPartition divides each hot spot into a number of cold spots through two partition- ing techniques: hierarchical subspace partitioning and sub- scription set partitioning. The first aims to partition the hot Fig. 4. An example of subscription assignment. MA ET AL.: A SCALABLE AND RELIABLE MATCHING SERVICE FOR CONTENT-BASED PUBLISH/SUBSCRIBE SYSTEMS 5
  • 6. spots where the subscriptions are diffused among the whole space, and the second aims to partition the hot spots where the subscriptions fall into a narrow space. 4.4.1 Hierarchical Subspace Partitioning To alleviate the hot spots whose subscriptions diffuse in its whole space, we propose a hierarchical subspace partitioning technique, called HSPartition. Its basic idea is to divide the space of a hot spot into a number of smaller subspaces and reassign its subscriptions into these new generated subspaces. Step one, each hot spot is divided along with its ranges of dimensions. Assume Vi contains ki dimensions, and one of its hot spots is Gx1ÁÁÁxk . For each xi 6¼ 0, the corre- sponding range Rxi i is divided into N 0 seg segments, each of which is denoted by Rxiyi ð1 xi Ns; 1 yi N 0 segÞ. Therefore, Gx1ÁÁÁxk is divided into ðN 0 segÞki subspaces, each of which is represented by Gx1ÁÁÁxky1ÁÁÁyk . Step two, subscriptions in Gx1ÁÁÁxk are reassigned to these new subspaces according to the subscription instal- lation in Section 4.2. Because each subspace occupies a small piece of the space of Gx1ÁÁÁxk , it means a subscription S of Gx1ÁÁÁxk may fall into a part of these subspaces, which will reduce the number of subscriptions that each event matches against on the hot spot. HSPartition enables the hot spots to be divided into much smaller subspaces itera- tively. As shown in the left top corner Fig. 5, the hot spot G102 is divided into four smaller subspaces by HSParti- tion, and the maximum number of subscriptions in this space decreases from 9 to 4. To avoid concurrency partitioning one hot spot by multi- ple brokers in the same top cluster, the leader broker of each top cluster is responsible for periodically checking and partitioning hot spots. When a broker receives a subscrip- tion, it hierarchically divides the space of the subscription until each subspace can not be divided further. Therefore, the leader broker that is in charge of dividing a hot spot Gx1ÁÁÁxk disseminates a update message including a three- tuple {“Subspace Partition”, x1 Á Á Á xk, N 0 s} to all the other brokers. Because the global cluster at level 0 of SkipCloud organizes all brokers into a random graph, we can utilize a gossip-based multicast method [22] to disseminate the update message to other brokers in OðlogNbÞ hops, where Nb is the total broker size. 4.4.2 Subscription Set Partitioning To alleviate the hot spots whose subscriptions fall into a nar- row space, we propose a subscription set partitioning, called SSPartition. Its basic idea is to divide the subscriptions set of a hot spot into a number of subsets and scatter these subsets to multiple top clusters. Assume Gx1ÁÁÁxk is a hot spot and N0 is the size of each subscription subset. Step one, divide the subscription set of Gx1ÁÁÁxk into n subsets, where n ¼ jGx1ÁÁÁxk j=N0. Each subset is assigned a new SubspaceID x1 Á Á Á xk À i, ð1 i jGx1ÁÁÁxk j=N0Þ, and the subscription with identifier SubID would be dis- patched to the subset Gx1ÁÁÁxkÀi if HðSubIDÞ%n ¼ i. Step two, each subset Gx1ÁÁÁxkÀi is forwarded to its corre- sponding top cluster with ClusterID Hðx1 Á Á Á xk À iÞ. Since these non-overlapping subsets are scattered to multiple top clusters, it allows events falling into the hot spots to be matched in parallel multiple brokers, which brings much lower matching latency. Similar to the process of HSParti- tion, the leader broker who is responsible for dividing a hot spot Gx1ÁÁÁxk disseminates a three-tuple {“Subscription Parti- tion”, x1 Á Á Á xk, n} to all the other brokers to support further partitioning operations. As shown in the right bottom cor- ner of Fig. 5, the hot spot G2100 into three smaller subsets G2100À1; G2100À2 and G2100À3 by SSPartition. Correspond- ingly, the maximal number of subscriptions of the subspace decreases from 11 to 4. 4.4.3 Adaptive Selection Algorithm Because of diverse distributions of subscriptions, both HSPartition and SSPartition can not replace with each other. On one hand, HSPartition is attractive to divide the hot spots whose subscriptions are uniform scattered regions. However, it’s inappropriate to divide the hot spots whose subscriptions all appear at the same exact point. On the other hand, SSPartition allows to divide any kind of hot spots into multiple subsets even if all subscriptions falls into the same single point. However, compared with HSPartition, it has to dispatch an event to multiple subspa- ces, which brings a higher traffic overhead. To achieve balanced workloads among brokers, we pro- pose an adaptive selection algorithm to select either HSPar- tition or SSPartition to alleviate hot spots. The selection is based on the similarity of subscriptions in the same hot spot. Specifically, assume Gx1ÁÁÁxky1ÁÁÁyk is the subspace with maximal size of subscriptions in HSPartition. a is a thresh- old value, which represents the similarity degree of sub- scriptions’ spaces in a hot spot. We choose HSPartition as the partitioning algorithm if jGx1ÁÁÁxky1ÁÁÁyk j=jGx1ÁÁÁxk j a. Oth- erwise, we choose SSPartition. Through combining both partitioning techniques, this selection algorithm can allevi- ate hot spots in an adaptive manner. 4.5 Performance Analysis 4.5.1 The Average Searching Size Since each event is matched in one of its candidate subspa- ces, the average number of subscriptions of all subspaces, called the average searching size, is critical to reduce the matching latency. We give the formal analysis as follows. Theorem 1. Suppose the percentage of each predicate’s range of each subscription is , then the average searching size Nprob in HPartition is not more than Nsub t Pt i¼1 ki if Nseg ! 1, where t is the number of groups in the entire space, ki is the number of Fig. 5. An example of dividing hot spots. Two hot spots of G1200 and G2100 are divided HSPartition and SSPartition, respectively. 6 IEEE TRANSACTIONS ON CLOUD COMPUTING, VOL. 3, NO. 1, JANUARY-MARCH 2015
  • 7. dimensions in each group, and Nsub is the number of subscriptions. Proof. For each dimension Ai; i 2 ½1; kŠ, the range of Ai is Ri, the length of corresponding predicate Pi is kRik. Since each dimension is divided into Nseg segments, the length of each segment is kRik=Nseg. Thus, the expected number of segments that Pi falls into is dNsege. For a space Vi, it contains ðNsegÞki subspaces. Then the average searching size in Vi is Ni ¼ dNsegeki Nsub ðNsegÞki . When Nseg ! 1, we have dNsege Nseg ¼ , and Ni ¼ ki Nsub. According to the probabil- ity based forwarding strategy in Section 4.3, the average searching size is Nprob 1 t Pt i¼1 Ni. That is, Nprob Nsub t Pt i¼1 ki . tu According to the result of Theorem 1, the average search- ing size Nprob decreases with the reduction of . That is, smaller brings less matching time in HPartition. Fortu- nately, the subscriptions distribution in real world applica- tions is often skewed, and most predicates occupy small ranges, which guarantees small average searching size. Besides, note that Nsub t Pt i¼1 ki reaches its minimal value Nsub k t if ki ¼ kj, where i 2 ½1; tŠ and j 2 ½1; tŠ. It indicates that the upper bound of Nprob can further decrease to Nsub k t if each group has the same number of dimensions. 4.5.2 Event Matching Reliability To achieve reliable event matching, each event should have multiple candidates in the system. In this section, we give a formal analysis of the event matching availability as follows. Theorem 2. SREM promises event matching availability in the face of concurrent crash failure of up to dNtopð1 À e À t Ntop Þ À 1 brokers, where d is the number of brokers in each top cluster of SkipCloud, Ntop is the number of top clusters, and t is the number of groups in HPartition. Proof. Based on HPartition, each event has t candidate sub- spaces which are diffused into Ntop top clusters. Over n boxes, distributing m balls at random, the expectation of the number of empty boxes is neÀm n . Similarly, over Ntop top clusters, distributing t candidate subspaces at random, the expectation of the number of non-empty top clusters is Ntopð1 À e À t Ntop Þ. Since each top cluster contains d brokers which manages the same set of subspaces, the expectation of the number of non-empty brokers for each event is dNtopð1 À e À t Ntop Þ. Thus, SREM ensures available event matching in the face of concurrent crash failure of up to dNtopð1 À e À t Ntop Þ À 1 brokers. tu According to Theorem 2, the event matching availabil- ity of SREM is affected by d and t. That is, both Skip- Cloud and HPartition provide flexible schemes to ensure reliable event matching. 5 PEER MANAGEMENT In SREM, there are mainly three roles: clients, brokers, and clusters. Brokers are responsible for managing all of them. Since the joining or leaving of these roles may lead to inefficient and unreliable data dissemination, we will discuss the dynamics maintenance mechanisms used by brokers in this section. 5.1 Subscriber Dynamics To detect the status of subscribers, each subscriber estab- lishes affinity with a broker (called home broker), and period- ically sends its subscription as a heartbeat message to its home broker. The home broker maintains a timer for its every buffered subscription. If the broker has not received a heartbeat message from a subscriber over Tout time, the sub- scriber is supposed to be offline. Next, the home broker removes this subscription from its buffer and notifies the brokers containing the failed subscription to remove it. 5.2 Broker Dynamics Broker dynamics may lead to new clusters joining or old clusters leaving. In this section, we mainly consider the brokers joining/leaving from existing clusters, rather than the changing of the cluster size. When a new broker is generated by its datacenter man- agement service, it firstly sends a “Broker Join” message to the leader broker in its top cluster. The leader broker returns back its top cluster identifier, neighbor lists of all levels of SkipCloud, and all subspaces including the corresponding subscriptions. The new broker generates its own identifier by adding a b-ary number to its top cluster identifier and takes the received items of each level as its initial neighbors. There is no particular mechanism to handle broker departure from a cluster. In the top cluster, its leader broker can easily monitor the status of other brokers. For the clus- ters of the rest levels, the sampling service guarantees that the older items of each neighbor list are prior to be replaced by fresh ones during the view shuffling operation, which makes the failed brokers be removed from the system quickly. From the perspective of event matching, all brokers in the same top cluster have the same subspaces of subscrip- tions, which indicates that broker failure would not inter- rupt the event matching operation if there is at least one broker alive in each cluster. 5.3 Cluster Dynamics Brokers dynamics may lead to new clusters joining or old clusters leaving. Since each subspace is managed by the top cluster whose identifier is closest to that of the subspace, it’s necessary to adaptively migrate a number of old clusters to the new joining clusters. Specifically, the leader broker of the new cluster delivers its top ClusterID carried on a “Cluster Join” message to other clusters. The leader brokers in all other clusters find out the subspaces whose identifiers are closer to the new ClusterID than their own cluster iden- tifiers, and migrate them to the new cluster. Since each subspace is stored in one cluster, the cluster departure incurs subscription loss. The peer sampling ser- vice of SkipCloud can be used to detect failed clusters. To recover lost subscriptions, a simple method is to redirect the lost subscriptions by their owners’ heartbeat messages. Due to the unreliable links between subscribers and brokers, this approach may lead to long repair latency. To this end, we MA ET AL.: A SCALABLE AND RELIABLE MATCHING SERVICE FOR CONTENT-BASED PUBLISH/SUBSCRIBE SYSTEMS 7
  • 8. store all subscriptions into a number of well-known servers of the datacenters. When these servers obtain the failed clus- ters, they dispatch the subscriptions in these failed clusters to the corresponding live clusters. Besides, the number of levels m in SREM is adjusted adaptively with the change of the broker size Nb to ensure m ¼ dlogbðNbÞe, where b is the number of children clusters of each non-top cluster. Each leader broker runs a gossip- based aggregation service [23] at the global cluster to esti- mate the total broker size. When the estimated broker size Ne d is enough to change m, each leader broker notifies other brokers to update their neighbor lists through a gossip- based multicast with a probability of 1=Ne d. If a number of leader brokers disseminate the “Update Neighbor Lists” messages simultaneously, the earlier messages will be stopped when it collides with later ones, which ensures all leader brokers have the same view of m. When m decreases, each broker removes its neighbor list of the top level directly. When m increases, a new top level is initialized by filling appropriate neighbors from the neighbor lists at other levels. Due to the logarithmic relation of m and Nb, only sig- nificant change of Nb can change m. So, the level dynamics brings quite low traffic cost. 6 EXPERIMENT 6.1 Implementation To take advantage of the reliable links and high bandwidth among servers of the cloud computing environment, we choose the CloudStack [24] testbed to design and implement our prototype. To develop the prototype as modular and portable, we use ICE [25] as the fundamental communica- tion platform. ICE provides a communication solution that is easy to program with, and allows the developers to only focus on their application logic. Based on ICE, we add about 11,000 lines of Java code. To evaluate the performance of SkipCloud, we imple- ment both SkipCloud and Chord to forward subscriptions and messages. To evaluate the performance of HPartition, the prototype supports different space partitioning policies. Moreover, the prototype provides three different message forwarding strategies, i.e, least subscription amount for- warding, random forwarding, and probability based for- warding mentioned in Section 4.3. 6.2 Parameters and Metrics We use a group of virtual machines (VMs) in the CloudStack testbed to evaluate the performance of SREM. Each VM is running in a exclusive physical machine, and we use 64 VMs as brokers. For each VM, it is equipped with four processor cores, 8 GB memory, and is connected to Gigabit Ethernet switches. In SkipCloud, the number of brokers in each top cluster d is set to 2. To ensure reliable connectivity of clusters, the average degree of brokers in each cluster Dc is 6. In HParti- tion, the entire subscription space consists of eight dimen- sions, each of which has a range from 0 to 500. The range of each dimension is cut into 10 segments by default. For each hotspot, the range of its every dimension is cut into two seg- ments iteratively. In most experiments, 40; 000 subscriptions are generated and dispatched to their corresponding brokers. The ranges of predicates of subscriptions follow a normal distribution with standard deviation of 50, represented by ssub. Besides, there are two million events generated by all brokers. For the events, the value of each pair follows a uniform distri- bution along the entire range of the corresponding dimen- sion. A subspace is labeled as a hotspot if its subscription amount is over 1,000. Besides, the size of basic subscription subset N0 in Section 4.4.2 is set to 500, and the threshold value a in Section 4.4.3 is set to 0.7. The detail of default parameters is shown in Table 3. Recall that HPartition divides all dimensions into t indi- vidual spaces Vi, each of which contains ki dimensions. We set ki to be 1, 2, and 4, respectively. These three partitioning policies are called HPartition-1, HPartition-2, and HParti- tion-4, respectively. Note that HPartition-1 represents the single-dimensional partitioning (mentioned in Section 4.1), which is adopted by BlueDove [15]. The details of imple- mented methods are shown in Table 4, where SREM-ki and Chord-ki represent HPartition-ki under SkipCloud and Chord, respectively. In the following experiments, we eval- uate the performance of SkipCloud through comparing SREM-ki with Chord-ki, and evaluate the performance of HPartition through comparing HPartition-4 and HParti- tion-2 with the single-dimensional partitioning in Blue- Dove, i.e., HPartition-1. Besides, we do not use APartition (mentioned in Section 4.1) in the experiments, which is mainly because its fine-grained partitioning technique leads to extremely high memory cost. We evaluate the performance of SREM through a number of metrics. Subscription searching size: The number of subscrip- tions that need to be searched on each broker for matching a message. Matching rate: The number of matched events per second. Suppose the first event is matched at the moment of T1, and the last one is matched at the moment of T2. Thus, the matching rate is Ne T2ÀT1 . Event loss rate: The percentage of lost events in a specified time period. TABLE 4 List of Implemented Methods Name Overlay Partitioning Technique SREM-4 SkipCloud HPartition-4 SREM-2 SkipCloud HPartition-2 SREM-1 SkipCloud HPartition-1 (SPartition, BlueDove) Chord-4 Chord HPartition-4 Chord-2 Chord HPartition-2 Chord-1 Chord HPartition-1 (SPartition, BlueDove) TABLE 3 Default Parameters in SREM Parameter Nb d Dc k ki Nseg Value 64 2 6 8 f1; 2; 4g 10 Parameter N 0 seg Nsub ssub Nhot N0 b Value 2 40K 50 1; 000 500 0.7 8 IEEE TRANSACTIONS ON CLOUD COMPUTING, VOL. 3, NO. 1, JANUARY-MARCH 2015
  • 9. 6.3 Space Partitioning Policy The space partitioning policy determines the number of searching subscriptions for matching a message. It fur- ther affects the matching rate and workload allocation among brokers. In this section, we test three different space partitioning policies : HPartition-1, HPartition-2, and HPartition-4. Fig. 6a shows the cumulative distribution function (CDF) of subscription searching sizes on all brokers. Most subscrip- tion searching sizes of HPartition-1 are smaller than the threshold value of Nhot. In Fig. 6a, the percentage of “cold” subspaces in HPartition-4, HPartition-2, and HPartition-1 is 99.7, 95.1 and 83.9 percent. As HPartition-4 splits the entire space into more fine-grained subspaces, subscriptions fall into the same subspace if they are interested in the same range of four dimensions. In contrast, since HPartition-1 or HPartition-2 splits the entire space into less fine-grained sub- spaces, subscriptions are dispatched to the same subspace with a higher probability. Besides, the CDF of each approach shows a sharp increase at the subscription searching size of 500, which is caused by SSPartition in Section 4.4.2. Fig. 6b shows the average subscription searching size of each broker. The maximal average subscription sizes of HPartition-4, HPartition-2, and HPartition-1 are 437, 666 and 980, respectively. Their corresponding normalized stan- dard deviations (standard deviation divided by the aver- age) of the average subscription sizes of each approach are 0.026, 0.057 and 0.093, respectively. It indicates that brokers of HPartition-4 have less subscription searching size and better workload balance, which mainly lies in its more fine- grained space partitioning. In conclusion, with the increasing of the partitioning granularity, HPartition shows better workload balance among brokers at the cost of the growth of the memory cost. 6.4 Forwarding Policy and Workload Balance 6.4.1 Impact of Message Forwarding Policy The forwarding policy determines each message to be for- warded to which broker, which significantly affects the workload balance among brokers. Fig. 7 shows the matching rates of three policies: probability based forwarding, least subscription amount forwarding and random forwarding. As shown in Fig. 7 the probability based forwarding pol- icy has highest matching rate in various scenarios. This mainly lies in its better trade-off between workload balance and searching latency. For instance, when we use HParti- tion-4 under SkipCloud, the matching rate of the probability based policy are 1:27Â and 1:51Â that of random policy and least subscription amount forwarding policy, respectively. When we use HPartition-4 under Chord, the corresponding gains of the probability based policy become 1:26Â and 1:57Â, respectively. 6.4.2 Workload Balance of Brokers We compare the workload balance of brokers under differ- ent partitioning strategies and overlays. In the experiment, we use the probability based forwarding policy to dispatch events. One million events are forwarded and matched against 40;000 subscriptions in the experiments. We use an index of bðNbÞ [26] to evaluate the load balance among brokers, where bðNbÞ ¼ ð PNb i¼1 LoadiÞ2 =ðNb PNb i¼1 Load2 i Þ, and Loadi is the workload of each broker. The value of bðNbÞ is between 0 and 1, and the higher value means better work- load balance among brokers. Fig. 9a shows the distributions of the number of for- warding events among brokers. The forwarding event size of SREM-ki is less than that of Chord-ki, which is because SkipCloud spends less routing hops than Chord. The corre- sponding values of bðNbÞ of SREM-4, SREM-2, SREM-1, Chord-4, Chord-2 and Chord-1 are 0.99, 0.96, 0.98, 0.99, 0.99 and 0.98 respectively. These values indicate quite bal- anced workloads of forwarding events among brokers. Besides, the number of forwarding events in SREM is at least 60 percent smaller than in Chord. Fig. 9b shows the distributions of matching rates among brokers. Their corre- sponding values of bðNbÞ are 0.98, 0.99, 0.99, 0.99, 0.97 and 0.89, respectively. The balanced matching rates among brokers is mainly caused by the fine-grained partitioning of HPartition and the probability based forwarding policy. 6.5 Scalability In this section, we evaluate the scalability of all approaches through measuring the changing of matching rate with dif- ferent values of Nb, Nsub, k and Nseg. In each experiment, only one parameter of Table 3 is changed to validate its impact. We first change the number of brokers Nb. As shown in Fig. 10a, the matching rate of each approach increases line- arly with the growth of Nb. As Nb increases from 4 to 64, the gain of the matching rate of SREM-4, SREM-2, SREM-1, Chord-4, Chord-2 and Chord-1 is 12:8Â, 11:2Â, 9:6Â, 10:0Â, 8:4Â and 5:8Â, respectively. Compared with Chord-ki, the higher increasing rate of SREM-ki is mainly caused by the smaller routing hops of SkipCloud. Besides, SREM-4 presents the highest matching rate with various values of Nb due to the fine-grained partitioning technique of HParti- tion and the lower forwarding hops of SkipCloud. We change the number of subscriptions Nsub in Fig. 10b. As Nsub increases from 40K to 80K, the matching rate of Fig. 6. Distribution of subscription searching sizes. Fig. 7. Forwarding policy. MA ET AL.: A SCALABLE AND RELIABLE MATCHING SERVICE FOR CONTENT-BASED PUBLISH/SUBSCRIBE SYSTEMS 9
  • 10. SREM-4, SREM-2, SREM-1, Chord-4, Chord-2 and Chord-1 decreases 61.2, 58.7, 39.3, 51.5, 51.0 and 38.6 percent, respec- tively. Compared with SREM-1, each subscription in SREM- 4 may fall into more subspaces due to its fine-grained parti- tioning, which leads to a higher increasing rate of the aver- age subscription searching size in SREM-4. That’s why the decreasing percentages of the matching rates in SREM-ki and Chord-ki increase with the growth of ki. In spite of the higher decreasing percentage, the matching rates of SREM- 4 and Chord-4 are 27:4Â and 33:7Â that of SREM-1 and Chord-1 respectively, when Nsub equals to 80;000. We increase the number of dimensions k from 8 to 20 in Fig. 10c. The matching rate of SREM-4, SREM-2, SREM-1, Chord-4, Chord-2 and Chord-1 decreases 83.4, 31.3, 25.7, 81.2, 44.0 and 15.0 percent, respectively. Similar to the phe- nomenon of Fig. 10b, the decreasing percentages of the matching rates in SREM-ki and Chord-ki increase with the growth of ki. This is because the fine-grained partitioning of HPartition-4 leads to faster growing of the average subscrip- tion searching size. When k equals to 20, the matching rates of SREM-4 and Chord-4 are 8:7Â and 9:4Â that of SREM-1 and Chord-1, respectively. We evaluate the impact of different number of segments Nseg in Fig. 10d. As Nseg increases from 4 to 10, The corre- sponding matching rates of SREM-4, SREM-2, SREM-1, Chord-4, Chord-2 and Chord-1 increases 82.7, 179.1, 44.7, 98.3, 286.1 and 48.4 percent, respectively. These numbers indicate that the bigger Nseg brings more fine-grained parti- tioning and smaller average subscription searching size. In conclusion, the matching rate of SREM shows a linear increasing capacity with the growing Nb, and SREM-4 presents highest matching rate in various scenarios due to its more fine-grained partitioning technique and less rout- ing hops of SkipCloud. 6.6 Reliability In this section, we evaluate the reliability of SREM-4 by test- ing its ability to recover from server failures. During the experiment, 64 brokers totally generate 120 million messages and dispatch them to their corresponding subspaces. After 10 seconds, a part of brokers are shut down simultaneously. Fig. 11a shows the changing of event loss rates of SREM when four, eight 16 and 32 brokers fail. From the moment when brokers fail, the corresponding event loss rates increase to 3, 8, 18 and 37 percent respectively in 10 seconds, and drop back to 0 in 20 seconds. This recovery ability mainly lies in the quick failure detection of the peer sam- pling service mentioned in Section 3.1 and the quick reas- signing lost subscriptions by the well-known servers mentioned in Section 5.3. Note that the maximal event loss rate in each case is less than the percentage of lost brokers. This is because each top cluster of SkipCloud initially has two brokers, and an event will not be dropped if it is dis- patched to an alive broker of its top cluster. Fig. 11b shows the changing of the matching rates when a number of brokers fail. When a number of brokers fail at the moment of 10 second, there is an apparent dropping of the matching rate in the following tens of seconds. Note that the dropping interval increases with the number of failed brokers, which is because the failure of more brokers leads to a higher latency of detecting these brokers and recovering the lost subscriptions. After the handling of failed brokers, the matching rate of each situation increases to a higher level. It indicates SREM can function normally even if a large number of broker fail simultaneously. In conclusion, the peer sampling service of SkipCloud ensures continuous matching service even if a large number of brokers fails simultaneously. Through buffering sub- scriptions in the well-known servers, failed subscriptions can be dispatched to the corresponding alive brokers in tens of seconds. Fig. 9. Workload balance. Fig. 10. Scalability. Fig. 8. The impact of workload characteristics. 10 IEEE TRANSACTIONS ON CLOUD COMPUTING, VOL. 3, NO. 1, JANUARY-MARCH 2015
  • 11. 6.7 Workload Characteristics In this section, we evaluate how workload characteristics affect the matching rate of each approach. First, we evaluate the influence of different subscription distribution through changing the standard deviation ssub from 50 to 200. Fig. 8a shows the matching rate of each approach decreases with the increasing of ssub. As ssub increases, the predicates of each subscription occupy larger ranges of the corresponding dimensions, which causes that the subscription is dispatched to more subspaces. We then evaluate the performance of each approach under different event distributions. In this experiment, the standard deviation of the event distribution se changes from 50 to 200. Note that skewed events lead to two scenarios. One is the events are skewed in the same way as subscriptions. Thus, the hot events coincide with the hot spots, which severely hurts the matching rate. The other is the events are skewed in the opposite way as sub- scriptions, which benefits the matching rate. Fig. 8b shows how the adverse skewed events affect the perfor- mance of each approach. As se reduces from 200 to 50, the matching rate of SREM-4, SREM-2, SREM-1, Chord-4, Chord-2 and Chord-1 decreases 79.4, 63.2, 47.0, 77.2, 59.1 and 52.6 percent, respectively. Although the matching rate of SREM-4 decreases greatly as se reduces, it is still higher than other approaches under different se. We evaluate the performance of each approach under different combinations of dimensions (called subscription patterns) in Fig. 8c. Until now, all above mentioned experi- ments generate subscriptions by one subscription pattern, where each predicate of the dimension follows the normal distribution. We uniformly select a group of patterns to generate subscriptions. For the predicates that a subscrip- tion does not contain, their ranges are the whole ranges of the corresponding dimensions. Fig. 8c shows the matching rate decreases with the number of subscription patterns. As the number of subscription patterns grows from 1 to 16, the matching rate of SREM-4, SREM-2, SREM-1, Chord-4, Chord-2 and Chord-1 decreases 86.7, 63.8, 2.3, 90.3, 65.9 and 10.3 percent, respectively. The sharp decline of the matching rate of SREM-4 and Chord-4 is mainly caused by the quick increasing of the average searching size. How- ever, the matching rates of SREM-4 and Chord-4 are still much higher than that of the other approaches. In conclusion, the matching throughput of each approach decreases greatly as the skewness of subscriptions or events increases. Compared with other approaches, the fine- grained partitioning of SREM-4 ensures higher matching throughput under various parameter settings. 6.8 Memory Overhead Storing subscriptions is main memory overhead of each bro- ker. Since each subscription may fall into the subspaces that are managed by the same broker, each broker only needs to store a real subscription and a group of its identifications to reduce the memory overhead. We use the average number of subscription identifications that each broker stores, noted by Nsid, as a criterion. Recall that a predicate specifies a con- tinuous range of a dimension. We restrict a continuous range by two numerals, each of which occupies 8 bytes. Suppose that each subscription identification uses 8 bytes. Thus, the maximal average memory overhead of managing subscriptions on each broker is 16kNsub þ 8Nsid, where k is the number of dimensions, and Nsub is the total number of subscriptions. As mentioned in Section 4.1, HPartition-ki brings large memory cost and partitioning latency as ki increases. When we test the performance of HPartition-8, the Java heap space is out of memory. We first evaluate the average memory cost of brokers with the changing of Nsub in Fig. 12a. The value of Nsid of each partitioning technique increases linearly with the growth of Nsub. As Nsub increases from 40K to 100K, the value of Nsid in HPartition-4, HPartition-2 and HPartition-1 grows 143, 151 and 150 percent, respectively. HPartition-4 presents the highest Nsid than that of the other approaches, which is because the number of subspaces that a subscrip- tion falls into increases with the decreasing room of each subspace. In spite of the high Nsid in HPartition-4, its maxi- mal average overhead on each broker only cost 20.2 MB when Nsub reaches 100K. We then evaluate how the memory overhead of brokers changes with different values of ssub. As ssub increases, each predicate of subscriptions spans a wider range, which leads to much more memory overhead. As ssub increases from 50 to 200, the value of Nsid in HPartition-4, HPartition-2 and HPartition-1 grows 86, 60 and 27 percent, respectively. Unsurprisingly, Nsid in HPartition-4 still has largest value. When ssub equals to 200, the maximal average memory overhead of HPartition-4 is 6.2 MB, which brings small memory overhead to each broker. In conclusion, the memory overhead of HPartition-ki increases slowly with the growth of Nsub and ssub, if ki is no more than 4. 7 RELATED WORK A large body of efforts on broker based pub/subs have been proposed in recent years. One method is to organize brokers into a tree overlay, such that events can be delivered to all relevant brokers without duplicate transmissions. Besides, data replication schemes [27] are employed to ensure Fig. 12. Memory overhead. Fig. 11. Reliability. MA ET AL.: A SCALABLE AND RELIABLE MATCHING SERVICE FOR CONTENT-BASED PUBLISH/SUBSCRIBE SYSTEMS 11
  • 12. reliable event matching. For instance, Siena [3] advertises subscriptions to the whole network. When receiving an event, each broker determines to forward the event to the corresponding broker according to its routing table. In Atmosphere [4], it dynamically identifies entourages of publishers and subscribers to transmit events with low latency. It is appropriate to the scenarios with small-scale of subscribers. As the number of subscribers increases, the over-overlays constructed in Atmosphere probably have the similar latency like in Siena. To ensure reliable routing, Kazemzadeh and Jacobsen [5] propose a d-fault-tolerance algorithm to handle concurrent crash failure of up to d brokers. Brokers are required to maintain a partial view of this tree that includes all brokers within distance d þ 1. The multi-hop routing techniques in these tree-based pub/subs lead to a high routing latency. Besides, skewed subscrip- tions and events lead to unbalanced workloads among brokers, which may severely reduces the matching through- put. In contrast, SREM uses SkipCloud to reduce the routing latency and HPartition to balance the workloads of brokers. Another method is to divide brokers into multiple clusters through unstructured overlays. Brokers in each cluster are connected through reliable topologies. For instance, brokers in Kyra [7] is grouped into cliques based on their network proximity. Each clique divides the whole content space into non-overlapping zones based on the number of its brokers. After that, the brokers in different cliques which are respon- sible for similar zones are connected by a multicast tree. Thus, events are forwarded through its corresponding multi- ple tree. Sub-2-Sub [8] implements epidemic-based cluster- ing to partition all subscriptions into disjoint subspaces. The nodes in each subspace are organized into a bidirectional ring. Due to the long delay of routing events in unstructured overlays, most of these approaches are inadequate to achieve scalable event matching. In contrast, SREM uses SkipCloud to organize brokers, which uses the prefix routing technique to achieve low routing latency. To reduce the routing hops, a number of methods orga- nize brokers through structured overlays which commonly need QðlogNÞ hops to locate a broker. Subscriptions and events falling into the same subspace are sent and matched on a rendezvous broker. For instance, PastryString [9] con- structs a distributed index tree for each dimension to sup- port both numerical and string dimensions. The resource discovery service proposed by Ranjan et al. [10] maps events and subscriptions into d-dimensional indexes, and hashes these indexes onto a DHT network. To ensure reli- able pub/sub service, each broker in Meghdoot [11] has a back up which is used when the primary broker fails. Com- pared with these DHT-based approaches, SREM ensures smaller forwarding latency through the prefix routing of SkipCloud, and higher event matching reliability by multi- ple brokers in each top cluster of SkipCloud and multiple candidate groups of HPartition. Recently, a number of cloud providers have offered a series of pub/sub services. For instance, Move [14] pro- vides high available key-value storage and matching respectively based on one-hop lookup [17]. BlueDove [15] adopts a single-dimensional partitioning technique to divide the entire spare and a performance-aware forward- ing scheme to select candidate matcher for each event. Its scalability is limited by the coarse-grained clustering tech- nique. SEMAS [16] proposes a fine-grained partitioning technique to achieve high matching rate. However, this partitioning technique only provides one candidate for each event and may lead to large memory cost as the num- ber of data dimensions increases. In contrast, HPartition makes a better trade-off between the matching throughput and reliability through a flexible manner of constructing logical space. 8 CONCLUSIONS AND FUTURE WORK This paper introduces SREM, a scalable and reliable event matching service for content-based pub/sub systems in cloud computing environment. SREM connects the brokers through a distributed overlay SkipCloud, which ensures reliable connectivity among brokers through its multi-level clusters and brings a low routing latency through a prefix routing algorithm. Through a hybrid multi-dimensional space partitioning technique, SREM reaches scalable and balanced clustering of high dimensional skewed subscrip- tions, and each event is allowed to be matched on any of its candidate servers. Extensive experiments with real deploy- ment based on a CloudStack testbed are conducted, produc- ing results which demonstrate that SREM is effective and practical, and also presents good workload balance, scal- ability and reliability under various parameter settings. Although our proposed event matching service can effi- ciently filter out irrelevant users from big data volume, there are still a number of problems we need to solve. Firstly, we do not provide elastic resource provisioning strategies in this paper to obtain a good performance price ratio. We plan to design and implement the elastic strategies of adjusting the scale of servers based on the churn work- loads. Secondly, it does not guarantee that the brokers dis- seminate large live content with various data sizes to the corresponding subscribers in a real-time manner. For the dissemination of bulk content, the upload capacity becomes the main bottleneck. Based on our proposed event matching service, we will consider utilizing a cloud-assisted tech- nique to realize a general and scalable data dissemination service over live content with various data sizes. ACKNOWLEDGMENTS This work was supported by the National Grand Funda- mental Research 973 Program of China (Grant No. 2011CB302601), the National Natural Science Foundation of China (Grant No.61379052), the National High Technology Research and Development 863 Program of China (Grant No.2013AA01A213), the Natural Science Foundation for Distinguished Young Scholars of Hunan Province (Grant No.14JJ1026), Specialized Research Fund for the Doctoral Program of Higher Education (Grant No.20124307110015). Yijie Wang is the corresponding author. REFERENCES [1] Dataperminite. (2014). [Online]. Available: http://www.domo. com/blog/2012/06/how-much-data-is-created-every-minute/ [2] F. Benevenuto, T. Rodrigues, M. Cha, and V. Almeida, “Characterizing user behavior in online social networks,” in Proc. 9th ACM SIGCOMM Conf. Internet Meas. Conf., 2009, pp. 49–62. 12 IEEE TRANSACTIONS ON CLOUD COMPUTING, VOL. 3, NO. 1, JANUARY-MARCH 2015
  • 13. [3] A. Carzaniga, “Architectures for an event notification service scalable to wide-area networks,” Ph.D. dissertation, Ingegneria Informatica e Automatica, Politecnico di Milano, Milan, Italy, 1998. [4] P. Eugster and J. Stephen, “Universal cross-cloud communication,” IEEE Trans. Cloud Comput., vol. 2, no. 2, pp. 103–116, 2014. [5] R. S. Kazemzadeh and H.-A Jacobsen, “Reliable and highly avail- able distributed publish/subscribe service,” in Proc. 28th IEEE Int. Symp. Reliable Distrib. Syst., 2009, pp. 41–50. [6] Y. Zhao and J. Wu, “Building a reliable and high-performance content-based publish/subscribe system,” J. Parallel Distrib. Comput., vol. 73, no. 4, pp. 371–382, 2013. [7] F. Cao and J. P. Singh, “Efficient event routing in content-based publish/subscribe service network,” in Proc. IEEE INFOCOM, 2004, pp. 929–940. [8] S. Voulgaris, E. Riviere, A. Kermarrec, and M. Van Steen, “Sub-2- sub: Self-organizing content-based publish and subscribe for dynamic and large scale collaborative networks,” Res. Rep. RR5772, INRIA, Rennes, France, 2005. [9] I. Aekaterinidis and P. Triantafillou, “Pastrystrings: A comprehen- sive content-based publish/subscribe DHT network,” in Proc. IEEE Int. Conf. Distrib. Comput. Syst., 2006, pp. 23–32. [10] R. Ranjan, L. Chan, A. Harwood, S. Karunasekera, and R. Buyya, “Decentralised resource discovery service for large scale federated grids,” in Proc. IEEE Int. Conf. e-Sci. Grid Comput., 2007, pp. 379– 387. [11] A. Gupta, O. D. Sahin, D. Agrawal, and A. El Abbadi, “Meghdoot: Content-based publish/subscribe over p2p networks,” in Proc. 5th ACM/IFIP/USENIX Int. Conf. Middleware, 2004, pp. 254–273. [12] X. Lu, H. Wang, J. Wang, J. Xu, and D. Li, “Internet-based virtual computing environment: Beyond the data center as a computer,” Future Gener. Comput. Syst., vol. 29, pp. 309–322, 2011. [13] Y. Wang, X. Li, X. Li, and Y. Wang, “A survey of queries over uncertain data,” Knowl. Inf. Syst., vol. 37, no. 3, pp. 485–530, 2013. [14] W. Rao, L. Chen, P. Hui, and S. Tarkoma, “Move: A large scale keyword-based content filtering and dissemination system,” in Proc. IEEE 32nd Int. Conf. Distrib. Comput. Syst., 2012, pp. 445–454. [15] M. Li, F. Ye, M. Kim, H. Chen, and H. Lei, “A scalable and elastic publish/subscribe service,” in Proc. IEEE Int. Parallel Distrib. Pro- cess. Symp., 2011, pp. 1254–1265. [16] X. Ma, Y. Wang, Q. Qiu, W. Sun, and X. Pei, “Scalable and elastic event matching for attribute-based publish/subscribe systems,” Future Gener. Comput. Syst., vol. 36, pp. 102–119, 2013. [17] A. Lakshman and P. Malik, “Cassandra: A decentralized struc- tured storage system,” Oper. Syst. Rev., vol. 44, no. 2, pp. 35–40, 2010. [18] M. Sathiamoorthy, M. Asteris, D. Papailiopoulos, A. G. Dimakis, R. Vadali, S. Chen, and D. Borthakur, “Xoring elephants: Novel erasure codes for big data,” in Proc. 39th Int. Conf. Very Large Data Bases, 2013, pp. 325–336. [19] S. Voulgaris, D. Gavidia, and M. van Steen, “Cyclon: Inexpensive membership management for unstructured p2p overlays,” J. Netw. Syst. Manage., vol. 13, no. 2, pp. 197–217, 2005. [20] B. Y. Zhao, L. Huang, J. Stribling, S. C. Rhea, A. D. Joseph, and J. D. Kubiatowicz, “Tapestry: A resilient global-scale overlay for service deployment,” IEEE J. Sel. Areas Commun., vol. 22, no. 1, pp. 41–53, Jan. 2004. [21] Murmurhash. (2014). [Online]. Available: http://burtleburtle. net/bob/hash/doobs.html [22] A.-M Kermarrec, L. Massoulie, and A. J. Ganesh, “Probabilistic reliable dissemination in large-scale systems,” IEEE Trans. Parallel Distrib. Syst., vol. 14, no. 3, pp. 248–258, Mar. 2003. [23] M. Jelasity, A. Montresor, and €O. Babaoglu, “Gossip-based aggre- gation in large dynamic networks,” ACM Trans. Comput. Syst., vol. 23, no. 3, pp. 219–252, 2005. [24] Cloudstack. (2014). [Online]. Available: (http://cloudstack. apache.org/) [25] Zeroc. (2014). [Online]. Available: (http://www.zeroc.com/) [26] B. He, D. Sun, and D. P. Agrawal, “Diffusion based distributed internet gateway load balancing in a wireless mesh network,” in Proc. IEEE Global Telecommun. Conf., 2009, pp. 1–6. [27] Y. Wang and S. Li, “Research and performance evaluation of data replication technology in distributed storage systems,” Comput. Math. Appl., vol. 51, no. 11, pp. 1625–1632, 2006. Xingkong Ma received the BS degree in computer science and technology from the School of Com- puter of Shandong University, China, in 2007, and the MS degree in computer science and technol- ogy from National University of Defense Technol- ogy, China, in 2009. He is currently working toward the PhD degree at the School of Computer of National University of Defense Technology. His current research interests lie in the areas of data dissemination, publish/subscribe. He is a student member of the IEEE. Yijie Wang received the PhD degree in computer science and technology from the National Univer- sity of Defense Technology, China, in 1998. She is currently a professor at the Science and Tech- nology on Parallel and Distributed Processing Laboratory, National University of Defense Tech- nology. Her research interests include network computing, massive data processing, and parallel and distributed processing. She is a member of the IEEE. Xiaoqiang Pei received the BS and MS degrees in computer science and technology from National University of Defense Technology, China, in 2009 and 2011, respectively. He is cur- rently working toward the PhD degree at the School of Computer of National University of Defense Technology. His current research inter- ests lie in the areas of fault-tolerant storage and cloud computing. For more information on this or any other computing topic, please visit our Digital Library at www.computer.org/publications/dlib. MA ET AL.: A SCALABLE AND RELIABLE MATCHING SERVICE FOR CONTENT-BASED PUBLISH/SUBSCRIBE SYSTEMS 13