Final Year Project IEEE 2015

TTA
FINAL YEAR PROJECTS TITLES
WITH ABSTRACT
www.ttafinalyearprojects.com
IEEE 2015, 2014, 2013, 2012, etc..,
Projects for B.E/B.Tech/M.E/MCA/Bsc/Msc
For complete base paper, call now and talk
to our expert
90942066260 | 9042066280| 044 4353 3393

DOMAIN : NETWORKING
CODE PROJECT TITLE DESCRIPTION REFERENCE
TTA-DN-
C1501
Delay Analysis of
Multichannel
Opportunistic Spectrum
Access MAC Protocols
We provide a comprehensive delay and
queuing analysis for two baseline
medium access control protocols for multi-user
cognitive radio networks with homogeneous
users and channels and investigate the impact
of different network parameters on the system
performance. In addition to an accurate
Markov chain, which follows the queue status
of all users, several lower complexity queuing
theory approximations are provided. Accuracy
and performance of the proposed analytical
approximations are verified with extensive
simulations. It is observed that using an Aloha-
type access to the control channel, a
buffering MAC protocol, where in case of
interruption the CR user waits for the primary
user to vacate the channel before resuming the
transmission, outperforms a switching MAC
protocol, where the CR user vacates the
channel in case of appearance of primary users
and then compete again to gain access to a new
channel. The reason is that
the delay bottleneck for both protocols is the
time required to successfully access the control
channel, which occurs more frequently for the
switching MAC protocol. It is thus shown that
a clustering approach, where users are divided
into clusters with a separate control channel
per cluster, can significantly improve the
performance by reducing the competitions over
control channel.
IEEE 2015
TTA-DN-
C1502
LEISURE A Framework
for Load-Balanced
Network - Wide Traffic
Measurement
Network-wide traffic measurement is of
interest to network operators to uncover
global network behavior for the management
tasks of traffic accounting, debugging or
troubleshooting, security, and
traffic engineering. Increasingly,
sophisticated network measurement tasks such
as anomaly detection and security forensic
analysis are requiring in-depth fine-grained
IEEE 2015

flow-level measurements. However,
performing in-depth per-
flow measurements (e.g., detailed payload
analysis) is often an expensive process. Given
the fast-changing Internet traffic landscape and
large traffic volume, a single monitor is not
capable of accomplishing
the measurement tasks for all applications of
interest due to its resource constraint.
Moreover, uncovering global network behavior
requires network-wide traffic measurements at
multiple monitors across
the network since traffic measured at any
single monitor only provides a partial view and
may not be sufficient or accurate. These
factors call for coordinated measurements
among multiple distributed monitors. In this
paper, we present a centralized
optimization framework, LEISURE (Load-
Equalized measurement), for load-
balancing network measurement workloads
across distributed monitors. Specifically, we
consider various load-balancing problems
under different objectives and study their
extensions to support different deployment
scenarios. We evaluate LEISURE via detailed
simulations on Abilene and
GEANT network traces to show
that LEISURE can achieve much better load-
balanced performance (e.g., 4.75X smaller
peak workload and 70X smaller variance in
workloads) across all coordinated monitors in
comparison to naive solution (uniform
assignment) to accomplish network-
wide traffic measurement tasks.
TTA-DN-
C1503
Authenticated Key
Exchange Protocols for
Parallel Network File
Systems
We study the problem of key establishment for
secure many-to-many communications. The
problem is inspired by the proliferation of
large-scale
distributed file systems supporting parallel acc
ess to multiple storage devices. Our work
focuses on the current Internet standard for
such file systems, i.e., parallel
Network File System (pNFS), which makes
use of Kerberos to
IEEE 2015

establish parallel session keys between clients
and storage devices. Our review of the existing
Kerberos-based protocol shows that it has a
number of limitations: (i) a metadata server
facilitating key exchange between the clients
and the storage devices has heavy workload
that restricts the scalability of the protocol; (ii)
the protocol does not provide forward secrecy;
(iii) the metadata server generates itself all the
session keys that are used between the clients
and storage devices, and this inherently leads
to key escrow. In this paper, we propose a
variety
of authenticated key exchange protocols that
are designed to address the above issues. We
show that our protocols are capable of
reducing up to approximately 54% of the
workload of the metadata server and
concurrently supporting forward secrecy and
escrow-freeness. All this requires only a small
fraction of increased computation overhead at
the client.
TTA-DN-
C1504
Diversifying Web
Service
Recommendation
Results via Exploring
Service Usage History
The last decade has witnessed a tremendous
growth of Web services as a major technology
for sharing data, computing resources, and
programs on the Web. With the increasing
adoption and presence of Web services, design
of novel approaches for
effective Web service recommendation to
satisfy users’ potential requirements has
become of paramount importance.
Existing Web service
commendation approaches mainly focus on
predicting missing QoS values of Web service
candidates which are interesting to a user using
collaborative filtering approach, content-based
approach, or their hybrid.
These recommendation approaches assume
that recommended Web services are
independent to each other, which sometimes
may not be true. As a result, many similar or
redundant Web services may exist in
a recommendation list. In this paper, we
propose a novel Web
service recommendation approach
IEEE 2015

incorporating a user’s potential QoS
preferences and diversity feature of user
interests on Web services. User’s interests and
QoS preferences on Web services are first
mined
by exploring the Web service usage history.
Then we compute scores of Web service
candidates by measuring their relevance with
historical and potential user interests, and their
QoS utility. We also construct
a Web service graph based on the functional
similarity between Web services. Finally, we
present an innovative diversity-
aware Web service ranking algorithm to rank
the Web service candidates based on their
scores, and diversity degrees derived from
the Web service graph. Extensive experiments
are conducted based on a real
world Web service dataset, indicating that our
proposed Web service recommendation approa
ch significantly improves the quality of their
commendation results compared with existing
methods.
TTA-DN-
C1505
Virtual Servers Co-
Migration for Mobile
Accesses Online vs.
Offline
In this paper, we study the problem of co-
migrating a set of service replicas residing on
one or more redundant virtual servers in clouds
in order to satisfy a sequence of mobile batch-
request demands in a cost effective way. With
such a migration, we can not only reduce the
service access latency for end users but also
minimize the network costs for service
providers. The co-migration can be achieved at
the cost of bulk-data transfer and increases the
overall monetary costs for the service
providers. To gain the benefits of
service migration while minimizing the overall
costs, we propose a co-migration algorithm
Migk for multiple servers, each hosting a
service replica. Migk is a randomized
algorithm with a competitive cost of O(γ log
n/min{1/k, μ/λ+μ}) to migrate κ services in a
static n-node network where γ is the maximal
ratio of the migration costs between any pair of
neighbor nodes in the network, and where λ
and μ represent the maximum wired
IEEE 2015

transmission cost and the wireless link cost
respectively. For comparison, we also study
this problem in its static off-line form by
proposing a parallel dynamic programming
(hereafter DP) based algorithm that integrates
the branch & bound strategy with sampling
techniques in order to approximate the optimal
DP results. We validate the advantage of the
proposed algorithms via extensive simulation
studies using various requests patterns and
cloud network topologies. Our simulation
results show that the proposed algorithms can
effectively adapt to mobile access patterns to
satisfy the service request sequences in a cost-
effective way.
TTA-DN-
C1506
Anomaly-Based
Network Intrusion
Detection System
We present POSEIDON, a new anomaly-
based network intrusion detection system.
POSEIDON is payload-based, and has a two-
tier architecture: the first stage consists of a
self-organizing map, while the second one is a
modified PAYL system. Our benchmarks on
the 1999 DARPA data set show a
higher detection rate and lower number of false
positives than PAYL and PHAD
IEEE 2015
TTA-DN-
C1507
CEDAR A Low-Latency
and Distributed
Strategy for Packet
Recovery in Wireless
Networks
Underlying link-layer protocols of well-
established wireless networks that use the
conventional “store-and-forward” design
paradigm cannot provide highly sustainable
reliability and stability in wireless
communication, which introduce significant
barriers and setbacks in scalability and
deployments of wireless networks. In this
paper, we propose a Code
Embedded Distributed Adaptive and Reliable
(CEDAR) link-layer framework that
targets low latency and balancing en/decoding
load among nodes. CEDAR is the first
comprehensive theoretical framework for
analyzing and designing distributed and
adaptive error recovery for wireless networks.
It employs a theoretically sound framework for
embedding channel codes in each packet and
performs the error correcting process in
selected intermediate nodes in a packet's route.
To identify the intermediate nodes for the
IEEE 2015

decoding, we mathematically calculate the
average packet delay and formalize the
problem as a nonlinear integer programming
problem. By minimizing the delays, we derive
three propositions that: 1) can identify the
intermediate nodes that minimize the
propagation and transmission delay of
a packet; and 2) and 3) can identify the
intermediate nodes that simultaneously
minimize the queuing delay and maximize the
fairness of en/decoding load of all the nodes.
Guided by the propositions, we then propose a
scalable and distributed scheme in CEDAR to
choose the intermediate en/decoding nodes in a
route to achieve its objective. The results from
real-world test bed “NESTbed” and simulation
with MATLAB prove that CEDAR is superior
to schemes using hop-by-hop decoding and
destination decoding not only in packet delay
and throughput but also in energy-consumption
and load distribution balance.
TTA-DN-
C1508
CoCoWa A
Collaborative Contact-
Based Watchdog for
Detecting Selfish Nodes
Mobile ad-hoc networks (MANETs) assume
that mobile nodes voluntary cooperate in order
to work properly. This cooperation is a cost-
intensive activity and some nodes can refuse to
cooperate, leading to a selfish node behavior.
Thus, the overall network performance could
be seriously affected. The use of watchdogs is
a well-known mechanism
to detect selfish nodes. However, the detection
process performed by watchdogs can fail,
generating false positives and false negatives
that can induce to wrong operations.
Moreover, relying on local watchdogs alone
can lead to poor performance when
detecting selfish nodes, in term of precision
and speed. This is specially important on
networks with sporadic contacts, such as delay
tolerant networks (DTNs), where
sometimes watchdogs lack of enough time or
information to detect the selfish nodes. Thus,
we propose collaborative contact-based
watchdog (CoCoWa) as
a collaborative approach based on the diffusion
of local selfish nodes awareness when
IEEE 2015

a contact occurs, so that information
about selfish nodes is quickly propagated. As
shown in the paper, this collaborative approach
reduces the time and increases the precision
when detecting selfish nodes.
TTA-DN-
C1509
Distributed
Opportunistic
Scheduling for
EnergyHarvesting
Based Wireless
Networks A Two-
StageProbing Approach
This paper considers a heterogeneous ad
hoc network with multiple transmitter-receiver
pairs, in which all transmitters are capable of
harvesting renewable energy from the
environment and compete for one shared
channel by random access. In particular, we
focus on two different scenarios: the constant
energy harvesting (EH) rate model where the
EH rate remains constant within the time of
interest and the i.i.d. EH rate model where the
EH rates are independent and
identically distributed across different
contention slots. To quantify the roles of both
the energy state information (ESI) and the
channel state information (CSI),
a distributed opportunistic scheduling (DOS)
framework with two-stage probing and save-
then-transmit energy utilization is proposed.
Then, the optimal throughput and the optimal
scheduling strategy are obtained via one-
dimension search, i.e., an iterative algorithm
consisting of the following two steps in each
iteration: First, assuming that the stored energy
level at each transmitter is stationary with a
given distribution, the expected throughput
maximization problem is formulated as an
optimal stopping problem, whose solution is
proven to exist and then derived for both
models; second, for a fixed stopping rule, the
energy level at each transmitter is shown to be
stationary and an efficient iterative algorithm
is proposed to compute its steady-state
distribution. Finally, we validate our analysis
by numerical results and quantify the
throughput gain compared with the best-effort
delivery scheme.
IEEE 2015
TTA-DN-
C1510
Enabling Efficient Multi-
Keyword Ranked
Search Over Encrypted
Mobile Cloud Data
In mobile cloud computing, a fundamental
application is to outsource the mobile data to
external cloud servers for scalable data storage.
The outsourced data, however, need to
IEEE 2015

Through Blind Storage be encrypted due to the privacy and
confidentiality concerns of their owner. This
results in the distinguished difficulties on the
accurate search over
the encrypted mobile cloud data. To tackle this
issue, in this paper, we develop the searchable
encryption for multi-
keyword ranked search over the storage data.
Specifically, by considering the large number
of outsourced documents (data) in the cloud,
we utilize the relevance score and k-nearest
neighbor techniques to develop
an efficient multi-keyword search scheme that
can return the ranked search results based on
the accuracy. Within this framework, we
leverage an efficient index to further improve
the search efficiency, and adopt
the blind storage system to conceal access
pattern of the search user. Security analysis
demonstrates that our scheme can achieve
confidentiality of documents and index,
trapdoor privacy, trapdoor unlinkability, and
concealing access pattern of the search user.
Finally, using extensive simulations, we show
that our proposal can achieve much improved
efficiency in terms of search functionality
and search time compared with the existing
proposals.
TTA-DN-
C1511
Energy-Efficient Group
Key Agreement for
Wireless Networks
Advances in lattice-based cryptography are
enabling the use of public key algorithms
(PKAs) in power-constrained ad hoc and
sensor network devices. Unfortunately, while
many wireless networks are dominated
by group communications, PKAs are
inherently unicast i.e., public/private key pairs
are generated by data destinations. To fully
realize public key cryptography in
these networks, lightweight PKAs should be
augmented with energy-efficient mechanisms
for group key agreement. We consider a
setting where master keys are loaded on clients
according to an arbitrary distribution. We
present a protocol that uses
session keys derived from those master keys to
establish a group key that is information-
IEEE 2015

theoretically secure. When master keys are
distributed randomly, our protocol requires
O(logb t) multicasts, where 1-1 is the
probability that a given client possesses a
given master key. The minimum number of
public multicast transmissions required for a
set of clients to agree on a secret key in our
setting was recently characterized. The
proposed protocol achieves the best possible
approximation to that optimum that is
computable in polynomial time. Moreover, the
computational requirements of our protocol
compare favorably to multi-party extensions of
Diffie-Hellman key exchange.
TTA-DN-
C1512
iPath Path Inference in
Wireless Sensor
Networks
Recent wireless sensor networks (WSNs) are
becoming increasingly complex with the
growing network scale and the dynamic nature
of wireless communications. Many
measurement and diagnostic approaches
depend on per-packet routing paths for
accurate and fine-grained analysis of the
complex network behaviors. In this paper, we
propose iPath, a novel path inference approach
to reconstructing the per-packet
routing paths in dynamic and large-
scale networks. The basic idea of iPath is to
exploit high path similarity to iteratively infer
long paths from short ones. iPath starts with an
initial known set ofpaths and
performs path inference iteratively. iPath inclu
des a novel design of a lightweight hash
function for verification of the inferred paths.
In order to further improve
the inference capability as well as the
execution efficiency, iPath includes a fast
bootstrapping algorithm to reconstruct the
initial set ofpaths. We also
implement iPath and evaluate its performance
using traces from large-scale WSN
deployments as well as extensive simulations.
Results show that iPath achieves much higher
reconstruction ratios under
different network settings compared to other
state-of-the-art approaches.
IEEE 2015

TTA-DN-
C1513
Joint Static and
Dynamic Traffic
Scheduling in Data
Center Networks
The advent and continued growth of
large data centers has led to much interest in
switch architectures that can economically
meet the high capacities needed for
interconnecting the thousands of servers in
these data centers. Various multilayer
architectures employing thousands of switches
have been proposed in the literature. We make
use of the observation that the traffic in
a data center is a mixture of
relatively static and rapidly fluctuating
components, and develop a combined
scheduler for both these components using a
generalization of the load-balanced scheduler.
The presence of the known static component
introduces asymmetries in the ingress-egress
capacities, which preclude the use of a load-
balanced scheduler as is. We generalize the
load-balanced scheduler and also incorporate
an opportunistic scheduler that sends traffic on
a direct path when feasible to enhance the
overall switch throughput. Our evaluations
show that this scheduler works very well
despite avoiding the use of a central scheduler
for making packet-by-
packet scheduling decisions.
IEEE 2015
TTA-DN-
C1514
On Downlink
Beamforming with
Small Cells inWireless
Heterogeneous
Systems
In this letter, we study downlink beam
forming for wireless heterogeneous networks
with two groups of users. The users in one
group (group 1) are supported by the small cell
base station (SBS) as well as the macro cell
base station (MBS), while the users in the
other group (group 2) are supported by the
MBS only. The MBS is equipped with an
antenna array for downlink beam forming. We
formulate a convex optimization problem,
which can be solved by semi definite
programming (SDP) relaxation, for
downlink beam forming that takes advantage
of the presence of the SBS for group 1, but
also takes into account the interfering signal
from the SBS for group 2.
IEEE 2015

TTA-DN-
C1515
On-Demand Discovery
of Software Service
Dependencies in
MANETs
The dependencies among the components
of service-oriented software applications
hosted in a mobile ad hoc network (MANET)
are difficult to determine due to the inherent
loose coupling of the services and the transient
communication topologies of the network. Yet
understanding these dependencies is critical to
making good management decisions, since
dependence data underlie important analyses
such as fault localization and impact analysis.
Current methods for discovering dependencies,
developed primarily for fixed networks,
assume that dependencies change only slowly
and require relatively long monitoring periods
as well as substantial memory and
communication resources, all of which are
impractical in the MANET environment. We
describe a new dynamic dependence discovery
method designed specifically for this
environment, yielding dynamic snapshots of
dependence relationships discovered through
observations of service interactions. We
evaluate the performance of our method in
terms of the accuracy of the
discovered dependencies, and draw insights on
the selection of critical parameters under
various operational conditions. Although
operated under more stringent conditions, our
method is shown to provide results comparable
to or better than existing methods.
IEEE 2015
TTA-DN-
C1516
PWDGR Pair-Wise
Directional
Geographical Routing
Based on Wireless
Sensor Network
Multipath routing in wireless multimedia senso
r network makes it possible to transfer data
simultaneously so as to reduce delay and
congestion and it is worth researching.
However, the current
multipath routing strategy may cause problem
that the node energy near sink becomes
obviously higher than other nodes which
makes the network invalid and dead. It also has
serious impact on the performance
of wireless multimedia sensor network (WMS
N). In this paper, we propose a pair-wise
directional geographical routing (PWDGR)
strategy to solve the energy bottleneck
problem. First, the source node can send the
IEEE 2015

data to the pair-wise node around the sink node
in accordance with certain algorithm and then
it will send the data to the sink node.
These pair-wise nodes are equally selected in
360° scope around sink according to a certain
algorithm. Therefore, it can effectively relieve
the serious energy burden around Sink and also
make a balance between energy consumption
and end-to-end delay. Theoretical analysis and
a lot of simulation experiments
on PWDGR have been done and the results
indicate that PWDGR is superior to the
proposed strategies of the similar strategies
both in the view of the theory and the results of
those simulation experiments. With respect to
the strategies of the same kind, PWDGR is
able to prolong 70% network life. The delay
time is also measured and it is only increased
by 8.1% compared with the similar strategies.
TTA-DN-
C1517
REAL A Reciprocal
Protocol for Location
Privacy in Wireless
Sensor Networks
K-anonymity has been used to
protect location privacy for location monitorin
g services in wireless
sensor networks (WSNs), where sensor nodes
work together to report k-anonymized
aggregate locations to a server. Each k-
anonymized aggregate location is a cloaked
area that contains at least k persons. However,
we identify an attack model to show that
overlapping aggregate locations still pose
privacy risks because an adversary can infer
some overlapping areas with less than k
persons that violates the k-
anonymity privacy requirement. In this paper,
we propose a reciprocal protocol for
location privacy (REAL) in WSNs.
In REAL, sensor nodes are required to
autonomously organize their sensing areas into
a set of non-overlapping and highly accurate k-
anonymized aggregate locations. To confront
the three key challenges in REAL, namely,
self-organization, reciprocity property and high
accuracy, we design a state transition process,
a locking mechanism and a time delay
mechanism, respectively. We compare the
performance of REAL with
IEEE 2015

current protocols through simulated
experiments. The results show
that REAL protects location privacy, provides
more accurate query answers, and reduces
communication and computational costs.
TTA-DN-
C1518
SanGA A Self-Adaptive
Network-Aware
Approach to Service
Composition
Service-Oriented Computing enables
the composition of loosely
coupled services provided with varying
Quality of Service (QoS) levels. Selecting a
near-optimal set of services for
a composition in terms of QoS is crucial when
many functionally equivalent services are
available. As the number of distributed
services, particularly in the cloud, is rising
rapidly, the impact of the network on the QoS
keeps increasing. Despite this,
current approaches do not differentiate
between the QoS of services themselves and
the network. Therefore, the computed latency
differs from the actual latency, resulting in
suboptimal QoS. Thus, we propose a network-
aware approach that handles the QoS
of services and the QoS of
the network independently. First, we build
a network model in order to estimate
the network latency between
arbitrary services and potential users. Our
selection algorithm then leverages this
model to find compositions with a low latency
for a given execution policy. We employ
a self-adaptive genetic algorithm which
balances the optimization of latency and other
QoS as needed and improves the convergence
speed. In our evaluation, we show that
our approach works under realistic network
conditions, efficiently
computing compositions with much lower
latency and otherwise equivalent QoS
compared to current approaches.
IEEE 2015
TTA-DN-
C1519
Secure Binary Image
Stegnograpghy Based
On Minimizing the
disortion on the texture
Most state-of-the-
art binary image steganographic techniques
only consider the flipping distortion according
to the human visual system, which will be
not secure when they are attacked by
IEEE 2015

steganalyzers. In this paper,
a binary image steganographic scheme that
aims to minimize the embedding distortion on
the texture is presented. We extract the
complement, rotation, and mirroring-invariant
local texture patterns (crmiLTPs) from
the binary image first. The weighted sum of
crmiLTP changes when flipping one pixel is
then employed to measure the flipping
distortion corresponding to that pixel. By
testing on both simple binary images and the
constructed image data set, we show that the
proposed measurement can well describe the
distortions on both visual quality and
statistics. Based on the proposed measurement,
a practical steganographic scheme is
developed. The steganographic scheme
generates the cover vector by dividing the
scrambled image into super pixels. Thereafter,
the syndrome-trellis code is employed
to minimize the designed embedding
distortion. Experimental results have
demonstrated that the proposed steganographic
scheme can achieve statistical security without
degrading the image quality or the embedding
capacity.
TTA-DN-
C1520
Software Puzzle A
Countermeasure to
Resource-Inflated
Denial-of- Service
Attacks
Denial-of-service (DoS) and distributed DoS
(DDoS) are among the major threats to cyber-
security, and client puzzle, which demands a
client to perform computationally expensive
operations before being granted services from
a server, is a well-
known countermeasure to them. However, an
attacker can inflate its capability of
DoS/DDoS attacks with fast puzzle-
solving software and/or built-in graphics
processing unit (GPU)
hardware to significantly weaken the
effectiveness of client puzzles. In this paper,
we study how to prevent DoS/DDoS attackers
from inflating their puzzle-solving
capabilities. To this end, we introduce a new
client puzzle referred to as software puzzle.
Unlike the existing client puzzle schemes,
which publish their puzzle algorithms in
IEEE 2015

advance, a puzzle algorithm in the present
software puzzle scheme is randomly generated
only after a client request is received at the
server side and the algorithm is generated such
that: 1) an attacker is unable to prepare an
implementation to solve the puzzle in advance
and 2) the attacker needs considerable effort in
translating a central processing
unit puzzle software to its functionally
equivalent GPU version such that the
translation cannot be done in real time.
Moreover, we show
how to implement software puzzle in the
generic server-browser model.
TTA-DN-
C1521
Task Allocation for
Wireless Sensor
Network Using Modified
Binary Particle Swarm
Optimization
Many applications
of wireless sensor network (WSN) require the
execution of several computationally intense
in-network processing tasks. Collaborative in-
network processing among multiple nodes is
essential when executing such a task due to the
strictly constrained energy and resources in
single node. Task allocation is essential to
allocate the workload of each task to proper
nodes in an efficient manner. In this paper,
a modified version
of binary particle swarm optimization (MBPS
O), which adopts a different transfer function
and a new position updating procedure with
mutation, is proposed for the
task allocation problem to obtain the best
solution. Each particle in MBPSO is encoded
to represent a complete potential solution
for task allocation. The task workload and
connectivity are ensured by taking them as
constraints for the problem. Multiple metrics,
including task execution time, energy
consumption, and network lifetime, are
considered a whole by designing a hybrid
fitness function to achieve the best overall
performance. Simulation results show the
feasibility of the proposed MBPSO-based
approach for task allocation problem in WSN.
The proposed MBPSO-based approach also
outperforms the approaches based on genetic
algorithm and BPSO in the comparative
IEEE 2015

analysis.
TTA-DN-
C1522
Towards Distributed
Optimal Movement
Strategy for Data
Gathering in Wireless
Sensor Network
In this paper, we address how to design
a distributed movement strategy for mobile
collectors, which can be either physical mobile
agents or query/collector packets periodically
launched by the sink, to achieve
successful data gathering in wireless sensor net
works. Formulating the problem as general
random walks on a graph composed
of sensor nodes, we analyze how
much data can be successfully gathered in time
under any Markovian random-
walk movement strategies for mobile
collectors moving over a graph (or network),
while each sensor node is equipped with
limited buffer space and data arrival rates are
heterogeneous over different sensor nodes. In
particular, from the analysis, we obtain the
optimal movement strategy among a class of
Markovian strategies so as to minimize
the data loss rate over all sensor nodes, and
explain how such
an optimal movement strategy can be made to
work in a distributed fashion. We demonstrate
that
our distributed optimal movement strategy can
lead to about 2 times smaller loss rate than a
standard random walk strategy under diverse
scenarios. In particular, our strategy results in
up to 70% cost savings for the deployment of
multiple collectors to achieve the target
data loss rate than the standard random
walk strategy.
IEEE 2015
TTA-DN-
C1523
Universal Network
Coding-Based
Opportunistic Routing
for Unicast
Network coding-
based opportunistic routing has emerged as an
elegant way to optimize the capacity of lossy
wireless multihop networks by reducing the
amount of required feedback messages. Most
of the works on network coding-
based opportunistic routing in the literature
assume that the links are independent. This
assumption has been invalidated by the recent
empirical studies that showed that the
IEEE 2015

correlation among the links can be arbitrary. In
this work, we show that the performance
of network coding-
based opportunistic routing is greatly impacted
by the correlation among the links. We
formulate the problem of maximizing the
throughput while achieving fairness under
arbitrary channel conditions, and we identify
the structure of its optimal solution. As is
typical in the literature, the optimal solution
requires a large amount of immediate feedback
messages, which is unrealistic. We propose the
idea of performing network coding on the
feedback messages and show that if the
intermediate node waits until receiving only
one feedback message from each next-hop
node, the optimal level of network coding
redundancy can be computed in a distributed
manner. The coded feedback messages require
a small amount of overhead, as they can be
integrated with the packets. Our approach is
also oblivious to losses and correlations among
the links, as it optimizes the performance
without the explicit knowledge of these two
factors.
TTA-JN-
C1524
VEGAS Visual influEnce
GrAph Summarization
on Citation Networks
Visually analyzing citation networks poses
challenges to many fields of the data mining
research. How can we summarize a
large citation graph according to the user's
interest? In particular, how can we illustrate
the impact of a highly influential paper through
the summarization? Can we maintain the
sensory node-link graph structure while
revealing the flow-based influence patterns and
preserving a fine readability? The state-of-the-
art influence maximization algorithms can
detect the most influential node in
a citation network, but fail to summarize
a graph structure to account for its influence.
On the other hand,
existing graph summarization methods fold
large graphs into clustered views, but can not
reveal the hidden influence patterns underneath
the citation network. In this paper, we first
formally define
IEEE 2015

the Influence Graph Summarization problem
on citation networks. Second, we propose a
matrix decomposition based algorithm pipeline
to solve the IGS problem. Our method can not
only highlight the flow-
based influence patterns, but also easily extend
to support the rich attribute information. A
prototype system called VEGAS implementing
this pipeline is also developed. Third, we
present a theoretical analysis on our main
algorithm, which is equivalent to the kernel k-
mean clustering. It can be proved that the
matrix decomposition based algorithm can
approximate the objective of the proposed IGS
problem. Last, we conduct comprehensive
experiments with real-
world citation networks to compare the
proposed algorithm with
classical graph summarization methods.
Evaluation results demonstrate that our method
significantly outperforms the previous ones in
optimizing both the quantitative IGS objective
and the quality of the visual summarizations.
TTA-JN-
C1525
Privacy Protection for
Wireless Medical
Sensor Data
In recent
years, wireless sensor networks have
been widely used in healthcare
applications, such as hospital and home
patient
monitoring. Wireless medical sensor net
works are more vulnerable to
eavesdropping, modification,
impersonation and replaying attacks
than the wired networks. A lot of work
has been done to
secure wireless medical sensor networks
. The existing solutions can protect the
patient data during transmission, but
cannot stop the inside attack where the
administrator of the patient database
reveals the sensitive patient data. In this
paper, we propose a practical approach
to prevent the inside attack by using
IEEE 2015

multiple data servers to store
patient data. The main contribution of
this paper is securely distributing the
patient data in multiple data servers and
employing the Paillier and ElGamal
cryptosystems to perform statistic
analysis on the patient data without
compromising the patients’ privacy.
TTA-JN-
C1526
A Decentralized Cloud
Firewall Framework
with Resources
Provisioning Cost
Optimization
Cloud computing is becoming popular as the
next infrastructure of computing platform.
Despite the promising model and hype
surrounding, security has become the major
concern that people hesitate to transfer their
applications to clouds.
Concretely, cloud platform is under numerous
attacks. As a result, it is definitely expected to
establish a firewall to protect cloud from these
attacks. However, setting up a
centralized firewall for a whole cloud data
center is infeasible from both performance and
financial aspects. In this paper, we propose
a decentralized cloud firewall framework for
individual cloud customers. We investigate
how to dynamically allocate resources to
optimize resources provisioning cost, while
satisfying QoS requirement specified by
individual customers simultaneously.
Moreover, we establish novel queuing theory
based model M/Geo/1 and M/Geo/m for
quantitative system analysis, where the service
times follow a geometric distribution. By
employing Z-transform and embedded Markov
chain techniques, we obtain a closed-form
expression of mean packet response time.
Through extensive simulations and
experiments, we conclude that an M/Geo/1
model reflects the cloud firewall real system
much better than a traditional M/M/1 model.
Our numerical results also indicate that we are
able to set up cloud firewall with
affordable cost to cloud customers.
IEEE 2015

TTA-JN-
C1527
A Privacy-Aware
Authentication Scheme
for Distributed Mobile
Cloud Computing
Services
In modern societies, the number
of mobile users has dramatically risen in recent
years. In this paper, an
efficient authentication scheme for distributed
mobile cloud computing services is proposed.
The proposed scheme provides security and
convenience for mobile users to access
multiple mobile cloud
computing services from
multiple service providers using only a single
private key. The security strength of the
proposed scheme is based on bilinear pairing
cryptosystem and dynamic nonce generation.
In addition, the scheme supports
mutual authentication, key exchange, user
anonymity, and user untraceability. From
system implementation point of view,
verification tables are not required for the
trusted smart card generator
(SCG) service and cloud computing service pr
oviders when adopting the proposed scheme.
In consequence, this scheme reduces the usage
of memory spaces on these
corresponding service providers. In
one mobile user authentication session, only
the targeted cloud service provider needs to
interact with the service requestor (user). The
trusted SCG serves as the secure key
distributor
for distributed cloud service providers
and mobile clients. In the proposed scheme,
the trusted SCG service is not involved in
individual user authentication process. With
this design,
our scheme reduces authentication processing
time required by communication and
computation between cloud service providers
and traditional trusted third party service.
Formal security proof and performance
analyses are conducted to show that
the scheme is both secure and efficient.
IEEE 2015
TTA-JN-
C1528
CPCDN Content
Delivery Powered by
Context and User
Intelligence
There is an unprecedented trend
that content providers (CPs) are building their
own content delivery networks (CDNs) to
provide a variety of content services to
IEEE 2015

their users. By exploiting powerful CP-level
information in content distribution, these CP-
built CDNs open up a whole new design space
and are changing
the content delivery landscape. In this paper,
we adopt a measurement-based approach to
understanding why, how, and how much CP-
level intelligences can help content delivery.
We first present a measurement study of the
CDN built by Tencent, a
largest content provider based in China. We
observe new characteristics and trends
in content delivery which pose great
challenges to the
conventional content delivery paradigm and
motivate the proposal of CPCDN, a
CDN powered by CP-aware information. We
then reveal the benefits obtained by exploiting
two indispensable CP-level intelligences,
namely context intelligence and user intelligen
ce, in content delivery. Inspired by the insights
learnt from the measurement studies, we
systematically explore the design space
of CPCDNand present the novel architecture
and algorithms to address the
new content delivery challenges that have
arisen. Our results not only demonstrate the
potential of CPCDN in
pushing content delivery performance to the
next level, but also identify new research
problems calling for further investigation.
TTA-JN-
C1529
QoS Evaluation for Web
Service
Recommendation
Web service recommendation is one of the
most important fields of research in the area
of service computing. The two core problems
of Web service recommendation are the
prediction of unknown QoSproperty values
and the evaluation of overall QoS according to
user preferences. Aiming to address these two
problems and their current challenges, we
propose two efficient approaches to solve these
problems. First, unknown QoS property values
were predicted by modeling the high-
dimensional QoSdata as tensors, by utilizing
an important tensor operation, i.e., tensor
composition, to predict these QoSvalues. Our
IEEE 2015

method, which considers all QoS dimensions
integrally and uniformly, allows us to predict
multi-dimensional QoS values accurately and
easily. Second, the overall QoS was evaluated
by proposing an efficient user preference
learning method, which learns user preferences
based on users' ratings history data, allowing
us to obtain user preferences quantifiably and
accurately. By solving these two core
problems, it became possible to compute a
realistic value for the overall QoS. The
experimental results showed our proposed
methods to be more efficient than existing
methods.
TTA-JN-
C1530
Towards Information
Diffusion in Mobile
Social Networks
The emerging of mobile social networks opens
opportunities for viral marketing. However,
before fully utilizing mobile social networks as
a platform for viral marketing, many
challenges have to be addressed. In this paper,
we address the problem of identifying a small
number of individuals through whom
the information can be diffused to
the network as soon as possible, referred to as
the diffusion minimization
problem. Diffusion minimization under the
probabilistic diffusion model can be
formulated as an asymmetric k- center problem
which is NP-hard, and the best known
approximation algorithm for the asymmetric k-
center problem has approximation ratio of log
n and time complexity O(n5). Clearly, the
performance and the time complexity of the
approximation algorithm are not satisfiable in
large-scale mobile social networks. To deal
with this problem, we propose a community
based algorithm and a distributed set-cover
algorithm. The performance of the proposed
algorithms is evaluated by extensive
experiments on both synthetic networks and a
real trace. The results show that the
community based algorithm has the best
performance in both synthetic networks and
the real trace compared to existing algorithms,
and the distributed set-cover algorithm
outperforms the approximation algorithm in
IEEE 2015

the real trace in terms of diffusion time.
TTA-JN-
C1531
Location-Sharing
Systems With
Enhanced Privacy in
Mobile Online Social
Networks
Location sharing is one of the critical
components
in mobile online social networks (mOSNs),
which has attracted much attention recently.
With the advent of mOSNs, more and more
users' location information will be collected by
the service providers in mOSN. However, the
users' privacy, including
location privacy and social network privacy,
cannot be guaranteed in the previous work
without the trust assumption on the service
providers. In this paper, aiming at
achieving enhanced privacy against the insider
attack launched by the service providers in
mOSNs, we introduce a new architecture with
multiple location servers for the first time and
propose a secure solution
supporting location sharing among friends and
strangers in location-based applications. In our
construction, the user's friend set in each
friend’s query submitted to the location servers
is divided into multiple subsets by the social
network server randomly. Each location server
can only get a subset of friends, instead of the
whole friends' set of the user as the previous
work. In addition, for the first time, we
propose a location-sharing construction which
provides check ability of the searching results
returned from location servers in an efficient
way. We also prove that the new construction
is secure under the stronger security model
with enhanced privacy. Finally, we provide
extensive experimental results to demonstrate
the efficiency of our proposed construction.
IEEE 2015
TTA-JN-
C1532
Mobile Based
Healthcare
Management Using
Artificial Intelligence
In this growing age of technology it is
necessary to have a proper health
care management system which should be cent
percent accurate but also should be portable so
that every person carry with it as personalized
health care system. The health
care management system which will consist
of mobile based Heart Rate Measurement so
IEEE 2015

that the data can be transferred and
diagnosis based on heart rate can be provided
quickly with a click of button. The system will
consist of video conferencing to connect
remotely with the Doctor. The Doc-Bot which
was developed earlier is now being transferred
to mobile platform and will be further
advanced for diagnosis of common diseases.
The system will also consist of Online Blood
Bank which will provide up-to-date details
about availability of blood in different
hospitals.
TTA-JN-
C1533
PSMPA Patient Self-
Controllable and Multi-
Level Privacy-
Preserving Cooperative
Authentication in
Distributed m-
Healthcare Cloud
Computing System
Distributed m-
healthcare cloud computing system significantl
y facilitates efficient patient treatment for
medical consultation by sharing personal
health information
among healthcare providers. However, it
brings about the challenge of keeping both the
data confidentiality and patients'
identity privacy simultaneously. Many existing
access control and
anonymous authentication schemes cannot be
straightforwardly exploited. To solve the
problem, in this paper, a novel authorized
accessible privacy model (AAPM) is
established. Patients can authorize physicians
by setting an access tree supporting flexible
threshold predicates. Then, based on it, by
devising a new technique of attribute-based
designated verifier signature, a patient self-
controllable multi-level privacy-
preserving cooperativeauthentication scheme
(PSMPA) realizing three levels of security
and privacy requirement in distribute dm-
healthcare cloud computing system is
proposed. The directly authorized physicians,
the indirectly authorized physicians and the
unauthorized persons in medical consultation
can respectively decipher the personal health
information and/or verify patients' identities by
satisfying the access tree with their own
attribute sets. Finally, the formal security proof
and simulation results illustrate our scheme
can resist various kinds of attacks and far
IEEE 2015

outperforms the previous ones in terms of
computational, communication and storage
overhead.
TTA-JN-
C1534
Secure and Distributed
Data Discovery and
Dissemination in
Wireless Sensor
Networks
A data discovery and dissemination protocol
for wireless sensor networks (WSNs) is
responsible for updating configuration
parameters of, and distributing management
commands to, the sensor nodes. All
existing data discovery and dissemination prot
ocols suffer from two drawbacks. First, they
are based on the centralized approach; only the
base station can distribute data items. Such an
approach is not suitable for emergent multi-
owner-multi-user WSNs. Second, those
protocols were not designed with security in
mind and hence adversaries can easily launch
attacks to harm the network. This paper
proposes the
first secure and distributed data discovery and
dissemination protocol named DiDrip. It
allows the network owners to authorize
multiple network users with different
privileges to simultaneously and directly
disseminate data items to the sensor nodes.
Moreover, as demonstrated by our theoretical
analysis, it addresses a number of possible
security vulnerabilities that we have identified.
Extensive security analysis show DiDrip is
provably secure. We also implement DiDrip in
an experimental network of resource-
limited sensor nodes to show its high
efficiency in practice.
IEEE 2015
TTA-JN-
C1535
DDSGA A Data-Driven
Semi-Global Alignment
Approach for Detecting
Masquerade Attacks
A masquerade attacker impersonates a legal
user to utilize the user services and privileges.
The semi-global alignment algorithm (SGA) is
one of the most effective and efficient
techniques to detect these attacks but it has not
reached yet the accuracy and performance
required by large scale, multiuser systems. To
improve both the effectiveness and the
performances of this algorithm, we propose the
Data-Driven Semi-
Global Alignment, DDSGA approach. From
the security effectiveness view point,
IEEE 2015

DDSGA improves the scoring systems by
adopting distinct alignment parameters for
each user. Furthermore, it tolerates small
mutations in user command sequences by
allowing small changes in the low-level
representation of the commands functionality.
It also adapts to changes in the user behavior
by updating the signature of a user according
to its current behavior. To optimize the
runtime overhead, DDSGA minimizes
the alignment overhead and parallelizes the
detection and the update. After describing
the DDSGA phases, we present the
experimental results that show that DDSGA
achieves a high hit ratio of 88.4 percent with a
low false positive rate of 1.7 percent. It
improves the hit ratio of the enhanced SGA by
about 21.9 percent and reduces Maxion-
Townsend cost by 22.5 percent.
Hence, DDSGA results in improving both the
hit ratio and false positive rates with an
acceptable computational overhead.
TTA-JN-
C1536
Revisiting Attribute-
Based Encryption with
Verifiable Outsourced
Decryption
Attribute-based encryption (ABE) is a
promising technique for fine-grained access
control of encrypted data in a cloud storage,
however, decryption involved in the ABEs is
usually too expensive for resource-constrained
front-end users, which greatly hinders its
practical popularity. In order to reduce
the decryption overhead for a user to recover
the plaintext, Green et al. suggested
to outsource the majority of
the decryption work without revealing actually
data or private keys. To ensure the third-party
service honestly computes
the outsourced work, Lai et al. provided a
requirement of verifiability to the
decryption of ABE, but their scheme doubled
the size of the underlying ABE ciphertext and
the computation costs. Roughly speaking, their
main idea is to use a
parallel encryption technique, while one of
the encryption components is used for the
verification purpose. Hence, the bandwidth and
the computation cost are doubled. In this
IEEE 2015

paper, we investigate the same problem. In
particular, we propose a more efficient and
generic construction of ABE
with verifiable outsourced decryption based on
an attribute-based key encapsulation
mechanism, a symmetric-
key encryption scheme and a commitment
scheme. Then, we prove the security and the
verification soundness of our constructed ABE
scheme in the standard model. Finally, we
instantiate our scheme with concrete building
blocks. Compared with Lai et al.'s scheme, our
scheme reduces the bandwidth and the
computation costs almost by half.
TTA-JN-
C1537
A Strategy of
Clustering Modification
Directions in Spatial
Image Steganography
Most of the recently proposed
steganographic schemes are based on
minimizing an additive distortion
function defined as the sum of
embedding costs for individual pixels.
In such an approach, mutual embedding
impacts are often ignored. In this paper,
we present an approach that can exploit
the interactions among embedding
changes in order to reduce the risk of
detection by steganalysis. It employs a
novel strategy,
called clustering modification directions
(CMDs), based on the assumption that
when embedding modifications in
heavily textured regions are locally
heading toward the same direction, the
steganographic security might be
improved. To implement the strategy, a
cover image is decomposed into several
sub images, in which message segments
are embedded with well-known
schemes using additive distortion
functions. The costs of pixels are
updated dynamically to take mutual
embedding impacts into account.
Specifically, when neighboring pixels
IEEE 2015

are changed toward a
positive/negative direction, the cost of
the considered pixel is biased toward
the same direction. Experimental results
show that our proposed CMD strategy,
incorporated into existing
steganographic schemes, can effectively
overcome the challenges posed by the
modern steganalyzers with high-
dimensional features.
TTA-JN-
C1538
An Access Control
Model for Online Social
Networks Using User-
to-User Relationships
Users and resources
in online social networks (OSNs) are
interconnected via various types of
relationships. In particular, user-to-
user relationships form the basis of the OSN
structure, and play a significant role in
specifying and enforcing access control.
Individual users and the OSN provider should
be enabled to specify which access can be
granted in terms of existing relationships. In
this paper, we propose a novel user-to-
user relationship-
based access control (UURAC) model for
OSN systems that utilizes regular expression
notation for such policy
specification. Access control policies on users
and resources are composed in terms of
requested action, multiple relationship types,
the starting point of the evaluation, and the
number of hops on the path. We present two
path checking algorithms to determine whether
the required relationship path between users
for a given access request exists. We validate
the feasibility of our approach by
implementing a prototype system and
evaluating the performance of these two
algorithms.
IEEE 2015
TTA-JN-
C1539
An Authenticated Trust
and Reputation
Calculation and
Management System
for Cloud and Sensor
Networks Integration
Induced by incorporating the powerful data
storage and data processing abilities
of cloud computing (CC) as well as ubiquitous
data gathering capability of
wireless sensor networks (WSNs), CC-WSN
integration received a lot of attention from
IEEE 2015

both academia and industry. However,
authentication as well
as trust and reputation calculation and manage
ment of cloud service providers (CSPs)
and sensor network providers (SNPs) are two
very critical and barely explored issues for this
new paradigm. To fill the gap, this paper
proposes a
novel authenticated trust and reputation calcula
tion and management (ATRCM) system for
CC-WSN integration. Considering the
authenticity of CSP and SNP, the attribute
requirement of cloud service user (CSU) and
CSP, the cost, trust, and reputation of the
service of CSP and SNP, the proposed
ATRCM system achieves the following three
functions: 1) authenticating CSP and SNP to
avoid malicious impersonation attacks; 2)
calculating and managing trust and reputation
regarding the service of CSP and SNP; and 3)
helping CSU choose desirable CSP and
assisting CSP in selecting appropriate SNP.
Detailed analysis and design as well as further
functionality evaluation results are presented to
demonstrate the effectiveness of ATRCM,
followed with system security analysis.
TTA-JN-
C1540
An Efficient Privacy-
Preserving Ranked
Keyword Search
Method
Cloud data owners prefer to outsource
documents in an encrypted form for the
purpose of privacy preserving. Therefore it is
essential to develop efficient and reliable
ciphertext search techniques. One challenge is
that the relationship between documents will
be normally concealed in the process of
encryption, which will lead to
significant search accuracy performance
degradation. Also the volume of data in data
centers has experienced a dramatic growth.
This will make it even more challenging to
design ciphertext search schemes that can
provide efficient and reliable online
information retrieval on large volume of
encrypted data. In this paper, a hierarchical
clustering method is proposed to support
more search semantics and also to meet the
demand for fast ciphertext search within a big
IEEE 2015

data environment. The proposed hierarchical
approach clusters the documents based on the
minimum relevance threshold, and then
partitions the resulting clusters into sub-
clusters until the constraint on the maximum
size of cluster is reached. In the search phase,
this approach can reach a linear computational
complexity against an exponential size
increase of document collection. In order to
verify the authenticity of search results, a
structure called minimum hash sub-tree is
designed in this paper. Experiments have been
conducted using the collection set built from
the IEEE Xplore. The results show that with a
sharp increase of documents in the dataset
the search time of the proposed method
increases linearly whereas the search time of
the traditional method increases exponentially.
Furthermore, the proposed method has an
advantage over the traditional method in
the rank privacy and relevance of retrieved
documents.
TTA-JN-
C1541
An Internal Intrusion
Detection and
Protection System by
Using Data Mining and
Forensic Techniques
Currently, most computer systems use user IDs
and passwords as the login patterns to
authenticate users. However, many people
share their login patterns with coworkers and
request these coworkers to assist co-tasks,
thereby making the pattern as one of the
weakest points of computer security. Insider
attackers, the valid users of a system who
attack the system internally, are hard to detect
since most intrusion detection systems and
firewalls identify and isolate malicious
behaviors launched from the outside world of
the system only. In addition, some studies
claimed that analyzing system calls (SCs)
generated by commands can identify these
commands, with which to accurately detect
attacks, and attack patterns are the features of
an attack. Therefore, in this paper, a
security system, named the
Internal Intrusion Detection and Protection Sys
tem (IIDPS), is proposed to detect insider
attacks at SC
level by using data mining and forensic techniq
IEEE 2015

ues. The IIDPS creates users' personal profiles
to keep track of users' usage habits as
their forensic features and determines whether
a valid login user is the account holder or
not by comparing his/her current computer
usage behaviors with the patterns collected in
the account holder's personal profile. The
experimental results demonstrate that the
IIDPS's user identification accuracy is 94.29%,
whereas the response time is less than 0.45 s,
implying that it can prevent a
protected system from insider attacks
effectively and efficiently.
TTA-JN-
C1542
Cloud-Assisted Safety
Message Dissemination
in VANET–Cellular
Heterogeneous
Wireless Network
In vehicular ad hoc networks (VANETs),
efficient message dissemination is critical to
road safety and traffic efficiency. Since many
VANET-based schemes suffer from high
transmission delay and data redundancy, the
integrated VANET–
cellular heterogeneous network has been
proposed recently and attracted significant
attention. However, most existing studies focus
on selecting suitable gateways to
deliver safety message from the source vehicle
to a remote server, whereas
rapid safety message dissemination from the
remote server to a targeted area has not been
well studied. In this paper, we propose a
framework for
rapid message dissemination that combines the
advantages of diverse communication
and cloud computing technologies.
Specifically, we propose a novel Cloud-
assisted
Message Downlink dissemination Scheme
(CMDS), with which the safety messages in
the cloud server are first delivered to the
suitable mobile gateways on relevant roads
with the help of cloud computing (where
gateways are buses with both cellular and
VANET interfaces), and then being
disseminated among neighboring vehicles via
vehicle-to-vehicle (V2V) communication. To
evaluate the proposed scheme, we
mathematically analyze its performance and
IEEE 2015

conduct extensive simulation experiments.
Numerical results confirm the efficiency of
CMDS in various urban scenarios.
TTA-JN-
C1543
Collaborative Task
Execution in Mobile
Cloud Computing
Under a Stochastic
Wireless Channel
This paper
investigates collaborative task execution betwe
en a mobile device and a cloud clone for
mobile applications under
a stochastic wireless channel.
A mobile application is modeled as a sequence
of tasks that can be executed on
the mobile device or on the cloud clone. We
aim to minimize the energy consumption on
the mobile device while meeting a time
deadline, by strategically offloading tasks to
the cloud. We formulate
the collaborative task execution as a
constrained shortest path problem. We derive a
one-climb policy by characterizing the optimal
solution and then propose an enumeration
algorithm for
the collaborative task execution in polynomial
time. Further, we apply the LARAC algorithm
to solving the optimization problem
approximately, which has lower complexity
than the enumeration algorithm. Simulation
results show that the approximate solution of
the LARAC algorithm is close to the optimal
solution of the enumeration algorithm. In
addition, we consider a probabilistic time
deadline, which is transformed to hard
deadline by Markov inequality. Moreover,
compared to the local execution and the
remote execution,
the collaborative task execution can
significantly save the energy consumption on
the mobile device, prolonging its battery life.
IEEE 2015
TTA-JN-
C1544
Contact-Aware Data
Replication in Roadside
Unit Aided Vehicular
Delay Tolerant
Networks
Roadside units (RSUs), which enable vehicles-
to infrastructure communications, are deployed
along roadsides to handle the ever-growing
communication demands caused by explosive
increase of vehicular traffics. How to
efficiently utilize them to enhance
the vehicular delay tolerant network (VDTN)
performance are the important problems in
IEEE 2015

designing RSU-aided VDTNs. In this work,
we implement an extensive experiment
involving tens of thousands of operational
vehicles in Beijing city. Based on this newly
collected Beijing trace and the existing
Shanghai trace, we obtain some invariant
properties for communication contacts of large
scale RSU-aided VDTNs. Specifically, we find
that the contact time between RSUs and
vehicles obeys an exponential distribution,
while the contact rate between them follows a
Poisson distribution. According to these
observations, we investigate the problem of
communication contact-
aware mobile data replication for RSU-
aided VDTNs by considering the mobile
data dissemination system that
transmits data from the Internet to vehicles via
RSUs through opportunistic communications.
In particular, we formulate the
communication contact-aware RSU-
aidedvehicular mobile data dissemination
problem as an optimization problem with
realistic VDTN settings, and we provide an
efficient heuristic solution for this NP-hard
problem. By carrying out extensive simulation
using realistic vehicular traces, we demonstrate
the effectiveness of our proposed heuristic
contact-aware data replication scheme, in
comparison with the optimal solution and other
existing schemes.
TTA-JN-
C1545
Cost-Aware SEcure
Routing (CASER)
Protocol Design for
Wireless Sensor
Networks
Lifetime optimization and security are two
conflicting design issues for multi-
hop wireless sensor networks (WSNs) with
non-replenishable energy resources. In this
paper, we first propose a novel secure and
efficient Cost-
Aware Secure Routing (CASER) protocol to
address these two conflicting issues through
two adjustable parameters: energy balance
control (EBC) and probabilistic-based random
walking. We then discover that the energy
consumption is severely disproportional to the
uniform energy deployment for the
given network topology, which greatly reduces
IEEE 2015

the lifetime of the sensor networks. To solve
this problem, we propose an efficient non-
uniform energy deployment strategy to
optimize the lifetime and message delivery
ratio under the same energy resource and
security requirement. We also provide a
quantitative security analysis on the
proposed routing protocol. Our theoretical
analysis and OPNET simulation results
demonstrate that the
proposed CASER protocol can provide an
excellent tradeoff between routing efficiency
and energy balance, and can significantly
extend the lifetime of the sensor networks in
all scenarios. For the non-uniform energy
deployment, our analysis shows that we can
increase the lifetime and the total number of
messages that can be delivered by more than
four times under the same assumption. We also
demonstrate that the proposed
CASER protocol can achieve a high message
delivery ratio while preventing routing trace
back attacks.
TTA-JN-
C1546
Deleting Secret Data
with Public Verifiability
Existing software-based data erasure programs
can be summarized as following the same one-
bit-return protocol: the deletion program
performs data erasure and returns either
success or failure. However, such a onebit-
return protocol turns the data deletion system
into a black box – the user has to trust the
outcome but cannot easily verify it. This is
especially problematic when the deletion
program is encapsulated within a Trusted
Platform Module (TPM), and the user has no
access to the code inside. In this paper, we
present a cryptographic solution that aims to
make the data deletion process more
transparent and verifiable. In contrast to the
conventional black/white assumptions about
TPM (i.e., either completely trust or distrust),
we introduce a third assumption that sits in
between: namely, “trust-but-verify”. Our
solution enables a user to verify the correct
implementation of two important operations
IEEE 2015

inside a TPM without accessing its source
code: i.e., the correct encryption of data and
the faithful deletion of the key. Finally, we
present a proof-of-concept implementation of
the SSE system on a resource-constrained Java
card to demonstrate its practical feasibility. To
our knowledge, this is the first systematic
solution to the secure data deletion problem
based on a “trust-but-verify” paradigm,
together with a concrete prototype
implementation.
TTA-JN-
C1547
Design and Evaluation
of the Optimal Cache
Allocation for Content-
Centric Networking
Content-Centric Networking (CCN) is a
promising framework to rebuild the Internet’s
forwarding substrate around the concept
of content. CCN advocates ubiquitous in-
network caching to enhance content delivery
and thus each router has storage space
to cache frequently requested content. In this
work, we focus on
the cache allocation problem, namely, how to
distribute the cache capacity across routers
under a constrained total storage budget for
the network. We first formulate this problem
as a content placement problem and obtain
the optimal solution by a two-step method. We
then propose a suboptimal heuristic method
based on node centrality, which is more
practical in dynamic networks with
frequent content publishing. We investigate
through simulations the factors that affect
the optimal cache allocation, and perhaps more
importantly we use a real-life Internet topology
and video access logs from a large scale
Internet video provider to evaluate the
performance of various cache allocation
methods. We observe that network topology
and content popularity are two important
factors that affect where exactly
should cache capacity be placed. Further, the
heuristic method comes with only a very
limited performance penalty compared to
the optimal allocation. Finally, using our
findings, we provide recommendations
for network operators on the best deployment
IEEE 2015

of CCN caches capacity over routers.
TTA-JN-
C1548
Designing High
Performance Web-
Based Computing
Services to Promote
Telemedicine Database
Management System
Many web computing systems are running real
time database services where their information
change continuously and expand
incrementally. In this
context, web data services have a major role
and draw significant improvements in
monitoring and controlling the information
truthfulness and data propagation.
Currently, web telemedicine database services
are of central
importance to distributed systems. However,
the increasing complexity and the rapid growth
of the real world healthcare challenging
applications make it hard to induce
the database administrative staff. In this paper,
we build an integrated web data services that
satisfy fast response time for large scale Tele-
health database management systems. Our
focus will be on database management with
application scenarios in
dynamic telemedicine systems to increase care
admissions and decrease care difficulties such
as distance, travel, and time limitations. We
propose three-fold approach based on data
fragmentation, database websites clustering
and intelligent data distribution. This approach
reduces the amount of data migrated between
websites during applications' execution;
achieves cost-effective communications during
applications' processing and improves
applications' response time and throughput.
The proposed approach is validated internally
by measuring the impact of using
our computing services' techniques on
various performance features like
communications cost, response time, and
throughput. The external validation is achieved
by comparing the performance of our
approach to that of other techniques in the
literature. The results show that our integrated
approach significantly improves the
performance of web database systems and
outperforms its counterparts.
IEEE 2015

TTA-JN-
C1549
Distributed Database
Management
Techniques for Wireless
Sensor Networks
In sensor networks, the large amount of data
generated by sensors greatly influences the
lifetime of the network. To manage this
amount of sensed data in an energy-efficient
way, new methods of storage and data query
are needed. In this way,
the distributed database approach
for sensor networks is proved as one of the
most energy-efficient data storage and
query techniques. This paper surveys the state
of the art of the techniques used to manage
data and queries
in wireless sensor networks based on
the distributed paradigm. A classification of
these techniques is also proposed. The goal of
this work is not only to present how data and
query management techniques have advanced
nowadays, but also show their benefits and
drawbacks, and to identify open issues
providing guidelines for further contributions
in this type of distributed architectures.
IEEE 2015
TTA-JN-
C1550
Diversifying Web
Service
Recommendation
Results via Exploring
Service Usage History
The last decade has witnessed a tremendous
growth of Web services as a major technology
for sharing data, computing resources, and
programs on the Web. With the increasing
adoption and presence of Web services, design
of novel approaches for
effective Web service recommendation to
satisfy users’ potential requirements has
become of paramount importance.
Existing Web service
recommendation approaches mainly focus on
predicting missing QoS values of Web service
candidates which are interesting to a user using
collaborative filtering approach, content-based
approach, or their hybrid.
These recommendation approaches assume
that recommended Web services are
independent to each other, which sometimes
may not be true. As a result, many similar or
redundant Web services may exist in
a recommendation list. In this paper, we
IEEE 2015

propose a novel Web
service recommendation approach
incorporating a user’s potential QoS
preferences and diversity feature of user
interests on Web services. User’s interests and
QoS preferences on Web services are first
mined
by exploring the Web service usage history.
Then we compute scores of Web service
candidates by measuring their relevance with
historical and potential user interests, and their
QoS utility. We also construct
a Web service graph based on the functional
similarity between Web services. Finally, we
present an innovative diversity-
aware Web service ranking algorithm to rank
the Web service candidates based on their
scores, and diversity degrees derived from
the Web service graph. Extensive experiments
are conducted based on a real
world Web service dataset, indicating that our
proposed Web service recommendation approa
ch significantly improves the quality of their
commendation results compared with existing
methods.
TTA-JN-
C1551
DROPS Division and
Replication of Data in
Cloud for Optimal
Performance and
Security
Outsourcing data to a third-party
administrative control, as is done
in cloud computing, gives rise to
security concerns. The data compromise may
occur due to attacks by other users and nodes
within the cloud. Therefore,
high security measures are required to
protect data within the cloud. However, the
employed security strategy must also take into
account the optimization of the data retrieval
time. In this paper, we
propose Division and Replication of Data in
the Cloud for Optimal Performance and Securi
ty (DROPS) that collectively approaches
the security and performance issues. In
the DROPS methodology, we divide a file into
fragments, and replicate the
fragmented data over the cloud nodes. Each of
the nodes stores only a single fragment of a
particular data file that ensures that even in
IEEE 2015

case of a successful attack, no meaningful
information is revealed to the attacker.
Moreover, the nodes storing the fragments, are
separated with certain distance by means of
graph T-coloring to prohibit an attacker of
guessing the locations of the fragments.
Furthermore, the DROPS methodology does
not rely on the traditional cryptographic
techniques for the data security; thereby
relieving the system of computationally
expensive methodologies. We show that the
probability to locate and compromise all of the
nodes storing the fragments of a single file is
extremely low. We also compare
the performance of the DROPS methodology
with ten other schemes. The higher level
of security with slight performance overhead
was observed.
TTA-JN-
C1552
Dynamic Bin Packing
for On-Demand Cloud
Resource Allocation
Dynamic Bin Packing (DBP) is a variant of
classical bin packing, which assumes that
items may arrive and depart at arbitrary times.
Existing works on DBP generally aim to
minimize the maximum number of bins ever
used in the packing. In this paper, we consider
a new version of the DBP problem, namely,
the MinTotal DBP problem which targets at
minimizing the total cost of the bins used over
time. It is motivated by the request dispatching
problem arising from cloud gaming systems.
We analyze the competitive ratios of the
modified versions of the commonly used First
Fit, Best Fit, and Any Fit packing(the family
of packing algorithms that open a new bin only
when no currently open bin can accommodate
the item to be packed) algorithms for the
MinTotal DBP problem. We show that the
competitive ratio of Any Fit packing cannot be
better than + 1, where is the ratio of the
maximum item duration to the minimum item
duration. The competitive ratio of Best
Fit packing is not bounded for any given. For
First Fit packing, if all the item sizes are
smaller than 1 of the bin capacity (> 1 is a
constant), the competitive ratio has an upper
bound of �1 + 3 �1 + 1. For the general case,
IEEE 2015

the competitive ratio of First Fit packing has
an upper bound of 2 + 7. We also propose a
Hybrid First Fit packing algorithm that can
achieve a competitive ratio no larger than 5 4 +
19 4 when is not known and can achieve a
competitive ratio no larger than + 5 when is
known.
TTA-JN-
C1553
Location-Aware and
Personalized
Collaborative Filtering
for Web Service
Recommendation
Collaborative Filtering (CF) is widely
employed for
making Web service recommendation. CF-
based Web service recommendation aims to
predict missing QoS (Quality-of-Service)
values of Webservices. Although several CF-
based Web service QoS prediction methods
have been proposed in recent years, the
performance still needs significant
improvement. Firstly, existing QoS prediction
methods seldom
consider personalized influence of users
and services when measuring the similarity
between users and between services.
Secondly, Web service QoS factors, such as
response time and throughput, usually depends
on the locations of Web services and users.
However, existing Webservice QoS prediction
methods seldom took this observation into
consideration. In this paper, we propose
a location-aware personalized CF method
for Web service recommendation. The
proposed method leverages both locations of
users and Web services when selecting similar
neighbors for the target user or service. The
method also includes an enhanced similarity
measurement for users andWeb services, by
taking into account the personalized influence
of them. To evaluate the performance of our
proposed method, we conduct a set of
comprehensive experiments using a real-
world Webservice dataset. The experimental
results indicate that our approach improves the
QoS prediction accuracy and computational
efficiency significantly, compared to previous
CF-based methods.
IEEE 2015

TTA-JN-
C1554
Location-Based Key
Management Strong
Against Insider Threats
in Wireless Sensor
Networks
To achieve secure communications
in wireless sensor networks (WSNs), sensor no
des (SNs) must establish secret
shared keys with neighboring nodes.
Moreover, those keys must be updated by
defeating the insider threats of corrupted
nodes. In this paper, we propose a location-
based key management scheme for WSNs,
with special considerations of insider threats.
After reviewing existing location-
based key management schemes and studying
their advantages and disadvantages, we
selected location-
dependent key management (LDK) as a
suitable scheme for our study. To solve a
communication interference problem in LDK
and similar methods, we have devised a
new key revision process that incorporates
grid-based location information. We also
propose a key establishment process using grid
information. Furthermore, we
construct key update and revocation processes
to effectively resist inside attackers. For
analysis, we conducted a rigorous simulation
and confirmed that our method can increase
connectivity while decreasing the compromise
ratio when the minimum number of
common keys required for key establishment is
high. When there was a corrupted node
leveraging insider threats, it was also possible
to effectively rekey every SN except for the
corrupted node using our method. Finally, the
hexagonal deployment of anchor nodes could
reduce network costs.
IEEE 2015
TTA-JN-
C1555
Malware Propagation in
Large-Scale Networks
Malware is pervasive in networks, and poses a
critical threat to network security. However,
we have very limited understanding
of malware behavior in networks to date. In
this paper, we investigate how
malware propagates in networks from a global
perspective. We formulate the problem, and
establish a rigorous two layer epidemic model
for malware propagation from network to netw
ork. Based on the proposed model, our analysis
indicates that the distribution of a
IEEE 2015

given malware follows exponential
distribution, power law distribution with a
short exponential tail, and power law
distribution at its early, late and final stages,
respectively. Extensive experiments have been
performed through two real-world
global scale malware data sets, and the results
confirm our theoretical findings.
TTA-JN-
C1556
Optimal Cloudlet
Placement and User to
Cloudlet Allocation in
Wireless Metropolitan
Area Networks
Mobile applications are becoming increasingly
computation-intensive, while the computing
capability of portable mobile devices is
limited. A powerful way to reduce the
completion time of an application in a mobile
device is to offload its tasks to nearby
cloudlets, which consist of clusters of
computers. Although there is a significant
body of research in mobile cloudlet offloading
technology, there has been very little attention
paid to how cloudlets should be placed in a
given network to optimize mobile application
performance. In this paper we
study cloudlet placement and
mobile user allocation to the cloudlets in
a wireless metropolitan area network (WMAN)
. We devise an algorithm for the problem,
which enables the placement of the cloudlets
at user dense regions of the WMAN, and
assigns mobile users to the placed cloudlets
while balancing their workload. We also
conduct experiments through simulation. The
simulation results indicate that the
performance of the proposed algorithm is very
promising.
IEEE 2015
TTA-JN-
C1557
Predistribution Scheme
for Establishing Group
Keys in Wireless
Sensor Networks
Wireless sensor networks (WSNs). This is
because sensor nodes are limited in memory
storage and computational power. In 1992,
Blundo et al. proposed a non
interactive group key establishment scheme
using a multivariate polynomial.
Their scheme can establish a group key of
m sensors. Since each share is a polynomial
involving m - 1 variables and having degree k,
each sensor needs to store (k + 1)m-
1 coefficients from GF(p), which is
IEEE 2015

exponentially proportional to the size of group.
This makes their scheme only practical when
m = 2 for peer-to-peer communication. So far,
most existing predistribution schemes in
WSNs establish pair
wise keys for sensor nodes. In this paper, we
propose a novel design to propose
a predistribution scheme for establishing group
keys in WSNs. Our design uses a special-type
multivariate polynomial in ZN, where N is a
RSA modulus. The advantage of using this
type of multivariate polynomial can limit the
storage space of each sensor to be m(k + 1),
which is linearly proportional to the size
of group communication. In addition, we prove
the security of the proposed scheme and show
that the computational complexity of the
proposed scheme is efficient.
TTA-JN-
C1558
Privacy-Preserving
Detection of Sensitive
Data Exposure
Statistics from security firms, research
institutions and government organizations
show that the number of data-leak instances
have grown rapidly in recent years. Among
various data-leak cases, human mistakes are
one of the main causes of data loss. There exist
solutions detecting inadvertent sensitive
data leaks caused by human mistakes and to
provide alerts for organizations. A common
approach is to screen content in storage and
transmission for exposed sensitive information.
Such an approach usually requires
the detection operation to be conducted in
secrecy. However, this secrecy requirement is
challenging to satisfy in practice,
as detection servers may be compromised or
outsourced. In this paper, we present a privacy-
preserving data-leak detection (DLD) solution
to solve the issue where a special set
of sensitive data digests is used in detection.
The advantage of our method is that it enables
the data owner to safely delegate
the detection operation to a semi honest
provider without revealing the sensitive data to
the provider. We describe how Internet service
providers can offer their customers DLD as an
add-on service with strong privacy guarantees.
IEEE 2015

The evaluation results show that our method
can support accurate detection with very small
number of false alarms under various data-leak
scenarios.
TTA-JN-
C1559
Providing Privacy-
Aware Incentives in
Mobile Sensing
Systems
Mobile sensing relies on data contributed by
users through their mobile device (e.g., smart
phone) to obtain useful information about
people and their surroundings. However, users
may not want to contribute due to lack
of incentives and concerns on
possible privacy leakage. To effectively
promote user participation,
both incentive and privacy issues should be
addressed. Although incentive and
privacy have been addressed separately
in mobile sensing, it is still an open problem to
address them simultaneously. In this paper, we
propose two credit-based privacy-
aware incentive schemes for
mobile sensing systems, where the focus is
on privacy protection instead of on the design
of incentive mechanisms. Our schemes
enable mobile users to earn credits by
contributing data without leaking which data
they have contributed, and ensure that
malicious users cannot abuse the system to
earn unlimited credits. Specifically, the first
scheme considers scenarios where an online
trusted third party (TTP) is available, and
relies on the TTP to protect user privacy and
prevent abuse attacks. The second scheme
considers scenarios where no online TTP is
available. It applies blind signature, partially
blind signature, and a novel extended Merkle
tree technique to protect user privacy and
prevent abuse attacks. Security analysis and
cost evaluations show that our schemes are
secure and efficient.
IEEE 2015
TTA-JN-
C1560
Response Time Based
Optimal Web Service
Selection
Selecting an optimal web service among a list
of functionally equivalent web services still
remains a challenging issue. For
Internet services, the presence of low-
performance servers, high latency or overall
poor service quality can translate into lost
IEEE 2015

sales, user frustration, and customers lost. In
this paper, we propose a novel method for QoS
metrification based on Hidden Markov Models
(HMM), which further suggests
an optimal path for the execution of user
requests. The technique we show can be used
to measure and predict the behavior
of Web Services in terms of response time, and
can thus be used to rank services quantitatively
rather than just qualitatively. We demonstrate
the feasibility and usefulness of our
methodology by drawing experiments on real
world data. The results have shown how our
proposed method can help the user to
automatically select the most
reliable Web Servicetaking into account
several metrics, among them, system
predictability and response time variability.
Later ROC curve shows a 12 percent
improvement in prediction accuracy using
HMM.
TTA-JN-
C1561
Robust cloud
management of MANET
checkpoint sessions
In a traditional mobile ad-hoc network
(MANET), if two nodes are engaged in
a session and one of them departs suddenly,
their communication is aborted. The session is
not active any more, work is lost and,
consequently, the energy of the batteries has
been wasted. This paper proposes a model that
uses a cloud service to register, save, pause
and
resume sessions between MANET member
nodes so that both work in progress and energy
are saved. A checkpoint technique is
introduced to capture the progress of
a session and allow it to be resumed. This is an
additional service to our cloud management of
the MANET. The model proposed in this paper
was tested on Android-based devices and an
Amazon cloud instance. Experimental results
show that the model is feasible, robust, saves
time and, more importantly, energy
if session breaks occur frequently.
IEEE 2015

TTA-JN-
C1562
Secure Anonymous Key
Distribution Scheme for
Smart Grid
To fully support information management
among various stakeholders
in smart grid domains, how to
establish secure communication sessions has
become an important issue
for smart grid environments. In order to
support secure communications
between smart meters and service
providers, key management for authentication
becomes a crucial security topic. Recently,
several key distribution schemes have been
proposed to provide secure communications
for smart grid. However, these schemes do not
support smart meter anonymity and possess
security weaknesses. This paper utilizes an
identity-based signature scheme and an
identity-based encryption scheme to propose a
newanonymous key distribution scheme for sm
art grid environments. In the proposed scheme,
a smart meter can anonymously access
services provided by service providers using
one private key without the help of the trusted
anchor during authentication. In addition, the
proposed scheme requires only a few of
computation operations at the smart meter side.
Security analysis is conducted to prove the
proposed scheme is secure under random
oracle model.
IEEE 2015
TTA-JN-
C1563
Secure Data
Aggregation Technique
for Wireless Sensor
Networks in the
Presence of Collusion
Attacks
Due to limited computational power and
energy resources, aggregation of data from
multiple sensor nodes done at the aggregating
node is usually accomplished by simple
methods such as averaging. However
such aggregation is known to be highly
vulnerable to node compromising attacks.
Since WSN are usually unattended and without
tamper resistant hardware, they are highly
susceptible to such attacks. Thus, ascertaining
trustworthiness of data and reputation
of sensor nodes is crucial for WSN. As the
performance of very low power processors
dramatically improves, future aggregator nodes
will be capable of performing more
sophisticated data aggregation algorithms, thus
making WSN less vulnerable. Iterative
IEEE 2015

filtering algorithms hold great promise for
such a purpose. Such algorithms
simultaneously aggregate data from multiple
sources and provide trust assessment of these
sources, usually in a form of corresponding
weight factors assigned to data provided by
each source. In this paper we demonstrate that
several existing iterative filtering algorithms,
while significantly more robust
against collusion attacks than the simple
averaging methods, are nevertheless susceptive
to a novel sophisticated collusion attack we
introduce. To address this security issue, we
propose an improvement for iterative
filtering techniques by providing an initial
approximation for such algorithms which
makes them not only collusion robust, but also
more accurate and faster converging.
TTA-JN-
C1564
Secure Distributed
Deduplication Systems
with Improved
Reliability
Data deduplication is a technique for
eliminating duplicate copies of data, and has
been widely used in cloud storage to reduce
storage space and upload bandwidth. However,
there is only one copy for each file stored in
cloud even if such a file is owned by a huge
number of users. As a result, deduplication
system improves storage utilization while
reducing reliability. Furthermore, the challenge
of privacy for sensitive data also arises when
they are outsourced by users to cloud. Aiming
to address the above security challenges, this
paper makes the first attempt to formalize the
notion of distributed reliable
deduplication system. We propose
new distributed deduplication systems with
higher reliability in which the data chunks
are distributed across multiple cloud servers.
The security requirements of data
confidentiality and tag consistency are also
achieved by introducing a deterministic secret
sharing scheme in distributed storage systems,
instead of using convergent encryption as in
previous deduplication systems. Security
analysis demonstrates that
our deduplication systems are secure in terms
IEEE 2015

of the definitions specified in the proposed
security model. As a proof of concept, we
implement the proposed systems and
demonstrate that the incurred overhead is very
limited in realistic environments.
TTA-JN-
C1565
TEES An Efficient
Search Scheme over
Encrypted Data on
Mobile Cloud
Cloud storage provides a convenient, massive,
and scalable storage at low cost,
but data privacy is a major concern that
prevents users from storing files on
the cloud trustingly. One way of enhancing
privacy from data owner point of view is
to encrypt the files before outsourcing them
onto the cloud and decrypt the files after
downloading them. However, data encryption
is a heavy overhead for the mobile devices,
and data retrieval process incurs a complicated
communication between the data user and
cloud. Normally with limited bandwidth
capacity and limited battery life, these issues
introduce heavy overhead to computing and
communication as well as a higher power
consumption for mobile device users, which
makes
the encrypted search over mobile cloud very
challenging. In this paper, we
propose TEES (Traffic and Energy
saving Encrypted Search), a bandwidth and
energy efficient encrypted search architecture
over mobile cloud. The proposed architecture
offloads the computation from mobile devices
to the cloud, and we further optimize the
communication between the mobile clients and
the cloud. It is demonstrated that
the data privacy does not degrade when the
performance enhancement methods are
applied. Our experiments show
that TEES reduces the computation time by
23% to 46% and save the energy consumption
by 35% to 55% per file retrieval, meanwhile
the network traffics during the file retrievals
are also significantly reduced.
IEEE 2015
TTA-JN-
C1566
Transparent Real-Time
Task Scheduling on
Temporal Resource
Partitions
The Hierarchical Real-
Time Scheduling (HiRTS) technique helps
improve overall resource utilization in real-
IEEE 2015

time embedded systems. With HiRTS, a
computation resource is divided into a group
of temporal resource partitions, each of which
accommodates multiple real-time tasks.
Besides the
computationresource partitioning problem, real
-time task scheduling on resource partitions is
also a major problem of HiRTS. The
existing scheduling techniques for
dedicated resources, like schedulability tests
and utilization bounds, are unable to work
without changes
on temporal resource partitions in most cases.
In this paper, we show how to achieve
maximal transparency for task scheduling on
Regular Partitions, a type
of resource partition introduced by the
Regularity-based Resource Partition (RRP)
Model. We show that several classes of real-
time scheduling problems on a
regular partition can be transformed into
equivalent problems on a dedicated
single resource, such that comprehensive
single-resource scheduling techniques provide
optimal solutions. Furthermore, this
transformation method could be applied to
different types of real-time tasks such as
periodic tasks, sporadic tasks and a
periodic tasks.
TTA-JN-
C1567
User-Defined Privacy
Grid System for
Continuous Location-
Based Services
Location-based services (LBS) require users to
continuously report their location to a
potentially untrusted server to
obtain services based on their location, which
can expose them to privacy risks.
Unfortunately, existing privacy-preserving
techniques for LBS have several limitations,
such as requiring a fully-trusted third party,
offering limited privacy guarantees and
incurring high communication overhead. In
this paper, we propose a user-
defined privacy grid system called
dynamic grid system (DGS); the first
holistic system that fulfills four essential
requirements for privacy-preserving snapshot
and continuous LBS. (1) The system only
IEEE 2015

requires a semi-trusted third party, responsible
for carrying out simple matching operations
correctly. This semi-trusted third party does
not have any information about
a user's location. (2) Secure snapshot
and continuous location privacy is guaranteed
under our defined adversary models. (3) The
communication cost for the user does not
depend on the user's desired privacy level; it
only depends on the number of relevant points
of interest in the vicinity of the user. (4)
Although we only focus on range and k-
nearest-neighbor queries in this work,
our system can be easily extended to support
other spatial queries without changing the
algorithms run by the semi-trusted third party
and the database server, provided the required
search area of a spatial query can be abstracted
into spatial regions. Experimental results show
that our DGS is more efficient than the state-
of-the-art privacy-preserving technique
for continuous LBS.
TTA-JN-
C1568
VoteTrust Leveraging
Friend Invitation Graph
to Defend against
Social Network Sybils
Online social networks (OSNs) suffer from the
creation of fake accounts that introduce fake
product reviews, malware and spam. Existing
defenses focus on using
the social graph structure to isolate fakes.
However, our work shows that Sybils could
befriend a large number of real users,
invalidating the assumption behind social-
graph-based detection. In this paper, we
present VoteTrust, a scalable defense system
that further leverages user-level
activities. VoteTrust models
the friend invitation interactions among users
as a directed, signed graph, and uses two key
mechanisms to detect Sybilsover the graph: a
voting-based Sybil detection to find Sybils that
users vote to reject, and a Sybil community
detection to find other colluding Sybils around
identified Sybils. Through evaluating on
Renren social network, we show
that VoteTrust is able to prevent Sybils from
generating many unsolicited friend requests.
We also deploy VoteTrust in Renen, and our
IEEE 2015

real experience demonstrates
that VoteTrust can detect large-scale collusion
among Sybils.
DOMAIN : DATA MINING
TTA-DD-
C1501
CrowdOp Query
Optimization for
Declarative
Crowdsourcing
Systems
We study the query optimization problem
in declarative crowdsourcing systems. Declarat
ivecrowdsourcing is designed to hide the
complexities and relieve the user of the burden
of dealing with the crowd. The user is only
required to submit an SQL-like query and
the system takes the responsibility of
compiling the query, generating the execution
plan and evaluating in the crowd sourcing
marketplace. A given query can have many
alternative execution plans and the difference
in crowd sourcing cost between the best and
the worst plans may be several orders of
magnitude. Therefore, as in relational
database systems, query optimization is
important to crowd sourcing systems that
provide declarative query interfaces. In this
paper, we propose CROWDOP, a cost-
based query optimization approach
for declarative crowd
sourcing systems. CROWDOP considers both
cost and latency
in query optimization objectives and
generates query plans that provide a good
balance between the cost and latency. We
develop efficient algorithms in
the CROWDOP for optimizing three types
of queries: selection queries, join queries, and
complex selection-join queries. We validate
our approach via extensive experiments by
simulation as well as with the real crowd on
Amazon Mechanical Turk.
IEEE 2015
TTA-DD-
C1502
Time-Series
Classification with
COTE The Collective of
Transformation-Based
Ensembles
Recently, two ideas have been explored that
lead to more accurate algorithms for time-
series classification (TSC). First, it has been
shown that the simplest way to gain
improvement on TSC problems is to transform
into an alternative data space where
discriminatory features are more easily
IEEE 2015

detected. Second, it was demonstrated that
with a single data representation, improved
accuracy can be achieved through
simple ensemble schemes. We combine these
two principles to test the hypothesis that
forming a collective of ensembles of classifiers
on different data transformations improves the
accuracy of time-series classification.
The collective contains classifiers constructed
in the time, frequency, change, and
shapelet transformation domains. For
the time domain, we use a set of elastic
distance measures. For the other domains, we
use a range of standard classifiers. Through
extensive experimentation on 72 datasets,
including all of the 46 UCR datasets, we
demonstrate that the simple collective formed
by including all classifiers in one ensemble is
significantly more accurate than any of its
components and any other previously
published TSC algorithm. We investigate
alternative hierarchical collective structures
and demonstrate the utility of the approach on
a new problem involving classifying
Caenorhabditiselegans mutant types.
TTA-DD-
C1503
PruDent A Pruned and
Confident Stacking
Approach for Multi-
label Classification
Over the past decade or so, several research
groups have addressed the problem of multi-
label classification where each example can
belong to more than one class at the same time.
A common approach, called Binary Relevance
(BR), addresses this problem by inducing a
separate classifier for each class. Research has
shown that this framework can be improved if
mutual class dependence is exploited: an
example that belongs to class X is likely to
belong also to class Y ; conversely, belonging
to X can make an example less likely to belong
to Z. Several works sought to model this
information by using the vector of class labels
as additional example attributes. To fill the
unknown values of these attributes during
prediction, existing methods resort to using
outputs of other classifiers, and this makes
them prone to errors. This is where our paper
IEEE 2015

wants to contribute. We identified two
potential ways to prune unnecessary
dependencies and to reduce error-propagation
in our new classifier-stacking technique, which
is named PruDent. Experimental results
indicate that the classification performance of
PruDent compares favorably with that of other
state-of-the-art approaches over a broad range
of testbeds. Moreover, its computational costs
grow only linearly in the number of classes.
TTA-DD-
C1504
Raw Wind Data
Preprocessing A Data-
Mining Approach
Wind energy integration research generally
relies on complex sensors located at remote
sites. The procedure for generating high-level
synthetic information from databases
containing large amounts of low-
level data must therefore account for possible
sensor failures and imperfect input data.
The data input is highly sensitive
to data quality. To address this problem, this
paper presents an empirical methodology that
can efficiently preprocess and filter
the raw wind data using only aggregated active
power output and the
corresponding wind speed values at
the wind farm. First, raw wind data properties
are analyzed, and all the data are divided into
six categories according to their attribute
magnitudes from a statistical perspective.
Next, the weighted distance, a novel concept of
the degree of similarity between the individual
objects in the wind database and the local
outlier factor (LOF) algorithm, is incorporated
to compute the outlier factor of every
individual object, and this outlier factor is then
used to assess which category an object
belongs to. Finally, the methodology was
tested successfully on the data collected from a
large wind farm in northwest China.
IEEE 2015
TTA-DD-
C1505
Removing DUST Using
Multiple Alignment of
Sequences
A large number of URLs collected by web
crawlers correspond to pages with duplicate or
near-duplicate contents. To crawl, store,
and use such duplicated data implies a waste of
resources, the building of low quality rankings,
and poor user experiences. To deal with this
IEEE 2015

problem, several studies have been proposed to
detect and remove duplicate documents
without fetching their contents. To accomplish
this, the proposed methods learn normalization
rules to transform all duplicate URLs into the
same canonical form. A challenging aspect of
this strategy is deriving a set of general and
precise rules. In this work, we present
DUSTER, a new approach to derive quality
rules that take advantage of a multi-
sequence alignment strategy. We demonstrate
that a full multi-sequence alignment of URLs
with duplicated content, before the generation
of the rules, can lead to the deployment of very
effective rules. By evaluating our method, we
observed it achieved larger reductions in the
number of duplicate URLs than our best
baseline, with gains of 82 and 140.74 percent
in two different web collections.
TTA-DD-
C1506
Keyword Extraction
and Clustering for
Document
Recommendation in
Conversations
This paper addresses the problem
of keyword extraction from conversations,
with the goal of using these keywords to
retrieve, for each short conversation fragment,
a small number of potentially relevant
documents, which can be recommended to
participants. However, even a short fragment
contains a variety of words, which are
potentially related to several topics; moreover,
using an automatic speech recognition (ASR)
system introduces errors among them.
Therefore, it is difficult to infer precisely the
information needs of
the conversation participants. We first propose
an algorithm to extract keywords from the
output of an ASR system (or a manual
transcript for testing), which makes use of
topic modeling techniques and of a sub
modular reward function which favors
diversity in the keyword set, to match the
potential diversity of topics and reduce ASR
noise. Then, we propose a method to derive
multiple topically separated queries from
this keyword set, in order to maximize the
chances of making at least one
relevant recommendation when using these
IEEE 2015

queries to search over the English Wikipedia.
The proposed methods are evaluated in terms
of relevance with respect
to conversation fragments from the Fisher,
AMI, and ELEA conversational corpora, rated
by several human judges. The scores show that
our proposal improves over previous methods
that consider only word frequency or topic
similarity, and represents a promising solution
for a document recommender system to be
used in conversations.
TTA-DD-
C1507
An Internal Intrusion
Detection and
Protection System by
Using Data Mining and
Forensic Techniques
Currently, most computer systems use user IDs
and passwords as the login patterns to
authenticate users. However, many people
share their login patterns with coworkers and
request these coworkers to assist co-tasks,
thereby making the pattern as one of the
weakest points of computer security. Insider
attackers, the valid users of a system who
attack the system internally, are hard to detect
since most intrusion detection systems and
firewalls identify and isolate malicious
behaviors launched from the outside world of
the system only. In addition, some studies
claimed that analyzing system calls (SCs)
generated by commands can identify these
commands, with which to accurately detect
attacks, and attack patterns are the features of
an attack. Therefore, in this paper, a
security system, named the
Internal Intrusion Detection and Protection Sys
tem (IIDPS), is proposed to detect insider
attacks at SC
level by using data mining and forensic techniq
ues. The IIDPS creates users' personal profiles
to keep track of users' usage habits as
their forensic features and determines whether
a valid login user is the account holder or
not by comparing his/her current computer
usage behaviors with the patterns collected in
the account holder's personal profile. The
IIDPS's user identification accuracy is 94.29%,
whereas the response time is less than 0.45 s,
implying that it can prevent a
IEEE 2015

protected system from insider attacks
effectively and efficiently.
TTA-DD-
C1508
A Critical-time-point
Approach to All-
departure-time
Lagrangian Shortest
Paths
Given a spatio-temporal network, a source, a
destination, and a
desired departure time interval, the All-
departure-
time Lagrangian Shortest Paths (ALSP)
problem determines a set which includes the
shortest path for every departure time in the
given interval. ALSP is important
for critical societal applications such as eco-
routing. However, ALSP is computationally
challenging due to the non-stationary ranking
of the candidate paths across
distinct departure-times. Current related work
for reducing the redundant work, across
consecutive departure-times sharing a common
solution, exploits only partial information e.g.,
the earliest feasible arrival time of a path. In
contrast, our approach uses all available
information, e.g., the entire time series of
arrival times for all departure-times. This
allows elimination of all knowable redundant
computation based on complete information
available at hand. We operationalize this idea
through the concept of critical-time-points
(CTP), i.e., departure-times before which
ranking among candidate paths cannot change.
In our preliminary work, we proposed a CTP
based forward search strategy. In this paper,
we propose a CTP based temporal bi-
directional search for the ALSP problem via a
novel impromptu rendezvous termination
condition. Theoretical and experimental
analysis show that the
proposed approach outperforms the related
work approaches particularly when there are
few critical-time-points.
IEEE 2015
TTA-DD-
C1509
Co-ClusterD A
Distributed Frame work
for Data Co-Clustering
with Sequential
Updates
Co-clustering has emerged to be a
powerful data mining tool for two-
dimensional co-occurrence and dyadic data.
However, co-clustering algorithms often
require significant computational resources
and have been dismissed as impractical for
IEEE 2015

large data sets. Existing studies have provided
strong empirical evidence that expectation-
maximization (EM) algorithms (e.g., k-means
algorithm) with sequential updates can
significantly reduce the computational cost
without degrading the resulting solution.
Motivated by this observation, we
introduce sequential updates for alternate
minimization co-clustering(AMCC) algorithms
which are variants of EM algorithms, and also
show that AMCC algorithms with
sequential updates converge. We then propose
two approaches to parallelize AMCC
algorithms with sequential updates in
a distributed environment. Both approaches are
proved to maintain the convergence properties
of AMCC algorithms. Based on these two
approaches, we present a new
distributed framework, Co-ClusterD, which
supports efficient implementations of AMCC
algorithms with sequential updates. We design
and implement Co-ClusterD, and show its
efficiency through two AMCC algorithms: fast
nonnegative matrix tri-factorization (FNMTF)
and information theoretic co-clustering(ITCC).
We evaluate our framework on both a local
cluster of machines and the Amazon EC2
cloud. Empirical results show that AMCC
algorithms implemented in Co-ClusterD can
achieve a much faster convergence and often
obtain better results than their traditional
concurrent counterparts.
TTA-DD-
C1511
Differentially Private
Frequent Itemset
Mining via Transaction
Splitting
Recently, there has been a growing interest in
designing differentially private data mining alg
orithms.Frequent itemset mining (FIM) is one
of the most fundamental problems in
data mining. In this paper, we explore the
possibility of designing
a differentially private FIM algorithm which
can not only achieve high data utility and a
high degree of privacy, but also offer high time
efficiency. To this end, we propose a
differentially private FIM algorithm based on
the FP-growth algorithm, which is referred to
as PFP-growth. The PFP-growth algorithm
IEEE 2015

consists of a preprocessing phase and
a mining phase. In the preprocessing phase, to
improve the utility and privacy tradeoff, a
novel smart splitting method is proposed to
transform the database. For a given database,
the preprocessing phase needs to be performed
only once. In the mining phase, to offset the
information loss caused
by transaction splitting, we devise a run-time
estimation method to estimate the actual
support of item sets in the original database. In
addition, by leveraging the downward closure
property, we put forward a dynamic reduction
method to dynamically reduce the amount of
noise added to guarantee privacy during the
mining process. Through formal privacy
analysis, we show that our PFP-growth
algorithm is ε-differentially private. Extensive
experiments on real datasets illustrate that our
PFP-growth algorithm substantially
outperforms the state-of-the-art techniques.
TTA-DD-
C1512
Efficient Algorithms for
Mining Top-K High
Utility Itemsets
High utility itemsets (HUIs) mining is an
emerging topic in data mining, which refers to
discovering all itemsets having
a utility meeting a user-specified
minimum utility threshold min_util. However,
setting min_util appropriately is a difficult
problem for users. Generally speaking, finding
an appropriate minimum utility threshold by
trial and error is a tedious process for users. If
min_util is set too low, too many HUIs will be
generated, which may cause
the mining process to be very inefficient. On
the other hand, if min_util is set too high, it is
likely that no HUIs will be found. In this
paper, we address the above issues by
proposing a new framework for top-
k high utility itemset mining, where k is the
desired number of HUIs to be mined. Two
types of efficient algorithms named TKU
(mining Top-K Utilityitemsets) and TKO
(mining Top-K utility itemsets in One phase)
are proposed for mining such item sets without
the need to set min_util. We provide a
structural comparison of the
IEEE 2015

two algorithms with discussions on their
advantages and limitations. Empirical
evaluations on both real and synthetic datasets
show that the performance of the
proposed algorithms is close to that of the
optimal case of state-of-the-
art utility mining algorithms.
TTA-DD-
C1513
k-Nearest Neighbor
Classification over
Semantically Secure
Encrypted Relational
Data
Data Mining has wide applications in many
areas such as banking, medicine, scientific
research and among government
agencies. Classification is one of the
commonly used tasks in data mining
applications. For the past decade, due to the
rise of various privacy issues, many theoretical
and practical solutions to
the classification problem have been proposed
under different security models. However, with
the recent popularity of cloud computing, users
now have the opportunity to outsource
their data, in encrypted form, as well as
the data mining tasks to the cloud. Since
the data on the cloud is in encrypted form,
existing privacy-
preserving classification techniques are not
applicable. In this paper, we focus on solving
the classification problem over encrypted data.
In particular, we propose asecure k-NN
classifier over encrypted data in the cloud. The
proposed protocol protects the confidentiality
of data, privacy of user's input query, and hides
the data access patterns. To the best of our
knowledge, our work is the first to develop
a secure k-NN classifier
over encrypted data under the semi-honest
model. Also, we empirically analyze the
efficiency of our proposed protocol using a
real-world dataset under different parameter
settings.
IEEE 2015
TTA-DD-
C1514
Location Aware
Keyword Query
Suggestion Based on
Document Proximity
Keyword suggestion in web search helps users
to access relevant information without having
to know how to precisely express their queries.
Existing keyword suggestion techniques do not
consider the locations of the users and
the query results; i.e., the spatial proximity of a
IEEE 2015

user to the retrieved results is not taken as a
factor in the recommendation. However, the
relevance of search results in many
applications (e.g., location-based services) is
known to be correlated with their
spatial proximity to the query issuer. In this
paper, we design a location-
aware keyword query suggestion framework.
We propose a weighted keyword-
document graph, which captures both the
semantic relevance between
keyword queries and the spatial distance
between the resulting documents and the
user location. The graph is browsed in a
random-walk-with-restart fashion, to select
the keyword queries with the highest scores
as suggestions. To make our framework
scalable, we propose a partition-
based approach that outperforms the baseline
algorithm by up to an order of magnitude. The
appropriateness of our framework and the
performance of the algorithms are evaluated
using real data.
TTA-DD-
C1515
Rank-Based Similarity
Search Reducing the
Dimensional
Dependence
This paper introduces a data structure for k-
NN search, the Rank Cover Tree (RCT),
whose pruning tests rely solely on the
comparison of similarity values; other
properties of the underlying space, such as the
triangle inequality, are not employed. Objects
are selected according to their ranks with
respect to the query object, allowing much
tighter control on the overall execution costs.
A formal theoretical analysis shows that with
very high probability, the RCT returns a
correct query result in time that depends very
competitively on a measure of the intrinsic
dimensionality of the data set. The
experimental results for the RCT show that
non-metric pruning strategies
for similarity search can be practical even
when the representational dimension of the
data is extremely high. They also show that the
RCT is capable of meeting or exceeding the
level of performance of state-of-the-art
methods that make use of metric pruning or
IEEE 2015

other selection tests involving numerical
constraints on distance values.
TTA-DD-
C1516
RANWAR Rank-Based
Weighted Association
Rule Mining from Gene
Expression and
Methylation Data
Review selection using
microreview
Ranking of association rules is currently an
interesting topic in data mining and
bioinformatics. The huge number of
evolved rules of items (or, genes)
by association rule mining (ARM) algorithms
makes confusion to the decision maker. In this
article, we propose a weighted rule-
mining technique (say, RANWAR or rank-
based weighted association rule-mining)
to rank the rules using two novel rule-
interestingness measures, viz., rank-
based weighted condensed support (wcs)
and weighted condensed confidence (wcc)
measures to bypass the problem. These
measures are basically depended on the rank of
items (genes). Using the rank, we
assign weight to each item. RANWAR
generates much less number of frequent item
sets than the state-of-the-
art association rule mining algorithms. Thus, it
saves time of execution of the algorithm. We
run RANWAR on gene expression and
methylation datasets. The genes of the
top rules are biologically validated
by Gene Ontologies (GOs) and KEGG
pathway analyses. Many top
ranked rules extracted from RANWAR that
hold poor ranks in traditional Apriori, are
highly biologically significant to the related
diseases. Finally, the top rules evolved
from RANWAR, that are not in Apriori, are
reported.
IEEE 2015
TTA-DD-
C1517
Text Detection and
Recognition in Imagery
A Survey Towards
Effective Bug Triage
with Software Data
Reduction Techniques
Software companies spend over 45 percent of
cost in dealing with software bugs. An
inevitable step of fixing bugs is bug triage,
which aims to correctly assign a developer to a
new bug. To decrease the time cost in manual
work, text classification techniques are applied
to conduct automatic bug triage. In this paper,
we address the problem
of data reduction for bug triage, i.e., how to
reduce the scale and improve the quality
IEEE 2015

of bug data. We combine instance selection
with feature selection to simultaneously
reduce data scale on the bug dimension and the
word dimension. To determine the order of
applying instance selection and feature
selection, we extract attributes from
historical bug data sets and build a predictive
model for a new bug data set. We empirically
investigate the performance of data reduction
on totally 600,000 bug reports of two large
open source projects, namely Eclipse and
Mozilla. The results show that
our data reduction can effectively reduce
the data scale and improve the accuracy
ofbug triage. Ourwork provides an approach to
leveraging techniques on data processing to
form reduced and high-
quality bug data in software development and
maintenance.
TTA-DD-
C1518
Towards Open-World
Person Re-
Identification by One-
Shot Group-based
Verification
Solving the problem of matching people across
non-overlapping multi-camera views, known
as person reidentification (re-id), has received
increasing interests in computer vision. In a
real-world application scenario, a watch-list
(gallery set) of a handful of known target
people are provided with very few (in many
cases only a single) image(s) (shots) per target.
Existing re-id methods are largely unsuitable
to address this open-world re-id challenge
because they are designed for (1) a closed-
world scenario where the gallery and probe
sets are assumed to contain exactly the same
people, (2) person-wise identification whereby
the model attempts to verify exhaustively
against each individual in the gallery set, and
(3) learning a matching model using multi-
shots. In this paper, a novel transfer local
relative distance comparison (t-LRDC) model
is formulated to address the open-
world person re-identificationproblem by one-
shot group-based verification. The model is
designed to mine and transfer useful
information from a labelled open-world non-
target dataset. Extensive experiments
demonstrate that the proposed approach
IEEE 2015

outperforms both non-transfer learning and
existing transfer learning based re-id methods.
TTA-DD-
C1519
Improving Accuracy
and Robustness of Self-
Tuning Histograms by
Subspace Clustering
In large databases, the amount and the
complexity of the data calls for data
summarization techniques. Such summaries
are used to assist fast approximate query
answering or query optimization.
Histograms are a prominent class of model-
free data summaries and are widely used in
database systems. So-called self-
tuning histograms look at query-execution
results to refine themselves. An assumption
with such histograms, which has not been
questioned so far, is that they can learn the
dataset from scratch, that is-starting with an
empty bucket configuration. We show that this
is not the case. Self-tuning methods are very
sensitive to the initial configuration. Three
major problems stem from this.
Traditional self-tuning is unable to learn
projections of multi-dimensional data, is
sensitive to the order of queries, and reaches
only local optima with high estimation errors.
We show how to improve a self-tuning method
significantly by starting with a carefully
chosen initial configuration. We propose
initialization by dense subspace clusters in
projections of the data,
which improves bothaccuracy and robustness o
f self-tuning. Our experiments on different
datasets show that the error rate is typically
halved compared to the uninitialized version.
IEEE 2015
TTA-JD-
C1520
TRIP An Interactive
Retrieving-Inferring
Data Imputation
Approach
Data imputation aims at filling in missing
attribute values in databases. Most
existing imputation methods to string attribute
values are inferring-based approaches, which
usually fail to reach a high imputation recall by
just inferring missing values from the complete
part of the data set. Recently, some retrieving-
based methods are proposed to retrieve
missing values from external resources such as
the World Wide Web, which tend to reach a
much higher imputation recall, but inevitably
IEEE 2015

bring a large overhead by issuing a large
number of search queries. In this paper, we
investigate the interaction between
the inferring-based methods and the retrieving-
based methods. We show that retrieving a
small number of selected missing values can
greatly improve the imputation recall of
the inferring-based methods. With this
intuition, we propose an interactive Retrieving-
Inferring data imPutation approach (TRIP),
which
performs retrieving and inferring alternately in
filling in missing attribute values in a dataset.
To ensure the high recall at the minimum
cost, TRIP faces a challenge of selecting the
least number of missing values for retrieving to
maximize the number of inferable values. Our
proposed solution is able to identify an
optimal retrieving-inferring scheduling scheme
in deterministic data imputation, and the
optimality of the generated scheme is
theoretically analyzed with proofs. We also
analyze with an example that the optimal
scheme is not feasible to be achieved in τ-
constrained stochastic data imputation (τ-SDI),
but still, our proposed solution identifies an
expected-optimal scheme in τ-SDI. Extensive
experiments on four data collections show
that TRIP retrieves on average 20 percent
missing values and achieves the same high
recall that was reached by the retrieving-
basedapproach.
TTA-JD-
C1521
Pattern-Aided
Regression Modeling
and Prediction Model
Analysis
This paper first
introduces pattern aided regression (PXR) mod
els, a new type of regression models designed
to represent accurate and
interpretable prediction models. This was
motivated by two observations:
(1) Regression modeling applications often
involve complex diverse predictor-response
relationships, which occur when the
optimal regression models (of
given regression model type) fitting two or
more distinct logical groups of data are highly
different. (2) State-of-the-
IEEE 2015

art regression methods are often unable to
adequately model such relationships. This
paper defines PXR models using several
patterns and local regression models, which
respectively serve as logical and behavioral
characterizations of distinct predictor-response
relationships. The paper also introduces a
contrast pattern aided regression (CPXR)
method, to build accurate PXR models. In
experiments, the PXR models built by CPXR
are very accurate in general, often
outperforming state-of-the-art regression
methods by big margins. Usually using (a)
around seven simple patterns and (b) linear
local regression models, those PXR models are
easy to interpret; in fact, their complexity is
just a bit higher than that of (piecewise)
linear regression models and is significantly
lower than that of traditional ensemble based
regression models. CPXR is especially
effective for high-dimensional data. The paper
also discusses how to use CPXR methodology
for analyzing prediction models and correcting
their prediction errors.
TTA-JD-
C1522
A Set of Complexity
Measures Designed for
Applying Meta-Learning
to Instance Selection
In recent years, some authors have approached
the instance selection problem from a meta-
learning perspective. In their work, they
try to find relationships between the
performance of some methods from this field
and the values of some data-
complexity measures, with the aim of
determining the best performing method given
a data set, using only the values of
the measures computed on this data.
Nevertheless, most of the data-
complexity measures existing in the literature
were not conceived for this purpose and the
feasibility of their use in this field is yet to be
determined. In this paper, we revise the
definition of some measures that we presented
in a previous work, that
were designed for meta-
learning based instance selection. Also, we
assess them in an experimental study involving
three setsof measures, 59 databases,
16 instance selection methods, two classifiers,
IEEE 2015

and eight regression learners used as meta-
learners. The results suggest that
our measures are more efficient and effective
than those traditionally used by researchers
that have addressed the instance selection from
a perspective based on meta-learning.
TTA-JD-
C1522
Efficient Algorithms for
Mining the Concise and
Lossless
Representation of High
Utility Item sets
Mining high utility itemsets (HUIs) from
databases is an important data mining task,
which refers to the discovery of itemsets
with high utilities (e.g. high profits). However,
it may present too many HUIs to users, which
also degrades the efficiency of
the mining process. To achieve high efficiency
for themining task and provide
a concise mining result to users, we propose a
novel framework in this paper
for mining closed+ high utility itemsets(CHUI
s), which serves as a compact
and lossless representation of HUIs. We
propose three efficient algorithms named
AprioriCH (Apriori-
based algorithm for miningHigh utility Closed
+ itemsets), AprioriHC-D
(AprioriHC algorithm with Discarding
unpromising and isolated items) and CHUD
(Closed+ High Utility Itemset Discovery) to
find this representation. Further, a method
called DAHU (Derive
All High Utility Itemsets) is proposed to
recover all HUIs from the set of CHUIs
without accessing the original database.
Results on real and synthetic datasets show
that the proposed algorithms are
very efficient and that our approaches achieve
a massive reduction in the number of HUIs. In
addition, when all HUIs can be recovered by
DAHU, the combination of CHUD and DAHU
outperforms the state-of-the-
art algorithms for mining HUIs.
IEEE 2015
TTA-JD-
C1523
Keyword Extraction
and Clustering for
Document
Recommendation in
Conversations
This paper addresses the problem
of keyword extraction from conversations,
with the goal of using these keywords to
retrieve, for each short conversation fragment,
a small number of potentially relevant
IEEE 2015

documents, which can be recommended to
participants. However, even a short fragment
contains a variety of words, which are
potentially related to several topics; moreover,
using an automatic speech recognition (ASR)
system introduces errors among them.
Therefore, it is difficult to infer precisely the
information needs of
the conversation participants. We first propose
an algorithm to extract keywords from the
output of an ASR system (or a manual
transcript for testing), which makes use of
topic modeling techniques and of a
submodular reward function which favors
diversity in the keyword set, to match the
potential diversity of topics and reduce ASR
noise. Then, we propose a method to derive
multiple topically separated queries from
this keyword set, in order to maximize the
chances of making at least one
relevant recommendation when using these
queries to search over the English Wikipedia.
The proposed methods are evaluated in terms
of relevance with respect
to conversation fragments from the Fisher,
AMI, and ELEA conversational corpora, rated
by several human judges. The scores show that
our proposal improves over previous methods
that consider only word frequency or topic
similarity, and represents a promising solution
for a document recommender system to be
used in conversations.
TTA-JD-
C1524
Top-k Similarity Join in
Heterogeneous
Information Networks
As a newly
emerging network model, heterogeneous infor
mation networks (HINs) have received
growing attention. Many data mining tasks
have been explored in HINs, including
clustering, classification,
and similarity search. Similarity join is a
fundamental operation required for many
problems. It is attracting attention from various
applications on network data, such as friend
recommendation, link prediction, and online
advertising. Although similarity join has been
well studied in homogeneous networks, it has
IEEE 2015

not yet been studied
in heterogeneous networks. Especially, none of
the existing research on similarity join takes
different semantic meanings behind paths into
consideration and almost all completely ignore
the heterogeneity and diversity of the HINs. In
this paper, we propose a path-
based similarity join (PS-join) method to
return the top k similar pairs of objects based
on any user specified join path in
a heterogeneous information network. We
study how to prune expensive
similarity computation by introducing bucket
pruning based locality sensitive hashing
(BPLSH) indexing. Compared with existing
Link-based Similarity join (LS-join) method,
PS-join can derive various similarity
semantics. Experimental results on real data
sets show the efficiency and effectiveness of
the proposed approach.
TTA-JD-
C1525
Active Learning for
Ranking through
Expected Loss
Optimization
Learning to rank arises in many data mining
applications, ranging from web search engine,
online advertising to recommendation system.
In learning to rank, the performance of
a ranking model is strongly affected by the
number of labeled examples in the training set;
on the other hand, obtaining labeled examples
for training data is very expensive and time-
consuming. This presents a great need for
the active learning approaches to select most
informative examples for ranking learning;
however, in the literature there is still very
limited work to
address active learning for ranking. In this
paper, we propose a
general active learning framework, expected lo
ss optimization (ELO), for ranking. The ELO
framework is applicable to a wide range
of ranking functions. Under this framework,
we derive a novel
algorithm, expected discounted cumulative
gain (DCG) loss optimization (ELO-DCG), to
select most informative examples. Then, we
investigate both query and document
level active learning for raking and propose a
IEEE 2015

two-stage ELO-DCG algorithm which
incorporate both query and document selection
into active learning. Furthermore, we show
that it is flexible for the algorithm to deal with
the skewed grade distribution problem with the
modification of the loss function. Extensive
experiments on real-world web search data sets
have demonstrated great potential and
effectiveness of the proposed framework and
algorithms.
TTA-JD-
C1526
Relational Collaborative
Topic Regression for
Recommender Systems
Due to its successful application
in recommender systems, collaborative filterin
g (CF) has become a hot research topic in data
mining and information retrieval. In traditional
CF methods, only the feedback matrix, which
contains either explicit feedback (also called
ratings) or implicit feedback on the items given
by users, is used for training and prediction.
Typically, the feedback matrix is sparse, which
means that most users interact with few items.
Due to this sparsity problem, traditional CF
with only feedback information will suffer
from unsatisfactory performance. Recently,
many researchers have proposed to utilize
auxiliary information, such as item content
(attributes), to alleviate the data sparsity
problem in
CF. Collaborative topic regression (CTR) is
one of these methods which has achieved
promising performance by successfully
integrating both feedback information and item
content information. In many real applications,
besides the feedback and item content
information, there may exist relations (also
known as networks) among the items which
can be helpful for recommendation. In this
paper, we develop a novel hierarchical
Bayesian model
called Relational Collaborative Topic
Regression (RCTR), which extends CTR by
seamlessly integrating the user-item feedback
information, item content information, and
network structure among items into the same
model. Experiments on real-world datasets
show that our model can achieve better
IEEE 2015

prediction accuracy than the state-of-the-art
methods with lower empirical training time.
Moreover, RCTR can learn good interpretable
latent structures which are useful for
recommendation.
TTA-JD-
C1527
Relevance Feature
Discovery for Text
Mining
It is a big challenge to guarantee the quality of
discovered relevance features in text document
s for describing user preferences because of
large scale terms and data patterns. Most
existing popular text mining and classification
methods have adopted term-based approaches.
However, they have all suffered from the
problems of polysemy and synonymy. Over
the years, there has been often held the
hypothesis that pattern-based methods should
perform better than term-based ones in
describing user preferences; yet, how to
effectively use large scale patterns remains a
hard problem in text mining. To make a
breakthrough in this challenging issue, this
paper presents an innovative model
for relevance feature discovery. It discovers
both positive and negative patterns
in text documents as higher level features and
deploys them over low-level features (terms).
It also classifies terms into categories and
updates term weights based on their specificity
and their distributions in patterns. Substantial
experiments using this model on RCV1, TREC
topics and Reuters-21578 show that the
proposed model significantly outperforms both
the state-of-the-art term-based methods and the
pattern based methods.
IEEE 2015

TTA-JD-
C1528
Differentially Private
Frequent Itemset
Mining via Transaction
Splitting
Recently, there has been a growing interest in
designing differentially private data mining alg
orithms.Frequent itemset mining (FIM) is one
of the most fundamental problems in
data mining. In this paper, we explore the
possibility of designing
a differentially private FIM algorithm which
can not only achieve high data utility and a
high degree of privacy, but also offer high time
efficiency. To this end, we propose a
differentially private FIM algorithm based on
the FP-growth algorithm, which is referred to
as PFP-growth. The PFP-growth algorithm
consists of a preprocessing phase and
a mining phase. In the preprocessing phase, to
improve the utility and privacy tradeoff, a
novel smart splitting method is proposed to
transform the database. For a given database,
the preprocessing phase needs to be performed
only once. In the mining phase, to offset the
information loss caused
by transaction splitting, we devise a run-time
estimation method to estimate the actual
support of item sets in the original database. In
addition, by leveraging the downward closure
property, we put forward a dynamic reduction
method to dynamically reduce the amount of
noise added to guarantee privacy during the
mining process. Through formal privacy
analysis, we show that our PFP-growth
algorithm is ε-differentially private. Extensive
experiments on real datasets illustrate that our
PFP-growth algorithm substantially
outperforms the state-of-the-art techniques.
IEEE 2015
TTA-JD-
C1529
Backward Path Growth
for Efficient Mobile
Sequential
Recommendation
The problem
of mobile sequential recommendation is to
suggest a route connecting a set of pick-up
points for a taxi driver so that he/she is more
likely to get passengers with less travel cost.
Essentially, a key challenge of this problem is
its high computational complexity. In this
paper, we propose a novel dynamic
programming based method to solve
the mobile sequential recommendation proble
m consisting of two separate stages: an offline
IEEE 2015

pre-processing stage and an online search
stage. The offline stage pre-computes potential
candidate sequences from a set of pick-up
points. A backward incremental sequence
generation algorithm is proposed based on the
identified iterative property of the cost
function. Simultaneously, an incremental
pruning policy is adopted in the process of
sequence generation to reduce the search space
of the potential sequences effectively. In
addition, a batch pruning algorithm is further
applied to the generated potential sequences to
remove some non-optimal sequences of a
given length. Since the pruning effectiveness
keeps growing with the increase of the
sequence length, at the online stage, our
method can efficiently find the optimal driving
route for an unloaded taxi in the remaining
candidate sequences. Moreover, our method
can handle the problem of optimal route search
with a maximum cruising distance or a
destination constraint. Experimental results on
real and synthetic data sets show that both the
pruning ability and the efficiency of our
method surpass the state-of-the-art methods.
Our techniques can therefore be effectively
employed to address the problem
of mobile sequential recommendation with
many pick-up points in real-world
applications.
TTA-JD-
C1530
Mining Partially-
Ordered Sequential
Rules Common to
Multiple Sequences
Sequential rule mining is an important
data mining problem
with multiple applications. An important
limitation of algorithms
for mining sequential rules common to multipl
e sequences is that rules are very specific and
therefore many similar rules may represent the
same situation. This can cause three major
problems: (1) similar rules can be rated quite
differently, (2) rules may not be found because
they are individually considered uninteresting,
and (3) rules that are too specific are less
likely to be used for making
predictions. To address these issues, we
explore the idea of mining “partially-ordered
IEEE 2015

sequential rules” (POSR), a more general form
of sequential rules such that items in the
antecedent and the consequent of each rule are
unordered. To mine POSR, we propose the
RuleGrowth algorithm, which is efficient and
easily extendable. In particular, we present an
extension (TRuleGrowth) that accepts a
sliding-window
constraint to find rules occurring within a
maximum amount of time. A performance
study with four real-life datasets show that
RuleGrowth and TRuleGrowth have excellent
performance and scalability
compared to baseline algorithms and that the
number of rules discovered can be several
orders of magnitude smaller when the sliding-
window constraint is applied. Furthermore, we
also report results from a real application
showing that POSR can provide much higher
prediction accuracy than
regular sequential rules for sequence prediction
.
TTA-JD-
C1531
CRoM and HuspExt
Improving Efficiency of
High Utility Sequential
Pattern Extraction
High utility sequential pattern mining has been
considered as an important research problem
and a number of relevant algorithms have been
proposed for this topic. The main challenge
of high utility sequential pattern mining is that,
the search space is large and the efficiency of
the solutions is directly affected by the degree
at which they can eliminate the
candidate patterns. Therefore, the efficiency of
any high utility sequential pattern mining
solution depends on its ability to reduce this
big search space, and as a result, lower the
computational complexity of calculating
the utilities of the candidate patterns. In this
paper, we propose efficient data structures and
pruning technique which is based on
Cumulated Rest of Match (CRoM) based
upper bound. CRoM, by defining a tighter
upper bound on the utility of the candidates,
allows more conservative pruning before
candidate pattern generation in comparison to
the existing techniques. In addition, we have
developed an efficient
IEEE 2015

algorithm, High Utility Sequential
Pattern Extraction (HuspExt), which calculates
the utilities of the child patterns based on that
of the parents'. Substantial experiments on
both synthetic and real datasets from different
domains show that, the proposed solution
efficiently
discovers high utility sequential patterns from
large scale datasets with different data
characteristics, under low utility thresholds.
TTA-JD-
C1532
Co-Extracting Opinion
Targets and Opinion
Words from Online
Reviews Based on the
Word Alignment Model
Mining opinion targets and opinion words fro
m online reviews are important tasks for fine-
grained opinion mining, the key component of
which involves detecting opinion relations
among words. To this end, this paper proposes
a novel approach based on the partially-
supervised alignment model, which regards
identifying opinion relations as
an alignment process. Then, a graph-based co-
ranking algorithm is exploited to estimate the
confidence of each candidate. Finally,
candidates with higher confidence are
extracted as opinion targets or opinion words.
Compared to previous methods based on the
nearest-neighbor rules,
our model captures opinion relations more
precisely, especially for long-span relations.
Compared to syntax-based methods,
our word alignment model effectively
alleviates the negative effects of parsing errors
when dealing with informal online texts. In
particular, compared to the traditional
unsupervised alignment model, the
proposed model obtains better precision
because of the usage of partial supervision. In
addition, when estimating candidate
confidence, we penalize higher-degree vertices
in our graph-based co-ranking algorithm to
decrease the probability of error generation.
Our experimental results on three corpora with
different sizes and languages show that our
approach effectively outperforms state-of-the-
art methods.
IEEE 2015

TTA-JD-
C1533
Global Redundancy
Minimization for
Feature Ranking
Feature selection has been an important
research topic in data mining, because the real
data sets often have high-dimensional features,
such as the bioinformatics and text mining
applications. Many existing
filter feature selection
methods rank features by optimizing
certain feature ranking criterions, such that
correlated features often have similar rankings.
These correlated features are redundant and
don't provide large mutual information to help
data mining. Thus, when we select a limited
number of features, we hope to select the top
non-redundant features such that the useful
mutual information can be maximized. In
previous research, Ding et al. recognized this
important issue and proposed the minimum
Redundancy Maximum
Relevance Feature Selection (mRMR) model
to minimize the redundancy between
sequentially selected features. However, this
method used the greedy search, thus the global
feature redundancy wasn't considered and the
results are not optimal. In this paper, we
propose a new feature selection framework to
globally minimize the feature redundancy with
maximizing the given feature ranking scores,
which can come from any supervised or
unsupervised methods. Our new model has no
parameter so that it is especially suitable for
practical data mining application.
Experimental results on benchmark data sets
show that the proposed method consistently
improves the feature selection results
compared to the original methods. Meanwhile,
we introduce a new unsupervised global and
local discriminative feature selection method
which can be unified with the
global feature redundancy minimization frame
work and shows superior performance.
IEEE 2015
TTA-JD-
C1534
Review Selection Using
Micro-Reviews
Given the proliferation of review content, and
the fact that reviews are highly diverse and
often unnecessarily verbose, users frequently
face the problem of selecting the
IEEE 2015

appropriate reviews to consume. Micro-
reviews are emerging as a new type of
online review content in the social media.
Micro-reviews are posted by users of check-in
services such as Foursquare. They are concise
(up to 200 characters long) and highly focused,
in contrast to the comprehensive and
verbose reviews. In this paper, we propose a
novel mining problem, which brings together
these two disparate sources of review content.
Specifically, we use coverage of micro-
reviews as an objective for selecting a set of
reviews that cover efficiently the salient
aspects of an entity. Our approach consists of a
two-step process: matching review sentences
to micro-reviews, and selecting a small set
of reviews that cover as many micro-
reviews as possible, with few sentences. We
formulate this objective as a combinatorial
optimization problem, and show how to derive
an optimal solution using Integer Linear
Programming. We also propose an efficient
heuristic algorithm that approximates the
optimal solution. Finally, we perform a
detailed evaluation of all the steps of our
methodology using data collected from
Foursquare and Yelp.
TTA-JD-
C1535
A Trust Management
Scheme Based on
Behavior Feedback for
Opportunistic Networks
In the harsh environment where node density is
sparse, the slow-moving nodes cannot
effectively utilize the encountering
opportunities to realize the self-organized
identity authentications, and do not have the
chance to join the network routing. However,
considering most of the communications in
opportunistic networks are caused by
forwarding operations, there is no need to
establish the complete mutual authentications
IEEE 2015

for each conversation. Accordingly, a
novel trust management scheme is
presented based on the information
of behavior feedback, in order to complement
the insufficiency of identity authentications.
By utilizing the certificate chains based on
social attributes, the mobile nodes build the
local certificate graphs gradually to realize the
web of “Identity Trust” relationship.
Meanwhile, the successors generate
Verified Feedback Packets for each
positive behavior, and consequently the
“Behavior Trust” relationship is formed for
slow-moving nodes. Simulation result shows
that, by implementing our trust scheme, the
delivery probability and trust reconstruction
ratio can be effectively improved when there
are large numbers of compromised nodes, and
it means that our trust management scheme can
efficiently explore and filter the trust nodes for
secure forwarding inopportunistic networks.
TTA-JD-
C1536
Extending Association
Rule Summarization
Techniques to Assess
Risk of Diabetes
Mellitus
Early detection of patients with elevated risk of
developing diabetes mellitus is critical to the
improved prevention and overall clinical
management of these patients. We
aim to apply association rule mining to
electronic medical records (EMR) to discover
sets of risk factors and their corresponding
subpopulations that represent patients at
particularly high risk of developing diabetes.
Given the high dimensionality of
EMRs, association rule mining generates a
very large set of rules which we need to
summarize for easy clinical use. We reviewed
four association rule set summarization techniq
ues and conducted a comparative
evaluation to provide guidance regarding their
applicability, strengths and weaknesses. We
proposed
extensions to incorporate risk of diabetes into
the process of finding an optimal summary.
We evaluated these modified techniques on a
real-world prediabetic patient cohort. We
found that all four methods produced
summaries that described subpopulations at
IEEE 2015

high risk of diabetes with each method having
its clear strength. For our purpose, our
extension to the Buttom-Up
Summarization (BUS) algorithm produced the
most suitable summary. The subpopulations
identified by this summary covered most high-
risk patients, had low overlap and were at very
high risk of diabetes.
TTA-JD-
C1537
A decision-theoretic
rough set approach for
dynamic data mining
Uncertainty and fuzziness generally exist in
real-life data. Approximations are employed to
describe the uncertain information
approximately in rough set theory. Certain and
uncertain rules are induced directly from
different regions partitioned by
approximations. Approximation can further be
applied to data-mining-related task, ,
attribute reduction. Nowadays, different types
of data collected from different applications
evolve with time, especially new attributes
may appear while new objects are added. This
paper presents
an approach for dynamic maintenance of
approximations objects and attributes
added simultaneously under the framework
of decision-theoretic rough set (DTRS).
Equivalence feature vector and matrix are
defined first to update approximations of
DTRS in different levels of granularity. Then,
the information system is decomposed into
subspaces, and the equivalence feature matrix
is updated in different subspaces
incrementally. Finally, the approximations of
DTRS are renewed during the process of
updating the equivalence feature matrix.
Extensive experimental results verify the
effectiveness of the proposed methods.
IEEE 2015
TTA-JD-
C1538
A Joint Segmentation
and Classification
Framework for
Sentence Level
Sentiment
Classification
In this paper, we propose
a joint segmentation and classification framew
ork for sentence-levelsentiment classification.
It is widely recognized that phrasal
information is crucial for sentiment
classification. However,
IEEE 2015

existing sentiment classification algorithms
typically split a sentence as a word sequence,
which does not effectively handle the
inconsistent sentiment polarity between a
phrase and the words it contains, such as {“not
bad,” “bad”} and {“a great deal of,” “great”}.
We address this issue by developing
a joint framework for sentence-
level sentiment classification. It
simultaneously generates
useful segmentations and predicts sentence-
level polarity based on
the segmentation results. Specifically, we
develop a candidate generation model to
produce segmentation candidates of a
sentence; a segmentation ranking model to
score the usefulness of
a segmentation candidate for
sentiment classification; and
a classification model for predicting
the sentiment polarity of as egmentation. We
train the joint framework directly
from sentences annotated with only sentiment
polarity, without using any syntactic
or sentiment annotations in segmentation level.
We conduct experiments
for sentiment classification on two benchmark
datasets: a tweet dataset and a review dataset.
Experimental results show that: 1) our method
performs comparably with state-of-the-art
methods on both datasets;
2) joint modeling segmentation and classificati
on outperforms pipelined baseline methods in
various experimental settings.
TTA-JD-
C1539
A Similarity-Based
Learning Algorithm
Using Distance
Transformation
Numerous theories and algorithms have been
developed to solve vectorial
data learning problems by searching for the
hypothesis that best fits the observed training
sample. However, many real-world
applications involve samples that are not
described as feature vectors, but as
(dis)similarity data. Converting vectorial data
into (dis)similarity data is more easily
performed than converting (dis)similarity data
into vectorial data. This study proposes a
IEEE 2015

stochastic
iterative distance transformation model for
similarity-based learning. The proposed model
can be used to identify a clear class boundary
in data by modifying the (dis)similarities
between examples. The experimental results
indicate that the performance of the proposed
method is comparable with those of various
vector-based and proximity-
based learning algorithms.
TTA-JD-
C1540
Active Learning from
Relative Comparisons
This work focuses
on active learning from relative comparison inf
ormation. A relative comparison specifies, for
a data triplet (xi, xj, xk), that instance xi is
more similar to xj than to xk. Such constraints,
when available, have been shown to be useful
toward learning tasks such as defining
appropriate distance metrics or finding good
clustering solutions. In real-world applications,
acquiring constraints often involves
considerable human effort, as it requires the
user to manually inspect the instances. This
motivates us to study how to select and query
the most useful relative comparisons to
achieve effective learning with minimum user
effort. Given an underlying class concept that
is employed by the user to provide such
constraints, we present an information-
theoretic criterion that selects the triplet whose
answer leads to the highest expected
information gain about the classes of a set of
examples. Directly applying the proposed
criterion requires examining O(n3) triplets
with n instances, which is prohibitive even for
datasets of moderate size. We show that a
randomized selection strategy can be used to
reduce the selection pool from O(n3) to O(n)
with minimal loss in efficiency, allowing us to
scale up to considerably larger problems.
Experiments show that the proposed method
consistently outperforms baseline policies.
IEEE 2015
TTA-JD-
C1541
Adaptive Processing for
Distributed Skyline
Queries over Uncertain
Data
Query processing over uncertain data has
gained growing attention, because it is
necessary to deal with uncertain data in many
IEEE 2015

real-life applications. In this paper, we
investigate skyline queries overuncertain data i
n distributed environments (DSUD query)
whose research is only in an early stage. The
state-of-the-art algorithm, called e-DSUD
algorithm, is designed
for processing this query. It has the desirable
characteristics of progressiveness and
minimum bandwidth consumption. However,
it still needs to be perfected in three aspects.
(1) Progressiveness. Each time it only returns
one query result at most. (2) Efficiency. There
are a significant amount of redundant I/O cost
and numerous iterations which causes a long
total query time. (3) Universality. It is
restricted to the case where local skylinetuples
are incomparability. To address these
concerns, we first present a detailed analysis of
the e-DSUD algorithm and then develop an
improved framework for the DSUD query,
namely IDSUD. Based on the new framework,
we propose an adaptive algorithm, called
ADSUD, for the DSUD query. In the
algorithm, we redefine the approximate
global skyline probability and choose local
representative tuples due to minimum
probabilistic bounding rectangle adaptively.
Furthermore, we design a progressive pruning
method and apply the reuse mechanism to
improve its efficiency. The results of extensive
experiments verify the better overall
performance of our algorithm than the e-
DSUD algorithm.
TTA-JD-
C1542
Adding Geospatial Data
Provenance into SDI—
A Service-Oriented
Approach
Geospatial data provenance records the
derivation history of a geospatial data product.
It is important in evaluating the quality
of data products. In
a Geospatial Web Service environment
where data are often disseminated and
processed widely and frequently in an
unpredictable way, it is even more important in
identifying original data sources, tracing
workflows, updating or reproducing scientific
results, and evaluating reliability and quality
of geospatial data products. Geospatial data pr
IEEE 2015

ovenance has become a fundamental issue in
establishing the spatial data infrastructure
(SDI). This paper investigates how to
support provenance awareness in SDI. It
addresses key issues
including provenance modeling, capturing, and
sharing in a SDI enabled by
interoperable geospatial services. A reference
architecture for provenance tracking is
proposed, which can
accommodate geospatial feature provenance at
different levels of granularity. Open standards
from ISO, World Wide Web Consortium
(W3C), and OGC are leveraged to facilitate the
interoperability. At the feature type level, this
paper proposes extensions of W3C PROV-
XML for ISO 19115 lineage and “Parent
Level” provenance registration in
the geospatial catalog service. At the feature
instance level, light-weight lineage information
entities for feature provenance are proposed
and managed by Web Feature Services.
Experiments demonstrate the applicability of
the approach for
creating provenance awareness in an
interoperable geospatialservice-
oriented environment.
TTA-JD-
C1543
Answering Pattern
Queries Using Views
Answering queries using views has proven
effective for querying relational and
semistructured data. This paper investigates
this issue for graph pattern queries based on
graph simulation. We propose a notion
of pattern containment to characterize
graph pattern matching using graph pattern vie
ws. We show that a pattern query can
be answered using a set of views if and only if
it is contained in the views. Based on this
characterization, we develop efficient
algorithms to answer graph pattern queries. We
also study problems for determining (minimal,
minimum) containment of pattern queries. We
establish their complexity (from cubic-time to
NP-complete) and provide efficient checking
algorithms (approximation when the problem
is intractable). In addition, when
IEEE 2015

a pattern query is not contained in the views,
we study maximally contained rewriting to
find approximate answers; we show that it is in
cubic-time to compute such rewriting, and
present a rewriting algorithm. We
experimentally verify that these methods are
able to efficiently answer pattern queries on
large real-world graphs.
TTA-JD-
C1544
CloudKeyBank Privacy
and Owner
Authorization Enforced
Key Management
Framework
Explosive growth in the number of passwords
for Web based applications and
encryption keys for outsourced data storage
well exceeds the management limit of users.
Therefore, outsourcing keys(including
passwords and data encryption keys) to
professional password managers (honest-but-
curious service providers) is attracting the
attention of many users. However, existing
solutions in a traditional data outsourcing
scenario are unable to simultaneously meet the
following three security requirements
for keys outsourcing: (1) Confidentiality
and privacy of keys; (2) Search privacy on
identity attributes tied to keys;
(3) Owner controllable authorization over
his/her shared keys. In this paper, we
propose CloudKeyBank, the first
unified key management framework that
addresses all the three goals above. Under
our framework, the key owner can
perform privacy and controllable
authorization enforced encryption with
minimum information leakage. To
implement CloudKey Bank efficiently, we
propose a new cryptographic primitive named
Searchable Conditional Proxy Re-Encryption
(SC-PRE) which combines the techniques of
Hidden Vector Encryption (HVE) and Proxy
Re-Encryption (PRE) seamlessly, and propose
a concrete SC-PRE scheme based on existing
HVE and PRE schemes. Our experimental
results and security analysis show the
efficiency and security goals are well achieved.
IEEE 2015

TTA-JD-
C1545
Clustering Deviations
for Black Box
Regression Testing of
Database Applications
Regression tests often result in
many deviations (differences between two
system versions), either due to changes
or regression faults. For the tester to analyze
such deviations efficiently, it would be helpful
to accurately group them, such that each group
contains deviations representing one unique
change or regression fault. Because it is
unlikely that a general solution to the above
problem can be found, we focus our work on a
common type of software
system: database applications. We investigate
the use of clustering, based
on database manipulations
and test specifications (from test models), to
group regression test deviations according to
the faults or changes causing them. We also
propose assessment criteria based on the
concept of entropy to compare
alternative clustering strategies. To validate
our approach, we ran a large scale industrial
case study, and our results show that our
clustering approach can indeed serve as an
accurate strategy for
grouping regression test deviations. Among the
four test campaigns
assessed, deviations were clustered perfectly
for two of them, while for the other two,
the clusters were all homogenous. Our analysis
suggests that this approach can significantly
reduce the effort spent by testers in
analyzing regression test deviations, increase
their level of confidence, and therefore
make regression testing more scalable.
IEEE 2015
TTA-JD-
C1546
Context-based
Collaborative Filtering
for Citation
Recommendation
Citation recommendation is an interesting and
significant research area as it solves the
information overload in academia by
automatically suggesting relevant references
for a research paper. Recently, with the rapid
proliferation of information technology,
research papers are rapidly published in
various conferences and journals. This
makes citation recommendation a highly
important and challenging discipline. In this
paper, we propose a
IEEE 2015

novel citation recommendation method that
uses only easily obtained citation relations as
source data. The rationale underlying this
method is that, if two citing papers are
significantly co-occurring with the same citing
paper(s), they should be similar to some
extent. Based on the above rationale, an
association mining technique is employed to
obtain the paper representation of each citing
paper from the citation context. Then, these
paper representations are pairwise compared to
compute similarities between the citing papers
for collaborative filtering. We evaluate our
proposed method through two relevant real-
world data sets. Our experimental results
demonstrate that the proposed method
significantly outperforms the baseline method
in terms of precision, recall, and F1, as well as
mean average precision and mean reciprocal
rank, which are metrics related to the rank
information in the recommendation list.
TTA-JD-
C1547
Crowdsourcing for Top-
K Query Processing
over Uncertain Data
Querying uncertain data has become a
prominent application due to the proliferation
of user-generated content from social media
and of data streams from sensors.
When data ambiguity cannot be reduced
algorithmically, crowdsourcing proves a viable
approach, which consists in posting tasks to
humans and harnessing their judgment for
improving the confidence about data values or
relationships. This paper tackles the problem
of processing top-
K queries over uncertain data with the help
ofcrowdsourcing for quickly converging to the
real ordering of relevant results. Several offline
and online approaches for addressing questions
to a crowd are defined and contrasted on both
synthetic and real data sets, with the aim of
minimizing the crowd interactions necessary to
find the real ordering of the result set.
IEEE 2015
TTA-JD-
C1548
Discovering Latent
Semantics in Web
Documents using Fuzzy
Clustering
Web documents are heterogeneous and
complex. There exists complicated
associations within oneweb document and
linking to the others. The high interactions
IEEE 2015

between terms in documents demonstrate
vague and ambiguous meanings. Efficient and
effective clustering methods
to discoverlatent and coherent meanings in
context are necessary. This paper presents
a fuzzy linguistic topological space along with
a fuzzy clustering algorithm to discover the
contextual meaning in the web documents. The
proposed algorithm extracts features from
the web documents using conditional random
field methods and builds a fuzzy linguistic
topological space based on the associations of
features. The associations of cooccurring
features organize a hierarchy of
connected semantic complexes called
“CONCEPTS,” wherein a fuzzy linguistic
measure is applied on each complex to
evaluate 1) the relevance of
a document belonging to a topic, and 2) the
difference between the other
topics. Web contents are able to
be clustered into topics in the hierarchy
depending on their fuzzylinguistic
measures; web users can further explore the
CONCEPTS of web contents accordingly.
Besides the algorithm applicability in web text
domains, it can be extended to other
applications, such as data mining,
bioinformatics, content-based, or collaborative
information filtering, etc.
TTA-JD-
C1549
Discovery of Ranking
Fraud for Mobile Apps
Ranking fraud in the mobile App market refers
to fraudulent or deceptive activities which
have a purpose of bumping up the Apps in the
popularity list. Indeed, it becomes more and
more frequent for App developers to use shady
means, such as inflating their Apps' sales or
posting phony App ratings, to
commit ranking fraud. While the importance of
preventing ranking fraud has been widely
recognized, there is limited understanding and
research in this area. To this end, in this paper,
we provide a holistic view
of ranking fraud and propose
a ranking fraud detection system
for mobile Apps. Specifically, we first propose
IEEE 2015

to accurately locate the ranking fraud by
mining the active periods, namely leading
sessions, of mobile Apps. Such leading
sessions can be leveraged for detecting the
local anomaly instead of globalanomaly of
App rankings. Furthermore, we investigate
three types of evidences, i.e., ranking based
evidences, rating based evidences and review
based evidences, by modeling Apps' ranking,
rating and review behaviors through statistical
hypotheses tests. In addition, we propose an
optimization based aggregation method to
integrate all the evidences for fraud detection.
Finally, we evaluate the proposed system with
real-world App data collected from the iOS
App Store for a long time period. In the
experiments, we validate the effectiveness of
the proposed system, and show the scalability
of the detection algorithm as well as some
regularity of ranking fraud activities.
TTA-JD-
C1550
k-Nearest Neighbor
Classification over
Semantically Secure
Encrypted Relational
Data
Data Mining has wide applications in many
areas such as banking, medicine, scientific
research and among government
agencies. Classification is one of the
commonly used tasks in data mining
applications. For the past decade, due to the
rise of various privacy issues, many theoretical
and practical solutions to
the classification problem have been proposed
under different security models. However, with
the recent popularity of cloud computing, users
now have the opportunity to outsource
their data, in encrypted form, as well as
the data mining tasks to the cloud. Since
the data on the cloud is in encrypted form,
existing privacy-
preserving classification techniques are not
applicable. In this paper, we focus on solving
the classification problem over encrypted data.
In particular, we propose asecure k-NN
classifier over encrypted data in the cloud. The
proposed protocol protects the confidentiality
of data, privacy of user's input query, and hides
the data access patterns. To the best of our
knowledge, our work is the first to develop
IEEE 2015

a secure k-NN classifier
over encrypted data under the semi-honest
model. Also, we empirically analyze the
efficiency of our proposed protocol using a
real-world dataset under different parameter
settings.
TTA-JD-
C1551
Location Aware
Keyword Query
Suggestion Based on
Document Proximity
Keyword suggestion in web search helps users
to access relevant information without having
to know how to precisely express their queries.
Existing keyword suggestion techniques do not
consider the locations of the users and
the query results; i.e., the spatial proximity of a
user to the retrieved results is not taken as a
factor in the recommendation. However, the
relevance of search results in many
applications (e.g., location-based services) is
known to be correlated with their
spatial proximity to the query issuer. In this
paper, we design a location-
aware keyword query suggestion framework.
We propose a weighted keyword-
document graph, which captures both the
semantic relevance between
keyword queries and the spatial distance
between the resulting documents and the
user location. The graph is browsed in a
random-walk-with-restart fashion, to select
the keyword queries with the highest scores
as suggestions. To make our framework
scalable, we propose a partition-
based approach that outperforms the baseline
algorithm by up to an order of magnitude. The
appropriateness of our framework and the
performance of the algorithms are evaluated
using real data.
IEEE 2015
TTA-JD-
C1552
Mining Temporal
Patterns in Time
Interval-based Data
Sequential pattern mining is an important
subfield in data mining. Recently, applications
using time interval-based event data have
attracted considerable efforts in
discovering patterns from events that persist
for some duration. Since the relationship
between two intervals is intrinsically complex,
how to effectively and
efficiently mine interval-based sequences is a
IEEE 2015

challenging issue. In this paper, two novel
representations, endpoint representation and
end time representation, are proposed to
simplify the processing of complex
relationships among event intervals. Based on
the proposed representations, three types
of interval-based patterns: temporal pattern,
occurrence-probabilistic temporal pattern, and
duration-probabilistic temporal pattern, are
defined. In addition, we develop two novel
algorithms, Temporal Pattern Miner (TPMiner)
and Probabilistic Temporal Pattern Miner (P-
TPMiner), to discover three types of interval-
based sequential patterns. We also propose
three pruning techniques to further reduce the
search space of the mining process.
Experimental studies show that both
algorithms are able to find three types
of patterns efficiently. Furthermore, we apply
proposed algorithms to real datasets to
demonstrate the effectiveness and validate the
practicability of proposed patterns.
TTA-JD-
C1553
Multi-Objective Service
Composition in
Uncertain
Environments
Web services have the potential to offer the
enterprises with the ability to compose internal
and external business services in order to
accomplish complex
processes. Service composition then becomes
an increasingly challenging issue when
complex and critical applications are built
upon services with different QoS criteria.
However, most of the existing QoS-
aware service composition techniques are
simply based on the assumption that multiple
QoS criteria, no matter whether these multiple
criteria are conflicting or not, can be combined
into a single criterion to be optimized,
according to some utility functions. In practice,
this can be very difficult as these utility
functions or weights are not well-known a
priori. In addition, the existing approaches are
designed to work in certain environments,
where the QoS parameters are well-known in
advance. These approaches will render
fruitless when facing uncertainand
dynamic environments, e.g.,
IEEE 2015

cloud environments, where no prior knowledge
of the QoS parameters is available. In this
paper, two novel multi-objective approaches
are proposed to handle QoS-aware
Web service composition with conflicting
objectives and various restrictions on the
quality matrices. The proposed approaches use
reinforcement learning in order to deal with the
uncertainty characteristics inherent in open and
dynamic environments. Experimental results
reveal the ability of the proposed approaches to
find a set of Pareto optimal solutions, which
have the equivalent quality to satisfy multiple
QoS-objectives with different user preferences.
TTA-JD-
C1554
Pattern-based Topics
for Document Modelling
in Information Filtering
Many mature term-based or pattern-
based approaches have been used in the field
of information filtering to generate
users' information needs from a collection
of documents. A fundamental assumption for
these approaches is that the documents in the
collection are all about one topic. However, in
reality users' interests can be diverse and
the documents in the collection often involve
multiple topics. Topic modelling, such as
Latent Dirichlet Allocation (LDA), was
proposed to generate statistical models to
represent multiple topics in a collection
of documents, and this has been widely
utilized in the fields of machine learning
and information retrieval, etc. But its
effectiveness in information filtering has not
been so well explored. Patterns are always
thought to be more discriminative than single
terms for describing documents. However, the
enormous amount of discovered patterns
hinder them from being effectively and
efficiently used in real applications, therefore,
selection of the most discriminative and
representative patterns from the huge amount
of discovered patterns becomes crucial. To
deal with the above mentioned limitations and
problems, in this paper, a
novel information filtering model, Maximum
matched Pattern-
based Topic Model (MPBTM), is proposed.
IEEE 2015

The main distinctive features of the
proposed model include: (1)
user information needs are generated in terms
of multiple topics; (2) eachtopic is represented
by patterns; (3) patterns are generated
from topic models and are organized in terms
of their statistical and taxonomic features; and
(4) the most discriminative and representative
patterns, called Maximum Matched Patterns,
are proposed to estimate
the document relevance to the
user's information needs in order to filter out
irrelevant documents. Extensive experiments
are conducted to evaluate the effectiveness of
the proposed model by using the TREC data
collection Reuters Corpus Volume 1. The
results show that the
proposed model significantly - utperforms both
state-of-the-art term-based models and pattern-
based models.
TTA-JD-
C1555
Polarity Consistency
Checking for Domain
Independent Sentiment
Dictionaries
Polarity classification of words is
important for applications such as
Opinion Mining and Sentiment
Analysis. A number
of sentiment word/sense dictionaries ha
ve been manually or
(semi)automatically constructed. We
notice that
these sentiment dictionaries have
numerous inaccuracies. Besides obvious
instances, where the same word appears
with different polarities in
different dictionaries, the
dictionaries exhibit complex cases
of polarity inconsistency, which cannot
be detected by mere manual inspection.
We introduce the concept
of polarity consistency of words/senses
in sentiment dictionariesin this paper.
We show that the consistency problem
is NP-complete. We reduce the polarity
consistency problem to the satisfiability
IEEE 2015

problem and utilize two fast SAT
solvers to detect inconsistencies in
a sentiment dictionary. We perform
experiments on
five sentiment dictionaries and
WordNet to show interand intra-
dictionaries inconsistencies.
TTA-JD-
C1556
Predicting User-Topic
Opinions in Twitter with
Social and Topical
Context
With popular microblogging services
like Twitter, users are able to online share their
real-time feelings in a more convenient way.
The user generated data in Twitter is thus
regarded as a resource providing individuals'
spontaneous emotional information, and has
attracted much attention of researchers. Prior
work has measured the emotional expressions
in users' tweets and then performed various
analysis and learning. However, how to utilize
those learned knowledge from the observed
tweets and the context information
to predict users' opinions toward specific
topics they had not directly given yet, is a
novel problem presenting both challenges and
opportunities. In this paper, we mainly focus
on solving this problem with
a Social context and Topical context incorporat
ed Matrix Factorization (ScTcMF) framework.
The experimental results on a real-
world Twitter data set show that this
framework outperforms the state-of-the-art
collaborative filtering methods, and
demonstrate that
both social contextand topical context are
effective in improving the user-
topic opinion prediction performance.
IEEE 2015
TTA-JD-
C1557
RankRC Large-scale
Nonlinear Rare Class
Ranking
Rare class problems are common in real-world
applications across a wide range of domains.
Standard classification algorithms are known
to perform poorly in these cases, since they
focus on overall classification accuracy. In
addition, we have seen a significant increase of
data in recent years, resulting in
many large scale rare class problems. In this
paper, we focus on nonlinear kernel based
IEEE 2015

classification methods expressed as a
regularized loss minimization problem. We
address the challenges associated with
both rare class problems
and large scale learning, by 1) optimizing area
under curve of the receiver of operator
characteristic in the training process, instead of
classification accuracy and 2) using
a rare class kernel representation to achieve an
efficient time and space algorithm. We call the
algorithm RankRC. We provide justifications
for the rare class representation and
experimentally illustrate the effectiveness
of RankRC in test performance, computational
complexity, and model robustness.
TTA-JD-
C1558
Reverse Keyword
Search for Spatio-
Textual Top-kQueries
in Location-Based
Services
Spatio-textual queries retrieve the most similar
objects with respect to a given location and
a keywordset. Existing studies mainly focus on
how to efficiently find the top-k result set
given a spatio-textualquery. Nevertheless, in
many application scenarios, users cannot
precisely formulate their keywords and instead
prefer to choose them from some
candidate keyword sets. Moreover, in
information browsing applications, it is useful
to highlight the objects with the tags
(keywords) under which the objects have high
rankings. Driven by these applications, we
propose a novel query paradigm, namely
reverse keyword search for spatio-textual top-k
queries (RSTQ). It returns the keywords under
which a target object will be a spatio-
textual top-k result. To efficiently process the
new query, we devise a novel hybrid index
KcR-tree to store and summarize the spatial
and textual information of objects. By
accessing the high-level nodes of KcR-tree, we
can estimate the rankings of the target object
without accessing the actual objects. To further
improve the performance, we propose three
query optimization techniques, i.e., KcR*-tree,
lazy upper-bound updating, and keyword set
filtering. We also extend RSTQ to allow the
input location to be a spatial region instead of a
point. Extensive experimental evaluation
IEEE 2015

demonstrates the efficiency of our proposed
query techniques in terms of both the
computational cost and I/O cost.
TTA-JD-
C1559
Scalable Distributed
Processing Of K
Nearest Neighbore
Queries over Moving
Objects
Central to many applications
involving moving objects is the task
of processing k-nearest neighbor (k-
NN) queries. Most of the existing approaches
to this problem are designed for the centralized
setting where query processing takes place on
a single server; it is difficult, if not impossible,
for them to scale to a distributed setting to
handle the vast volume of data and
concurrent queries that are increasingly
common in those applications. To address this
problem, we propose a suite of solutions that
can
support scalable distributed processing of k-
NN queries. We first present a new index
structure called Dynamic Strip Index (DSI),
which can better adapt to different data
distributions than exiting grid indexes.
Moreover, it can be naturally distributed across
the cluster, therefore lending itself well
todistributed processing. We further propose
a distributed k-NN search (DKNN) algorithm
based on DSI. DKNN avoids having an
uncertain number of potentially expensive
iterations, and is thus more efficient and more
predictable than existing approaches. DSI and
DKNN are implemented on Apache S4, an
open-source platform
for distributed stream processing. We perform
extensive experiments to study the
characteristics of DSI and DKNN, and
compare them with three baseline methods.
Experimental results show that our proposal
scales well and significantly outperforms the
alternative methods.
IEEE 2015
TTA-JD-
C1560
Sentiment analysis
from opinion mining to
human-agent
interaction
The opinion mining and human-
agent interaction communities are currently
addressing sentiment analysis from different
perspectives that comprise, on the one hand,
disparate sentiment-related phenomena and
computational representations, and on the
IEEE 2015

other hand, different detection and dialog
management methods. In this paper we
identify and discuss the growing opportunities
for cross-disciplinary work that may increase
individual
advances. Sentiment/opinion detection
methods used inhuman-agent interaction are
indeed rare and, when they are employed, they
are not different from the ones used
in opinion mining and consequently not
designed for socio-
affective interactions (timing constraint of
the interaction, sentiment analysis as an input
and an output
of interaction strategies). Tosupport our
claims, we present a comparative state of the
art which analyzes the sentiment-related
phenomena and the sentiment detection
methods used in both communities and makes
an overview of the goals of socio-
affective human-agent strategies. We propose
then different possibilities for mutual benefit,
specifying several research tracks and
discussing the open questions and
prospects. To show the feasibility of the
general guidelines proposed we also approach
them from a specific perspective by applying
them to the case of the Greta embodied
conversational agents platform and discuss the
way they can be used to make a more
significative sentiment analysis for human-
agent interactions in two different use cases:
job interviews and dialogs with museum
visitors.
TTA-JD-
C1561
Similarity Measure
Selection for Clustering
Time Series Databases
In the past few years, clustering has become a
popular task associated with time series. The
choice of a suitable distance measure is crucial
to the clustering process and, given the vast
number of distance
measures for time series available in the
literature and their diverse characteristics,
this selection is not straightforward. With the
objective of simplifying this task, we propose a
multi-label classification framework that
provides the means to automatically select the
IEEE 2015

most suitable distance measures for
clustering a time series database. This
classifier is based on a novel collection of
characteristics that describe the main features
of the time series databases and provide the
predictive information necessary to
discriminate between a set of
distance measures. In order to test the validity
of this classifier, we conduct a complete set of
experiments using both synthetic and
real time series databases and a set of 5
common distance measures. The positive
results obtained by the designed classification
framework for various
performance measures indicate that the
proposed methodology is useful to simplify the
process of
distance selection in time series clustering task
s.
TTA-JD-
C1562
Splitting Large Medical
Data Sets based on
Normal Distribution in
Cloud Environment
The surge of medical and e-commerce
applications has generated tremendous amount
of data, which brings people to a so-called
“Big Data” era. Different from
traditional large data sets, the term “Big Data”
not only means the large size of data volume
but also indicates the high velocity
of data generation. However,
current data mining and analytical techniques
are facing the challenge of dealing with large
volume data in a short period of time. This
paper explores the efficiency of utilizing
the Normal Distribution (ND) method
for splitting and
processing large volume medical data in cloud
environment, which can provide representative
information in the split data sets. The ND-
based new model consists of two stages. The
first stage adopts the ND method
for large data sets splitting and processing,
which can reduce the volume of data sets. The
second stage implements the ND-based model
in a cloud computing infrastructure for
allocating the split data sets. The experimental
results show substantial efficiency gains of the
proposed method over the conventional
IEEE 2015

methods without splitting data into small
partitions. The ND-based method can generate
representative data sets, which can offer
efficient solution for large data processing.
The split data sets can be processed in parallel
in Cloud computing environment.
TTA-JD-
C1563
Steganography Using
Reversible Texture
Synthesis
We propose a novel approach
for steganography using a reversible texture sy
nthesis. A texture synthesis process resamples
a smaller texture image, which synthesizes a
new texture image with a similar local
appearance and an arbitrary size. We weave
the texture synthesis process into
steganography to conceal secret messages. In
contrast to using an existing cover image to
hide messages, our algorithm conceals the
source texture image and embeds secret
messages through the process
of texture synthesis. This allows us to extract
the secret messages and source texture from a
stego synthetic texture. Our approach offers
three distinct advantages. First, our scheme
offers the embedding capacity that is
proportional to the size of the
stego texture image. Second, a steganalytic
algorithm is not likely to defeat our
steganographic approach. Third,
the reversible capability inherited from our
scheme provides functionality, which allows
recovery of the source texture. Experimental
results have verified that our proposed
algorithm can provide various numbers of
embedding capacities, produce a visually
plausible texture images, and recover the
source texture.
IEEE 2015
TTA-JD-
C1564
TASCTopic-Adaptive
Sentiment
Classification on
Dynamic Tweets Topic
Model for Graph Mining
Sentiment classification is a topic-sensitive
task, i.e., a classifier trained from
one topic will perform worse on another. This
is especially a problem for
the tweets sentiment analysis. Since
the topics in Twitter are very diverse, it is
impossible to train a universal classifier for
all topics. Moreover, compared to product
review, Twitter lacks data labeling and a rating
IEEE 2015

mechanism to acquire sentiment labels. The
extremely sparse text of tweets also brings
down the performance of
a sentiment classifier. In this paper, we
propose a semi-supervised topic-
adaptive sentiment classification (TASC) mod
el, which starts with a classifier built on
common features and mixed labeled data from
various topics. It minimizes the hinge loss to
adapt to unlabeled data and features
including topic-related sentiment words,
authors' sentiments and sentiment connections
derived from“@” mentions of tweets, named
astopic-adaptive features. Text and non-text
features are extracted and naturally split into
two views for co-training. The TASC learning
algorithm updates topic-adaptive features
based on the collaborative selection of
unlabeled data, which in turn helps to select
more reliable tweets to boost the performance.
We also design the adapting model along a
timeline (TASC-t) for dynamic tweets. An
experiment on 6topics from
published tweet corpuses demonstrates
that TASC outperforms other well-known
supervised and ensemble classifiers. It also
beats those semi-supervised learning methods
without feature adaption. Meanwhile, TASC-t
can also achieve impressive accuracy and F-
score. Finally, with timeline visualization of
“river” graph, people can intuitively grasp the
ups and downs of sentiments' evolvement, and
the intensity by color gradation.
TTA-JD-
C1565
Towards Effective Bug
Triage with Software
Data Reduction
Techniques
Software companies spend over 45 percent of
cost in dealing with software bugs. An
inevitable step of fixing bugs is bug triage,
which aims to correctly assign a developer to a
new bug. To decrease the time cost in manual
work, text classification techniques are applied
to conduct automatic bug triage. In this paper,
we address the problem
of data reduction for bug triage, i.e., how to
reduce the scale and improve the quality
of bug data. We combine instance selection
with feature selection to simultaneously
IEEE 2015

reduce data scale on the bug dimension and the
word dimension. To determine the order of
applying instance selection and feature
selection, we extract attributes from
historical bug data sets and build a predictive
model for a new bug data set. We empirically
investigate the performance of data reduction
on totally 600,000 bug reports of two large
open source projects, namely Eclipse and
Mozilla. The results show that
our data reduction can effectively reduce
the data scale and improve the accuracy of
bug triage. Our work provides an approach to
leveraging techniques on data processing to
form reduced and high-
quality bug data in software development and
maintenance.
DOMAIN : CLOUD COMPUTING
TTA-DC-
C1501
An Access Control
Model for Online Social
Networks Using User-
to-User Relationships
Users and resources
in online social networks (OSNs) are
interconnected via various types of
relationships. In particular, user-to-
user relationships form the basis of the OSN
structure, and play a significant role in
specifying and enforcing access control.
Individual users and the OSN provider should
be enabled to specify which access can be
granted in terms of existing relationships. In
this paper, we propose a novel user-to-
user relationship-
based access control (UURAC) model for
OSN systems that utilizes regular expression
notation for such policy
specification. Access control policies on users
and resources are composed in terms of
requested action, multiple relationship types,
the starting point of the evaluation, and the
number of hops on the path. We present two
path checking algorithms to determine whether
the required relationship path between users
for a given access request exists. We validate
the feasibility of our approach by
implementing a prototype system and
evaluating the performance of these two
IEEE 2015

algorithms.
TTA-DC-
C1502
Cost-Effective
Authentic and
Anonymous Data
Sharing with Forward
Security
Data sharing has never been easier with the
advances of cloud computing, and an
accurate analysis on
the shared data provides an array of
benefits to both the society and
individuals. Data sharing with a large
number of participants must take into
account several issues, including
efficiency, data integrity and privacy
of data owner. Ring signature is a promising
candidate to construct an anonymous and
authentic data sharing system. It allows
a data owner to anonymously authenticate
his data which can be put into the cloud for
storage or analysis purpose. Yet the costly
certificate verification in the traditional public
key infrastructure (PKI) setting becomes a
bottleneck for this solution to be scalable.
Identity-based (ID-based) ring signature,
which eliminates the process of certificate
verification, can be used instead. In this
paper, we further enhance the security of ID-
based ring signature by providing
forward security: If a secret key of any user
has been compromised, all previous
generated signatures that include this user
still remain valid. This property is especially
important to any large scale data
sharing system, as it is impossible to ask
all data owners to reauthenticate
their data even if a secret key of one single
user has been compromised. We provide a
concrete and efficient instantiation of our
scheme, prove its security and provide an
implementation to show its practicality.
IEEE 2015

TTA-DC-
C1503
SEDASC - shared data
authority Scheme
Cloud storage is an application of clouds that
liberates organizations from establishing in-
house data storage systems. However, cloud
storage gives rise to security concerns. In case
of group-shared data, the data face both cloud-
specific and conventional insider threats.
Secure data sharing among a group that
counters insider threats of legitimate yet
malicious users is an important research issue.
In this paper, we propose the Secure Data
Sharing in Clouds (SeDaSC) methodology that
provides: 1) data confidentiality and integrity;
2) access control; 3) data sharing (forwarding)
without using compute-intensive reencryption;
4) insider threat security; and 5) forward and
backward access control. The SeDaSC
methodology encrypts a file with a single
encryption key. Two different key shares for
each of the users are generated, with the user
only getting one share. The possession of a
single share of a key allows the SeDaSC
methodology to counter the insider threats. The
other key share is stored by a trusted third
party, which is called the cryptographic server.
The SeDaSC methodology is applicable to
conventional and mobile cloud computing
environments. We implement a working
prototype of the SeDaSC methodology and
evaluate its performance based on the time
consumed during various operations. We
formally verify the working of SeDaSC by
using high-level Petri nets, the Satisfiability
Modulo Theories Library, and a Z3 solver. The
results proved to be encouraging and show that
SeDaSC has the potential to be effectively
used for secure data sharing in the cloud.
IEEE 2015
TTA-DC-
C1504
A Computational
Dynamic Trust Model
for User Authorization
Development of authorization mechanisms for
secure information access by a large
community of users in an open environment is
an important problem in the ever-growing
Internet world. In this paper we propose
a computational dynamic trust model for user a
uthorization, rooted in findings from social
science. Unlike most
existing computational trust models,
IEEE 2015

this model distinguishes trusting belief in
integrity from that in competence in different
contexts and accounts for subjectivity in the
evaluation of a particular trustee by different
trusters. Simulation studies were conducted to
compare the performance of the proposed
integrity belief model with
other trust models from the literature for
different user behavior patterns. Experiments
show that the proposed model achieves higher
performance than other models especially in
predicting the behavior of unstable users.
TTA-DC-
C1505
Shared Authority Based
Privacy-Preserving
Authentication Protocol
in Cloud Computing
Cloud computing is an emerging data
interactive paradigm to realize users' data
remotely stored in an
online cloud server. Cloud services provide
great conveniences for the users to enjoy the
on-demand cloud applications without
considering the local infrastructure limitations.
During the data accessing, different users may
be in a collaborative relationship, and thus
data sharing becomes significant to achieve
productive benefits. The existing security
solutions mainly focus on the authentication to
realize that a user's privative data cannot be
illegally accessed, but neglect a
subtle privacy issue during a user challenging
the cloud server to request other users for
data sharing. The challenged access request
itself may reveal the user's privacy no matter
whether or not it can obtain the data access
permissions. In this paper, we propose
a shared authority based privacy-
preserving authenticationprotocol (SAPA) to
address above privacy issue for cloud storage.
In the SAPA, 1) shared access authority is
achieved by anonymous access request
matching mechanism with security and privacy
considerations (e.g., authentication, data
anonymity, user privacy, and forward
security); 2) attribute based access control is
adopted to realize that the user can only access
its own data fields; 3) proxy re-encryption is
applied to provide data sharing among the
multiple users. Meanwhile, universal
IEEE 2015

composability (UC) model is established to
prove that the SAPA theoretically has the
design correctness. It indicates that the
proposed protocol is attractive for multi-user
collaborative cloud applications.
TTA-DC-
C1506
Provable Multicopy
Dynamic Data
Possession in Cloud
Computing Systems
Increasingly more and more organizations are
opting for outsourcing data to
remote cloud service providers (CSPs).
Customers can rent the CSPs storage
infrastructure to store and retrieve almost
unlimited amount of data by paying fees
metered in gigabyte/month. For an increased
level of scalability, availability, and durability,
some customers may want their data to be
replicated on multiple servers across
multiple data centers. The more copies the
CSP is asked to store, the more fees the
customers are charged. Therefore, customers
need to have a strong guarantee that the CSP is
storing all data copies that are agreed upon in
the service contract, and all these copies are
consistent with the most recent modifications
issued by the customers. In this paper, we
propose a map-based provable
multicopy dynamic data possession (MB-
PMDDP) scheme that has the following
features: 1) it provides an evidence to the
customers that the CSP is not cheating by
storing fewer copies; 2) it supports outsourcing
of dynamic data, i.e., it supports block-level
operations, such as block modification,
insertion, deletion, and append; and 3) it
allows authorized users to seamlessly access
the file copies stored by the CSP. We give a
comparative analysis of the proposed MB-
PMDDP scheme with a reference model
obtained by extending
existing provable possession of dynamic single
-copy schemes. The theoretical analysis is
validated through experimental results on a
commercial cloud platform. In addition, we
show the security against colluding servers,
and discuss how to identify corrupted copies
by slightly modifying the proposed scheme.
IEEE 2015

TTA-DC-
C1507
My Privacy My Decision
- Control of Photo
Sharing on Online
Social Networks
Photo sharing is an attractive feature which
popularizes Online Social Networks (OSNs).
Unfortunately, it may leak users’ privacy if
they are allowed to post, comment, and tag
a photo freely. In this paper, we attempt to
address this issue and study the scenario when
a user shares a photo containing individuals
other than himself/herself (termed co-photo for
short). To prevent possible privacy leakage of
a photo, we design a mechanism to enable each
individual in a photo be aware of the posting
activity and participate in the decision making
on the photo posting. For this purpose, we
need an efficient facial recognition (FR)
system that can recognize everyone in
the photo. However, more demanding privacy
setting may limit the number of
the photos publicly available to train the FR
system. To deal with this dilemma, our
mechanism attempts to utilize users’
private photos to design a personalized FR
system specifically trained to differentiate
possible photo co-owners without leaking
their privacy. We also develop a distributed
consensus based method to reduce the
computational complexity and protect the
private training set. We show that our system
is superior to other possible approaches in
terms of recognition ratio and efficiency. Our
mechanism is implemented as a proof of
concept Android application on Facebook’s
platform.
IEEE 2015
TTA-DC-
C1508
A Profit Maximization
Scheme with
Guaranteed Quality of
Service in Cloud
Computing
As an effective and efficient way to
provide computing resources and services to
customers on demand, cloud computing has
become more and more popular.
From cloud service providers'
perspective, profit is one of the most important
considerations, and it is mainly determined by
the configuration of a cloud service platform
under given market demand. However, a single
long-term renting scheme is usually adopted to
configure a cloud platform, which
cannot guarantee the service quality but leads
to serious resource waste. In this paper, a
IEEE 2015

double resource renting scheme is designed
firstly in which short-term renting and long-
term renting are combined aiming at the
existing issues. This double
renting scheme can
effectively guarantee the quality of service of
all requests and reduce the resource waste
greatly. Secondly, a service system is
considered as an M/M/m+D queuing model
and the performance indicators that affect
the profit of our double renting scheme are
analyzed, e.g., the average charge, the ratio of
requests that need temporary servers, and so
forth. Thirdly, a profit maximization problem
is formulated for the double
renting scheme and the optimized
configuration of a cloud platform is obtained
by solving the profit maximization problem.
Finally, a series of calculations are conducted
to compare the profit of our
proposed scheme with that of the single renting
scheme. The results show that our scheme can
not only guarantee the service quality of all
requests, but also obtain more profit than the
latter.
TTA-DC-
C1509
Attribute-based Access
Control with Constant-
size Ciphertext in Cloud
Computing
With the popularity of cloud computing, there
have been increasing concerns about its
security and privacy. Since
the cloud computing environment is distributed
and untrusted, data owners have to encrypt
outsourced data to enforce confidentiality.
Therefore, how to achieve practicable access
control of encrypted data in an untrusted
environment is an urgent issue that needs to be
solved. Attribute-Based Encryption (ABE) is a
promising scheme suitable
for access control in cloud storage systems.
This paper proposes a hierarchical attribute-
based access control scheme with constant-size
Ciphertext. The scheme is efficient because the
length of Ciphertext and the number of bilinear
pairing evaluations to a constant are fixed. Its
computation cost in encryption and decryption
algorithms is low. Moreover, the hierarchical
authorization structure of our scheme reduces
IEEE 2015

the burden and risk of a single authority
scenario. We prove the scheme is of CCA2
security under the decisional q-Bilinear Diffie-
Hellman Exponent assumption. In addition, we
implement our scheme and analyse its
performance. The analysis results show the
proposed scheme is efficient, scalable, and
fine-grained in dealing with access control for
outsourced data in cloud computing.
TTA-DC-
C1510
Bidding Strategies for
Spot Instances in Cloud
Computing Markets
In recent times, spot pricing - a dynamic
pricing scheme - is becoming increasingly
popular for cloud services. This new pricing
format, though efficient in terms of cost
and resource use, has added to the
complexity of decision making for
typical cloud computing users. To
recommend bidding strategies in
spot markets, we use a simulation study to
understand the implications that provider-
recommended strategies have
for cloud users. We use data based on
Amazon's
Elastic Compute Cloud spot market to
provide users with guidelines when
considering tradeoffs between cost, wait
time, and interruption rates.
IEEE 2015
TTA-DC-
C1511
CHARM - A Cost-
efficient Multi-cloud
Data Hosting Scheme
with High Availability
Nowadays, more and more enterprises and
organizations are hosting their data into
the cloud, in order to reduce the IT
maintenance cost and enhance
the data reliability. However, facing the
numerous cloud vendors as well as their
heterogeneous pricing policies, customers may
well be perplexed with which cloud(s) are
suitable for storing their data and
what hosting strategy is cheaper. The general
status quo is that customers usually put
their data into a single cloud (which is subject
to the vendor lock-in risk) and then simply
trust to luck. Based on comprehensive analysis
of various state-of-the-artcloud vendors, this
paper proposes a
novel data hosting scheme (named CHARM)
IEEE 2015

which integrates two key functions desired.
The first is selecting several suitable clouds
and an appropriate redundancy strategy to
store data with minimized monetary cost and
guaranteed availability. The second is
triggering a transition process to re-
distribute data according to the variations
of data access pattern and pricing of clouds.
We evaluate the performance
of CHARM using both trace-driven
simulations and prototype experiments. The
results show that compared with the major
existing schemes; CHARM not only saves
around 20 percent of monetary cost but also
exhibits sound adaptability to data and price
adjustments.
TTA-DC-
C1512
CloudArmor -
Supporting Reputation-
based Trust
Management for Cloud
Services
Trust management is one of the most
challenging issues for the adoption and growth
of cloud computing. The highly dynamic,
distributed, and non-transparent nature
of cloud services introduces several
challenging issues such as privacy, security,
and availability. Preserving consumers’
privacy is not an easy task due to the sensitive
information involved in the interactions
between consumers and
the trust management service.
Protecting cloud services against their
malicious users (e.g., such users might give
misleading feedback to disadvantage a
particular cloud service) is a difficult problem.
Guaranteeing the availability of
the trust management service is another
significant challenge because of the dynamic
nature of cloud environments. In this article,
we describe the design and implementation
of CloudArmor, a reputation-
based trust management framework that
provides a set of functionalities to
deliver Trust as a Service (TaaS), which
includes i) a novel protocol to prove the
credibility of trust feedbacks and preserve
users’ privacy, ii) an adaptive and robust
credibility model for measuring the credibility
of trust feedbacks to
IEEE 2015

protect cloud services from malicious users
and to compare the trustworthiness
of cloud services, and iii) an availability model
to manage the availability of the decentralized
implementation of
the trust management service. The feasibility
and benefits of our approach have been
validated by a prototype and experimental
studies using a collection of real-world
trust feedbacks on cloud services.
TTA-DC-
C1513
DaSCE - Data Security
for Cloud Environment
with Semi-Trusted
Third Party
Off-site data storage is an application
of cloud that relieves the customers from
focusing on data storage system. However,
outsourcing data to a third-party administrative
control entails serious
security concerns. Data leakage may occur due
to attacks by other users and machines in
the cloud. Wholesale of data by cloud service
provider is yet another problem that is faced in
the cloud environment. Consequently, high-
level of security measures is required. In this
paper, we
propose DataSecurity for Cloud Environment
with Semi-Trusted Third Party (DaSCE),
a data security system that provides (a) key
management (b) access control, and (c) file
assured deletion. The DaSCE utilizes Shamir’s
(k, n) threshold scheme to manage the keys,
where k out of n shares are required to
generate the key. We use multiple key
managers, each hosting one share of key.
Multiple key managers avoid single point of
failure for the cryptographic keys. We (a)
implement a working prototype of DaSCE and
consumed during various operations, (b)
formally model and analyze the working
of DaSCE using High Level Petri nets
(HLPN), and (c) verify the working of
DaSCE using Satisfiability Modulo Theories
Library (SMT-Lib) and Z3 solver. The results
reveal that DaSCE can be effectively used
for security of outsourced data by employing
key management, access control, and file
assured deletion.
IEEE 2015

TTA-DC-
C1514
Data as a Currency and
Cloud-Based Data
Lockers
With large data volumes being generated
through Google search, Facebook, Twitter,
Instagram, and the increasingly instrumented
physical world (with embedded sensors), the
authors discuss whether such data can be the
basis of a new transactional relationship
between people and companies in which both
sides benefit from new products and services
and increased economic growth. However, the
key distinction from previous discussions is
whether the existence of a
global cloud computing industry (consisting of
datacenters located in different parts of the
world) can be used to facilitate such
transactional relations, with awareness
of data privacy and access management. The
authors propose the use of data as a currency,
to enable consumers to directly monetize their
own data and request services (based on the
"value" their data holds within a marketplace).
IEEE 2015
TTA-DC-
C1515
Dynamic Weight-Based
Individual Similarity
Calculation for
Information Searching
in Social Computing
In the social computing environment, the
complete information about an individual is
usually distributed in
heterogeneous social networks, which are
presented as linked data. Synthetically
recognizing and integrating these distributed
and heterogeneous data for
efficiently information searching is an
important but challenging work. In this paper,
a dynamic weight (DW)-
based similarity calculation is proposed to
recognize and integrate
similar individuals from distributed data
environments. First, each link of an
individual is weighted by applying DW. Then,
a semantic similarity metric is proposed to
combine the DW into similarity calculation.
Then, a searching system framework for
a similarity-based individual is designed and
tested in real-world data sets. Finally, massive
experiments are conducted both in benchmark
and real-world social community data sets. The
results show that our approach can produce a
IEEE 2015

good result in
similar individual searching in social networks.
In addition, it performs significantly better
than the existing state-of-the-art approaches in
similar individual searching.
TTA-DC-
C1516
Efficient audit service
outsourcing for data
integrity in clouds
Cloud computing that provides elastic
computing and storage resource on demand
has become increasingly important due to the
emergence of “big data”. Cloud computing
resources are a natural fit for processing
big data streams as they allow
big data application to run at a scale which is
required for handling its complexities
(data volume, variety and velocity). With
the data no longer under users' direct
control, data security in cloud computing is
becoming one of the most concerns in the
adoption of cloud computing resources. In
order to improve data reliability and
availability, storing multiple replicas along
with original datasets is a common strategy
for cloud service providers.
Public data auditing schemes allow users to
verify their outsourced data storage without
having to retrieve the whole dataset. However,
existing data auditing techniques suffers from
efficiency and security problems. First, for
dynamic datasets with multiple replicas, the
communication overhead for update
verifications is very large, because each update
requires updating of all replicas, where
verification for each update requires O(log n )
communication complexity. Second, existing
schemes cannot provide public auditing and
authentication of block indices at the same
time. Without authentication of block indices,
the server can build a valid proof based
on data blocks other than the blocks client
requested to verify. In order to address these
problems, in this paper, we present a novel
public auditing scheme named MuR-DPA. The
new scheme incorporated a novel
authenticated data structure (ADS) based on
the Merkle hash tree (MHT), which we call
MR-MHT. To support full
IEEE 2015

dynamic data updates and authentication of
block indices, we included rank and level
values in computation of MHT nodes. In
contrast to existing schemes, level values of
nodes in MR-MHT are assigned in a top-down
order, and all replica blocks for
each data block are organized into a same
replica sub-tree. S- ch a configuration
allows efficient verification of updates for
multiple replicas. Compared to
existing integrity verification and
public auditing schemes, theoretical analysis
and experimental results show that the
proposed MuR-DPA scheme can not only
incur much less communication overhead for
both update verification
and integrity verification of cloud datasets with
multiple replicas, but also provide enhanced
security against dishonest cloud
service providers.
TTA-DC-
C1517
Generic and Efficient
Constructions of
Attribute-Based
Encryption with
Decryption
Attribute-based encryption (ABE) provides a
mechanism for complex access control over
encrypted data. However in most ABE
systems, the ciphertext size and
the decryption overhead, which grow with the
complexity of the access policy, are becoming
critical barriers in applications running on
resource-limited
devices. Outsourcing decryption of ABE
ciphertexts to a powerful third party is a
considerable manner to solve this problem.
Since the third party is usually believed to be
untrusted, the security requirements of ABE
with outsourced decryption should include
privacy and verifiability. Namely, any
adversary including the third party should
learn nothing about the encrypted message,
and the correctness of
the outsourced decryption is supposed to be
verified efficiently. We propose generic
constructions of CPA-secure and RCCA-
secure ABE systems
with verifiable outsourced decryption from
CPA-secure ABE with outsourced decryption,
IEEE 2015

respectively. We also instantiate our CPA-
secure construction in the standard model and
then show an implementation of this
instantiation. The experimental results show
that, compared with the existing scheme, our
CPA-secure construction has more compact
ciphertext and less computational costs.
Moreover, the techniques involved in the
RCCA-secure construction can be applied in
generally constructing CCA-secure ABE,
which we believe to be of independent interest.
TTA-DC-
C1518
Group Key Agreement
with Local Connectivity
In this paper, we study
a group key agreement problem where a user is
only aware of his neighbors while
the connectivity graph is arbitrary. In our
problem, there is no centralized initialization
for users. A group key agreement with these
features is very suitable for social networks.
Under our setting, we construct two efficient
protocols with passive security. We obtain
lower bounds on the round complexity for this
type of protocol, which demonstrates that our
constructions are round efficient. Finally, we
construct an actively secure protocol from a
passively secure one.
IEEE 2015
TTA-DC-
C1519
Hybrid cloud approach
for secure authorized
seduplication
Data deduplication is one of important data
compression techniques for eliminating
duplicate copies of repeating data, and has
the amount of storage space and save
bandwidth. To protect the confidentiality of
sensitive data while supporting deduplication,
the convergent encryption technique has been
proposed to encrypt the data before
outsourcing. To better protect data security,
this paper makes the first attempt to formally
address the problem of authorized data
deduplication. Different from traditional
deduplication systems, the differential
privileges of users are further considered in
duplicate check besides the data itself. We also
present several new deduplication
constructions supporting authorized duplicate
check in a hybrid cloud architecture. Security
IEEE 2015

analysis demonstrates that our scheme
is secure in terms of the definitions specified in
the proposed security model. As a proof of
concept, we implement a prototype of our
proposed authorized duplicate check scheme
and conduct test bed experiments using our
prototype. We show that our
incurs minimal overhead compared to normal
operations.
TTA-DC-
C1520
Performing Initiative
Data Prefetching in
Distributed File
Systems for Cloud
Computing
This paper presents
an initiative data prefetching scheme on the
storage servers in distributed file
systems for cloud computing. In
this prefetching technique, the client machines
are not substantially involved in the process
of data prefetching, but the storage servers can
directly prefetch the data after analyzing the
history of disk I/O access events, and then send
the prefetched data to the relevant client
machines proactively. To put this technique to
work, the information about client nodes is
piggybacked onto the real client I/O requests,
and then forwarded to the relevant storage
server. Next, two prediction algorithms have
been proposed to forecast future block access
operations for directing what datashould be
fetched on storage servers in advance. Finally,
the prefetched data can be pushed to the
relevant client machine from the storage
server. Through a series of evaluation
experiments with a collection of application
benchmarks, we have demonstrated that our
presented initiative prefetching technique can
benefit distributed file systems for cloud envir
onments to achieve better I/O performance. In
particular, configuration-limited client
machines in the cloud are not responsible for
predicting I/O access operations, which can
definitely contribute to
preferable system performance on them.
IEEE 2015
TTA-DC-
C1521
Privacy protection for
wirelsess sensor
medical data
In recent years, wireless sensor networks have
been widely used in healthcare applications,
such as hospital and home patient monitoring.
IEEE 2015

Wireless medical sensor networks are more
vulnerable to eavesdropping, modification,
impersonation and replaying attacks than the
wired networks. A lot of work has been done
to secure wireless medical sensor networks.
The existing solutions can protect the
patient data during transmission, but cannot
stop the inside attack where the administrator
of the patient database reveals the sensitive
patient data. In this paper, we propose a
practical approach to prevent the inside attack
by using multiple data servers to store
patient data. The main contribution of this
paper is securely distributing the patient data in
multiple data servers and employing the
Paillier and ElGamal cryptosystems to perform
statistic analysis on the patient data without
compromising the patients’ privacy.
TTA-DC-
C1522
Quality-assured
Secured Load Sharing
in Mobile Cloud
Networking
Environment
In mobile cloud networks (MCNs),
a mobile user is connected with a cloud server
through a network gateway, which is
responsible for providing the required quality-
of-service (QoS) to the users. If a user
increases its service demand, the connecting
gateway may fail to provide the requested QoS
due to the overloaded demand, while the other
gateways remain under loaded. Due to the
increase in load in one gateway,
the sharing of load among all the gateways is
one of the prospective solutions for providing
QoS-guaranteed services to the mobile users.
Additionally, if a user misbehaves, the
situation becomes more challenging. In this
paper, we address the problem of QoS-
guaranteed secured service provisioning in
MCNs. We design a utility maximization
problem for quality-
assured secured loadsharing (QuaLShare) in
MCN, and determine its optimal solution using
auction theory. In QuaLShare, the overloaded
gateway detects the misbehaving gateways,
and, then, prevents them from participating in
the auction process. Theoretically, we
characterize both the problem and the solution
approaches in an MCN environment. Finally,
IEEE 2015

we investigate the existence of Nash
Equilibrium of the proposed scheme. We
extend the solution for the case of multiple
users, followed by theoretical analysis.
Numerical analysis establishes the correctness
of the proposed algorithms.
TTA-DC-
C1523
Secure Audit Service
by Using TPA for Data
Integrity in Cloud
System
Cloud service is not only for store
the data in cloud but also for
shared data over cloud for users. The
integrity of data on the cloud be easily lost or
damaged. To ensure cloud storage correctness
based on distributed
storage integrity auditing mechanism it helps
to secure and efficient operations on cloud data
done by third party auditor (TPA). The third
party auditor utilizing the ring signature and
keyed hash based message authenticating code
for checking integrity. The data privacy and
identity privacy on
shared data is secured using private key
encryption during auditing process by public
verifier. In existing process the data freshness
is not proven. So, we propose HMAC
mechanism to protect the metadata
secrecy, integrity, authentication on
shared data in the cloud storage. This also
supports random checking process by the
public verifiers instead of checking the
entire data on the cloud.
Ouraudit system define
the data freshness by secrecy, integrity, and
authentication of metadata and also supports
low computation and communication, less
extra storage for audit the metadata.
IEEE 2015
TTA-DC-
C1524
Secure Data
Transmission using
Stegnography and
encryption technique
Transmission of important data like text,
images, video, etc over the internet is increases
now days hence it's necessary
to use of secure methods for multimedia data.
Image encryption is most secure than other
multimedia components encryption because
some inherent properties such as higher data
capacity and high similarity between pixels.
The older encryption techniques such as AES,
DES, RTS are not suitable for
IEEE 2015

highly secure data transmission on wireless
media. Thus we combine the chaotic theory
and cryptography to form an
valuable technique for information security. In
the first stage, a user encrypts the original
input image using chaotic map theory. After
that data-hider compresses the LSB bits of the
encrypted image using a data-hiding key to
make space for accommodate some more data.
In now day's image encryption is chaos based
for some unique characteristics such as
correlation between neighboring pixels,
sensitivity to initial conditions, non-
periodicity, and control parameters. There are
number of image encryption algorithms based
on chaotic maps have been implemented some
of them are time consuming, complex and
some have very little key space. In this paper
we implement three non linier differential
chaos based encryption technique where for
the first time 3 differential chaoses is used for
position permutation and value
transformation technique. In the data hiding
phase, data which is in the binary forms
embedded into encrypted image by using least
significant bit algorithm. We tabulate
correlation coefficient value both horizontal
and vertical position for cipher and original
image and compare performance of our
Method with some existing methods. We also
discuss about different types of attack, key
sensitivity, and key space of our proposed
approach. The given approach is very simple,
fast, accurate and it have been applied together
as a double algorithm in order to serve best
results in highly unsecure and complex
environment. Each of these algorith- s are been
discussed one by one below.
TTA-DC-
C1525
Smart phone instant
messanger by using
google cloud
messaging
Two of the most important drivers of current
telecommunication markets are the
development of Rich Communication Services
(RCS) and cloud computing. The challenges of
delivering these new services on a cloud-based
architecture are not only on the technical side,
they also concern the definition of feasible
IEEE 2015

business models for all the involved agents and
the definition and negotiation of proper service
level agreements at different levels. This work
proposes to provide telecommunication
operators with cloud-based infrastructures
capable of offering customers innovative and
reliable rich communication services based on
their phone numbers that cannot be
replicated by the Internet competitors in terms
of flexibility, scalability or security. This
Obliquity as a Service model (MaaS) allows
telecommunication providers to maintain
relevance for their clients offering not only the
common communication services
(instant messaging, group communication and
chat, file sharing or enriched calls services) but
also a new kind of mobiquiotus services
related to mobile marketing, smart places,
Internet of Things or health care, exploiting all
the competitive advantages associated to the
development of a vertical cloud in a dynamic
and heterogeneous ecosystem. In addition, the
infrastructure layer needed to support the new
proposed model is defined and a first prototype
is deployed and evaluated with two
real use cases.
TTA-DC-
C1526
Social
Recommendation with
Cross-Domain
Transferable
Knowledge
Recommender systems can suffer from data
sparsity and cold start issues.
However, social networks, which enable users
to build relationships and create different types
of items, present an unprecedented opportunity
to alleviate these issues. In this paper, we
represent a social network as a star-structured
hybrid graph centered on a social domain,
which connects with other item domains. With
this innovative representation,
useful knowledge from an
auxiliary domain can be transferred through
the social domain to a target domain. Various
factors of item transferability, including
popularity and behavioral consistency, are
IEEE 2015

determined. We propose a novel Hybrid
Random Walk (HRW) method, which
incorporates such factors, to
select transferable items in auxiliary domains,
bridge cross-domain knowledge with
the social domain, and accurately predict user-
item links in a target domain. Extensive
experiments on a real social dataset
demonstrate that HRW significantly
outperforms existing approaches.
TTA-DC-
C1527
Three-server swapping
for access
confidentiality
We propose an approach to
protect confidentiality of data and accesses to
them when data are stored and managed by
external providers, and hence not under direct
control of their owner. Our approach is based
on the use of distributed data allocation
among three independent servers and on a
dynamic re-allocation of data at every access.
Dynamic re-allocation is enforced
by swapping data involved in an access across
the servers in such a way that accessing a
given node implies re-allocating it to a
different server, then destroying the ability of
servers to build knowledge by
observing accesses. The use of three servers
provides uncertainty, to the eyes of the servers,
of the result of the swapping operation, even in
presence of collusion among them.
IEEE 2015
TTA-DC-
C1528
Trust and Compactness
in Social Network
Groups
Understanding the dynamics
behind group formation and evolution
in social networks is considered an
instrumental milestone to better describe how
individuals gather and form communities, how
they enjoy and share the platform contents,
how they are driven by their preferences/tastes,
and how their behaviors are influenced by
peers. In this context, the notion
of compactness of a social group is particularly
relevant. While the literature usually refers
to compactness as a measure to merely
IEEE 2015

determine how much members of a group are
similar among each other, we argue that the
mutual trustworthiness between the members
should be considered as an important factor in
defining such a term. In fact, trust has
profound effects on the dynamics
of group formation and their evolution:
individuals are more likely to join with and
stay in a group if they
can trust other group members. In this paper,
we propose a quantitative measure
of group compactness that takes into account
both the similarity and the trustworthiness
among users, and we present an algorithm to
optimize such a measure. We provide
empirical results, obtained from the
real social networks EPINIONS and CIAO,
that compare our notion of compactness versus
the traditional notion of user similarity, clearly
proving the advantages of our approach.
TTA-JC-
C1529
Public Integrity
Auditing for Shared
Dynamic Cloud Data
with Group User
Revocation
The advent of the cloud computing makes
storage outsourcing become a rising trend,
which promotes the secure
remote data auditing a hot topic that appeared
in the research literature. Recently some
research consider the problem of secure and
efficient public data integrity auditing for share
d dynamic data. However, these schemes are
still not secure against the collusion
of cloud storage server and
revoked group users during user revocation in
practical cloud storage system. In this paper,
we figure out the collusion attack in the exiting
scheme and provide an
efficient public integrity auditing scheme with
secure group user revocation based on vector
commitment and verifier-
local revocation group signature. We design a
concrete scheme based on the our scheme
definition. Our scheme supports the
public checking and
efficient user revocation and also some nice
properties, such as confidently, efficiency,
countability and traceability of
secure group user revocation. Finally, the
IEEE 2015

security and experimental analysis show that,
compared with its relevant schemes our
scheme is also secure and efficient.
TTA-JC-
C1530
Audit-Free Cloud
Storage via Deniable
Attribute-based
Encryption
Cloud storage services have become
increasingly popular. Because of the
importance of privacy,
many cloud storage encryption schemes have
been proposed to protect data from those who
do not have access. All such schemes assumed
that cloud storage providers are safe and
cannot be hacked; however, in practice, some
authorities (i.e., coercers) may
force cloud storage providers to reveal user
secrets or confidential data on the cloud, thus
altogether
circumventing storage encryption schemes. In
this paper, we present our design for a
new cloud storage encryption scheme that
enables cloudstorage providers to create
convincing fake user secrets to protect user
privacy. Since coercers cannot tell if obtained
secrets are true or not,
the cloud storage providers ensure that user
privacy is still securely protected.
IEEE 2015
TTA-JC-
C1531
CHARM - A Cost-
effcient Multi-cloud
Data Hosting Scheme
with High Availability
Nowadays, more and more enterprises and
organizations are hosting their data into
the cloud, in order to reduce the IT
maintenance cost and enhance
the data reliability. However, facing the
numerous cloud vendors as well as their
heterogenous pricing policies, customers may
well be perplexed with which cloud(s) are
suitable for storing their data and
what hosting strategy is cheaper. The general
status quo is that customers usually put
their data into a single cloud (which is subject
to the vendor lock-in risk) and then simply
trust to luck. Based on comprehensive analysis
of various state-of-the-artcloud vendors, this
paper proposes a
novel data hosting scheme (named CHARM)
which integrates two key functions desired.
The first is selecting several suitable clouds
and an appropriate redundancy strategy to
IEEE 2015

store data with minimized monetary cost and
guaranteed availability. The second is
triggering a transition process to re-
distribute data according to the variations
of data access pattern and pricing of clouds.
We evaluate the performance
of CHARM using both trace-driven
simulations and prototype experiments. The
results show that compared with the major
existing schemes, CHARM not only saves
around 20 percent of monetary cost but also
exhibits sound adaptability to data and price
adjustments.
TTA-JC-
C1532
Secure Auditing and
Deduplicating Data in
Cloud
As the cloud computing technology develops
during the last decade,
outsourcing data to cloud service for storage
becomes an attractive trend, which benefits in
sparing efforts on heavy data maintenance and
management. Nevertheless, since the
outsourced cloud storage is not fully
trustworthy, it raises security concerns on how
to realize data deduplication in cloud while
achieving integrity auditing. In this work, we
study the problem of
integrity auditing and secure deduplication
on cloud data. Specifically, aiming at
achieving both data integrity and deduplication
in cloud, we propose two secure systems,
namely SecCloud and SecCloud+. SecCloud
introduces an auditing entity with a
maintenance of a MapReduce cloud, which
helps clients generate data tags before
uploading as well as audit the integrity
of data having been stored in cloud. Compared
with previous work, the computation by user in
SecCloud is greatly reduced during the file
uploading and auditing phases. SecCloud+ is
designed motivated by the fact that customers
always want to encrypt their data before
uploading, and enables
integrity auditing and secure deduplication on
encrypted data.
IEEE 2015
TTA-JC-
C1533
A Profit Maximization
Scheme with
Guaranteed Quality of
As an effective and efficient way to
provide computing resources and services to
IEEE 2015

Service in Cloud
Computing
customers on demand, cloud computing has
become more and more popular.
From cloud service providers'
perspective, profit is one of the most important
considerations, and it is mainly determined by
the configuration of a cloud service platform
under given market demand. However, a single
long-term renting scheme is usually adopted to
configure a cloud platform, which
cannot guarantee the service quality but leads
to serious resource waste. In this paper, a
double resource renting scheme is designed
firstly in which short-term renting and long-
term renting are combined aiming at the
existing issues. This double
renting scheme can
effectively guarantee the quality of service of
all requests and reduce the resource waste
greatly. Secondly, a service system is
considered as an M/M/m+D queuing model
and the performance indicators that affect
the profit of our double renting scheme are
analyzed, e.g., the average charge, the ratio of
requests that need temporary servers, and so
forth. Thirdly, a profit maximization problem
is formulated for the double
renting scheme and the optimized
configuration of a cloud platform is obtained
by solving the profit maximization problem.
Finally, a series of calculations are conducted
to compare the profit of our
proposed scheme with that of the single renting
scheme. The results show that our scheme can
not only guarantee the service quality of all
requests, but also obtain more profit than the
latter.
TTA-JC-
C1534
Online Resource
Scheduling under
Concave Pricing for
Cloud Computing
With the booming cloud computing industry,
computational resources are readily and
elastically available to the customers. In order
to attract customers with various demands,
most Infrastructure-as-a-service
(IaaS) cloud service providers offer
several pricing strategies such as pay as you
go, pay less per unit when you use more (so
called volume discount), and pay even less
IEEE 2015

when you reserve. The diverse pricing schemes
among different IaaS service providers or even
in the same provider form a complex economic
landscape that nurtures the market
of cloud brokers. By strategically scheduling
multiple customers’ resource requests,
a cloud broker can fully take advantage of the
discounts offered by cloud service providers.
In this paper, we focus on how a broker can
help a group of customers to fully utilize the
volume discount pricing strategy offered
by cloud service providers through cost-
efficient online resource scheduling. We
present a randomized online stack-
centric scheduling algorithm (ROSA) and
theoretically prove the lower bound of its
competitive ratio. Three special cases of the
offline concave cost scheduling problem and
the corresponding optimal algorithms are
introduced. Our simulation shows that ROSA
achieves a competitive ratio close to the
theoretical lower bound under the special
cases. Trace-driven simulation using Google
cluster data demonstrates that ROSA is
superior to the
conventional online scheduling algorithms in
terms of cost saving.
TTA-JC-
C1535
The Value of
Cooperation -
Minimizing User Costs
in Multi-broker Mobile
Cloud Computing
Networks
We study the problem
of user cost minimization
in mobile cloud computing (MCC) networks.
We consider a MCC model where multiple
brokers assign cloud resources to mobile users.
The model is characterized by an
heterogeneous cloud architecture (which
includes a public cloud and a cloudlet) and by
the heterogeneous pricing strategies
of cloud service providers. In this setting, we
investigate two classes of cloud reservation
strategies, i.e., a competitive strategy, and a
compete-then-cooperate strategy as a
performance bound. We first study a purely
competitive scenario where brokers compete to
reserve computing resources from remote
public clouds (which are affected by long
delays) and from local cloudlets (which have
IEEE 2015

limited computational resources but short
delays). We provide theoretical results
demonstrating the existence of disagreement
points (i.e., the equilibrium reservation
strategy that no broker has incentive to deviate
unilaterally from) and convergence of the best
response strategies of the brokers to
disagreement points. We then consider the
scenario in which brokers agree to cooperate in
exchange for a lower average cost of
resources. We formulate a cooperative
problem where the objective is to minimize the
total average price of all brokers, under the
constraint that no broker should pay a price
higher than the disagreement price (i.e., the
competitive price).We design new globally
optimal solution algorithm to solve the
resulting non-convex cooperative problem,
based on a combination of the branch and
bound framework and of advanced convex
relaxation techniques. The resulting optimal
solution provides a lower bound on the
achievable user cost without complete
collusion among brokers. Compared with pure
competition, we found that i) noticeable
cooperative gains can be achieved over pure
competition in markets with a few brokers
only, and ii) the cooperative gain is only
marginal in crowded markets, i.e., with a high
number of brokers, hence there is n- clear
incentive for brokers to cooperate.
TTA-JC-
C1536
System of Systems for
Quality-of-Service
Observation and
Response in Cloud
Computing
Environments
As military, academic, and
commercial computing systems evolve from
autonomous entities that deliver
computing products into network centric
enterprise systems that deliver computing as
a service, opportunities emerge to
consolidate computing resources, software,
and information through cloud computing.
Along with these opportunities come
challenges, particularly to service providers
and operations centers that struggle to monitor
and manage quality of service (QoS) for these
services in order to meet
customer service commitments. Traditional
IEEE 2015

approaches fall short in addressing these
challenges because they examine QoS from a
limited perspective rather than from a system-
of-systems (SoS) perspective applicable to a
net-centric enterprise system in which any user
from any location can
share computing resources at any time. This
paper presents a SoS approach to enable QoS
monitoring, management, and response for
enterprise systems that deliver computing as
a service through
a cloud computing environment. A concrete
example is provided for application of this new
SoS approach to a real-world scenario (viz.,
distributed denial of service). Simulated results
confirm the efficacy of the approach.
TTA-JC-
C1537
A Computational
Dynamic Trust Model
for User Authorization
Development of authorization mechanisms for
secure information access by a large
community of users in an open environment is
an important problem in the ever-growing
Internet world. In this paper we propose
a computational dynamic trust model for user a
uthorization, rooted in findings from social
science. Unlike most
existing computational trust models,
this model distinguishes trusting belief in
integrity from that in competence in different
contexts and accounts for subjectivity in the
evaluation of a particular trustee by different
trusters. Simulation studies were conducted to
compare the performance of the proposed
integrity belief model with
other trust models from the literature for
different user behavior patterns. Experiments
show that the proposed model achieves higher
performance than other models especially in
predicting the behavior of unstable users.
IEEE 2015
TTA-JC-
C1538
Generic and Efficient
Constructions of
Attribute-Based
Encryption with
Decryption
Attribute-based encryption (ABE) provides a
mechanism for complex access control over
encrypted data. However in most ABE
systems, the ciphertext size and
the decryption overhead, which grow with the
complexity of the access policy, are becoming
critical barriers in applications running on
IEEE 2015

resource-limited
devices. Outsourcing decryption of ABE
ciphertexts to a powerful third party is a
considerable manner to solve this problem.
Since the third party is usually believed to be
untrusted, the security requirements of ABE
with outsourced decryption should include
privacy and verifiability. Namely, any
adversary including the third party should
learn nothing about the encrypted message,
and the correctness of
the outsourced decryption is supposed to be
verified efficiently. We propose generic
constructions of CPA-secure and RCCA-
secure ABE systems
with verifiable outsourced decryption from
CPA-secure ABE with outsourced decryption,
respectively. We also instantiate our CPA-
secure construction in the standard model and
then show an implementation of this
instantiation. The experimental results show
that, compared with the existing scheme, our
CPA-secure construction has more compact
ciphertext and less computational costs.
Moreover, the techniques involved in the
RCCA-secure construction can be applied in
generally constructing CCA-secure ABE,
which we believe to be of independent interest.
TTA-JC-
C1539
Leveraging Data
Deduplication to
Improve the
Performance of Primary
Storage Systems in the
Cloud
With the explosive growth in data volume, the
I/O bottleneck has become an increasingly
daunting challenge for big data analytics in
the Cloud. Recent studies have shown that
moderate to high data redundancy clearly
exists in primary storage systems in the Cloud.
Our experimental studies reveal
that data redundancy exhibits a much higher
level of intensity on the I/O path than that on
disks due to relatively high temporal access
locality associated with small I/O
requests to redundant data. Moreover, directly
applying data deduplication to primary storage
systems in the Cloud will likely cause space
contention in memory and data fragmentation
on disks. Based on these observations, we
propose a performance-oriented
IEEE 2015

I/O deduplication, called POD, rather than a
capacity oriented I/O deduplication,
exemplified by iDedup, to improve the
I/O performance of primary storage systems in
the Cloud without sacrificing capacity savings
of the latter. POD takes a two-pronged
approach to improving theperformance of prim
ary storage systems and
minimizing performance overhead
of deduplication, namely, a request-based
selective deduplication technique, called
Select- Dedupe, to alleviate the data
fragmentation and an adaptive memory
management scheme, called iCache, to ease
the memory contention between the bursty
read traffic and the bursty write traffic. We
have implemented a prototype of POD as a
module in the Linux operating system. The
experiments conducted on our lightweight
prototype implementation of POD show that
POD significantly outperforms iDedup in the
I/Operformance measure by up to 87.9% with
an average of 58.8%. Moreover, our evaluation
results also show that POD achieves
comparable or better capacity savings than
iDedup.
TTA-JC-
C1540
Enabling Fine-grained
Multi-keyword Search
Supporting Classified
Sub-dictionaries over
Encrypted Cloud Data
Using cloud computing, individuals can store
their data on remote servers and
allow data access to public users through
the cloud servers. As the outsourced data are
likely to contain sensitive privacy information,
they are typically encrypted before uploaded to
the cloud. This, however, significantly limits
the usability of outsourced data due to the
difficulty of searching over the encrypted data.
In this paper, we address this issue by
developing the fine-grained multi-
keyword search schemes over encrypted
cloud data. Our original contributions are
three-fold. First, we introduce the relevance
scores and preference factors upon keywords
which enable the precise keyword search and
personalized user experience. Second, we
develop a practical and very efficient multi-
keyword search scheme. The proposed scheme
IEEE 2015

can support complicated logic search the
mixed “AND”, “OR” and “NO” operations of
keywords. Third, we further employ
the classified sub-dictionaries technique to
achieve better efficiency on index building,
trapdoor generating and query. Lastly, we
analyze the security of the proposed schemes
in terms of confidentiality of documents,
privacy protection of index and trapdoor, and
unlinkability of trapdoor. Through extensive
experiments using the real-world dataset, we
validate the performance of the proposed
schemes. Both the security analysis and
proposed schemes can achieve the same
security level comparing to the existing ones
and better performance in terms of
functionality, query complexity and efficiency.
TTA-JC-
C1541
On the Security of Data
Access Control for
Multiauthority Cloud
Storage Systems
Data access control has becoming a
challenging issue in cloud storage systems.
Some techniques have been proposed to
achieve the secure data access control in a
semitrusted cloud storage system. Recently,
K.Yang et al.proposed a
basic data access control scheme
for multiauthority cloud storage system (DAC-
MACS) and an
extensive data access control scheme (EDAC-
MACS). They claimed that the DAC-MACS
could achieve efficient decryption and
immediate revocation and the EDAC-MACS
could also achieve these goals even though non
revoked users reveal their Key Update Keys to
the revoked user. However, through our
cryptanalysis, the revocation security of both
schemes cannot be guaranteed. In this paper,
we first give two attacks on the two schemes.
By the first attack, the revoked user can
eavesdrop to obtain other users’ Key Update
Keys to update its Secret Key, and then it can
obtain proper Token to decrypt any secret
information as a non revoked user. In addition,
by the second attack, the revoked user can
intercept Ciphertext Update Key to retrieve its
ability to decrypt any secret information as a
IEEE 2015

non revoked user. Secondly, we propose a new
extensive DAC-MACS scheme (NEDAC-
MACS) to withstand the above two attacks so
as to guarantee more secure attribute
revocation. Then, formal cryptanalysis of
NEDAC-MACS is presented to prove
the security goals of the scheme. Finally, the
performance comparison among NEDAC-
MACS and related schemes is givento
demonstrate that the performance of NEDAC-
MACS is superior to that of DACC, and
relatively same as that of DAC-MACS.
TTA-JC-
C1542
Verifiable Auditing for
Outsourced Database
in Cloud Computing
The notion of database outsourcing enables the
data owner to delegate
the database management to a cloud service
provider (CSP) that provides
various database services to different users.
Recently, plenty of research work has been
done on the primitive of outsourced database.
However, it seems that no existing solutions
can perfectly support the properties of both
correctness and completeness for the query
results, especially in the case when the
dishonest CSP intentionally returns an empty
set for the query request of the user. In this
paper, we propose a
new verifiable auditing scheme for outsourced
database, which can simultaneously achieve
the correctness and completeness of search
results even if the dishonest CSP purposely
returns an empty set. Furthermore, we can
prove that our construction can achieve the
desired security properties even in the
encrypted outsourced database. Besides, the
proposed scheme can be extended to support
the dynamic database setting by incorporating
the notion of verifiable database with updates.
IEEE 2015
TTA-JC-
C1543
A Cost-Effective
Deadline-Constrained
Dynamic Scheduling
Algorithm for Scientific
Workflows in a Cloud
Environment
Cloud Computing, a distributed computing
paradigm, enables delivery of IT resources
over the Internet and follows the pay-as-you-
go billing model. Workflow scheduling is one
of the most challenging problems
in Cloud computing.
Although, workflow scheduling on distributed
IEEE 2015

systems like Grids and Clusters have been
extensively studied, however, these solutions
are not viable for a Cloud environment. It is
because, a Cloud environment differs from
other distributed environment in two major
ways: on-demand resource provisioning and
pay-as-you-go pricing model. Thus, to achieve
the true benefits of workflow orchestration
onto Cloud resources novel approaches that
can capitalize the advantages and address the
challenges specific to
a Cloud environment needs to be developed.
This work proposes a dynamic cost-
effective deadline-
constrained heuristic algorithm for scheduling
ascientific workflow in a public Cloud. The
proposed technique aims to exploit the
advantages offered by Cloud computing while
taking into account the virtual machine
performance variability and instance
acquisition delay to identify a just-in-
time schedule of
a deadline constrained scientific workflow at
lesser costs. Performance evaluation on some
well-known scientific workflows exhibit that
the proposed algorithm delivers better
performance in comparison to the current
state-of-the-art heuristics
TTA-JC-
C1544
A Hybrid Cloud
Approach for Secure
Authorized
Deduplication
Data deduplication is one of important data
compression techniques for eliminating
duplicate copies of repeating data, and has
the amount of storage space and save
bandwidth. To protect the confidentiality of
sensitive data while supporting deduplication,
the convergent encryption technique has been
proposed to encrypt the data before
outsourcing. To better protect data security,
this paper makes the first attempt to formally
address the problem of authorized
data deduplication. Different from
traditional deduplication systems, the
differential privileges of users are further
considered in duplicate check besides the data
itself. We also present several new
IEEE 2015

deduplication constructions
supporting authorized duplicate check in
a hybrid cloud architecture. Security analysis
demonstrates that our scheme is secure in
terms of the definitions specified in the
proposed security model. As a proof of
concept, we implement a prototype of our
and conduct test bed experiments using our
prototype. We show that our
incurs minimal overhead compared to normal
operations.
TTA-JC-
C1545
A Secure and Dynamic
Multi-keyword Ranked
Search Scheme over
Encrypted CloudData
Due to the increasing popularity of cloud
computing, more and more data owners are
motivated to outsource their data to cloud
servers for great convenience and reduced cost
in data management. However, sensitive data
should be encrypted before outsourcing for
privacy requirements, which obsoletes data
utilization like keyword-based document
retrieval. In this paper, we present a secure
multi-
keyword ranked search scheme over encrypted
cloud data, which simultaneously supports
dynamic update operations like deletion and
insertion of documents. Specifically, the vector
space model and the widely-used TFIDF
model are combined in the index construction
and query generation. We construct a special
tree-based index structure and propose a
“Greedy Depth-first Search” algorithm to
provide efficient multi-keyword ranked search.
The secure kNN algorithm is utilized
to encrypt the index and query vectors, and
meanwhile ensure accurate relevance score
calculation between encrypted index and query
vectors. In order to resist statistical attacks,
phantom terms are added to the index vector
for blinding search results. Due to the use of
our special tree-based index structure, the
proposed scheme can achieve sub-
linear search time and deal with the deletion
and insertion of documents flexibly. Extensive
experiments are conducted to demonstrate the
IEEE 2015

efficiency of the proposed scheme.
TTA-JC-
C1546
A Universal Fairness
Evaluation Framework
for Resource Allocation
in Cloud Computing
In cloud computing, fairness is one of the most
significant indicators to
evaluate resource allocation algorithms, which
reveals whether each user is allocated as much
as that of all other users having the same
bottleneck. However, how fair
an allocation algorithm is remains an urgent
issue. In this paper, we propose
Dynamic Evaluation Framework for Fairness (
DEFF), a framework to evaluate the fairness of
an resource allocation algorithm. In
our framework, two sub-models, Dynamic
Demand Model (DDM) and Dynamic Node
Model (DNM), are proposed to describe the
dynamic characteristics of resource demand
and the computing node number
under cloud computing environment.
Combining Fairness on Dominant Shares and
the two sub-models above, we finally obtain
DEFF. In our experiment, we adopt several
typical resource allocation algorithms to prove
the effectiveness on fairness evaluation by
using the DEFF framework.
IEEE 2015
TTA-JC-
C1547
Aggressive Resource
Provisioning for
Ensuring QOS in
virtualized Environment
Elasticity has now become the elemental
feature of cloud computing as it enables the
ability to dynamically add or remove virtual
machine instances when workload changes.
However, effective
virtualized resource management is still one of
the most challenging tasks. When the workload
of a service increases rapidly, existing
approaches cannot respond to the growing
performance requirement efficiently because
of either inaccuracy of adaptation decisions or
the slow process of adjustments, both of which
may result in
insufficient resource provisioning. As a
consequence, the Quality of Service (QoS) of
the hosted applications may degrade and the
Service Level Objective (SLO) will be thus
violated. In this paper, we introduce SPRNT, a
novel resource management framework,
to ensure high-level QoS in the cloud
IEEE 2015

computing system. SPRNT utilizes
an aggressive resource provisioning strategy
which encourages SPRNT to substantially
increase the resource allocation in each
adaptation cycle when workload increases.
This strategy first provisions resources which
are possibly more than actual demands, and
then reduces the over-provisioned resources if
needed. By applying the aggressive strategy,
SPRNT can satisfy the increasing performance
requirement in the first place so that
the QoScan be kept at a high level. The
experimental results show that SPRNT
achieves up to 7.7× speedup in adaptation
time, compared with existing efforts. By
enabling quick adaptation, SPRNT limits the
SLO violation rate up to 1.3 percent even when
dealing with rapidly increasing workload.
TTA-JC-
C1548
An Intelligent Economic
Approach for Dynamic
Resource Allocation in
Cloud Services
With Inter-Cloud, distributed cloud and
open cloud exchange (OCX) emerging, a
comprehensive resource allocation approach is
fundamental to highly
competitive cloud market. Oriented to
infrastructure as a service (IaaS),
an intelligent economic approach for dynamic
resource allocation(IEDA) is proposed with the
improved combinatorial double auction
protocol devised to enable various kinds
of resources traded among multiple consumers
and multiple providers at the same time enable
task partitioning among multiple providers. To
make bidding and asking reasonable in each
round of the auction and determine eligible
transaction relationship among providers and
consumers, a price formation mechanism is
proposed, which is consisted of a back
propagation neural network (BPNN) based
price prediction algorithm and a price
matching algorithm. A reputation system is
proposed and integrated to exclude dishonest
participants from the cloud market. The winner
determination problem (WDP) is solved by the
improved paddy field algorithm (PFA).
Simulation results have shown that IEDA can
not only help maximize market surplus and
IEEE 2015

surplus strength but also encourage
participants to be honest.
TTA-JC-
C1549
ANGEL - Agent-Based
Scheduling for Real-
Time Tasks in
Virtualized Clouds
The success of cloud computing makes an
increasing number of real-time applications
such as signal processing and weather
forecasting run in the cloud.
Meanwhile, scheduling for real-time tasks is
playing an essential role for a cloud provider to
maintain its quality of service and enhance the
system’s performance. In this paper, we devise
a novel agent-based scheduling mechanism
in cloud computing environment to
allocate real-time tasks and dynamically
provision resources. In contrast to traditional
contract net protocols, we employ a
bidirectional announcement-bidding
mechanism and the collaborative process
consists of three phases, i.e., basic matching
phase, forward announcement-bidding phase
and backward announcement-bidding phase.
Moreover, the elasticity is sufficiently
considered while scheduling by dynamically
adding virtual machines to improve
schedulability. Furthermore, we design
calculation rules of the bidding values in both
forward and backward announcement-bidding
phases and two heuristics for selecting
contractors. On the basis of the bidirectional
announcement-bidding mechanism, we
propose an agent-based dynamic scheduling
algorithm named ANGEL for real-time,
independent and a periodic tasks in clouds.
Extensive experiments are conducted on
CloudSim platform by injecting random
synthetic workloads and the workloads from
the last version of the Google cloud trace logs
to evaluate the performance of our ANGEL.
The experimental results indicate
that ANGEL can efficiently solve the real-
time task scheduling problem
in virtualized clouds.
IEEE 2015
TTA-JC-
C1550
Attribute-based Access
Control with Constant-
size Ciphertext in Cloud
Computing
With the popularity of cloud computing, there
have been increasing concerns about its
security and privacy. Since
IEEE 2015

the cloud computing environment is distributed
and untrusted, data owners have to encrypt
outsourced data to enforce confidentiality.
Therefore, how to achieve practicable access
control of encrypted data in an untrusted
environment is an urgent issue that needs to be
solved. Attribute-Based Encryption (ABE) is a
promising scheme suitable
for access control in cloud storage systems.
This paper proposes a hierarchical attribute-
based access control scheme with constant-
sizeciphertext. The scheme is efficient because
the length of ciphertext and the number of
bilinear pairing evaluations to a constant are
fixed. Its computation cost in encryption and
decryption algorithms is low. Moreover, the
hierarchical authorization structure of our
scheme reduces the burden and risk of a single
authority scenario. We prove the scheme is of
CCA2 security under the decisional q-Bilinear
Diffie-Hellman Exponent assumption. In
addition, we implement our scheme and
analyse its performance. The analysis results
show the proposed scheme is efficient,
scalable, and fine-grained in dealing
with access control for outsourced data
in cloud computing.
TTA-JC-
C1551
Automatic Memory
Control of Multiple
Virtual Machines on a
Consolidated Server
Through
virtualization, multiple virtual machines can
coexist and operate on one physical machine.
When virtual machines (VMs) compete
for memory, the performances of applications
deteriorate, especially those of memory-
intensive applications. In this study, we aim to
optimize memory control techniques using a
balloon driver for server consolidation. Our
contribution is three-fold: (1) We design and
implement an automatic control system
for memory based on a Xen balloon driver. To
avoid interference with VM monitor operation,
our system works in user mode; therefore, the
system is easily applied in practice. (2) We
design an adaptive global-scheduling
algorithm to regulate memory. This algorithm
is based on a dynamic baseline, which can
IEEE 2015

adjust memory allocation according to the
memory used by the VMs. (3) We evaluate our
optimized solution in a real environment with
10 VMs and well-known benchmarks (DaCapo
and Phoronix Test Suites). Experiments
confirm that our system can improve the
performance of memory-intensive and disk-
intensive applications by up to 500% and
300%, respectively. This toolkit has been
released for free download as a GNU General
Public License v3 software.
TTA-JC-
C1552
Circuit Ciphertext-
policy Attribute-based
Hybrid Encryption with
Verifiable Delegation in
cloud computing
In the cloud, for achieving access control and
keeping data confidential, the data owners
could adopt attribute-based encryption to
encrypt the stored data. Users with
limited computing power are however more
likely to delegate the mask of the decryption
task to the cloud servers to reduce
the computing cost. As a result, attribute-
based encryption with delegation emerges.
Still, there are caveats and questions remaining
in the previous relevant works. For instance,
during the delegation, the cloud servers could
tamper or replace the delegated ciphertext and
respond a forged computing result with
malicious intent. They may also cheat the
eligible users by responding them that they are
ineligible for the purpose of cost saving.
Furthermore, during the encryption, the access
policies may not be flexible enough as well.
Since policy for general circuits enables to
achieve the strongest form of access control, a
construction for realizing circuit ciphertext-
policy attribute-
based hybrid encryption withverifiable delegati
on has been considered in our work. In such a
system, combined with verifiable computation
and encrypt-then-mac mechanism, the data
confidentiality, the fine-grained access control
and the correctness of the
delegated computing results are well
guaranteed at the same time. Besides, our
scheme achieves security against chosen-
plaintext attacks under the k-multilinear
Decisional Diffie-Hellman assumption.
IEEE 2015

Moreover, an extensive simulation campaign
confirms the feasibility and efficiency of the
proposed solution.
TTA-JC-
C1553
CloudArmor -
Supporting Reputation-
based Trust
Management for Cloud
Services
Trust management is one of the most
challenging issues for the adoption and growth
of cloud computing. The highly dynamic,
distributed, and non-transparent nature
of cloud services introduces several
challenging issues such as privacy, security,
and availability. Preserving consumers’
privacy is not an easy task due to the sensitive
information involved in the interactions
between consumers and
the trust management service.
Protecting cloud services against their
malicious users (e.g., such users might give
misleading feedback to disadvantage a
particular cloud service) is a difficult problem.
Guaranteeing the availability of
the trust management service is another
significant challenge because of the dynamic
nature of cloud environments. In this article,
we describe the design and implementation
of CloudArmor, a reputation-
based trust management framework that
provides a set of functionalities to
deliver Trust as a Service (TaaS), which
includes i) a novel protocol to prove the
credibility of trust feedbacks and preserve
users’ privacy, ii) an adaptive and robust
credibility model for measuring the credibility
of trust feedbacks to
protect cloud services from malicious users
and to compare the trustworthiness
of cloud services, and iii) an availability model
to manage the availability of the decentralized
implementation of
the trust management service. The feasibility
and benefits of our approach have been
validated by a prototype and experimental
studies using a collection of real-world
trust feedbacks on cloud services.
IEEE 2015
TTA-JC-
C1554
Cost-Effective
Authentic and
Anonymous Data
Data sharing has never been easier with the
advances of cloud computing, and an accurate
IEEE 2015

Sharing with Forward
Security
analysis on the shared data provides an array
of benefits to both the society and
individuals. Data sharing with a large number
of participants must take into account several
issues, including efficiency, data integrity and
privacy of data owner. Ring signature is a
promising candidate to construct
an anonymous and
authentic data sharing system. It allows
a data owner to anonymously authenticate
his data which can be put into the cloud for
storage or analysis purpose. Yet the costly
certificate verification in the traditional public
key infrastructure (PKI) setting becomes a
bottleneck for this solution to be scalable.
Identity-based (ID-based) ring signature,
which eliminates the process of certificate
verification, can be used instead. In this paper,
we further enhance the security of ID-based
ring signature by providing forward security: If
a secret key of any user has been
compromised, all previous generated
signatures that include this user still remain
valid. This property is especially important to
any large scale data sharing system, as it is
impossible to ask all data owners to
reauthenticate their data even if a secret key of
one single user has been compromised. We
provide a concrete and efficient instantiation of
our scheme, prove its security and provide an
implementation to show its practicality.
TTA-JC-
C1555
DaSCE - Data Security
for Cloud Environment
with Semi-Trusted
Third Party
Off-site data storage is an application
of cloud that relieves the customers from
focusing on data storage system. However,
outsourcing data to a third-party administrative
control entails serious
security concerns. Data leakage may occur due
to attacks by other users and machines in
the cloud. Wholesale of data by cloud service
provider is yet another problem that is faced in
the cloud environment. Consequently, high-
level of security measures is required. In this
paper, we
propose DataSecurity for Cloud Environment
with Semi-Trusted Third Party (DaSCE),
IEEE 2015

a data security system that provides (a) key
management (b) access control, and (c) file
assured deletion. The DaSCE utilizes Shamir’s
(k, n) threshold scheme to manage the keys,
where k out of n shares are required to
generate the key. We use multiple key
managers, each hosting one share of key.
Multiple key managers avoid single point of
failure for the cryptographic keys. We (a)
implement a working prototype of DaSCE and
consumed during various operations, (b)
formally model and analyze the working
of DaSCE using High Level Petri nets
(HLPN), and (c) verify the working of
DaSCE using Satisfiability Modulo Theories
Library (SMT-Lib) and Z3 solver. The results
reveal that DaSCE can be effectively used
for security of outsourced data by employing
key management, access control, and file
assured deletion.
TTA-JC-
C1556
Discover the Expert -
Context-Adaptive
Expert Selection for
Medical Diagnosis
an expert selection system that learns online
the best expert to assign to each patient
depending on the context of the patient. In
general, the context can include an enormous
number and variety of information related to
the patient's health condition, age, gender,
previous drug doses, and so forth, but the most
relevant information is embedded in only a few
contexts. If these most relevant contexts were
known in advance, learning would be
relatively simple but they are not. Moreover,
the relevant contexts may be different for
different health conditions. To address these
challenges, we develop a new class of
algorithms aimed at discovering the most
relevant contexts and the best clinic
and expert to use to make a diagnosis given a
patient's contexts. We prove that as the number
of patients grows, the proposed context-
adaptive algorithm will discover the
optimal expert to select for patients with a
specific context. Moreover, the algorithm also
provides confidence bounds on the diagnostic
IEEE 2015

accuracy of the expert it selects, which can be
considered by the primary care physician
before making the final decision. While our
algorithm is general and can be applied in
numerous medical scenarios, we illustrate its
functionality and performance by applying it to
a real-world breast cancer diagnosis data set.
Finally, while the application we consider in
this paper is medical diagnosis, our proposed
algorithm can be applied in other environments
where expertise needs to be discovered.
TTA-JC-
C1557
Distributed denial of
service attacks in
software-defined
networking with cloud
computing
Although software-defined networking (SDN)
brings numerous benefits by decoupling the
control plane from the data plane, there is a
contradictory relationship between SDN
and distributed denial-of-
service(DDoS) attacks. On one hand, the
capabilities of SDN make it easy to detect and
to react to DDoS attacks. On the other hand,
the separation of the control plane from the
data plane of SDN introduces new attacks.
Consequently, SDN itself may be a target of
DDoS attacks. In this paper, we first discuss
the new trends and characteristics of
DDoS attacks in cloud computing environment
s. We show that SDN brings us a new chance
to defeat
s, and we summarize good features of SDN in
defeating DDoS attacks. Then we review the
studies about launching DDoS attacks on SDN
and the methods against DDoS attacks in SDN.
In addition, we discuss a number of challenges
that need to be addressed to mitigate DDoS
attached in SDN with cloud computing. This
work can help understand how to make full use
of SDN's advantages to defeat
s and how to prevent SDN itself from
becoming a victim of DDoS attacks.
IEEE 2015
TTA-JC-
C1558
Mathematical
Programming Approach
for Revenue
Maximization in Cloud
Federations
This paper assesses the benefits
of cloud federation for cloud providers.
Outsourcing and in sourcing are explored as
means to maximize the revenues of the
IEEE 2015

providers involved in the federation. An exact
method using a linear integer program is
proposed to optimize the partitioning of the
incoming workload across
the federation members. A pricing model is
suggested to enable providers to set their offers
dynamically and achieve highest revenues. The
conditions leading to highest gains are
identified and the benefits
of cloud federation are quantified.
TTA-JC-
C1559
My Privacy My Decision
- Control of Photo
Sharing on Online
Social Networks
Photo sharing is an attractive feature which
popularizes Online Social Networks (OSNs).
Unfortunately, it may leak users’ privacy if
they are allowed to post, comment, and tag
a photo freely. In this paper, we attempt to
address this issue and study the scenario when
a user shares a photo containing individuals
other than himself/herself (termed co-photo for
short). To prevent possible privacy leakage of
a photo, we design a mechanism to enable each
individual in a photo be aware of the posting
activity and participate in the decision making
on the photo posting. For this purpose, we
need an efficient facial recognition (FR)
system that can recognize everyone in
the photo. However, more demanding privacy
setting may limit the number of
the photos publicly available to train the FR
system. To deal with this dilemma, our
mechanism attempts to utilize users’
private photos to design a personalized FR
system specifically trained to differentiate
possible photo co-owners without leaking
their privacy. We also develop a distributed
consensus based method to reduce the
computational complexity and protect the
private training set. We show that our system
is superior to other possible approaches in
terms of recognition ratio and efficiency. Our
mechanism is implemented as a proof of
concept Android application on Facebook’s
platform.
IEEE 2015
TTA-JC-
C1560
OPoR - Enabling Proof
of Retrievability in
Cloud Computing with
Cloud computing moves the application
software and databases to the centralized large
IEEE 2015

Resource-Constrained
Devices
data centers, where the management of the
data and services may not be fully trustworthy.
In this work, we study the problem of ensuring
the integrity of data storage
in cloud computing. To reduce the
computational cost at user side during the
integrity verification of their data, the notion of
public verifiability has been proposed.
However, the challenge is that the
computational burden is too huge for the users
with resource-
constrained devices to compute the public
authentication tags of file blocks. To tackle the
challenge, we propose OPoR, a
new cloud storage scheme involving
a cloud storage server and a cloud audit server,
where the latter is assumed to be semi-honest.
In particular, we consider the task of allowing
the cloud audit server, on behalf of
the cloud users, to pre-process the data before
uploading to the cloud storage server and later
verifying the data integrity. OPoR outsources
and offloads the heavy computation of the tag
generation to the cloud audit server and
eliminates the involvement of user in the
auditing and in the pre-processing phases.
Furthermore, we strengthen
the proof of retrievability (PoR) model to
support dynamic data operations, as well as
ensure security against reset attacks launched
by the cloud storage server in the upload
phase.
TTA-JC-
C1561
Performing Initiative
Data Prefetching in
Distributed File
Systems for Cloud
Computing
This paper presents
an initiative data prefetching scheme on the
storage servers in distributed file
systems for cloud computing. In
this prefetching technique, the client machines
are not substantially involved in the process
of data prefetching, but the storage servers can
directly prefetch the data after analyzing the
history of disk I/O access events, and then send
the prefetched data to the relevant client
machines proactively. To put this technique to
work, the information about client nodes is
piggybacked onto the real client I/O requests,
IEEE 2015

and then forwarded to the relevant storage
server. Next, two prediction algorithms have
been proposed to forecast future block access
operations for directing what data should be
fetched on storage servers in advance. Finally,
the prefetched data can be pushed to the
relevant client machine from the storage
server. Through a series of evaluation
experiments with a collection of application
benchmarks, we have demonstrated that our
presented initiative prefetching technique can
benefit distributed file systems for cloud envir
onments to achieve better I/O performance. In
particular, configuration-limited client
machines in the cloud are not responsible for
predicting I/O access operations, which can
definitely contribute to
preferable system performance on them.
TTA-JC-
C1562
Privacy-Preserving
Multikeyword Similarity
Search Over
Outsourced Cloud Data
The amount of data generated by individuals
and enterprises is rapidly increasing. With the
emerging cloud computing paradigm,
the data and corresponding complex
management tasks can be outsourced to
the cloud for the management flexibility and
cost savings. Unfortunately, as the data could
be sensitive, the direct data outsourcing would
have the problem of privacy leakage. The
encryption can be used, before
the data outsourcing, with the concern that the
operations can still be accomplished by
the cloud. We consider the multikey
word similarity search over outsourced cloud d
ata. In particular, with the consideration of the
text data only, multiple keywords are specified
by the user. The cloud returns the files
containing more than a threshold number of
input keywords or similar keywords, where
the similarity here is defined according to the
edit distance metric. We propose three
solutions, where blind signature provides the
user access privacy, and a novel use of Bloom
filter's bit pattern provides the speedup
of search task at the cloud side. Our final
design to achieve the search is secure against
insider threats and efficient in terms of
IEEE 2015

the search time at the cloud side. Performance
evaluation and analysis are used to
demonstrate the practicality of our proposed
solutions.
TTA-JC-
C1563
Provable Multicopy
Dynamic Data
Possession in Cloud
Computing Systems
Increasingly more and more organizations are
opting for outsourcing data to
remote cloud service providers (CSPs).
Customers can rent the CSPs storage
infrastructure to store and retrieve almost
unlimited amount of data by paying fees
metered in gigabyte/month. For an increased
level of scalability, availability, and durability,
some customers may want their data to be
replicated on multiple servers across
multiple data centers. The more copies the
CSP is asked to store, the more fees the
customers are charged. Therefore, customers
need to have a strong guarantee that the CSP is
storing all data copies that are agreed upon in
the service contract, and all these copies are
consistent with the most recent modifications
issued by the customers. In this paper, we
propose a map-based provable
multicopy dynamic data possession (MB-
PMDDP) scheme that has the following
features: 1) it provides an evidence to the
customers that the CSP is not cheating by
storing fewer copies; 2) it supports outsourcing
of dynamic data, i.e., it supports block-level
operations, such as block modification,
insertion, deletion, and append; and 3) it
allows authorized users to seamlessly access
the file copies stored by the CSP. We give a
comparative analysis of the proposed MB-
PMDDP scheme with a reference model
obtained by extending
existing provable possession of dynamic single
-copy schemes. The theoretical analysis is
validated through experimental results on a
commercial cloud platform. In addition, we
show the security against colluding servers,
and discuss how to identify corrupted copies
by slightly modifying the proposed scheme.
IEEE 2015

TTA-JC-
C1564
SAE - Toward Efficient
Cloud Data Analysis
Service for Large-Scale
Social Networks
Social network analysis is used to extract
features of human communities and proves to
be very instrumental in a variety of scientific
domains. The dataset of a social network is
often so large that a
cloud data analysis service, in which the
computation is performed on a parallel
platform in the could, becomes a good choice
for researchers not experienced in parallel
programming. In the cloud, a primary
challenge to efficient data analysis is the
computation and communication skew (i.e.,
load imbalance) among computers caused by
humanity’s group behavior (e.g., bandwagon
effect). Traditional load balancing techniques
either require significant effort to re-balance
loads on the nodes, or cannot well cope with
stragglers. In this paper, we propose a general
straggler-aware execution approach, SAE, to
support the analysis service in the cloud. It
offers a novel computational decomposition
method that factors straggling feature
extraction processes into more fine-grained
sub-processes, which are then distributed over
clusters of computers for parallel execution.
Experimental results show that SAE can speed
up the analysis by up to 1.77 times compared
with state-of-the-art solutions.
IEEE 2015
TTA-JC-
C1565
Secure Cloud Storage
Meets with Secure
Network Coding
This paper reveals an intrinsic relationship
between secure cloud storage and secure netwo
rk coding for the first
time. Secure cloud storage was proposed only
recently while secure network coding has been
studied for more than ten years. Although the
two areas are quite different in their nature and
are studied independently, we show how to
construct a secure cloud storage protocol given
any secure network coding protocol. This gives
rise to a systematic way to
construct secure cloud storage protocols. Our
construction is secure under a definition which
captures the real world usage of the
cloud storage. Furthermore, we propose two
specific secure cloud storage protocols based
on two recent secure network coding protocols.
IEEE 2015

In particular, we obtain the first publicly
verifiable secure cloud storage protocol in the
standard model. We also enhance the proposed
generic construction to support user anonymity
and third-party public auditing, which both
have received considerable attention recently.
Finally, we prototype the newly proposed
protocol and evaluate its performance.
Experimental results validate the effectiveness
of the protocol.
TTA-JC-
C1566
SeDaSC - Secure Data
Sharing in Clouds
Cloud storage is an application of clouds that
liberates organizations from establishing in-
house data storage systems.
However, cloud storage gives rise to security
concerns. In case of group-shared data,
the data face both cloud-specific and
conventional insider
threats. Secure data sharing among a group
that counters insider threats of legitimate yet
malicious users is an important research issue.
the Secure Data Sharing in Clouds (SeDaSC)
methodology that provides:
1)data confidentiality and integrity; 2) access
control; 3) data sharing (forwarding) without
using compute-intensive reencryption; 4)
insider threat security; and 5) forward and
backward access control. The
SeDaSC methodology encrypts a file with a
single encryption key. Two different
key shares for each of the users are generated,
with the user only getting one share. The
possession of a single share of a key allows
the SeDaSC methodology to counter the
insider threats. The other key share is stored by
a trusted third party, which is called the
cryptographic server.
The SeDaSC methodology is applicable to
conventional and mobile cloud computing
environments. We implement a working
prototype of the SeDaSC methodology and
consumed during various operations. We
formally verify the working of SeDaSC by
using high-level Petri nets, the Satisfiability
IEEE 2015

Modulo Theories Library, and a Z3 solver. The
results proved to be encouraging and show that
SeDaSC has the potential to be effectively
used for secure data sharing in the cloud.
TTA-JC-
C1567
Shared Authority Based
Privacy-Preserving
Authentication Protocol
in Cloud Computing
Cloud computing is an emerging data
interactive paradigm to realize users' data
remotely stored in an
online cloud server. Cloud services provide
great conveniences for the users to enjoy the
on-demand cloud applications without
considering the local infrastructure limitations.
During the data accessing, different users may
be in a collaborative relationship, and thus
data sharing becomes significant to achieve
productive benefits. The existing security
solutions mainly focus on the authentication to
realize that a user's privative data cannot be
illegally accessed, but neglect a
subtle privacy issue during a user challenging
the cloud server to request other users for
data sharing. The challenged access request
itself may reveal the user's privacy no matter
whether or not it can obtain the data access
permissions. In this paper, we propose
a shared authority based privacy-
preserving authenticationprotocol (SAPA) to
address above privacy issue for cloud storage.
In the SAPA, 1) shared access authority is
achieved by anonymous access request
matching mechanism with security and privacy
considerations (e.g., authentication, data
anonymity, user privacy, and forward
security); 2) attribute based access control is
adopted to realize that the user can only access
its own data fields; 3) proxy re-encryption is
applied to provide data sharing among the
multiple users. Meanwhile, universal
composability (UC) model is established to
prove that the SAPA theoretically has the
design correctness. It indicates that the
proposed protocol is attractive for multi-user
collaborative cloud applications.
IEEE 2015
TTA-JC-
C1568
Social
Recommendation with
Cross-Domain
Recommender systems can suffer from data
sparsity and cold start issues.
IEEE 2015

Transferable
Knowledge
However, social networks, which enable users
to build relationships and create different
types of items, present an unprecedented
opportunity to alleviate these issues. In this
paper, we represent a social network as a
star-structured hybrid graph centered on
a social domain, which connects with other
item domains. With this innovative
representation, useful knowledge from an
auxiliary domain can be transferred through
the social domain to a target domain. Various
factors of item transferability, including
popularity and behavioral consistency, are
determined. We propose a novel Hybrid
Random Walk (HRW) method, which
incorporates such factors, to
select transferable items in auxiliary domains,
bridge cross-domain knowledge with
the social domain, and accurately predict
user-item links in a target domain. Extensive
experiments on a real social dataset
demonstrate that HRW significantly
outperforms existing approaches.
TTA-JC-
C1569
TMACS - A Robust and
Verifiable Threshold
Multi-Authority Access
Control System in
Public Cloud Storage
Attribute-based Encryption (ABE) is regarded
as a promising cryptographic conducting tool
to guarantee data owners’ direct control over
their data in public cloud storage. The earlier
ABE schemes involve only one authority to
maintain the whole attribute set, which can
bring a single-point bottleneck on both security
and performance. Subsequently, some multi-
authority schemes are proposed, in which
multiple authorities separately maintain
disjoint attribute subsets. However, the single-
point bottleneck problem remains unsolved. In
this paper, from another perspective, we
conduct a threshold multi-authority CP-
ABE access control scheme
for public cloud storage, named TMACS, in
which multiple authorities jointly manage a
uniform attribute set. In TMACS, taking
advantage of (t; n) threshold secret sharing, the
master key can be shared among multiple
authorities, and a legal user can generate
his/her secret key by interacting with any t
IEEE 2015

authorities. Security and performance analysis
results show that TMACS is not
only verifiable secure when less than t
authorities are compromised, but also robust
when no less than t authorities are alive in
the system. Furthermore, by efficiently
combining the traditional multi-
authority scheme with TMACS, we construct a
hybrid one, which satisfies the scenario of
attributes coming from different authorities as
well as achieving security and system-level
robustness.
TTA-JC-
C1570
Towards Privacy
Preserving Publishing
of set-valued Data on
Hybrid Cloud
Storage as a service has become an
important paradigm in cloud computing for
its great flexibility and economic savings.
However, the development is hampered
by data privacy concerns: data owners no
longer physically possess the storage of
their data. In this work, we study the issue
of privacy-preserving set-
valued data publishing.
Existing data privacy-
preserving techniques (such as encryption,
suppression, generalization) are not
applicable in many real scenes, since they
would incur large overhead for data query
or high information loss. Motivated by this
observation, we present a suite of new
techniques that make privacy-aware set-
valued data publishing feasible
on hybrid cloud. Ondata publishing phase,
we propose a data partition technique,
named extended quasi-identifier-
partitioning (EQI-partitioning), which
disassociates record terms that participate
in identifying combinations. This way
the cloud server cannot associate with high
probability a record with rare term
combinations. We prove
the privacy guarantee of our mechanism.
On data querying phase, we adopt
interactive differential privacy strategy to
resist privacy breaches from statistical
IEEE 2015

queries. We finally evaluate its
performance using real-life data sets on
our cloud test-bed. Our extensive
experiments demonstrate the validity and
practicality of the proposed scheme.
TTA-JC-
C1571
Towards Privacy-
Preserving Storage and
Retrieval in Multiple
Clouds
Cloud computing is growing exponentially,
whereby there are now hundreds
of cloud service providers (CSPs) of various
sizes. While the cloud consumers may enjoy
cheaper data storage and computation offered
in this multi-cloud environment, they are also
in face of more complicated reliability issues
and privacy preservation problems of their
outsourced data. Though searchable encryption
allows users to encrypt their stored data
while preserving some search capabilities, few
efforts have sought to consider the reliability
of the searchable encrypted data outsourced to
the clouds. In this paper, we propose a privacy-
preserving Storage and Retrieval (STRE)
mechanism that not only ensures security and
privacy but also provides reliability guarantees
for the outsourced searchable encrypted data.
The STRE mechanism enables the cloud users
to distribute and search their encrypted data
across multiple independent clouds managed
by different CSPs, and is robust even when a
certain number of CSPs crash. Besides the
reliability, STRE also offers the benefit of
partially hidden search pattern. We evaluate
the STRE mechanism on Amazon EC2 using a
real world dataset and the results demonstrate
both effectiveness and efficiency of our
approach.
IEEE 2015
TTA-JC-
C1572
Trust Enhanced
Cryptographic Role-
based Access Control
for Secure Cloud Data
Storage
Cloud data storage has provided significant
benefits by allowing users to store massive
amount of data on demand in a cost-effective
manner. To protect the privacy of data stored
in the cloud, cryptographic role-
based access control (RBAC) schemes have
been developed to ensure that the data can only
be accessed by those who are allowed
by access policies. However,
these cryptographic approaches do not address
IEEE 2015

the issues of trust. In this paper, we
propose trust models to reason about and to
improve the security for
stored data in cloud storage systems that
use cryptographic RBAC schemes. The trust
models provide an approach for the owners
and roles to determine the trustworthiness of
individual roles and users, respectively, in the
RBAC system. The proposed trust models
consider role inheritance and hierarchy in the
evaluation of trustworthiness of roles. We
present a design of a trust-
based cloud storage system, which shows how
the trust models can be integrated into a
system that uses cryptographic RBAC
schemes. We have also considered practical
application scenarios and illustrated how
the trust evaluations can be used to reduce the
risks and to enhance the quality of decision
making by data owners and roles
of cloud storage service.
TTA-JC-
C1573
Using ant colony
system to consolidate
VMS for green cloud
computing
High energy consumption of cloud data centers
is a matter of great concern. Dynamic
consolidation of Virtual Machines (VMs)
presents a significant opportunity to save
energy in data centers. A VM consolidation
approach uses live migration of VMs so that
some of the under-loaded Physical Machines
(PMs) can be switched-off or put into a low-
power mode. On the other hand, achieving the
desired level of Quality of Service (QoS)
between cloud providers and their users is
critical. Therefore, the main challenge
is to reduce energy consumption of data
centers while satisfying QoS requirements. In
this paper, we present a
distributed system architecture to perform
dynamic VM consolidation to reduce energy
consumption of cloud data centers while
maintaining the desired QoS. Since the VM
consolidation problem is strictly NP-hard,
we use an online optimization metaheuristic
algorithm called Ant Colony System (ACS).
The proposed ACS-based VM Consolidation
(ACS-VMC) approach finds a near-optimal
IEEE 2015

solution based on a specified objective
function. Experimental results on real
workload traces show that ACS-VMC reduces
energy consumption while maintaining the
required performance levels in a cloud data
center. It outperforms existing VM
consolidation approaches in terms of energy
consumption, number of VM migrations, and
QoS requirements concerning performance.
TTA-JC-
C1574
Using Virtual Machine
Allocation Policies to
Defend against Co-
resident Attacks in
Cloud Computing
Cloud computing enables users to consume
various IT resources in an on-demand manner,
and with low management overhead. However,
customers can face new security risks when
they use cloud computing platforms. In this
paper, we focus on one such threat − the co-
resident attack, where malicious users build
side channels and extract private information
from virtual machines co-located on the same
server. Previous works mainly
attempt to address the problem by eliminating
side channels. However, most of these
methods are not suitable for immediate
deployment due to the required
modifications to current cloud platforms. We
choose to solve the problem from a different
perspective, by studying how to improve
the virtual machine allocation policy, so that it
is difficult for attackers to co-locate with their
targets. Specifically, we (1) define security
metrics for assessing the attack; (2) model
these metrics, and compare the difficulty of
achieving co-residence under three
commonly used policies; (3) design a
new policy that not only mitigates the threat
of attack, but also satisfies the requirements for
workload balance and low power consumption;
and (4) implement, test, and prove the
effectiveness of the policy on the popular
open-source platform OpenStack
IEEE 2015
DOMAIN : BIG DATA
TTA-JB-
C1501
FastRAQ A Fast
Approach to Range-
Aggregate Queries in
Big Data Environments
Range-aggregate queries are to apply a
certain aggregate function on all tuples within
given query ranges.
IEEE 2015

Existing approaches to range-
aggregate queries are insufficient to quickly
provide accurate results
in big data environments. In this paper, we
propose FastRAQ-a fast approach to range-
aggregate queries in big data environments. Fa
stRAQ first divides big data into independent
partitions with a balanced partitioning
algorithm, and then generates a local
estimation sketch for each partition. When
a range-aggregate query request arrives, Fast
RAQ obtains the result directly by
summarizing local estimates from all
partitions. Fast RAQ has O(1) time complexity
for data updates and O(N/P×B) time
complexity for range-aggregate queries, where
N is the number of distinct tuples for all
dimensions, P is the partition number, and B is
the bucket number in the histogram. We
implement the Fast RAQ approach on the
Linux platform, and evaluate its performance
with about 10 billion data records.
Experimental results demonstrate that Fast
RAQ provides range-aggregate query results
within a time period two orders of magnitude
lower than that of Hive, while the relative error
is less than 3 percent within the given
confidence interval.
TTA-JB-
C1502
Collaboration- and
Fairness-Aware Big
Data Management in
Distributed Clouds
With the advancement of information and
communication technology, data are being
generated at an exponential rate via various
instruments and collected at an unprecedented
scale. Such large volume of data generated is
referred to as big data, which now are
revolutionizing all aspects of our life ranging
from enterprises to individuals, from science
communities to governments, as they exhibit
great potentials to improve efficiency of
enterprises and the quality of life. To obtain
nontrivial patterns and derive valuable
information from big data, a fundamental
problem is how to properly place the collected
data by different users
to distributed clouds and to efficiently analyze
the collected data to save user costs
IEEE 2015

in data storage and processing, particularly the
cost savings of users who share data. By doing
so, it needs the close collaborations among the
users, by sharing and utilizing
the big data in distributed clouds due to the
complexity and volume of big data. Since
computing, storage and bandwidth resources in
a distributed cloud usually are limited, and
such resource provisioning typically is
expensive, the collaborative users require to
make use of the resources fairly. In this paper,
we study a novel collaboration- and fairness-
aware big data management problem
in distributed cloud environments that aims to
maximize the system throughout, while
minimizing the operational cost of service
providers to achieve the system throughput,
subject to resource capacity and users fairness
constraints. We first propose a novel
optimization framework for the problem. We
then devise a fast yet scalable approximation
algorithm based on the built optimization
framework. We also analyze the time
complexity and approximation ratio of the
proposed algorithm. We finally conduct
experiments by simulations to evaluate the
performance of the proposed algorithm.
Experimental results demonstrate that the
proposed algorithm is promising, and
outperforms other heuristics.
TTA-JB-
C1503
On Traffic-Aware
Partition and
Aggregation in
MapReduce for Big
Data Applications
The MapReduce programming model
simplifies large-scale data processing on
commodity cluster by exploiting parallel map
tasks and reduce tasks. Although many efforts
have been made to improve the performance
of MapReduce jobs, they ignore the
network traffic generated in the shuffle phase,
which plays a critical role in performance
enhancement. Traditionally, a hash function is
used to partition intermediate data among
reduce tasks, which, however, is not traffic-
efficient because network topology
and data size associated with each key are not
taken into consideration. In this paper, we
study to reduce network traffic cost for
IEEE 2015

a MapReduce job by designing a novel
intermediate data partition scheme.
Furthermore, we jointly consider the
aggregator placement problem, where each
aggregator can reduce merged traffic from
multiple map tasks. A decomposition-based
distributed algorithm is proposed to deal with
the large-scale optimization problem
for big data application and an online
algorithm is also designed to
adjust data partition and aggregation in a
dynamic manner. Finally, extensive simulation
results demonstrate that our proposals can
significantly reduce network traffic cost under
both offline and online cases.
TTA-JB-
C1504
Privacy-Preserving
Ciphertext Multi-
Sharing Control for Big
Data Storage
The need of secure big data storage service is
more desirable than ever to date. The basic
requirement of the service is to guarantee the
confidentiality of the data. However, the
anonymity of the service clients, one of the
most essential aspects of privacy, should be
considered simultaneously. Moreover, the
service also should provide practical and fine-
grained encrypted data sharing such that a data
owner is allowed to share
a ciphertext of data among others under some
specified conditions. This paper, for the first
time, proposes a privacy-
preserving ciphertext multi-sharing mechanism
to achieve the above properties. It combines
the merits of proxy re-encryption with
anonymous technique in which
a ciphertext can be securely and conditionally
shared multiple times without leaking both the
knowledge of underlying message and the
identity information
of ciphertext senders/recipients. Furthermore,
this paper shows that the new primitive is
secure against chosen-ciphertext attacks in the
standard model.
IEEE 2015
TTA-JB-
C1505
Self-Adjusting Slot
Configurations for
Homogeneous and
Heterogeneous Hadoop
The MapReduce framework and its open
source implementation Hadoop have become
the defacto platform for scalable analysis on
large data sets in recent years. One of the
IEEE 2015

primary concerns in Hadoop is how to
minimize the completion length (i.e.,
makespan) of a set of MapReduce jobs. The
current Hadoop only allows
static slot configuration, i.e., fixed numbers of
map slots and reduce slots throughout the
lifetime of a cluster. However, we found that
such a static configuration may lead to low
system resource utilizations as well as long
completion length. Motivated by this, we
propose simple yet effective schemes which
use slot ratio between map and reduce tasks as
a tunable knob for reducing the makespan of a
given set. By leveraging the workload
information of recently completed jobs, our
schemes dynamically allocates resources
(or slots) to map and reduce tasks. We
implemented the presented schemes
in Hadoop V0.20.2 and evaluated them with
representative MapReduce benchmarks at
Amazon EC2. The experimental results
demonstrate the effectiveness and robustness
of our schemes under both simple workloads
and more complex mixed workloads.
TTA-JB-
C1506
A General
Communication Cost
Optimization
Framework for Big
Data Stream
Processing in Geo-
distributed Data
Centers
With the explosion
of big data, processing large numbers of
continuous data streams, i.e., big data
stream processing (BDSP), has become a
crucial requirement for many scientific and
industrial applications in recent years. By
offering a pool of
computation, communication and storage
resources, public clouds, like Amazon’s EC2,
are undoubtedly the most efficient platforms to
meet the ever-growing needs of BDSP. Public
cloud service providers usually operate a
number of geo-distributed datacenters across
the globe. Different datacenter pairs are with
different inter-datacenter network
costs charged by Internet Service Providers
(ISPs). While, inter-datacenter traffic in BDSP
constitutes a large portion of a cloud provider’s
traffic demand over the Internet and incurs
substantial communication cost, which may
even become the dominant operational
IEEE 2015

expenditure factor. As the datacenter resources
are provided in a virtualized way, the virtual
machines (VMs) for stream processing tasks
can be freely deployed onto any datacenters,
provided that the Service Level Agreement
(SLA, e.g., quality-of-information) is obeyed.
This raises the opportunity, but also a
challenge, to explore the inter-datacenter
network cost diversities to optimize both VM
placement and load balancing towards
network cost minimization with guaranteed
SLA. In this paper, we first propose
a general modeling framework that describes
all representative intertask relationship
semantics in BDSP. Based on our
novel framework, we then formulate
the communication cost minimization problem
for BDSP into a mixed-integer linear
programming (MILP) problem and prove it to
be NP-hard. We then propose a computation-
efficient solution based on MILP. The high
efficiency of our proposal is validated by
extensive simulation based studies.
TTA-JB-
C1507
Data Transfer
Scheduling for
Maximizing Throughput
of Big-Data Computing
in Cloud Systems
Many big-data computing applications have
been deployed in cloud platforms. These
applications normally demand
concurrent data transfers among computing no
des for parallel processing. It is important to
find the best transfer scheduling leading to the
least data retrieval time – the maximum
throughput in other words. However, the
existing methods cannot achieve this, because
they ignore link bandwidths and the diversity
of data replicas and paths. In this paper, we
aim to develop a max-
throughput data transfer scheduling to
minimize the data retrieval time of
applications. Specifically, the problem is
formulated into mixed integer programming,
and an approximation algorithm is proposed,
with its approximation ratio analyzed. The
extensive simulations demonstrate that our
algorithm can obtain near optimal solutions.
IEEE 2015

TTA-JB-
C1508
Accelerated PSO
Swarm Search Feature
Selection for Data
Stream Mining Big Data
Big Data though it is a hype up-springing
many technical challenges that confront both
academic research communities and
commercial IT deployment, the root sources
of Big Data are founded on data streams and
the curse of dimensionality. It is generally
known that data which are sourced from data
streams accumulate continuously making
traditional batch-based model induction
algorithms infeasible for real-
time data mining. Feature selection has been
popularly used to lighten the processing load in
inducing a data mining model. However, when
it comes to mining over high
dimensional data the search space from which
an optimal feature subset is derived grows
exponentially in size, leading to an intractable
demand in computation. In order to tackle this
problem which is mainly based on the high-
dimensionality and streaming format
of data feeds in Big Data, a novel
lightweight feature selection is proposed.
The feature selection is designed particularly
for mining streaming data on the fly, by using
accelerated particle swarm optimization
(APSO) type of swarm search that achieves
enhanced analytical accuracy within
reasonable processing time. In this paper, a
collection of Big Data with exceptionally large
degree of dimensionality are put under test of
our new feature selection algorithm for
performance evaluation.
IEEE 2015
TTA-JB-
C1509
An Efficient Privacy-
Preserving Ranked
Keyword Search
Method
Cloud data owners prefer to outsource
documents in an encrypted form for the
purpose of privacy preserving. Therefore it is
essential to develop efficient and reliable
ciphertext search techniques. One challenge is
that the relationship between documents will
be normally concealed in the process of
encryption, which will lead to
significant search accuracy performance
degradation. Also the volume of data in data
centers has experienced a dramatic growth.
This will make it even more challenging to
design ciphertext search schemes that can
IEEE 2015

provide efficient and reliable online
information retrieval on large volume of
encrypted data. In this paper, a hierarchical
clustering method is proposed to support
more search semantics and also to meet the
demand for fast ciphertext search within a big
data environment. The proposed hierarchical
approach clusters the documents based on the
minimum relevance threshold, and then
partitions the resulting clusters into sub-
clusters until the constraint on the maximum
size of cluster is reached. In the search phase,
this approach can reach a linear computational
complexity against an exponential size
increase of document collection. In order to
verify the authenticity of search results, a
structure called minimum hash sub-tree is
designed in this paper. Experiments have been
conducted using the collection set built from
the IEEE Xplore. The results show that with a
sharp increase of documents in the dataset
the search time of the proposed method
increases linearly whereas the search time of
the traditional method increases exponentially.
Furthermore, the proposed method has an
advantage over the traditional method in
the rank privacy and relevance of retrieved
documents.
TTA-JB-
C1510
Splitting Large Medical
Data Sets based on
Normal Distribution in
Cloud Environment
The surge of medical and e-commerce
applications has generated tremendous amount
of data, which brings people to a so-called
“Big Data” era. Different from
traditional large data sets, the term “Big Data”
not only means the large size of data volume
but also indicates the high velocity
of data generation. However,
current data mining and analytical techniques
are facing the challenge of dealing with large
volume data in a short period of time. This
paper explores the efficiency of utilizing
the Normal Distribution (ND) method
for splitting and
processing large volume medical data in cloud
environment, which can provide representative
information in the split data sets. The ND-
IEEE 2015

based new model consists of two stages. The
first stage adopts the ND method
for large data sets splitting and processing,
which can reduce the volume of data sets. The
second stage implements the ND-based model
in a cloud computing infrastructure for
allocating the split data sets. The experimental
results show substantial efficiency gains of the
proposed method over the conventional
methods without splitting data into small
partitions. The ND-based method can generate
representative data sets, which can offer
efficient solution for large data processing.
The split data sets can be processed in parallel
in Cloud computing environment.
DOMAIN : ANDROID
TTA-AA-
C1501
MARS Mobile
Application
Relaunching Speed-up
through Flash-Aware
Page Swapping
The approach for
fast application relaunching on the current
Android system is to cache background
applications in memory. This mechanism is
limited by the available memory size. In
addition, the application state may not be
easily recovered. We propose a prototype
system, MARS, to enable page swapping and
cache
more applications. MARS can speed up the ap
plication relaunching and restore
the application state. As a
new page swapping design for
optimizing application relaunching, MARS
isolates Android runtime Garbage Collection
(GC) from page swapping for compatibility
and employs several flash-aware techniques
for swap-in speedup. Two main components
of MARS are page slot allocation and
read/write control. Page slot allocation
reorganizes page slots in swap area to produce
sequential reads and improve the performance
of swap-in. Read/Write control addresses the
read/write interference issue by reducing
concurrent and extra internal writes. Compared
to the conventional Linux page swapping,
these two components can scale up the read
bandwidth up to about 3.8 times.
IEEE 2015

Application tests on a Google Nexus 4 phone
show that MARS reduces the launching time
of applications by 50% 80%. The
modified page swapping mechanism can
outperform the conventional
Linux page swapping up to 4 times.
TTA-AA-
C1503
EGC MONITORING
SYSTEM USING
ANDROID
This paperwork describes the development and
test of circuitry and software to enable
the use of Android mobile phones equipped
with Bluetooth to receive the incoming
electrocardiogram (ECG) signal from a user
and show it in real-time on the cell phone
screen. The system comprises three distinct
subsystems. The first one is dedicated to
condition the analog ECG signal, preparing it
for conversion to the digital world. The second
one consists of a microcontroller and a
Bluetooth module. This unit samples the ECG,
serializes the samples and transmits them via
the Bluetooth module to the Android cell
phone. The third subsystem is the cell phone
itself. An application program written to the
cell phone receives the ECG samples and
suitably charts the ECG signal on the screen
for analysis. The good quality of the ECG
signal allows for identification of arrhythmias.
IEEE 2015
TTA-AA-
C1504
Auto emergency alert
using android
In this paper, we describe the Well Phone, a
smart phone with additional software, that
is used as a personal health monitoring device.
The Well Phone interfaces various health
monitoring devices to the smart phone, and
collects physiological data from those devices.
It employs novel algorithms that perform
statistical analyses, relate sequences of
disparate measurements from different devices,
and correlate physical activity with
physiological measurements. The Well Phone
provides feedback to the user by means of
visualization and speech interaction,
and alerts a caregiver, medical professional, or
emergency responder, as needed.
IEEE 2015
TTA-AA-
C1505
Disaster Alert system
using android
Robot can do a work with ease which seems to
be impossible for a man and it becomes more
IEEE 2015

helpful if one can able to control it wirelessly.
Now a day robot is becoming a versatile and
has a lot of features like one can control it by
Smartphone, can avoid obstacles
automatically, sense the environment and can
send alert and now even it can diffuse the
bomb and can perform almost all the critical
task. The feature which is discussed in this
paper is to use it in rescue and search mission.
The robot can be controlled
wirelessly using RF technology, has ultrasonic
sensor for obstacle detection and it is also
equipped with the smart phone camera to
provide a Omni directional view and can send
the video stream wirelessly to remote device
which makes it easier to controlled the bot.
The robot can explore those places where
human cannot reach easily like the places
suffered from natural disaster like earthquake,
tsunami and hurricane.
TTA-AA-
C1506
Farm corps
management system
using android
This study aimed to investigate an
establishment using an
Intelligent System which employed an
Embedded System and Smart Phone for
chicken farming management and problem
solving using Raspberry Pi and Arduino Uno.
An experiment and comparative analysis of the
intelligent system was applied in a sample
chicken farm in this study. The findings of this
study found that the system could monitor
surrounding weather conditions including
humidity, temperature, climate quality, and
also the filter fan switch control in the
chicken farm. The system was found to be
comfortable for farmers to use as they could
effectively control the farm anywhere at
anytime, resulting in cost reduction, asset
saving, and productive management in
chicken farming.
IEEE 2015
TTA-AA-
C1507
ACCIDENT TRACKING
APP FOR ANDROID
MOBILE
The usage of mobile devices has increased
dramatically in recent years. These devices
serve us in many practical ways and provide us
with many services -- many of them in real-
time. The delivery of streaming audio,
IEEE 2015

streaming video and internet content to these
devices has become common place. One
emerging application in recent years is the use
of mobile devices for tracking local traffic
incidents and there are several such providers
of this content on the Internet as Google Maps,
here.com, Twitter, various Department of
Transportation's web sites, various radio
stations websites, and many others. Some sites
as Twitter only provide text information but
are updated often with recent data. Map
enhanced websites provide visual information
but are often not updated as often. The goal of
this project is to integrate all the sources of
traffic information together in one place and
filter intelligently all the recent incident data so
the results are as accurate and up to date as
possible thus minimizing the number of false
reports and incidents. This process,
implemented for iOS 7 using XCode and
Objective-C, allows the user to view traffic
reports for 15 large US cities with the
capabilities for the addition of many more
locations. Results for the app are compared
with the major individual sources and the
percentage of additional incidents detected and
false incidents incorrectly identified for several
large cities are provided.
TTA-AA-
C1508
Friend book A
Semantic-based Friend
Recommendation
System for Social
Networks
Existing social networking services
recommend friends to users based on
their social graphs, which may not be the most
appropriate to reflect a user's preferences
on friend selection in real life. In this paper, we
present Friend book, a novel semantic-
based friend recommendation system for social
networks, which recommends friends to
users based on their life styles instead
of social graphs. By taking advantage of
sensor-rich smart phones, Friend book
discovers life styles of users from user-centric
sensor data, measures the similarity of life
styles between users, and
recommends friends to users if their life styles
have high similarity. Inspired by text mining,
we model a user's daily life as life documents,
IEEE 2015

from which his/her life styles are extracted by
using the Latent Dirichlet Allocation
algorithm. We further propose a similarity
metric to measure the similarity of life styles
between users, and calculate users' impact in
terms of life styles with a friend-matching
graph. Upon receiving a request, Friend book
returns a list of people with
highest recommendation scores to the query
user. Finally, Friend book integrates a
feedback mechanism to further improve
the recommendation accuracy. We have
implemented Friend book on the Android-
based smart phones, and evaluated its
performance on both small-scale experiments
and large-scale simulations. The results show
that there commendations accurately reflect the
preferences of users in choosing friends.
TTA-AA-
C1509
Blood Banking System
Using Android
Automated Blood Bank is an associate work
that brings voluntary blood donors and those in
need of blood on to a common platform. The
mission is to fulfill every blood request in the
country with a promising android application
and motivated individuals who are willing to
donate blood. The proposed work aims to
overcome this communication barrier by
providing a direct link between the donor and
the recipient by using low cost and low power
Raspberry Pi B+ kit. It requires Micro USB of
5V and 2A power supply only. Entire
communication takes place via SMS (Short
Messaging Service) which is compatible
among all mobile types.
"Automated Blood Bank" is an project that
brings voluntary blood donors and those in
need of blood on to a common platform. This
project aims at servicing the persons who seek
donors who are willing to donate blood and
also provide it in the time frame required.
Automated Blood Bank tries to assist
victims/patients/those in want of blood. It is an
endeavor to achieve dead set these people in
want of blood and connect them to those
willing to donate. The proposed work explores
to find blood donors by using GSM based
IEEE 2015

Smart Card CPU - Raspberry Pi B+ Kit. The
vision is to be “The hope of every Indian in
search of a voluntary blood donor”.
TTA-AM-
C1501
Timer-based Bloom
Filter Aggregation for
Reducing Signaling
Overhead in
Distributed Mobility
Management
Distributed mobility management (DMM) is a
promising technology to address the mobile
data traffic explosion problem. Since the
location information of mobile nodes (MNs)
are distributed in several mobility agents
(MAs), DMM requires an additional
mechanism to share the location information of
MNs between MAs. In the literature, multicast
or distributed hash table (DHT)-based sharing
methods have been suggested; however they
incur significant signaling overhead owing to
unnecessary location information updates
under frequent handovers.
To reduce the signaling overhead, we propose
a timer-
based Bloom filter aggregation (TBFA)
scheme for distributing the location
information. In the TBFA scheme, the location
information of MNs is maintained
by Bloom filters at each MA. Also, since the
propagation of the whole Bloom filter for
every MN movement leads to
high signaling overhead, each MA only
propagates changed indexes in
the Bloom filter when a pre-
defined timer expires. To verify the
performance of the TBFA scheme, we develop
analytical models on
the signaling overhead and the latency and
devise an algorithm to select an
appropriate timer value. Extensive simulation
results are given to show the accuracy of
analytical models and effectiveness of the
TBFA scheme over the existing DMM scheme.
IEEE 2015
DOMAIN : IMAGE PROCESSING
TTA-AI-
C1501
Smartphone-Based
Wound Assessment
System for Patients
With Diabetes
Diabetic foot ulcers represent a significant
health issue. Currently, clinicians and nurses
mainly base their wound assessment on visual
examination of wound size and healing status,
while the patients themselves seldom have an
IEEE 2015

opportunity to play an active role. Hence, a
more quantitative and cost-effective
examination method that enables
the patients and their caregivers to take a more
active role in daily wound care potentially can
accelerate wound healing, save travel cost and
reduce healthcare expenses. Considering the
prevalence of smart phones with a high-
resolution digital camera, assessing wounds by
analyzing images of chronic foot ulcers is an
attractive option. In this paper, we propose a
novel wound image
analysis system implemented solely on the
Android smart phone. The wound image is
captured by the camera on the smart
phone with the assistance of an image capture
box. After that, the smart
phone performs wound segmentation by
applying the accelerated mean-shift algorithm.
Specifically, the outline of the foot is
determined based on skin color, and
the wound boundary is found using a simple
connected region detection method. Within
the wound boundary, the healing status is next
assessed based on red-yellow-black color
evaluation model. Moreover, the healing status
is quantitatively assessed, based on trend
analysis of time records for a given patient.
Experimental results on wound images
collected in UMASS-Memorial Health
Center Wound Clinic (Worcester, MA)
following an Institutional Review Board
approved protocol show that our system can be
efficiently used to analyze the wound healing
status with promising accuracy.
TTA-AI-
C1502
Hand Gesture
Recognition Using
Kinect Sensor
Hand gesture is becoming one of the most
common ways that people use in information
technology products needing interaction
between people and computer, which brings to
user an interesting experience. 3D camera are
developed recently, e.g. Kinect, not only
provide color image, but also depth map. It
opens a new opportunity in development of
human computer interaction (HCI) application.
This paper shows a
IEEE 2015

novel hand gesture recognition method based
on depth image obtained from
the Kinectsensor. Firstly, the hand region
extraction is done by putting thresholds
on hand point detected by using NITE 2 library
provided by Prime Sense. Secondly, we extract
the feature vector including the number of
open fingers, the angles between the fingertips
and horizontal of the hand, the angles between
two consecutive fingers, and the difference
between the distance from the hand center to
the fingertips and the radius of the biggest
inscribed circle. Finally, a support vector
machine (SVM) is applied to identify
different gestures. The experimental result
shows that the proposed method
performs hand gesture recognition at accuracy
of 95% in real-time.
DOMAIN : MOBILE COMPUTING
TTA-AM-
C1501
Timer-based Bloom
Filter Aggregation for
Reducing Signaling
Overhead in
Distributed Mobility
Management
Distributed mobility management (DMM) is a
promising technology to address the mobile
data traffic explosion problem. Since the
location information of mobile nodes (MNs)
are distributed in several mobility agents
(MAs), DMM requires an additional
mechanism to share the location information of
MNs between MAs. In the literature, multicast
or distributed hash table (DHT)-based sharing
methods have been suggested; however they
incur significant signaling overhead owing to
unnecessary location information updates
under frequent handovers.
To reduce the signaling overhead, we propose
a timer-
based Bloom filter aggregation (TBFA)
scheme for distributing the location
information. In the TBFA scheme, the location
information of MNs is maintained
by Bloom filters at each MA. Also, since the
propagation of the whole Bloom filter for
every MN movement leads to
high signaling overhead, each MA only
propagates changed indexes in
the Bloom filter when a pre-
IEEE 2015

defined timer expires. To verify the
performance of the TBFA scheme, we develop
analytical models on
the signaling overhead and the latency and
devise an algorithm to select an
appropriate timer value. Extensive simulation
results are given to show the accuracy of
analytical models and effectiveness of the
TBFA scheme over the existing DMM scheme.

Final Year Project IEEE 2015

More Related Content

What's hot

Similar to Final Year Project IEEE 2015

Recently uploaded

Final Year Project IEEE 2015