TTA
FINAL YEAR PROJECTS TITLES
WITH ABSTRACT
www.ttafinalyearprojects.com
IEEE 2015, 2014, 2013, 2012, etc..,
Projects for B.E/B.Tech/M.E/MCA/Bsc/Msc
For complete base paper, call now and talk
to our expert
90942066260 | 9042066280| 044 4353 3393
DOMAIN : NETWORKING
CODE PROJECT TITLE DESCRIPTION REFERENCE
TTA-DN-
C1501
Delay Analysis of
Multichannel
Opportunistic Spectrum
Access MAC Protocols
We provide a comprehensive delay and
queuing analysis for two baseline
medium access control protocols for multi-user
cognitive radio networks with homogeneous
users and channels and investigate the impact
of different network parameters on the system
performance. In addition to an accurate
Markov chain, which follows the queue status
of all users, several lower complexity queuing
theory approximations are provided. Accuracy
and performance of the proposed analytical
approximations are verified with extensive
simulations. It is observed that using an Aloha-
type access to the control channel, a
buffering MAC protocol, where in case of
interruption the CR user waits for the primary
user to vacate the channel before resuming the
transmission, outperforms a switching MAC
protocol, where the CR user vacates the
channel in case of appearance of primary users
and then compete again to gain access to a new
channel. The reason is that
the delay bottleneck for both protocols is the
time required to successfully access the control
channel, which occurs more frequently for the
switching MAC protocol. It is thus shown that
a clustering approach, where users are divided
into clusters with a separate control channel
per cluster, can significantly improve the
performance by reducing the competitions over
control channel.
IEEE 2015
TTA-DN-
C1502
LEISURE A Framework
for Load-Balanced
Network - Wide Traffic
Measurement
Network-wide traffic measurement is of
interest to network operators to uncover
global network behavior for the management
tasks of traffic accounting, debugging or
troubleshooting, security, and
traffic engineering. Increasingly,
sophisticated network measurement tasks such
as anomaly detection and security forensic
analysis are requiring in-depth fine-grained
IEEE 2015
flow-level measurements. However,
performing in-depth per-
flow measurements (e.g., detailed payload
analysis) is often an expensive process. Given
the fast-changing Internet traffic landscape and
large traffic volume, a single monitor is not
capable of accomplishing
the measurement tasks for all applications of
interest due to its resource constraint.
Moreover, uncovering global network behavior
requires network-wide traffic measurements at
multiple monitors across
the network since traffic measured at any
single monitor only provides a partial view and
may not be sufficient or accurate. These
factors call for coordinated measurements
among multiple distributed monitors. In this
paper, we present a centralized
optimization framework, LEISURE (Load-
Equalized measurement), for load-
balancing network measurement workloads
across distributed monitors. Specifically, we
consider various load-balancing problems
under different objectives and study their
extensions to support different deployment
scenarios. We evaluate LEISURE via detailed
simulations on Abilene and
GEANT network traces to show
that LEISURE can achieve much better load-
balanced performance (e.g., 4.75X smaller
peak workload and 70X smaller variance in
workloads) across all coordinated monitors in
comparison to naive solution (uniform
assignment) to accomplish network-
wide traffic measurement tasks.
TTA-DN-
C1503
Authenticated Key
Exchange Protocols for
Parallel Network File
Systems
We study the problem of key establishment for
secure many-to-many communications. The
problem is inspired by the proliferation of
large-scale
distributed file systems supporting parallel acc
ess to multiple storage devices. Our work
focuses on the current Internet standard for
such file systems, i.e., parallel
Network File System (pNFS), which makes
use of Kerberos to
IEEE 2015
establish parallel session keys between clients
and storage devices. Our review of the existing
Kerberos-based protocol shows that it has a
number of limitations: (i) a metadata server
facilitating key exchange between the clients
and the storage devices has heavy workload
that restricts the scalability of the protocol; (ii)
the protocol does not provide forward secrecy;
(iii) the metadata server generates itself all the
session keys that are used between the clients
and storage devices, and this inherently leads
to key escrow. In this paper, we propose a
variety
of authenticated key exchange protocols that
are designed to address the above issues. We
show that our protocols are capable of
reducing up to approximately 54% of the
workload of the metadata server and
concurrently supporting forward secrecy and
escrow-freeness. All this requires only a small
fraction of increased computation overhead at
the client.
TTA-DN-
C1504
Diversifying Web
Service
Recommendation
Results via Exploring
Service Usage History
The last decade has witnessed a tremendous
growth of Web services as a major technology
for sharing data, computing resources, and
programs on the Web. With the increasing
adoption and presence of Web services, design
of novel approaches for
effective Web service recommendation to
satisfy users’ potential requirements has
become of paramount importance.
Existing Web service
commendation approaches mainly focus on
predicting missing QoS values of Web service
candidates which are interesting to a user using
collaborative filtering approach, content-based
approach, or their hybrid.
These recommendation approaches assume
that recommended Web services are
independent to each other, which sometimes
may not be true. As a result, many similar or
redundant Web services may exist in
a recommendation list. In this paper, we
propose a novel Web
service recommendation approach
IEEE 2015
incorporating a user’s potential QoS
preferences and diversity feature of user
interests on Web services. User’s interests and
QoS preferences on Web services are first
mined
by exploring the Web service usage history.
Then we compute scores of Web service
candidates by measuring their relevance with
historical and potential user interests, and their
QoS utility. We also construct
a Web service graph based on the functional
similarity between Web services. Finally, we
present an innovative diversity-
aware Web service ranking algorithm to rank
the Web service candidates based on their
scores, and diversity degrees derived from
the Web service graph. Extensive experiments
are conducted based on a real
world Web service dataset, indicating that our
proposed Web service recommendation approa
ch significantly improves the quality of their
commendation results compared with existing
methods.
TTA-DN-
C1505
Virtual Servers Co-
Migration for Mobile
Accesses Online vs.
Offline
In this paper, we study the problem of co-
migrating a set of service replicas residing on
one or more redundant virtual servers in clouds
in order to satisfy a sequence of mobile batch-
request demands in a cost effective way. With
such a migration, we can not only reduce the
service access latency for end users but also
minimize the network costs for service
providers. The co-migration can be achieved at
the cost of bulk-data transfer and increases the
overall monetary costs for the service
providers. To gain the benefits of
service migration while minimizing the overall
costs, we propose a co-migration algorithm
Migk for multiple servers, each hosting a
service replica. Migk is a randomized
algorithm with a competitive cost of O(γ log
n/min{1/k, μ/λ+μ}) to migrate κ services in a
static n-node network where γ is the maximal
ratio of the migration costs between any pair of
neighbor nodes in the network, and where λ
and μ represent the maximum wired
IEEE 2015
transmission cost and the wireless link cost
respectively. For comparison, we also study
this problem in its static off-line form by
proposing a parallel dynamic programming
(hereafter DP) based algorithm that integrates
the branch & bound strategy with sampling
techniques in order to approximate the optimal
DP results. We validate the advantage of the
proposed algorithms via extensive simulation
studies using various requests patterns and
cloud network topologies. Our simulation
results show that the proposed algorithms can
effectively adapt to mobile access patterns to
satisfy the service request sequences in a cost-
effective way.
TTA-DN-
C1506
Anomaly-Based
Network Intrusion
Detection System
We present POSEIDON, a new anomaly-
based network intrusion detection system.
POSEIDON is payload-based, and has a two-
tier architecture: the first stage consists of a
self-organizing map, while the second one is a
modified PAYL system. Our benchmarks on
the 1999 DARPA data set show a
higher detection rate and lower number of false
positives than PAYL and PHAD
IEEE 2015
TTA-DN-
C1507
CEDAR A Low-Latency
and Distributed
Strategy for Packet
Recovery in Wireless
Networks
Underlying link-layer protocols of well-
established wireless networks that use the
conventional “store-and-forward” design
paradigm cannot provide highly sustainable
reliability and stability in wireless
communication, which introduce significant
barriers and setbacks in scalability and
deployments of wireless networks. In this
paper, we propose a Code
Embedded Distributed Adaptive and Reliable
(CEDAR) link-layer framework that
targets low latency and balancing en/decoding
load among nodes. CEDAR is the first
comprehensive theoretical framework for
analyzing and designing distributed and
adaptive error recovery for wireless networks.
It employs a theoretically sound framework for
embedding channel codes in each packet and
performs the error correcting process in
selected intermediate nodes in a packet's route.
To identify the intermediate nodes for the
IEEE 2015
decoding, we mathematically calculate the
average packet delay and formalize the
problem as a nonlinear integer programming
problem. By minimizing the delays, we derive
three propositions that: 1) can identify the
intermediate nodes that minimize the
propagation and transmission delay of
a packet; and 2) and 3) can identify the
intermediate nodes that simultaneously
minimize the queuing delay and maximize the
fairness of en/decoding load of all the nodes.
Guided by the propositions, we then propose a
scalable and distributed scheme in CEDAR to
choose the intermediate en/decoding nodes in a
route to achieve its objective. The results from
real-world test bed “NESTbed” and simulation
with MATLAB prove that CEDAR is superior
to schemes using hop-by-hop decoding and
destination decoding not only in packet delay
and throughput but also in energy-consumption
and load distribution balance.
TTA-DN-
C1508
CoCoWa A
Collaborative Contact-
Based Watchdog for
Detecting Selfish Nodes
Mobile ad-hoc networks (MANETs) assume
that mobile nodes voluntary cooperate in order
to work properly. This cooperation is a cost-
intensive activity and some nodes can refuse to
cooperate, leading to a selfish node behavior.
Thus, the overall network performance could
be seriously affected. The use of watchdogs is
a well-known mechanism
to detect selfish nodes. However, the detection
process performed by watchdogs can fail,
generating false positives and false negatives
that can induce to wrong operations.
Moreover, relying on local watchdogs alone
can lead to poor performance when
detecting selfish nodes, in term of precision
and speed. This is specially important on
networks with sporadic contacts, such as delay
tolerant networks (DTNs), where
sometimes watchdogs lack of enough time or
information to detect the selfish nodes. Thus,
we propose collaborative contact-based
watchdog (CoCoWa) as
a collaborative approach based on the diffusion
of local selfish nodes awareness when
IEEE 2015
a contact occurs, so that information
about selfish nodes is quickly propagated. As
shown in the paper, this collaborative approach
reduces the time and increases the precision
when detecting selfish nodes.
TTA-DN-
C1509
Distributed
Opportunistic
Scheduling for
EnergyHarvesting
Based Wireless
Networks A Two-
StageProbing Approach
This paper considers a heterogeneous ad
hoc network with multiple transmitter-receiver
pairs, in which all transmitters are capable of
harvesting renewable energy from the
environment and compete for one shared
channel by random access. In particular, we
focus on two different scenarios: the constant
energy harvesting (EH) rate model where the
EH rate remains constant within the time of
interest and the i.i.d. EH rate model where the
EH rates are independent and
identically distributed across different
contention slots. To quantify the roles of both
the energy state information (ESI) and the
channel state information (CSI),
a distributed opportunistic scheduling (DOS)
framework with two-stage probing and save-
then-transmit energy utilization is proposed.
Then, the optimal throughput and the optimal
scheduling strategy are obtained via one-
dimension search, i.e., an iterative algorithm
consisting of the following two steps in each
iteration: First, assuming that the stored energy
level at each transmitter is stationary with a
given distribution, the expected throughput
maximization problem is formulated as an
optimal stopping problem, whose solution is
proven to exist and then derived for both
models; second, for a fixed stopping rule, the
energy level at each transmitter is shown to be
stationary and an efficient iterative algorithm
is proposed to compute its steady-state
distribution. Finally, we validate our analysis
by numerical results and quantify the
throughput gain compared with the best-effort
delivery scheme.
IEEE 2015
TTA-DN-
C1510
Enabling Efficient Multi-
Keyword Ranked
Search Over Encrypted
Mobile Cloud Data
In mobile cloud computing, a fundamental
application is to outsource the mobile data to
external cloud servers for scalable data storage.
The outsourced data, however, need to
IEEE 2015
Through Blind Storage be encrypted due to the privacy and
confidentiality concerns of their owner. This
results in the distinguished difficulties on the
accurate search over
the encrypted mobile cloud data. To tackle this
issue, in this paper, we develop the searchable
encryption for multi-
keyword ranked search over the storage data.
Specifically, by considering the large number
of outsourced documents (data) in the cloud,
we utilize the relevance score and k-nearest
neighbor techniques to develop
an efficient multi-keyword search scheme that
can return the ranked search results based on
the accuracy. Within this framework, we
leverage an efficient index to further improve
the search efficiency, and adopt
the blind storage system to conceal access
pattern of the search user. Security analysis
demonstrates that our scheme can achieve
confidentiality of documents and index,
trapdoor privacy, trapdoor unlinkability, and
concealing access pattern of the search user.
Finally, using extensive simulations, we show
that our proposal can achieve much improved
efficiency in terms of search functionality
and search time compared with the existing
proposals.
TTA-DN-
C1511
Energy-Efficient Group
Key Agreement for
Wireless Networks
Advances in lattice-based cryptography are
enabling the use of public key algorithms
(PKAs) in power-constrained ad hoc and
sensor network devices. Unfortunately, while
many wireless networks are dominated
by group communications, PKAs are
inherently unicast i.e., public/private key pairs
are generated by data destinations. To fully
realize public key cryptography in
these networks, lightweight PKAs should be
augmented with energy-efficient mechanisms
for group key agreement. We consider a
setting where master keys are loaded on clients
according to an arbitrary distribution. We
present a protocol that uses
session keys derived from those master keys to
establish a group key that is information-
IEEE 2015
theoretically secure. When master keys are
distributed randomly, our protocol requires
O(logb t) multicasts, where 1-1 is the
probability that a given client possesses a
given master key. The minimum number of
public multicast transmissions required for a
set of clients to agree on a secret key in our
setting was recently characterized. The
proposed protocol achieves the best possible
approximation to that optimum that is
computable in polynomial time. Moreover, the
computational requirements of our protocol
compare favorably to multi-party extensions of
Diffie-Hellman key exchange.
TTA-DN-
C1512
iPath Path Inference in
Wireless Sensor
Networks
Recent wireless sensor networks (WSNs) are
becoming increasingly complex with the
growing network scale and the dynamic nature
of wireless communications. Many
measurement and diagnostic approaches
depend on per-packet routing paths for
accurate and fine-grained analysis of the
complex network behaviors. In this paper, we
propose iPath, a novel path inference approach
to reconstructing the per-packet
routing paths in dynamic and large-
scale networks. The basic idea of iPath is to
exploit high path similarity to iteratively infer
long paths from short ones. iPath starts with an
initial known set ofpaths and
performs path inference iteratively. iPath inclu
des a novel design of a lightweight hash
function for verification of the inferred paths.
In order to further improve
the inference capability as well as the
execution efficiency, iPath includes a fast
bootstrapping algorithm to reconstruct the
initial set ofpaths. We also
implement iPath and evaluate its performance
using traces from large-scale WSN
deployments as well as extensive simulations.
Results show that iPath achieves much higher
reconstruction ratios under
different network settings compared to other
state-of-the-art approaches.
IEEE 2015
TTA-DN-
C1513
Joint Static and
Dynamic Traffic
Scheduling in Data
Center Networks
The advent and continued growth of
large data centers has led to much interest in
switch architectures that can economically
meet the high capacities needed for
interconnecting the thousands of servers in
these data centers. Various multilayer
architectures employing thousands of switches
have been proposed in the literature. We make
use of the observation that the traffic in
a data center is a mixture of
relatively static and rapidly fluctuating
components, and develop a combined
scheduler for both these components using a
generalization of the load-balanced scheduler.
The presence of the known static component
introduces asymmetries in the ingress-egress
capacities, which preclude the use of a load-
balanced scheduler as is. We generalize the
load-balanced scheduler and also incorporate
an opportunistic scheduler that sends traffic on
a direct path when feasible to enhance the
overall switch throughput. Our evaluations
show that this scheduler works very well
despite avoiding the use of a central scheduler
for making packet-by-
packet scheduling decisions.
IEEE 2015
TTA-DN-
C1514
On Downlink
Beamforming with
Small Cells inWireless
Heterogeneous
Systems
In this letter, we study downlink beam
forming for wireless heterogeneous networks
with two groups of users. The users in one
group (group 1) are supported by the small cell
base station (SBS) as well as the macro cell
base station (MBS), while the users in the
other group (group 2) are supported by the
MBS only. The MBS is equipped with an
antenna array for downlink beam forming. We
formulate a convex optimization problem,
which can be solved by semi definite
programming (SDP) relaxation, for
downlink beam forming that takes advantage
of the presence of the SBS for group 1, but
also takes into account the interfering signal
from the SBS for group 2.
IEEE 2015
TTA-DN-
C1515
On-Demand Discovery
of Software Service
Dependencies in
MANETs
The dependencies among the components
of service-oriented software applications
hosted in a mobile ad hoc network (MANET)
are difficult to determine due to the inherent
loose coupling of the services and the transient
communication topologies of the network. Yet
understanding these dependencies is critical to
making good management decisions, since
dependence data underlie important analyses
such as fault localization and impact analysis.
Current methods for discovering dependencies,
developed primarily for fixed networks,
assume that dependencies change only slowly
and require relatively long monitoring periods
as well as substantial memory and
communication resources, all of which are
impractical in the MANET environment. We
describe a new dynamic dependence discovery
method designed specifically for this
environment, yielding dynamic snapshots of
dependence relationships discovered through
observations of service interactions. We
evaluate the performance of our method in
terms of the accuracy of the
discovered dependencies, and draw insights on
the selection of critical parameters under
various operational conditions. Although
operated under more stringent conditions, our
method is shown to provide results comparable
to or better than existing methods.
IEEE 2015
TTA-DN-
C1516
PWDGR Pair-Wise
Directional
Geographical Routing
Based on Wireless
Sensor Network
Multipath routing in wireless multimedia senso
r network makes it possible to transfer data
simultaneously so as to reduce delay and
congestion and it is worth researching.
However, the current
multipath routing strategy may cause problem
that the node energy near sink becomes
obviously higher than other nodes which
makes the network invalid and dead. It also has
serious impact on the performance
of wireless multimedia sensor network (WMS
N). In this paper, we propose a pair-wise
directional geographical routing (PWDGR)
strategy to solve the energy bottleneck
problem. First, the source node can send the
IEEE 2015
data to the pair-wise node around the sink node
in accordance with certain algorithm and then
it will send the data to the sink node.
These pair-wise nodes are equally selected in
360° scope around sink according to a certain
algorithm. Therefore, it can effectively relieve
the serious energy burden around Sink and also
make a balance between energy consumption
and end-to-end delay. Theoretical analysis and
a lot of simulation experiments
on PWDGR have been done and the results
indicate that PWDGR is superior to the
proposed strategies of the similar strategies
both in the view of the theory and the results of
those simulation experiments. With respect to
the strategies of the same kind, PWDGR is
able to prolong 70% network life. The delay
time is also measured and it is only increased
by 8.1% compared with the similar strategies.
TTA-DN-
C1517
REAL A Reciprocal
Protocol for Location
Privacy in Wireless
Sensor Networks
K-anonymity has been used to
protect location privacy for location monitorin
g services in wireless
sensor networks (WSNs), where sensor nodes
work together to report k-anonymized
aggregate locations to a server. Each k-
anonymized aggregate location is a cloaked
area that contains at least k persons. However,
we identify an attack model to show that
overlapping aggregate locations still pose
privacy risks because an adversary can infer
some overlapping areas with less than k
persons that violates the k-
anonymity privacy requirement. In this paper,
we propose a reciprocal protocol for
location privacy (REAL) in WSNs.
In REAL, sensor nodes are required to
autonomously organize their sensing areas into
a set of non-overlapping and highly accurate k-
anonymized aggregate locations. To confront
the three key challenges in REAL, namely,
self-organization, reciprocity property and high
accuracy, we design a state transition process,
a locking mechanism and a time delay
mechanism, respectively. We compare the
performance of REAL with
IEEE 2015
current protocols through simulated
experiments. The results show
that REAL protects location privacy, provides
more accurate query answers, and reduces
communication and computational costs.
TTA-DN-
C1518
SanGA A Self-Adaptive
Network-Aware
Approach to Service
Composition
Service-Oriented Computing enables
the composition of loosely
coupled services provided with varying
Quality of Service (QoS) levels. Selecting a
near-optimal set of services for
a composition in terms of QoS is crucial when
many functionally equivalent services are
available. As the number of distributed
services, particularly in the cloud, is rising
rapidly, the impact of the network on the QoS
keeps increasing. Despite this,
current approaches do not differentiate
between the QoS of services themselves and
the network. Therefore, the computed latency
differs from the actual latency, resulting in
suboptimal QoS. Thus, we propose a network-
aware approach that handles the QoS
of services and the QoS of
the network independently. First, we build
a network model in order to estimate
the network latency between
arbitrary services and potential users. Our
selection algorithm then leverages this
model to find compositions with a low latency
for a given execution policy. We employ
a self-adaptive genetic algorithm which
balances the optimization of latency and other
QoS as needed and improves the convergence
speed. In our evaluation, we show that
our approach works under realistic network
conditions, efficiently
computing compositions with much lower
latency and otherwise equivalent QoS
compared to current approaches.
IEEE 2015
TTA-DN-
C1519
Secure Binary Image
Stegnograpghy Based
On Minimizing the
disortion on the texture
Most state-of-the-
art binary image steganographic techniques
only consider the flipping distortion according
to the human visual system, which will be
not secure when they are attacked by
IEEE 2015
steganalyzers. In this paper,
a binary image steganographic scheme that
aims to minimize the embedding distortion on
the texture is presented. We extract the
complement, rotation, and mirroring-invariant
local texture patterns (crmiLTPs) from
the binary image first. The weighted sum of
crmiLTP changes when flipping one pixel is
then employed to measure the flipping
distortion corresponding to that pixel. By
testing on both simple binary images and the
constructed image data set, we show that the
proposed measurement can well describe the
distortions on both visual quality and
statistics. Based on the proposed measurement,
a practical steganographic scheme is
developed. The steganographic scheme
generates the cover vector by dividing the
scrambled image into super pixels. Thereafter,
the syndrome-trellis code is employed
to minimize the designed embedding
distortion. Experimental results have
demonstrated that the proposed steganographic
scheme can achieve statistical security without
degrading the image quality or the embedding
capacity.
TTA-DN-
C1520
Software Puzzle A
Countermeasure to
Resource-Inflated
Denial-of- Service
Attacks
Denial-of-service (DoS) and distributed DoS
(DDoS) are among the major threats to cyber-
security, and client puzzle, which demands a
client to perform computationally expensive
operations before being granted services from
a server, is a well-
known countermeasure to them. However, an
attacker can inflate its capability of
DoS/DDoS attacks with fast puzzle-
solving software and/or built-in graphics
processing unit (GPU)
hardware to significantly weaken the
effectiveness of client puzzles. In this paper,
we study how to prevent DoS/DDoS attackers
from inflating their puzzle-solving
capabilities. To this end, we introduce a new
client puzzle referred to as software puzzle.
Unlike the existing client puzzle schemes,
which publish their puzzle algorithms in
IEEE 2015
advance, a puzzle algorithm in the present
software puzzle scheme is randomly generated
only after a client request is received at the
server side and the algorithm is generated such
that: 1) an attacker is unable to prepare an
implementation to solve the puzzle in advance
and 2) the attacker needs considerable effort in
translating a central processing
unit puzzle software to its functionally
equivalent GPU version such that the
translation cannot be done in real time.
Moreover, we show
how to implement software puzzle in the
generic server-browser model.
TTA-DN-
C1521
Task Allocation for
Wireless Sensor
Network Using Modified
Binary Particle Swarm
Optimization
Many applications
of wireless sensor network (WSN) require the
execution of several computationally intense
in-network processing tasks. Collaborative in-
network processing among multiple nodes is
essential when executing such a task due to the
strictly constrained energy and resources in
single node. Task allocation is essential to
allocate the workload of each task to proper
nodes in an efficient manner. In this paper,
a modified version
of binary particle swarm optimization (MBPS
O), which adopts a different transfer function
and a new position updating procedure with
mutation, is proposed for the
task allocation problem to obtain the best
solution. Each particle in MBPSO is encoded
to represent a complete potential solution
for task allocation. The task workload and
connectivity are ensured by taking them as
constraints for the problem. Multiple metrics,
including task execution time, energy
consumption, and network lifetime, are
considered a whole by designing a hybrid
fitness function to achieve the best overall
performance. Simulation results show the
feasibility of the proposed MBPSO-based
approach for task allocation problem in WSN.
The proposed MBPSO-based approach also
outperforms the approaches based on genetic
algorithm and BPSO in the comparative
IEEE 2015
analysis.
TTA-DN-
C1522
Towards Distributed
Optimal Movement
Strategy for Data
Gathering in Wireless
Sensor Network
In this paper, we address how to design
a distributed movement strategy for mobile
collectors, which can be either physical mobile
agents or query/collector packets periodically
launched by the sink, to achieve
successful data gathering in wireless sensor net
works. Formulating the problem as general
random walks on a graph composed
of sensor nodes, we analyze how
much data can be successfully gathered in time
under any Markovian random-
walk movement strategies for mobile
collectors moving over a graph (or network),
while each sensor node is equipped with
limited buffer space and data arrival rates are
heterogeneous over different sensor nodes. In
particular, from the analysis, we obtain the
optimal movement strategy among a class of
Markovian strategies so as to minimize
the data loss rate over all sensor nodes, and
explain how such
an optimal movement strategy can be made to
work in a distributed fashion. We demonstrate
that
our distributed optimal movement strategy can
lead to about 2 times smaller loss rate than a
standard random walk strategy under diverse
scenarios. In particular, our strategy results in
up to 70% cost savings for the deployment of
multiple collectors to achieve the target
data loss rate than the standard random
walk strategy.
IEEE 2015
TTA-DN-
C1523
Universal Network
Coding-Based
Opportunistic Routing
for Unicast
Network coding-
based opportunistic routing has emerged as an
elegant way to optimize the capacity of lossy
wireless multihop networks by reducing the
amount of required feedback messages. Most
of the works on network coding-
based opportunistic routing in the literature
assume that the links are independent. This
assumption has been invalidated by the recent
empirical studies that showed that the
IEEE 2015
correlation among the links can be arbitrary. In
this work, we show that the performance
of network coding-
based opportunistic routing is greatly impacted
by the correlation among the links. We
formulate the problem of maximizing the
throughput while achieving fairness under
arbitrary channel conditions, and we identify
the structure of its optimal solution. As is
typical in the literature, the optimal solution
requires a large amount of immediate feedback
messages, which is unrealistic. We propose the
idea of performing network coding on the
feedback messages and show that if the
intermediate node waits until receiving only
one feedback message from each next-hop
node, the optimal level of network coding
redundancy can be computed in a distributed
manner. The coded feedback messages require
a small amount of overhead, as they can be
integrated with the packets. Our approach is
also oblivious to losses and correlations among
the links, as it optimizes the performance
without the explicit knowledge of these two
factors.
TTA-JN-
C1524
VEGAS Visual influEnce
GrAph Summarization
on Citation Networks
Visually analyzing citation networks poses
challenges to many fields of the data mining
research. How can we summarize a
large citation graph according to the user's
interest? In particular, how can we illustrate
the impact of a highly influential paper through
the summarization? Can we maintain the
sensory node-link graph structure while
revealing the flow-based influence patterns and
preserving a fine readability? The state-of-the-
art influence maximization algorithms can
detect the most influential node in
a citation network, but fail to summarize
a graph structure to account for its influence.
On the other hand,
existing graph summarization methods fold
large graphs into clustered views, but can not
reveal the hidden influence patterns underneath
the citation network. In this paper, we first
formally define
IEEE 2015
the Influence Graph Summarization problem
on citation networks. Second, we propose a
matrix decomposition based algorithm pipeline
to solve the IGS problem. Our method can not
only highlight the flow-
based influence patterns, but also easily extend
to support the rich attribute information. A
prototype system called VEGAS implementing
this pipeline is also developed. Third, we
present a theoretical analysis on our main
algorithm, which is equivalent to the kernel k-
mean clustering. It can be proved that the
matrix decomposition based algorithm can
approximate the objective of the proposed IGS
problem. Last, we conduct comprehensive
experiments with real-
world citation networks to compare the
proposed algorithm with
classical graph summarization methods.
Evaluation results demonstrate that our method
significantly outperforms the previous ones in
optimizing both the quantitative IGS objective
and the quality of the visual summarizations.
TTA-JN-
C1525
Privacy Protection for
Wireless Medical
Sensor Data
In recent
years, wireless sensor networks have
been widely used in healthcare
applications, such as hospital and home
patient
monitoring. Wireless medical sensor net
works are more vulnerable to
eavesdropping, modification,
impersonation and replaying attacks
than the wired networks. A lot of work
has been done to
secure wireless medical sensor networks
. The existing solutions can protect the
patient data during transmission, but
cannot stop the inside attack where the
administrator of the patient database
reveals the sensitive patient data. In this
paper, we propose a practical approach
to prevent the inside attack by using
IEEE 2015
multiple data servers to store
patient data. The main contribution of
this paper is securely distributing the
patient data in multiple data servers and
employing the Paillier and ElGamal
cryptosystems to perform statistic
analysis on the patient data without
compromising the patients’ privacy.
TTA-JN-
C1526
A Decentralized Cloud
Firewall Framework
with Resources
Provisioning Cost
Optimization
Cloud computing is becoming popular as the
next infrastructure of computing platform.
Despite the promising model and hype
surrounding, security has become the major
concern that people hesitate to transfer their
applications to clouds.
Concretely, cloud platform is under numerous
attacks. As a result, it is definitely expected to
establish a firewall to protect cloud from these
attacks. However, setting up a
centralized firewall for a whole cloud data
center is infeasible from both performance and
financial aspects. In this paper, we propose
a decentralized cloud firewall framework for
individual cloud customers. We investigate
how to dynamically allocate resources to
optimize resources provisioning cost, while
satisfying QoS requirement specified by
individual customers simultaneously.
Moreover, we establish novel queuing theory
based model M/Geo/1 and M/Geo/m for
quantitative system analysis, where the service
times follow a geometric distribution. By
employing Z-transform and embedded Markov
chain techniques, we obtain a closed-form
expression of mean packet response time.
Through extensive simulations and
experiments, we conclude that an M/Geo/1
model reflects the cloud firewall real system
much better than a traditional M/M/1 model.
Our numerical results also indicate that we are
able to set up cloud firewall with
affordable cost to cloud customers.
IEEE 2015
TTA-JN-
C1527
A Privacy-Aware
Authentication Scheme
for Distributed Mobile
Cloud Computing
Services
In modern societies, the number
of mobile users has dramatically risen in recent
years. In this paper, an
efficient authentication scheme for distributed
mobile cloud computing services is proposed.
The proposed scheme provides security and
convenience for mobile users to access
multiple mobile cloud
computing services from
multiple service providers using only a single
private key. The security strength of the
proposed scheme is based on bilinear pairing
cryptosystem and dynamic nonce generation.
In addition, the scheme supports
mutual authentication, key exchange, user
anonymity, and user untraceability. From
system implementation point of view,
verification tables are not required for the
trusted smart card generator
(SCG) service and cloud computing service pr
oviders when adopting the proposed scheme.
In consequence, this scheme reduces the usage
of memory spaces on these
corresponding service providers. In
one mobile user authentication session, only
the targeted cloud service provider needs to
interact with the service requestor (user). The
trusted SCG serves as the secure key
distributor
for distributed cloud service providers
and mobile clients. In the proposed scheme,
the trusted SCG service is not involved in
individual user authentication process. With
this design,
our scheme reduces authentication processing
time required by communication and
computation between cloud service providers
and traditional trusted third party service.
Formal security proof and performance
analyses are conducted to show that
the scheme is both secure and efficient.
IEEE 2015
TTA-JN-
C1528
CPCDN Content
Delivery Powered by
Context and User
Intelligence
There is an unprecedented trend
that content providers (CPs) are building their
own content delivery networks (CDNs) to
provide a variety of content services to
IEEE 2015
their users. By exploiting powerful CP-level
information in content distribution, these CP-
built CDNs open up a whole new design space
and are changing
the content delivery landscape. In this paper,
we adopt a measurement-based approach to
understanding why, how, and how much CP-
level intelligences can help content delivery.
We first present a measurement study of the
CDN built by Tencent, a
largest content provider based in China. We
observe new characteristics and trends
in content delivery which pose great
challenges to the
conventional content delivery paradigm and
motivate the proposal of CPCDN, a
CDN powered by CP-aware information. We
then reveal the benefits obtained by exploiting
two indispensable CP-level intelligences,
namely context intelligence and user intelligen
ce, in content delivery. Inspired by the insights
learnt from the measurement studies, we
systematically explore the design space
of CPCDNand present the novel architecture
and algorithms to address the
new content delivery challenges that have
arisen. Our results not only demonstrate the
potential of CPCDN in
pushing content delivery performance to the
next level, but also identify new research
problems calling for further investigation.
TTA-JN-
C1529
QoS Evaluation for Web
Service
Recommendation
Web service recommendation is one of the
most important fields of research in the area
of service computing. The two core problems
of Web service recommendation are the
prediction of unknown QoSproperty values
and the evaluation of overall QoS according to
user preferences. Aiming to address these two
problems and their current challenges, we
propose two efficient approaches to solve these
problems. First, unknown QoS property values
were predicted by modeling the high-
dimensional QoSdata as tensors, by utilizing
an important tensor operation, i.e., tensor
composition, to predict these QoSvalues. Our
IEEE 2015
method, which considers all QoS dimensions
integrally and uniformly, allows us to predict
multi-dimensional QoS values accurately and
easily. Second, the overall QoS was evaluated
by proposing an efficient user preference
learning method, which learns user preferences
based on users' ratings history data, allowing
us to obtain user preferences quantifiably and
accurately. By solving these two core
problems, it became possible to compute a
realistic value for the overall QoS. The
experimental results showed our proposed
methods to be more efficient than existing
methods.
TTA-JN-
C1530
Towards Information
Diffusion in Mobile
Social Networks
The emerging of mobile social networks opens
opportunities for viral marketing. However,
before fully utilizing mobile social networks as
a platform for viral marketing, many
challenges have to be addressed. In this paper,
we address the problem of identifying a small
number of individuals through whom
the information can be diffused to
the network as soon as possible, referred to as
the diffusion minimization
problem. Diffusion minimization under the
probabilistic diffusion model can be
formulated as an asymmetric k- center problem
which is NP-hard, and the best known
approximation algorithm for the asymmetric k-
center problem has approximation ratio of log
n and time complexity O(n5). Clearly, the
performance and the time complexity of the
approximation algorithm are not satisfiable in
large-scale mobile social networks. To deal
with this problem, we propose a community
based algorithm and a distributed set-cover
algorithm. The performance of the proposed
algorithms is evaluated by extensive
experiments on both synthetic networks and a
real trace. The results show that the
community based algorithm has the best
performance in both synthetic networks and
the real trace compared to existing algorithms,
and the distributed set-cover algorithm
outperforms the approximation algorithm in
IEEE 2015
the real trace in terms of diffusion time.
TTA-JN-
C1531
Location-Sharing
Systems With
Enhanced Privacy in
Mobile Online Social
Networks
Location sharing is one of the critical
components
in mobile online social networks (mOSNs),
which has attracted much attention recently.
With the advent of mOSNs, more and more
users' location information will be collected by
the service providers in mOSN. However, the
users' privacy, including
location privacy and social network privacy,
cannot be guaranteed in the previous work
without the trust assumption on the service
providers. In this paper, aiming at
achieving enhanced privacy against the insider
attack launched by the service providers in
mOSNs, we introduce a new architecture with
multiple location servers for the first time and
propose a secure solution
supporting location sharing among friends and
strangers in location-based applications. In our
construction, the user's friend set in each
friend’s query submitted to the location servers
is divided into multiple subsets by the social
network server randomly. Each location server
can only get a subset of friends, instead of the
whole friends' set of the user as the previous
work. In addition, for the first time, we
propose a location-sharing construction which
provides check ability of the searching results
returned from location servers in an efficient
way. We also prove that the new construction
is secure under the stronger security model
with enhanced privacy. Finally, we provide
extensive experimental results to demonstrate
the efficiency of our proposed construction.
IEEE 2015
TTA-JN-
C1532
Mobile Based
Healthcare
Management Using
Artificial Intelligence
In this growing age of technology it is
necessary to have a proper health
care management system which should be cent
percent accurate but also should be portable so
that every person carry with it as personalized
health care system. The health
care management system which will consist
of mobile based Heart Rate Measurement so
IEEE 2015
that the data can be transferred and
diagnosis based on heart rate can be provided
quickly with a click of button. The system will
consist of video conferencing to connect
remotely with the Doctor. The Doc-Bot which
was developed earlier is now being transferred
to mobile platform and will be further
advanced for diagnosis of common diseases.
The system will also consist of Online Blood
Bank which will provide up-to-date details
about availability of blood in different
hospitals.
TTA-JN-
C1533
PSMPA Patient Self-
Controllable and Multi-
Level Privacy-
Preserving Cooperative
Authentication in
Distributed m-
Healthcare Cloud
Computing System
Distributed m-
healthcare cloud computing system significantl
y facilitates efficient patient treatment for
medical consultation by sharing personal
health information
among healthcare providers. However, it
brings about the challenge of keeping both the
data confidentiality and patients'
identity privacy simultaneously. Many existing
access control and
anonymous authentication schemes cannot be
straightforwardly exploited. To solve the
problem, in this paper, a novel authorized
accessible privacy model (AAPM) is
established. Patients can authorize physicians
by setting an access tree supporting flexible
threshold predicates. Then, based on it, by
devising a new technique of attribute-based
designated verifier signature, a patient self-
controllable multi-level privacy-
preserving cooperativeauthentication scheme
(PSMPA) realizing three levels of security
and privacy requirement in distribute dm-
healthcare cloud computing system is
proposed. The directly authorized physicians,
the indirectly authorized physicians and the
unauthorized persons in medical consultation
can respectively decipher the personal health
information and/or verify patients' identities by
satisfying the access tree with their own
attribute sets. Finally, the formal security proof
and simulation results illustrate our scheme
can resist various kinds of attacks and far
IEEE 2015
outperforms the previous ones in terms of
computational, communication and storage
overhead.
TTA-JN-
C1534
Secure and Distributed
Data Discovery and
Dissemination in
Wireless Sensor
Networks
A data discovery and dissemination protocol
for wireless sensor networks (WSNs) is
responsible for updating configuration
parameters of, and distributing management
commands to, the sensor nodes. All
existing data discovery and dissemination prot
ocols suffer from two drawbacks. First, they
are based on the centralized approach; only the
base station can distribute data items. Such an
approach is not suitable for emergent multi-
owner-multi-user WSNs. Second, those
protocols were not designed with security in
mind and hence adversaries can easily launch
attacks to harm the network. This paper
proposes the
first secure and distributed data discovery and
dissemination protocol named DiDrip. It
allows the network owners to authorize
multiple network users with different
privileges to simultaneously and directly
disseminate data items to the sensor nodes.
Moreover, as demonstrated by our theoretical
analysis, it addresses a number of possible
security vulnerabilities that we have identified.
Extensive security analysis show DiDrip is
provably secure. We also implement DiDrip in
an experimental network of resource-
limited sensor nodes to show its high
efficiency in practice.
IEEE 2015
TTA-JN-
C1535
DDSGA A Data-Driven
Semi-Global Alignment
Approach for Detecting
Masquerade Attacks
A masquerade attacker impersonates a legal
user to utilize the user services and privileges.
The semi-global alignment algorithm (SGA) is
one of the most effective and efficient
techniques to detect these attacks but it has not
reached yet the accuracy and performance
required by large scale, multiuser systems. To
improve both the effectiveness and the
performances of this algorithm, we propose the
Data-Driven Semi-
Global Alignment, DDSGA approach. From
the security effectiveness view point,
IEEE 2015
DDSGA improves the scoring systems by
adopting distinct alignment parameters for
each user. Furthermore, it tolerates small
mutations in user command sequences by
allowing small changes in the low-level
representation of the commands functionality.
It also adapts to changes in the user behavior
by updating the signature of a user according
to its current behavior. To optimize the
runtime overhead, DDSGA minimizes
the alignment overhead and parallelizes the
detection and the update. After describing
the DDSGA phases, we present the
experimental results that show that DDSGA
achieves a high hit ratio of 88.4 percent with a
low false positive rate of 1.7 percent. It
improves the hit ratio of the enhanced SGA by
about 21.9 percent and reduces Maxion-
Townsend cost by 22.5 percent.
Hence, DDSGA results in improving both the
hit ratio and false positive rates with an
acceptable computational overhead.
TTA-JN-
C1536
Revisiting Attribute-
Based Encryption with
Verifiable Outsourced
Decryption
Attribute-based encryption (ABE) is a
promising technique for fine-grained access
control of encrypted data in a cloud storage,
however, decryption involved in the ABEs is
usually too expensive for resource-constrained
front-end users, which greatly hinders its
practical popularity. In order to reduce
the decryption overhead for a user to recover
the plaintext, Green et al. suggested
to outsource the majority of
the decryption work without revealing actually
data or private keys. To ensure the third-party
service honestly computes
the outsourced work, Lai et al. provided a
requirement of verifiability to the
decryption of ABE, but their scheme doubled
the size of the underlying ABE ciphertext and
the computation costs. Roughly speaking, their
main idea is to use a
parallel encryption technique, while one of
the encryption components is used for the
verification purpose. Hence, the bandwidth and
the computation cost are doubled. In this
IEEE 2015
paper, we investigate the same problem. In
particular, we propose a more efficient and
generic construction of ABE
with verifiable outsourced decryption based on
an attribute-based key encapsulation
mechanism, a symmetric-
key encryption scheme and a commitment
scheme. Then, we prove the security and the
verification soundness of our constructed ABE
scheme in the standard model. Finally, we
instantiate our scheme with concrete building
blocks. Compared with Lai et al.'s scheme, our
scheme reduces the bandwidth and the
computation costs almost by half.
TTA-JN-
C1537
A Strategy of
Clustering Modification
Directions in Spatial
Image Steganography
Most of the recently proposed
steganographic schemes are based on
minimizing an additive distortion
function defined as the sum of
embedding costs for individual pixels.
In such an approach, mutual embedding
impacts are often ignored. In this paper,
we present an approach that can exploit
the interactions among embedding
changes in order to reduce the risk of
detection by steganalysis. It employs a
novel strategy,
called clustering modification directions
(CMDs), based on the assumption that
when embedding modifications in
heavily textured regions are locally
heading toward the same direction, the
steganographic security might be
improved. To implement the strategy, a
cover image is decomposed into several
sub images, in which message segments
are embedded with well-known
schemes using additive distortion
functions. The costs of pixels are
updated dynamically to take mutual
embedding impacts into account.
Specifically, when neighboring pixels
IEEE 2015
are changed toward a
positive/negative direction, the cost of
the considered pixel is biased toward
the same direction. Experimental results
show that our proposed CMD strategy,
incorporated into existing
steganographic schemes, can effectively
overcome the challenges posed by the
modern steganalyzers with high-
dimensional features.
TTA-JN-
C1538
An Access Control
Model for Online Social
Networks Using User-
to-User Relationships
Users and resources
in online social networks (OSNs) are
interconnected via various types of
relationships. In particular, user-to-
user relationships form the basis of the OSN
structure, and play a significant role in
specifying and enforcing access control.
Individual users and the OSN provider should
be enabled to specify which access can be
granted in terms of existing relationships. In
this paper, we propose a novel user-to-
user relationship-
based access control (UURAC) model for
OSN systems that utilizes regular expression
notation for such policy
specification. Access control policies on users
and resources are composed in terms of
requested action, multiple relationship types,
the starting point of the evaluation, and the
number of hops on the path. We present two
path checking algorithms to determine whether
the required relationship path between users
for a given access request exists. We validate
the feasibility of our approach by
implementing a prototype system and
evaluating the performance of these two
algorithms.
IEEE 2015
TTA-JN-
C1539
An Authenticated Trust
and Reputation
Calculation and
Management System
for Cloud and Sensor
Networks Integration
Induced by incorporating the powerful data
storage and data processing abilities
of cloud computing (CC) as well as ubiquitous
data gathering capability of
wireless sensor networks (WSNs), CC-WSN
integration received a lot of attention from
IEEE 2015
both academia and industry. However,
authentication as well
as trust and reputation calculation and manage
ment of cloud service providers (CSPs)
and sensor network providers (SNPs) are two
very critical and barely explored issues for this
new paradigm. To fill the gap, this paper
proposes a
novel authenticated trust and reputation calcula
tion and management (ATRCM) system for
CC-WSN integration. Considering the
authenticity of CSP and SNP, the attribute
requirement of cloud service user (CSU) and
CSP, the cost, trust, and reputation of the
service of CSP and SNP, the proposed
ATRCM system achieves the following three
functions: 1) authenticating CSP and SNP to
avoid malicious impersonation attacks; 2)
calculating and managing trust and reputation
regarding the service of CSP and SNP; and 3)
helping CSU choose desirable CSP and
assisting CSP in selecting appropriate SNP.
Detailed analysis and design as well as further
functionality evaluation results are presented to
demonstrate the effectiveness of ATRCM,
followed with system security analysis.
TTA-JN-
C1540
An Efficient Privacy-
Preserving Ranked
Keyword Search
Method
Cloud data owners prefer to outsource
documents in an encrypted form for the
purpose of privacy preserving. Therefore it is
essential to develop efficient and reliable
ciphertext search techniques. One challenge is
that the relationship between documents will
be normally concealed in the process of
encryption, which will lead to
significant search accuracy performance
degradation. Also the volume of data in data
centers has experienced a dramatic growth.
This will make it even more challenging to
design ciphertext search schemes that can
provide efficient and reliable online
information retrieval on large volume of
encrypted data. In this paper, a hierarchical
clustering method is proposed to support
more search semantics and also to meet the
demand for fast ciphertext search within a big
IEEE 2015
data environment. The proposed hierarchical
approach clusters the documents based on the
minimum relevance threshold, and then
partitions the resulting clusters into sub-
clusters until the constraint on the maximum
size of cluster is reached. In the search phase,
this approach can reach a linear computational
complexity against an exponential size
increase of document collection. In order to
verify the authenticity of search results, a
structure called minimum hash sub-tree is
designed in this paper. Experiments have been
conducted using the collection set built from
the IEEE Xplore. The results show that with a
sharp increase of documents in the dataset
the search time of the proposed method
increases linearly whereas the search time of
the traditional method increases exponentially.
Furthermore, the proposed method has an
advantage over the traditional method in
the rank privacy and relevance of retrieved
documents.
TTA-JN-
C1541
An Internal Intrusion
Detection and
Protection System by
Using Data Mining and
Forensic Techniques
Currently, most computer systems use user IDs
and passwords as the login patterns to
authenticate users. However, many people
share their login patterns with coworkers and
request these coworkers to assist co-tasks,
thereby making the pattern as one of the
weakest points of computer security. Insider
attackers, the valid users of a system who
attack the system internally, are hard to detect
since most intrusion detection systems and
firewalls identify and isolate malicious
behaviors launched from the outside world of
the system only. In addition, some studies
claimed that analyzing system calls (SCs)
generated by commands can identify these
commands, with which to accurately detect
attacks, and attack patterns are the features of
an attack. Therefore, in this paper, a
security system, named the
Internal Intrusion Detection and Protection Sys
tem (IIDPS), is proposed to detect insider
attacks at SC
level by using data mining and forensic techniq
IEEE 2015
ues. The IIDPS creates users' personal profiles
to keep track of users' usage habits as
their forensic features and determines whether
a valid login user is the account holder or
not by comparing his/her current computer
usage behaviors with the patterns collected in
the account holder's personal profile. The
experimental results demonstrate that the
IIDPS's user identification accuracy is 94.29%,
whereas the response time is less than 0.45 s,
implying that it can prevent a
protected system from insider attacks
effectively and efficiently.
TTA-JN-
C1542
Cloud-Assisted Safety
Message Dissemination
in VANET–Cellular
Heterogeneous
Wireless Network
In vehicular ad hoc networks (VANETs),
efficient message dissemination is critical to
road safety and traffic efficiency. Since many
VANET-based schemes suffer from high
transmission delay and data redundancy, the
integrated VANET–
cellular heterogeneous network has been
proposed recently and attracted significant
attention. However, most existing studies focus
on selecting suitable gateways to
deliver safety message from the source vehicle
to a remote server, whereas
rapid safety message dissemination from the
remote server to a targeted area has not been
well studied. In this paper, we propose a
framework for
rapid message dissemination that combines the
advantages of diverse communication
and cloud computing technologies.
Specifically, we propose a novel Cloud-
assisted
Message Downlink dissemination Scheme
(CMDS), with which the safety messages in
the cloud server are first delivered to the
suitable mobile gateways on relevant roads
with the help of cloud computing (where
gateways are buses with both cellular and
VANET interfaces), and then being
disseminated among neighboring vehicles via
vehicle-to-vehicle (V2V) communication. To
evaluate the proposed scheme, we
mathematically analyze its performance and
IEEE 2015
conduct extensive simulation experiments.
Numerical results confirm the efficiency of
CMDS in various urban scenarios.
TTA-JN-
C1543
Collaborative Task
Execution in Mobile
Cloud Computing
Under a Stochastic
Wireless Channel
This paper
investigates collaborative task execution betwe
en a mobile device and a cloud clone for
mobile applications under
a stochastic wireless channel.
A mobile application is modeled as a sequence
of tasks that can be executed on
the mobile device or on the cloud clone. We
aim to minimize the energy consumption on
the mobile device while meeting a time
deadline, by strategically offloading tasks to
the cloud. We formulate
the collaborative task execution as a
constrained shortest path problem. We derive a
one-climb policy by characterizing the optimal
solution and then propose an enumeration
algorithm for
the collaborative task execution in polynomial
time. Further, we apply the LARAC algorithm
to solving the optimization problem
approximately, which has lower complexity
than the enumeration algorithm. Simulation
results show that the approximate solution of
the LARAC algorithm is close to the optimal
solution of the enumeration algorithm. In
addition, we consider a probabilistic time
deadline, which is transformed to hard
deadline by Markov inequality. Moreover,
compared to the local execution and the
remote execution,
the collaborative task execution can
significantly save the energy consumption on
the mobile device, prolonging its battery life.
IEEE 2015
TTA-JN-
C1544
Contact-Aware Data
Replication in Roadside
Unit Aided Vehicular
Delay Tolerant
Networks
Roadside units (RSUs), which enable vehicles-
to infrastructure communications, are deployed
along roadsides to handle the ever-growing
communication demands caused by explosive
increase of vehicular traffics. How to
efficiently utilize them to enhance
the vehicular delay tolerant network (VDTN)
performance are the important problems in
IEEE 2015
designing RSU-aided VDTNs. In this work,
we implement an extensive experiment
involving tens of thousands of operational
vehicles in Beijing city. Based on this newly
collected Beijing trace and the existing
Shanghai trace, we obtain some invariant
properties for communication contacts of large
scale RSU-aided VDTNs. Specifically, we find
that the contact time between RSUs and
vehicles obeys an exponential distribution,
while the contact rate between them follows a
Poisson distribution. According to these
observations, we investigate the problem of
communication contact-
aware mobile data replication for RSU-
aided VDTNs by considering the mobile
data dissemination system that
transmits data from the Internet to vehicles via
RSUs through opportunistic communications.
In particular, we formulate the
communication contact-aware RSU-
aidedvehicular mobile data dissemination
problem as an optimization problem with
realistic VDTN settings, and we provide an
efficient heuristic solution for this NP-hard
problem. By carrying out extensive simulation
using realistic vehicular traces, we demonstrate
the effectiveness of our proposed heuristic
contact-aware data replication scheme, in
comparison with the optimal solution and other
existing schemes.
TTA-JN-
C1545
Cost-Aware SEcure
Routing (CASER)
Protocol Design for
Wireless Sensor
Networks
Lifetime optimization and security are two
conflicting design issues for multi-
hop wireless sensor networks (WSNs) with
non-replenishable energy resources. In this
paper, we first propose a novel secure and
efficient Cost-
Aware Secure Routing (CASER) protocol to
address these two conflicting issues through
two adjustable parameters: energy balance
control (EBC) and probabilistic-based random
walking. We then discover that the energy
consumption is severely disproportional to the
uniform energy deployment for the
given network topology, which greatly reduces
IEEE 2015
the lifetime of the sensor networks. To solve
this problem, we propose an efficient non-
uniform energy deployment strategy to
optimize the lifetime and message delivery
ratio under the same energy resource and
security requirement. We also provide a
quantitative security analysis on the
proposed routing protocol. Our theoretical
analysis and OPNET simulation results
demonstrate that the
proposed CASER protocol can provide an
excellent tradeoff between routing efficiency
and energy balance, and can significantly
extend the lifetime of the sensor networks in
all scenarios. For the non-uniform energy
deployment, our analysis shows that we can
increase the lifetime and the total number of
messages that can be delivered by more than
four times under the same assumption. We also
demonstrate that the proposed
CASER protocol can achieve a high message
delivery ratio while preventing routing trace
back attacks.
TTA-JN-
C1546
Deleting Secret Data
with Public Verifiability
Existing software-based data erasure programs
can be summarized as following the same one-
bit-return protocol: the deletion program
performs data erasure and returns either
success or failure. However, such a onebit-
return protocol turns the data deletion system
into a black box – the user has to trust the
outcome but cannot easily verify it. This is
especially problematic when the deletion
program is encapsulated within a Trusted
Platform Module (TPM), and the user has no
access to the code inside. In this paper, we
present a cryptographic solution that aims to
make the data deletion process more
transparent and verifiable. In contrast to the
conventional black/white assumptions about
TPM (i.e., either completely trust or distrust),
we introduce a third assumption that sits in
between: namely, “trust-but-verify”. Our
solution enables a user to verify the correct
implementation of two important operations
IEEE 2015
inside a TPM without accessing its source
code: i.e., the correct encryption of data and
the faithful deletion of the key. Finally, we
present a proof-of-concept implementation of
the SSE system on a resource-constrained Java
card to demonstrate its practical feasibility. To
our knowledge, this is the first systematic
solution to the secure data deletion problem
based on a “trust-but-verify” paradigm,
together with a concrete prototype
implementation.
TTA-JN-
C1547
Design and Evaluation
of the Optimal Cache
Allocation for Content-
Centric Networking
Content-Centric Networking (CCN) is a
promising framework to rebuild the Internet’s
forwarding substrate around the concept
of content. CCN advocates ubiquitous in-
network caching to enhance content delivery
and thus each router has storage space
to cache frequently requested content. In this
work, we focus on
the cache allocation problem, namely, how to
distribute the cache capacity across routers
under a constrained total storage budget for
the network. We first formulate this problem
as a content placement problem and obtain
the optimal solution by a two-step method. We
then propose a suboptimal heuristic method
based on node centrality, which is more
practical in dynamic networks with
frequent content publishing. We investigate
through simulations the factors that affect
the optimal cache allocation, and perhaps more
importantly we use a real-life Internet topology
and video access logs from a large scale
Internet video provider to evaluate the
performance of various cache allocation
methods. We observe that network topology
and content popularity are two important
factors that affect where exactly
should cache capacity be placed. Further, the
heuristic method comes with only a very
limited performance penalty compared to
the optimal allocation. Finally, using our
findings, we provide recommendations
for network operators on the best deployment
IEEE 2015
of CCN caches capacity over routers.
TTA-JN-
C1548
Designing High
Performance Web-
Based Computing
Services to Promote
Telemedicine Database
Management System
Many web computing systems are running real
time database services where their information
change continuously and expand
incrementally. In this
context, web data services have a major role
and draw significant improvements in
monitoring and controlling the information
truthfulness and data propagation.
Currently, web telemedicine database services
are of central
importance to distributed systems. However,
the increasing complexity and the rapid growth
of the real world healthcare challenging
applications make it hard to induce
the database administrative staff. In this paper,
we build an integrated web data services that
satisfy fast response time for large scale Tele-
health database management systems. Our
focus will be on database management with
application scenarios in
dynamic telemedicine systems to increase care
admissions and decrease care difficulties such
as distance, travel, and time limitations. We
propose three-fold approach based on data
fragmentation, database websites clustering
and intelligent data distribution. This approach
reduces the amount of data migrated between
websites during applications' execution;
achieves cost-effective communications during
applications' processing and improves
applications' response time and throughput.
The proposed approach is validated internally
by measuring the impact of using
our computing services' techniques on
various performance features like
communications cost, response time, and
throughput. The external validation is achieved
by comparing the performance of our
approach to that of other techniques in the
literature. The results show that our integrated
approach significantly improves the
performance of web database systems and
outperforms its counterparts.
IEEE 2015
TTA-JN-
C1549
Distributed Database
Management
Techniques for Wireless
Sensor Networks
In sensor networks, the large amount of data
generated by sensors greatly influences the
lifetime of the network. To manage this
amount of sensed data in an energy-efficient
way, new methods of storage and data query
are needed. In this way,
the distributed database approach
for sensor networks is proved as one of the
most energy-efficient data storage and
query techniques. This paper surveys the state
of the art of the techniques used to manage
data and queries
in wireless sensor networks based on
the distributed paradigm. A classification of
these techniques is also proposed. The goal of
this work is not only to present how data and
query management techniques have advanced
nowadays, but also show their benefits and
drawbacks, and to identify open issues
providing guidelines for further contributions
in this type of distributed architectures.
IEEE 2015
TTA-JN-
C1550
Diversifying Web
Service
Recommendation
Results via Exploring
Service Usage History
The last decade has witnessed a tremendous
growth of Web services as a major technology
for sharing data, computing resources, and
programs on the Web. With the increasing
adoption and presence of Web services, design
of novel approaches for
effective Web service recommendation to
satisfy users’ potential requirements has
become of paramount importance.
Existing Web service
recommendation approaches mainly focus on
predicting missing QoS values of Web service
candidates which are interesting to a user using
collaborative filtering approach, content-based
approach, or their hybrid.
These recommendation approaches assume
that recommended Web services are
independent to each other, which sometimes
may not be true. As a result, many similar or
redundant Web services may exist in
a recommendation list. In this paper, we
IEEE 2015
propose a novel Web
service recommendation approach
incorporating a user’s potential QoS
preferences and diversity feature of user
interests on Web services. User’s interests and
QoS preferences on Web services are first
mined
by exploring the Web service usage history.
Then we compute scores of Web service
candidates by measuring their relevance with
historical and potential user interests, and their
QoS utility. We also construct
a Web service graph based on the functional
similarity between Web services. Finally, we
present an innovative diversity-
aware Web service ranking algorithm to rank
the Web service candidates based on their
scores, and diversity degrees derived from
the Web service graph. Extensive experiments
are conducted based on a real
world Web service dataset, indicating that our
proposed Web service recommendation approa
ch significantly improves the quality of their
commendation results compared with existing
methods.
TTA-JN-
C1551
DROPS Division and
Replication of Data in
Cloud for Optimal
Performance and
Security
Outsourcing data to a third-party
administrative control, as is done
in cloud computing, gives rise to
security concerns. The data compromise may
occur due to attacks by other users and nodes
within the cloud. Therefore,
high security measures are required to
protect data within the cloud. However, the
employed security strategy must also take into
account the optimization of the data retrieval
time. In this paper, we
propose Division and Replication of Data in
the Cloud for Optimal Performance and Securi
ty (DROPS) that collectively approaches
the security and performance issues. In
the DROPS methodology, we divide a file into
fragments, and replicate the
fragmented data over the cloud nodes. Each of
the nodes stores only a single fragment of a
particular data file that ensures that even in
IEEE 2015
case of a successful attack, no meaningful
information is revealed to the attacker.
Moreover, the nodes storing the fragments, are
separated with certain distance by means of
graph T-coloring to prohibit an attacker of
guessing the locations of the fragments.
Furthermore, the DROPS methodology does
not rely on the traditional cryptographic
techniques for the data security; thereby
relieving the system of computationally
expensive methodologies. We show that the
probability to locate and compromise all of the
nodes storing the fragments of a single file is
extremely low. We also compare
the performance of the DROPS methodology
with ten other schemes. The higher level
of security with slight performance overhead
was observed.
TTA-JN-
C1552
Dynamic Bin Packing
for On-Demand Cloud
Resource Allocation
Dynamic Bin Packing (DBP) is a variant of
classical bin packing, which assumes that
items may arrive and depart at arbitrary times.
Existing works on DBP generally aim to
minimize the maximum number of bins ever
used in the packing. In this paper, we consider
a new version of the DBP problem, namely,
the MinTotal DBP problem which targets at
minimizing the total cost of the bins used over
time. It is motivated by the request dispatching
problem arising from cloud gaming systems.
We analyze the competitive ratios of the
modified versions of the commonly used First
Fit, Best Fit, and Any Fit packing(the family
of packing algorithms that open a new bin only
when no currently open bin can accommodate
the item to be packed) algorithms for the
MinTotal DBP problem. We show that the
competitive ratio of Any Fit packing cannot be
better than + 1, where is the ratio of the
maximum item duration to the minimum item
duration. The competitive ratio of Best
Fit packing is not bounded for any given. For
First Fit packing, if all the item sizes are
smaller than 1 of the bin capacity (> 1 is a
constant), the competitive ratio has an upper
bound of �1 + 3 �1 + 1. For the general case,
IEEE 2015
the competitive ratio of First Fit packing has
an upper bound of 2 + 7. We also propose a
Hybrid First Fit packing algorithm that can
achieve a competitive ratio no larger than 5 4 +
19 4 when is not known and can achieve a
competitive ratio no larger than + 5 when is
known.
TTA-JN-
C1553
Location-Aware and
Personalized
Collaborative Filtering
for Web Service
Recommendation
Collaborative Filtering (CF) is widely
employed for
making Web service recommendation. CF-
based Web service recommendation aims to
predict missing QoS (Quality-of-Service)
values of Webservices. Although several CF-
based Web service QoS prediction methods
have been proposed in recent years, the
performance still needs significant
improvement. Firstly, existing QoS prediction
methods seldom
consider personalized influence of users
and services when measuring the similarity
between users and between services.
Secondly, Web service QoS factors, such as
response time and throughput, usually depends
on the locations of Web services and users.
However, existing Webservice QoS prediction
methods seldom took this observation into
consideration. In this paper, we propose
a location-aware personalized CF method
for Web service recommendation. The
proposed method leverages both locations of
users and Web services when selecting similar
neighbors for the target user or service. The
method also includes an enhanced similarity
measurement for users andWeb services, by
taking into account the personalized influence
of them. To evaluate the performance of our
proposed method, we conduct a set of
comprehensive experiments using a real-
world Webservice dataset. The experimental
results indicate that our approach improves the
QoS prediction accuracy and computational
efficiency significantly, compared to previous
CF-based methods.
IEEE 2015
TTA-JN-
C1554
Location-Based Key
Management Strong
Against Insider Threats
in Wireless Sensor
Networks
To achieve secure communications
in wireless sensor networks (WSNs), sensor no
des (SNs) must establish secret
shared keys with neighboring nodes.
Moreover, those keys must be updated by
defeating the insider threats of corrupted
nodes. In this paper, we propose a location-
based key management scheme for WSNs,
with special considerations of insider threats.
After reviewing existing location-
based key management schemes and studying
their advantages and disadvantages, we
selected location-
dependent key management (LDK) as a
suitable scheme for our study. To solve a
communication interference problem in LDK
and similar methods, we have devised a
new key revision process that incorporates
grid-based location information. We also
propose a key establishment process using grid
information. Furthermore, we
construct key update and revocation processes
to effectively resist inside attackers. For
analysis, we conducted a rigorous simulation
and confirmed that our method can increase
connectivity while decreasing the compromise
ratio when the minimum number of
common keys required for key establishment is
high. When there was a corrupted node
leveraging insider threats, it was also possible
to effectively rekey every SN except for the
corrupted node using our method. Finally, the
hexagonal deployment of anchor nodes could
reduce network costs.
IEEE 2015
TTA-JN-
C1555
Malware Propagation in
Large-Scale Networks
Malware is pervasive in networks, and poses a
critical threat to network security. However,
we have very limited understanding
of malware behavior in networks to date. In
this paper, we investigate how
malware propagates in networks from a global
perspective. We formulate the problem, and
establish a rigorous two layer epidemic model
for malware propagation from network to netw
ork. Based on the proposed model, our analysis
indicates that the distribution of a
IEEE 2015
given malware follows exponential
distribution, power law distribution with a
short exponential tail, and power law
distribution at its early, late and final stages,
respectively. Extensive experiments have been
performed through two real-world
global scale malware data sets, and the results
confirm our theoretical findings.
TTA-JN-
C1556
Optimal Cloudlet
Placement and User to
Cloudlet Allocation in
Wireless Metropolitan
Area Networks
Mobile applications are becoming increasingly
computation-intensive, while the computing
capability of portable mobile devices is
limited. A powerful way to reduce the
completion time of an application in a mobile
device is to offload its tasks to nearby
cloudlets, which consist of clusters of
computers. Although there is a significant
body of research in mobile cloudlet offloading
technology, there has been very little attention
paid to how cloudlets should be placed in a
given network to optimize mobile application
performance. In this paper we
study cloudlet placement and
mobile user allocation to the cloudlets in
a wireless metropolitan area network (WMAN)
. We devise an algorithm for the problem,
which enables the placement of the cloudlets
at user dense regions of the WMAN, and
assigns mobile users to the placed cloudlets
while balancing their workload. We also
conduct experiments through simulation. The
simulation results indicate that the
performance of the proposed algorithm is very
promising.
IEEE 2015
TTA-JN-
C1557
Predistribution Scheme
for Establishing Group
Keys in Wireless
Sensor Networks
Wireless sensor networks (WSNs). This is
because sensor nodes are limited in memory
storage and computational power. In 1992,
Blundo et al. proposed a non
interactive group key establishment scheme
using a multivariate polynomial.
Their scheme can establish a group key of
m sensors. Since each share is a polynomial
involving m - 1 variables and having degree k,
each sensor needs to store (k + 1)m-
1 coefficients from GF(p), which is
IEEE 2015
exponentially proportional to the size of group.
This makes their scheme only practical when
m = 2 for peer-to-peer communication. So far,
most existing predistribution schemes in
WSNs establish pair
wise keys for sensor nodes. In this paper, we
propose a novel design to propose
a predistribution scheme for establishing group
keys in WSNs. Our design uses a special-type
multivariate polynomial in ZN, where N is a
RSA modulus. The advantage of using this
type of multivariate polynomial can limit the
storage space of each sensor to be m(k + 1),
which is linearly proportional to the size
of group communication. In addition, we prove
the security of the proposed scheme and show
that the computational complexity of the
proposed scheme is efficient.
TTA-JN-
C1558
Privacy-Preserving
Detection of Sensitive
Data Exposure
Statistics from security firms, research
institutions and government organizations
show that the number of data-leak instances
have grown rapidly in recent years. Among
various data-leak cases, human mistakes are
one of the main causes of data loss. There exist
solutions detecting inadvertent sensitive
data leaks caused by human mistakes and to
provide alerts for organizations. A common
approach is to screen content in storage and
transmission for exposed sensitive information.
Such an approach usually requires
the detection operation to be conducted in
secrecy. However, this secrecy requirement is
challenging to satisfy in practice,
as detection servers may be compromised or
outsourced. In this paper, we present a privacy-
preserving data-leak detection (DLD) solution
to solve the issue where a special set
of sensitive data digests is used in detection.
The advantage of our method is that it enables
the data owner to safely delegate
the detection operation to a semi honest
provider without revealing the sensitive data to
the provider. We describe how Internet service
providers can offer their customers DLD as an
add-on service with strong privacy guarantees.
IEEE 2015
The evaluation results show that our method
can support accurate detection with very small
number of false alarms under various data-leak
scenarios.
TTA-JN-
C1559
Providing Privacy-
Aware Incentives in
Mobile Sensing
Systems
Mobile sensing relies on data contributed by
users through their mobile device (e.g., smart
phone) to obtain useful information about
people and their surroundings. However, users
may not want to contribute due to lack
of incentives and concerns on
possible privacy leakage. To effectively
promote user participation,
both incentive and privacy issues should be
addressed. Although incentive and
privacy have been addressed separately
in mobile sensing, it is still an open problem to
address them simultaneously. In this paper, we
propose two credit-based privacy-
aware incentive schemes for
mobile sensing systems, where the focus is
on privacy protection instead of on the design
of incentive mechanisms. Our schemes
enable mobile users to earn credits by
contributing data without leaking which data
they have contributed, and ensure that
malicious users cannot abuse the system to
earn unlimited credits. Specifically, the first
scheme considers scenarios where an online
trusted third party (TTP) is available, and
relies on the TTP to protect user privacy and
prevent abuse attacks. The second scheme
considers scenarios where no online TTP is
available. It applies blind signature, partially
blind signature, and a novel extended Merkle
tree technique to protect user privacy and
prevent abuse attacks. Security analysis and
cost evaluations show that our schemes are
secure and efficient.
IEEE 2015
TTA-JN-
C1560
Response Time Based
Optimal Web Service
Selection
Selecting an optimal web service among a list
of functionally equivalent web services still
remains a challenging issue. For
Internet services, the presence of low-
performance servers, high latency or overall
poor service quality can translate into lost
IEEE 2015
sales, user frustration, and customers lost. In
this paper, we propose a novel method for QoS
metrification based on Hidden Markov Models
(HMM), which further suggests
an optimal path for the execution of user
requests. The technique we show can be used
to measure and predict the behavior
of Web Services in terms of response time, and
can thus be used to rank services quantitatively
rather than just qualitatively. We demonstrate
the feasibility and usefulness of our
methodology by drawing experiments on real
world data. The results have shown how our
proposed method can help the user to
automatically select the most
reliable Web Servicetaking into account
several metrics, among them, system
predictability and response time variability.
Later ROC curve shows a 12 percent
improvement in prediction accuracy using
HMM.
TTA-JN-
C1561
Robust cloud
management of MANET
checkpoint sessions
In a traditional mobile ad-hoc network
(MANET), if two nodes are engaged in
a session and one of them departs suddenly,
their communication is aborted. The session is
not active any more, work is lost and,
consequently, the energy of the batteries has
been wasted. This paper proposes a model that
uses a cloud service to register, save, pause
and
resume sessions between MANET member
nodes so that both work in progress and energy
are saved. A checkpoint technique is
introduced to capture the progress of
a session and allow it to be resumed. This is an
additional service to our cloud management of
the MANET. The model proposed in this paper
was tested on Android-based devices and an
Amazon cloud instance. Experimental results
show that the model is feasible, robust, saves
time and, more importantly, energy
if session breaks occur frequently.
IEEE 2015
TTA-JN-
C1562
Secure Anonymous Key
Distribution Scheme for
Smart Grid
To fully support information management
among various stakeholders
in smart grid domains, how to
establish secure communication sessions has
become an important issue
for smart grid environments. In order to
support secure communications
between smart meters and service
providers, key management for authentication
becomes a crucial security topic. Recently,
several key distribution schemes have been
proposed to provide secure communications
for smart grid. However, these schemes do not
support smart meter anonymity and possess
security weaknesses. This paper utilizes an
identity-based signature scheme and an
identity-based encryption scheme to propose a
newanonymous key distribution scheme for sm
art grid environments. In the proposed scheme,
a smart meter can anonymously access
services provided by service providers using
one private key without the help of the trusted
anchor during authentication. In addition, the
proposed scheme requires only a few of
computation operations at the smart meter side.
Security analysis is conducted to prove the
proposed scheme is secure under random
oracle model.
IEEE 2015
TTA-JN-
C1563
Secure Data
Aggregation Technique
for Wireless Sensor
Networks in the
Presence of Collusion
Attacks
Due to limited computational power and
energy resources, aggregation of data from
multiple sensor nodes done at the aggregating
node is usually accomplished by simple
methods such as averaging. However
such aggregation is known to be highly
vulnerable to node compromising attacks.
Since WSN are usually unattended and without
tamper resistant hardware, they are highly
susceptible to such attacks. Thus, ascertaining
trustworthiness of data and reputation
of sensor nodes is crucial for WSN. As the
performance of very low power processors
dramatically improves, future aggregator nodes
will be capable of performing more
sophisticated data aggregation algorithms, thus
making WSN less vulnerable. Iterative
IEEE 2015
filtering algorithms hold great promise for
such a purpose. Such algorithms
simultaneously aggregate data from multiple
sources and provide trust assessment of these
sources, usually in a form of corresponding
weight factors assigned to data provided by
each source. In this paper we demonstrate that
several existing iterative filtering algorithms,
while significantly more robust
against collusion attacks than the simple
averaging methods, are nevertheless susceptive
to a novel sophisticated collusion attack we
introduce. To address this security issue, we
propose an improvement for iterative
filtering techniques by providing an initial
approximation for such algorithms which
makes them not only collusion robust, but also
more accurate and faster converging.
TTA-JN-
C1564
Secure Distributed
Deduplication Systems
with Improved
Reliability
Data deduplication is a technique for
eliminating duplicate copies of data, and has
been widely used in cloud storage to reduce
storage space and upload bandwidth. However,
there is only one copy for each file stored in
cloud even if such a file is owned by a huge
number of users. As a result, deduplication
system improves storage utilization while
reducing reliability. Furthermore, the challenge
of privacy for sensitive data also arises when
they are outsourced by users to cloud. Aiming
to address the above security challenges, this
paper makes the first attempt to formalize the
notion of distributed reliable
deduplication system. We propose
new distributed deduplication systems with
higher reliability in which the data chunks
are distributed across multiple cloud servers.
The security requirements of data
confidentiality and tag consistency are also
achieved by introducing a deterministic secret
sharing scheme in distributed storage systems,
instead of using convergent encryption as in
previous deduplication systems. Security
analysis demonstrates that
our deduplication systems are secure in terms
IEEE 2015
of the definitions specified in the proposed
security model. As a proof of concept, we
implement the proposed systems and
demonstrate that the incurred overhead is very
limited in realistic environments.
TTA-JN-
C1565
TEES An Efficient
Search Scheme over
Encrypted Data on
Mobile Cloud
Cloud storage provides a convenient, massive,
and scalable storage at low cost,
but data privacy is a major concern that
prevents users from storing files on
the cloud trustingly. One way of enhancing
privacy from data owner point of view is
to encrypt the files before outsourcing them
onto the cloud and decrypt the files after
downloading them. However, data encryption
is a heavy overhead for the mobile devices,
and data retrieval process incurs a complicated
communication between the data user and
cloud. Normally with limited bandwidth
capacity and limited battery life, these issues
introduce heavy overhead to computing and
communication as well as a higher power
consumption for mobile device users, which
makes
the encrypted search over mobile cloud very
challenging. In this paper, we
propose TEES (Traffic and Energy
saving Encrypted Search), a bandwidth and
energy efficient encrypted search architecture
over mobile cloud. The proposed architecture
offloads the computation from mobile devices
to the cloud, and we further optimize the
communication between the mobile clients and
the cloud. It is demonstrated that
the data privacy does not degrade when the
performance enhancement methods are
applied. Our experiments show
that TEES reduces the computation time by
23% to 46% and save the energy consumption
by 35% to 55% per file retrieval, meanwhile
the network traffics during the file retrievals
are also significantly reduced.
IEEE 2015
TTA-JN-
C1566
Transparent Real-Time
Task Scheduling on
Temporal Resource
Partitions
The Hierarchical Real-
Time Scheduling (HiRTS) technique helps
improve overall resource utilization in real-
IEEE 2015
time embedded systems. With HiRTS, a
computation resource is divided into a group
of temporal resource partitions, each of which
accommodates multiple real-time tasks.
Besides the
computationresource partitioning problem, real
-time task scheduling on resource partitions is
also a major problem of HiRTS. The
existing scheduling techniques for
dedicated resources, like schedulability tests
and utilization bounds, are unable to work
without changes
on temporal resource partitions in most cases.
In this paper, we show how to achieve
maximal transparency for task scheduling on
Regular Partitions, a type
of resource partition introduced by the
Regularity-based Resource Partition (RRP)
Model. We show that several classes of real-
time scheduling problems on a
regular partition can be transformed into
equivalent problems on a dedicated
single resource, such that comprehensive
single-resource scheduling techniques provide
optimal solutions. Furthermore, this
transformation method could be applied to
different types of real-time tasks such as
periodic tasks, sporadic tasks and a
periodic tasks.
TTA-JN-
C1567
User-Defined Privacy
Grid System for
Continuous Location-
Based Services
Location-based services (LBS) require users to
continuously report their location to a
potentially untrusted server to
obtain services based on their location, which
can expose them to privacy risks.
Unfortunately, existing privacy-preserving
techniques for LBS have several limitations,
such as requiring a fully-trusted third party,
offering limited privacy guarantees and
incurring high communication overhead. In
this paper, we propose a user-
defined privacy grid system called
dynamic grid system (DGS); the first
holistic system that fulfills four essential
requirements for privacy-preserving snapshot
and continuous LBS. (1) The system only
IEEE 2015
requires a semi-trusted third party, responsible
for carrying out simple matching operations
correctly. This semi-trusted third party does
not have any information about
a user's location. (2) Secure snapshot
and continuous location privacy is guaranteed
under our defined adversary models. (3) The
communication cost for the user does not
depend on the user's desired privacy level; it
only depends on the number of relevant points
of interest in the vicinity of the user. (4)
Although we only focus on range and k-
nearest-neighbor queries in this work,
our system can be easily extended to support
other spatial queries without changing the
algorithms run by the semi-trusted third party
and the database server, provided the required
search area of a spatial query can be abstracted
into spatial regions. Experimental results show
that our DGS is more efficient than the state-
of-the-art privacy-preserving technique
for continuous LBS.
TTA-JN-
C1568
VoteTrust Leveraging
Friend Invitation Graph
to Defend against
Social Network Sybils
Online social networks (OSNs) suffer from the
creation of fake accounts that introduce fake
product reviews, malware and spam. Existing
defenses focus on using
the social graph structure to isolate fakes.
However, our work shows that Sybils could
befriend a large number of real users,
invalidating the assumption behind social-
graph-based detection. In this paper, we
present VoteTrust, a scalable defense system
that further leverages user-level
activities. VoteTrust models
the friend invitation interactions among users
as a directed, signed graph, and uses two key
mechanisms to detect Sybilsover the graph: a
voting-based Sybil detection to find Sybils that
users vote to reject, and a Sybil community
detection to find other colluding Sybils around
identified Sybils. Through evaluating on
Renren social network, we show
that VoteTrust is able to prevent Sybils from
generating many unsolicited friend requests.
We also deploy VoteTrust in Renen, and our
IEEE 2015
real experience demonstrates
that VoteTrust can detect large-scale collusion
among Sybils.
DOMAIN : DATA MINING
TTA-DD-
C1501
CrowdOp Query
Optimization for
Declarative
Crowdsourcing
Systems
We study the query optimization problem
in declarative crowdsourcing systems. Declarat
ivecrowdsourcing is designed to hide the
complexities and relieve the user of the burden
of dealing with the crowd. The user is only
required to submit an SQL-like query and
the system takes the responsibility of
compiling the query, generating the execution
plan and evaluating in the crowd sourcing
marketplace. A given query can have many
alternative execution plans and the difference
in crowd sourcing cost between the best and
the worst plans may be several orders of
magnitude. Therefore, as in relational
database systems, query optimization is
important to crowd sourcing systems that
provide declarative query interfaces. In this
paper, we propose CROWDOP, a cost-
based query optimization approach
for declarative crowd
sourcing systems. CROWDOP considers both
cost and latency
in query optimization objectives and
generates query plans that provide a good
balance between the cost and latency. We
develop efficient algorithms in
the CROWDOP for optimizing three types
of queries: selection queries, join queries, and
complex selection-join queries. We validate
our approach via extensive experiments by
simulation as well as with the real crowd on
Amazon Mechanical Turk.
IEEE 2015
TTA-DD-
C1502
Time-Series
Classification with
COTE The Collective of
Transformation-Based
Ensembles
Recently, two ideas have been explored that
lead to more accurate algorithms for time-
series classification (TSC). First, it has been
shown that the simplest way to gain
improvement on TSC problems is to transform
into an alternative data space where
discriminatory features are more easily
IEEE 2015
detected. Second, it was demonstrated that
with a single data representation, improved
accuracy can be achieved through
simple ensemble schemes. We combine these
two principles to test the hypothesis that
forming a collective of ensembles of classifiers
on different data transformations improves the
accuracy of time-series classification.
The collective contains classifiers constructed
in the time, frequency, change, and
shapelet transformation domains. For
the time domain, we use a set of elastic
distance measures. For the other domains, we
use a range of standard classifiers. Through
extensive experimentation on 72 datasets,
including all of the 46 UCR datasets, we
demonstrate that the simple collective formed
by including all classifiers in one ensemble is
significantly more accurate than any of its
components and any other previously
published TSC algorithm. We investigate
alternative hierarchical collective structures
and demonstrate the utility of the approach on
a new problem involving classifying
Caenorhabditiselegans mutant types.
TTA-DD-
C1503
PruDent A Pruned and
Confident Stacking
Approach for Multi-
label Classification
Over the past decade or so, several research
groups have addressed the problem of multi-
label classification where each example can
belong to more than one class at the same time.
A common approach, called Binary Relevance
(BR), addresses this problem by inducing a
separate classifier for each class. Research has
shown that this framework can be improved if
mutual class dependence is exploited: an
example that belongs to class X is likely to
belong also to class Y ; conversely, belonging
to X can make an example less likely to belong
to Z. Several works sought to model this
information by using the vector of class labels
as additional example attributes. To fill the
unknown values of these attributes during
prediction, existing methods resort to using
outputs of other classifiers, and this makes
them prone to errors. This is where our paper
IEEE 2015
wants to contribute. We identified two
potential ways to prune unnecessary
dependencies and to reduce error-propagation
in our new classifier-stacking technique, which
is named PruDent. Experimental results
indicate that the classification performance of
PruDent compares favorably with that of other
state-of-the-art approaches over a broad range
of testbeds. Moreover, its computational costs
grow only linearly in the number of classes.
TTA-DD-
C1504
Raw Wind Data
Preprocessing A Data-
Mining Approach
Wind energy integration research generally
relies on complex sensors located at remote
sites. The procedure for generating high-level
synthetic information from databases
containing large amounts of low-
level data must therefore account for possible
sensor failures and imperfect input data.
The data input is highly sensitive
to data quality. To address this problem, this
paper presents an empirical methodology that
can efficiently preprocess and filter
the raw wind data using only aggregated active
power output and the
corresponding wind speed values at
the wind farm. First, raw wind data properties
are analyzed, and all the data are divided into
six categories according to their attribute
magnitudes from a statistical perspective.
Next, the weighted distance, a novel concept of
the degree of similarity between the individual
objects in the wind database and the local
outlier factor (LOF) algorithm, is incorporated
to compute the outlier factor of every
individual object, and this outlier factor is then
used to assess which category an object
belongs to. Finally, the methodology was
tested successfully on the data collected from a
large wind farm in northwest China.
IEEE 2015
TTA-DD-
C1505
Removing DUST Using
Multiple Alignment of
Sequences
A large number of URLs collected by web
crawlers correspond to pages with duplicate or
near-duplicate contents. To crawl, store,
and use such duplicated data implies a waste of
resources, the building of low quality rankings,
and poor user experiences. To deal with this
IEEE 2015
problem, several studies have been proposed to
detect and remove duplicate documents
without fetching their contents. To accomplish
this, the proposed methods learn normalization
rules to transform all duplicate URLs into the
same canonical form. A challenging aspect of
this strategy is deriving a set of general and
precise rules. In this work, we present
DUSTER, a new approach to derive quality
rules that take advantage of a multi-
sequence alignment strategy. We demonstrate
that a full multi-sequence alignment of URLs
with duplicated content, before the generation
of the rules, can lead to the deployment of very
effective rules. By evaluating our method, we
observed it achieved larger reductions in the
number of duplicate URLs than our best
baseline, with gains of 82 and 140.74 percent
in two different web collections.
TTA-DD-
C1506
Keyword Extraction
and Clustering for
Document
Recommendation in
Conversations
This paper addresses the problem
of keyword extraction from conversations,
with the goal of using these keywords to
retrieve, for each short conversation fragment,
a small number of potentially relevant
documents, which can be recommended to
participants. However, even a short fragment
contains a variety of words, which are
potentially related to several topics; moreover,
using an automatic speech recognition (ASR)
system introduces errors among them.
Therefore, it is difficult to infer precisely the
information needs of
the conversation participants. We first propose
an algorithm to extract keywords from the
output of an ASR system (or a manual
transcript for testing), which makes use of
topic modeling techniques and of a sub
modular reward function which favors
diversity in the keyword set, to match the
potential diversity of topics and reduce ASR
noise. Then, we propose a method to derive
multiple topically separated queries from
this keyword set, in order to maximize the
chances of making at least one
relevant recommendation when using these
IEEE 2015
queries to search over the English Wikipedia.
The proposed methods are evaluated in terms
of relevance with respect
to conversation fragments from the Fisher,
AMI, and ELEA conversational corpora, rated
by several human judges. The scores show that
our proposal improves over previous methods
that consider only word frequency or topic
similarity, and represents a promising solution
for a document recommender system to be
used in conversations.
TTA-DD-
C1507
An Internal Intrusion
Detection and
Protection System by
Using Data Mining and
Forensic Techniques
Currently, most computer systems use user IDs
and passwords as the login patterns to
authenticate users. However, many people
share their login patterns with coworkers and
request these coworkers to assist co-tasks,
thereby making the pattern as one of the
weakest points of computer security. Insider
attackers, the valid users of a system who
attack the system internally, are hard to detect
since most intrusion detection systems and
firewalls identify and isolate malicious
behaviors launched from the outside world of
the system only. In addition, some studies
claimed that analyzing system calls (SCs)
generated by commands can identify these
commands, with which to accurately detect
attacks, and attack patterns are the features of
an attack. Therefore, in this paper, a
security system, named the
Internal Intrusion Detection and Protection Sys
tem (IIDPS), is proposed to detect insider
attacks at SC
level by using data mining and forensic techniq
ues. The IIDPS creates users' personal profiles
to keep track of users' usage habits as
their forensic features and determines whether
a valid login user is the account holder or
not by comparing his/her current computer
usage behaviors with the patterns collected in
the account holder's personal profile. The
experimental results demonstrate that the
IIDPS's user identification accuracy is 94.29%,
whereas the response time is less than 0.45 s,
implying that it can prevent a
IEEE 2015
protected system from insider attacks
effectively and efficiently.
TTA-DD-
C1508
A Critical-time-point
Approach to All-
departure-time
Lagrangian Shortest
Paths
Given a spatio-temporal network, a source, a
destination, and a
desired departure time interval, the All-
departure-
time Lagrangian Shortest Paths (ALSP)
problem determines a set which includes the
shortest path for every departure time in the
given interval. ALSP is important
for critical societal applications such as eco-
routing. However, ALSP is computationally
challenging due to the non-stationary ranking
of the candidate paths across
distinct departure-times. Current related work
for reducing the redundant work, across
consecutive departure-times sharing a common
solution, exploits only partial information e.g.,
the earliest feasible arrival time of a path. In
contrast, our approach uses all available
information, e.g., the entire time series of
arrival times for all departure-times. This
allows elimination of all knowable redundant
computation based on complete information
available at hand. We operationalize this idea
through the concept of critical-time-points
(CTP), i.e., departure-times before which
ranking among candidate paths cannot change.
In our preliminary work, we proposed a CTP
based forward search strategy. In this paper,
we propose a CTP based temporal bi-
directional search for the ALSP problem via a
novel impromptu rendezvous termination
condition. Theoretical and experimental
analysis show that the
proposed approach outperforms the related
work approaches particularly when there are
few critical-time-points.
IEEE 2015
TTA-DD-
C1509
Co-ClusterD A
Distributed Frame work
for Data Co-Clustering
with Sequential
Updates
Co-clustering has emerged to be a
powerful data mining tool for two-
dimensional co-occurrence and dyadic data.
However, co-clustering algorithms often
require significant computational resources
and have been dismissed as impractical for
IEEE 2015
large data sets. Existing studies have provided
strong empirical evidence that expectation-
maximization (EM) algorithms (e.g., k-means
algorithm) with sequential updates can
significantly reduce the computational cost
without degrading the resulting solution.
Motivated by this observation, we
introduce sequential updates for alternate
minimization co-clustering(AMCC) algorithms
which are variants of EM algorithms, and also
show that AMCC algorithms with
sequential updates converge. We then propose
two approaches to parallelize AMCC
algorithms with sequential updates in
a distributed environment. Both approaches are
proved to maintain the convergence properties
of AMCC algorithms. Based on these two
approaches, we present a new
distributed framework, Co-ClusterD, which
supports efficient implementations of AMCC
algorithms with sequential updates. We design
and implement Co-ClusterD, and show its
efficiency through two AMCC algorithms: fast
nonnegative matrix tri-factorization (FNMTF)
and information theoretic co-clustering(ITCC).
We evaluate our framework on both a local
cluster of machines and the Amazon EC2
cloud. Empirical results show that AMCC
algorithms implemented in Co-ClusterD can
achieve a much faster convergence and often
obtain better results than their traditional
concurrent counterparts.
TTA-DD-
C1511
Differentially Private
Frequent Itemset
Mining via Transaction
Splitting
Recently, there has been a growing interest in
designing differentially private data mining alg
orithms.Frequent itemset mining (FIM) is one
of the most fundamental problems in
data mining. In this paper, we explore the
possibility of designing
a differentially private FIM algorithm which
can not only achieve high data utility and a
high degree of privacy, but also offer high time
efficiency. To this end, we propose a
differentially private FIM algorithm based on
the FP-growth algorithm, which is referred to
as PFP-growth. The PFP-growth algorithm
IEEE 2015
consists of a preprocessing phase and
a mining phase. In the preprocessing phase, to
improve the utility and privacy tradeoff, a
novel smart splitting method is proposed to
transform the database. For a given database,
the preprocessing phase needs to be performed
only once. In the mining phase, to offset the
information loss caused
by transaction splitting, we devise a run-time
estimation method to estimate the actual
support of item sets in the original database. In
addition, by leveraging the downward closure
property, we put forward a dynamic reduction
method to dynamically reduce the amount of
noise added to guarantee privacy during the
mining process. Through formal privacy
analysis, we show that our PFP-growth
algorithm is ε-differentially private. Extensive
experiments on real datasets illustrate that our
PFP-growth algorithm substantially
outperforms the state-of-the-art techniques.
TTA-DD-
C1512
Efficient Algorithms for
Mining Top-K High
Utility Itemsets
High utility itemsets (HUIs) mining is an
emerging topic in data mining, which refers to
discovering all itemsets having
a utility meeting a user-specified
minimum utility threshold min_util. However,
setting min_util appropriately is a difficult
problem for users. Generally speaking, finding
an appropriate minimum utility threshold by
trial and error is a tedious process for users. If
min_util is set too low, too many HUIs will be
generated, which may cause
the mining process to be very inefficient. On
the other hand, if min_util is set too high, it is
likely that no HUIs will be found. In this
paper, we address the above issues by
proposing a new framework for top-
k high utility itemset mining, where k is the
desired number of HUIs to be mined. Two
types of efficient algorithms named TKU
(mining Top-K Utilityitemsets) and TKO
(mining Top-K utility itemsets in One phase)
are proposed for mining such item sets without
the need to set min_util. We provide a
structural comparison of the
IEEE 2015
two algorithms with discussions on their
advantages and limitations. Empirical
evaluations on both real and synthetic datasets
show that the performance of the
proposed algorithms is close to that of the
optimal case of state-of-the-
art utility mining algorithms.
TTA-DD-
C1513
k-Nearest Neighbor
Classification over
Semantically Secure
Encrypted Relational
Data
Data Mining has wide applications in many
areas such as banking, medicine, scientific
research and among government
agencies. Classification is one of the
commonly used tasks in data mining
applications. For the past decade, due to the
rise of various privacy issues, many theoretical
and practical solutions to
the classification problem have been proposed
under different security models. However, with
the recent popularity of cloud computing, users
now have the opportunity to outsource
their data, in encrypted form, as well as
the data mining tasks to the cloud. Since
the data on the cloud is in encrypted form,
existing privacy-
preserving classification techniques are not
applicable. In this paper, we focus on solving
the classification problem over encrypted data.
In particular, we propose asecure k-NN
classifier over encrypted data in the cloud. The
proposed protocol protects the confidentiality
of data, privacy of user's input query, and hides
the data access patterns. To the best of our
knowledge, our work is the first to develop
a secure k-NN classifier
over encrypted data under the semi-honest
model. Also, we empirically analyze the
efficiency of our proposed protocol using a
real-world dataset under different parameter
settings.
IEEE 2015
TTA-DD-
C1514
Location Aware
Keyword Query
Suggestion Based on
Document Proximity
Keyword suggestion in web search helps users
to access relevant information without having
to know how to precisely express their queries.
Existing keyword suggestion techniques do not
consider the locations of the users and
the query results; i.e., the spatial proximity of a
IEEE 2015
user to the retrieved results is not taken as a
factor in the recommendation. However, the
relevance of search results in many
applications (e.g., location-based services) is
known to be correlated with their
spatial proximity to the query issuer. In this
paper, we design a location-
aware keyword query suggestion framework.
We propose a weighted keyword-
document graph, which captures both the
semantic relevance between
keyword queries and the spatial distance
between the resulting documents and the
user location. The graph is browsed in a
random-walk-with-restart fashion, to select
the keyword queries with the highest scores
as suggestions. To make our framework
scalable, we propose a partition-
based approach that outperforms the baseline
algorithm by up to an order of magnitude. The
appropriateness of our framework and the
performance of the algorithms are evaluated
using real data.
TTA-DD-
C1515
Rank-Based Similarity
Search Reducing the
Dimensional
Dependence
This paper introduces a data structure for k-
NN search, the Rank Cover Tree (RCT),
whose pruning tests rely solely on the
comparison of similarity values; other
properties of the underlying space, such as the
triangle inequality, are not employed. Objects
are selected according to their ranks with
respect to the query object, allowing much
tighter control on the overall execution costs.
A formal theoretical analysis shows that with
very high probability, the RCT returns a
correct query result in time that depends very
competitively on a measure of the intrinsic
dimensionality of the data set. The
experimental results for the RCT show that
non-metric pruning strategies
for similarity search can be practical even
when the representational dimension of the
data is extremely high. They also show that the
RCT is capable of meeting or exceeding the
level of performance of state-of-the-art
methods that make use of metric pruning or
IEEE 2015
other selection tests involving numerical
constraints on distance values.
TTA-DD-
C1516
RANWAR Rank-Based
Weighted Association
Rule Mining from Gene
Expression and
Methylation Data
Review selection using
microreview
Ranking of association rules is currently an
interesting topic in data mining and
bioinformatics. The huge number of
evolved rules of items (or, genes)
by association rule mining (ARM) algorithms
makes confusion to the decision maker. In this
article, we propose a weighted rule-
mining technique (say, RANWAR or rank-
based weighted association rule-mining)
to rank the rules using two novel rule-
interestingness measures, viz., rank-
based weighted condensed support (wcs)
and weighted condensed confidence (wcc)
measures to bypass the problem. These
measures are basically depended on the rank of
items (genes). Using the rank, we
assign weight to each item. RANWAR
generates much less number of frequent item
sets than the state-of-the-
art association rule mining algorithms. Thus, it
saves time of execution of the algorithm. We
run RANWAR on gene expression and
methylation datasets. The genes of the
top rules are biologically validated
by Gene Ontologies (GOs) and KEGG
pathway analyses. Many top
ranked rules extracted from RANWAR that
hold poor ranks in traditional Apriori, are
highly biologically significant to the related
diseases. Finally, the top rules evolved
from RANWAR, that are not in Apriori, are
reported.
IEEE 2015
TTA-DD-
C1517
Text Detection and
Recognition in Imagery
A Survey Towards
Effective Bug Triage
with Software Data
Reduction Techniques
Software companies spend over 45 percent of
cost in dealing with software bugs. An
inevitable step of fixing bugs is bug triage,
which aims to correctly assign a developer to a
new bug. To decrease the time cost in manual
work, text classification techniques are applied
to conduct automatic bug triage. In this paper,
we address the problem
of data reduction for bug triage, i.e., how to
reduce the scale and improve the quality
IEEE 2015
of bug data. We combine instance selection
with feature selection to simultaneously
reduce data scale on the bug dimension and the
word dimension. To determine the order of
applying instance selection and feature
selection, we extract attributes from
historical bug data sets and build a predictive
model for a new bug data set. We empirically
investigate the performance of data reduction
on totally 600,000 bug reports of two large
open source projects, namely Eclipse and
Mozilla. The results show that
our data reduction can effectively reduce
the data scale and improve the accuracy
ofbug triage. Ourwork provides an approach to
leveraging techniques on data processing to
form reduced and high-
quality bug data in software development and
maintenance.
TTA-DD-
C1518
Towards Open-World
Person Re-
Identification by One-
Shot Group-based
Verification
Solving the problem of matching people across
non-overlapping multi-camera views, known
as person reidentification (re-id), has received
increasing interests in computer vision. In a
real-world application scenario, a watch-list
(gallery set) of a handful of known target
people are provided with very few (in many
cases only a single) image(s) (shots) per target.
Existing re-id methods are largely unsuitable
to address this open-world re-id challenge
because they are designed for (1) a closed-
world scenario where the gallery and probe
sets are assumed to contain exactly the same
people, (2) person-wise identification whereby
the model attempts to verify exhaustively
against each individual in the gallery set, and
(3) learning a matching model using multi-
shots. In this paper, a novel transfer local
relative distance comparison (t-LRDC) model
is formulated to address the open-
world person re-identificationproblem by one-
shot group-based verification. The model is
designed to mine and transfer useful
information from a labelled open-world non-
target dataset. Extensive experiments
demonstrate that the proposed approach
IEEE 2015
outperforms both non-transfer learning and
existing transfer learning based re-id methods.
TTA-DD-
C1519
Improving Accuracy
and Robustness of Self-
Tuning Histograms by
Subspace Clustering
In large databases, the amount and the
complexity of the data calls for data
summarization techniques. Such summaries
are used to assist fast approximate query
answering or query optimization.
Histograms are a prominent class of model-
free data summaries and are widely used in
database systems. So-called self-
tuning histograms look at query-execution
results to refine themselves. An assumption
with such histograms, which has not been
questioned so far, is that they can learn the
dataset from scratch, that is-starting with an
empty bucket configuration. We show that this
is not the case. Self-tuning methods are very
sensitive to the initial configuration. Three
major problems stem from this.
Traditional self-tuning is unable to learn
projections of multi-dimensional data, is
sensitive to the order of queries, and reaches
only local optima with high estimation errors.
We show how to improve a self-tuning method
significantly by starting with a carefully
chosen initial configuration. We propose
initialization by dense subspace clusters in
projections of the data,
which improves bothaccuracy and robustness o
f self-tuning. Our experiments on different
datasets show that the error rate is typically
halved compared to the uninitialized version.
IEEE 2015
TTA-JD-
C1520
TRIP An Interactive
Retrieving-Inferring
Data Imputation
Approach
Data imputation aims at filling in missing
attribute values in databases. Most
existing imputation methods to string attribute
values are inferring-based approaches, which
usually fail to reach a high imputation recall by
just inferring missing values from the complete
part of the data set. Recently, some retrieving-
based methods are proposed to retrieve
missing values from external resources such as
the World Wide Web, which tend to reach a
much higher imputation recall, but inevitably
IEEE 2015
bring a large overhead by issuing a large
number of search queries. In this paper, we
investigate the interaction between
the inferring-based methods and the retrieving-
based methods. We show that retrieving a
small number of selected missing values can
greatly improve the imputation recall of
the inferring-based methods. With this
intuition, we propose an interactive Retrieving-
Inferring data imPutation approach (TRIP),
which
performs retrieving and inferring alternately in
filling in missing attribute values in a dataset.
To ensure the high recall at the minimum
cost, TRIP faces a challenge of selecting the
least number of missing values for retrieving to
maximize the number of inferable values. Our
proposed solution is able to identify an
optimal retrieving-inferring scheduling scheme
in deterministic data imputation, and the
optimality of the generated scheme is
theoretically analyzed with proofs. We also
analyze with an example that the optimal
scheme is not feasible to be achieved in τ-
constrained stochastic data imputation (τ-SDI),
but still, our proposed solution identifies an
expected-optimal scheme in τ-SDI. Extensive
experiments on four data collections show
that TRIP retrieves on average 20 percent
missing values and achieves the same high
recall that was reached by the retrieving-
basedapproach.
TTA-JD-
C1521
Pattern-Aided
Regression Modeling
and Prediction Model
Analysis
This paper first
introduces pattern aided regression (PXR) mod
els, a new type of regression models designed
to represent accurate and
interpretable prediction models. This was
motivated by two observations:
(1) Regression modeling applications often
involve complex diverse predictor-response
relationships, which occur when the
optimal regression models (of
given regression model type) fitting two or
more distinct logical groups of data are highly
different. (2) State-of-the-
IEEE 2015
art regression methods are often unable to
adequately model such relationships. This
paper defines PXR models using several
patterns and local regression models, which
respectively serve as logical and behavioral
characterizations of distinct predictor-response
relationships. The paper also introduces a
contrast pattern aided regression (CPXR)
method, to build accurate PXR models. In
experiments, the PXR models built by CPXR
are very accurate in general, often
outperforming state-of-the-art regression
methods by big margins. Usually using (a)
around seven simple patterns and (b) linear
local regression models, those PXR models are
easy to interpret; in fact, their complexity is
just a bit higher than that of (piecewise)
linear regression models and is significantly
lower than that of traditional ensemble based
regression models. CPXR is especially
effective for high-dimensional data. The paper
also discusses how to use CPXR methodology
for analyzing prediction models and correcting
their prediction errors.
TTA-JD-
C1522
A Set of Complexity
Measures Designed for
Applying Meta-Learning
to Instance Selection
In recent years, some authors have approached
the instance selection problem from a meta-
learning perspective. In their work, they
try to find relationships between the
performance of some methods from this field
and the values of some data-
complexity measures, with the aim of
determining the best performing method given
a data set, using only the values of
the measures computed on this data.
Nevertheless, most of the data-
complexity measures existing in the literature
were not conceived for this purpose and the
feasibility of their use in this field is yet to be
determined. In this paper, we revise the
definition of some measures that we presented
in a previous work, that
were designed for meta-
learning based instance selection. Also, we
assess them in an experimental study involving
three setsof measures, 59 databases,
16 instance selection methods, two classifiers,
IEEE 2015
and eight regression learners used as meta-
learners. The results suggest that
our measures are more efficient and effective
than those traditionally used by researchers
that have addressed the instance selection from
a perspective based on meta-learning.
TTA-JD-
C1522
Efficient Algorithms for
Mining the Concise and
Lossless
Representation of High
Utility Item sets
Mining high utility itemsets (HUIs) from
databases is an important data mining task,
which refers to the discovery of itemsets
with high utilities (e.g. high profits). However,
it may present too many HUIs to users, which
also degrades the efficiency of
the mining process. To achieve high efficiency
for themining task and provide
a concise mining result to users, we propose a
novel framework in this paper
for mining closed+ high utility itemsets(CHUI
s), which serves as a compact
and lossless representation of HUIs. We
propose three efficient algorithms named
AprioriCH (Apriori-
based algorithm for miningHigh utility Closed
+ itemsets), AprioriHC-D
(AprioriHC algorithm with Discarding
unpromising and isolated items) and CHUD
(Closed+ High Utility Itemset Discovery) to
find this representation. Further, a method
called DAHU (Derive
All High Utility Itemsets) is proposed to
recover all HUIs from the set of CHUIs
without accessing the original database.
Results on real and synthetic datasets show
that the proposed algorithms are
very efficient and that our approaches achieve
a massive reduction in the number of HUIs. In
addition, when all HUIs can be recovered by
DAHU, the combination of CHUD and DAHU
outperforms the state-of-the-
art algorithms for mining HUIs.
IEEE 2015
TTA-JD-
C1523
Keyword Extraction
and Clustering for
Document
Recommendation in
Conversations
This paper addresses the problem
of keyword extraction from conversations,
with the goal of using these keywords to
retrieve, for each short conversation fragment,
a small number of potentially relevant
IEEE 2015
documents, which can be recommended to
participants. However, even a short fragment
contains a variety of words, which are
potentially related to several topics; moreover,
using an automatic speech recognition (ASR)
system introduces errors among them.
Therefore, it is difficult to infer precisely the
information needs of
the conversation participants. We first propose
an algorithm to extract keywords from the
output of an ASR system (or a manual
transcript for testing), which makes use of
topic modeling techniques and of a
submodular reward function which favors
diversity in the keyword set, to match the
potential diversity of topics and reduce ASR
noise. Then, we propose a method to derive
multiple topically separated queries from
this keyword set, in order to maximize the
chances of making at least one
relevant recommendation when using these
queries to search over the English Wikipedia.
The proposed methods are evaluated in terms
of relevance with respect
to conversation fragments from the Fisher,
AMI, and ELEA conversational corpora, rated
by several human judges. The scores show that
our proposal improves over previous methods
that consider only word frequency or topic
similarity, and represents a promising solution
for a document recommender system to be
used in conversations.
TTA-JD-
C1524
Top-k Similarity Join in
Heterogeneous
Information Networks
As a newly
emerging network model, heterogeneous infor
mation networks (HINs) have received
growing attention. Many data mining tasks
have been explored in HINs, including
clustering, classification,
and similarity search. Similarity join is a
fundamental operation required for many
problems. It is attracting attention from various
applications on network data, such as friend
recommendation, link prediction, and online
advertising. Although similarity join has been
well studied in homogeneous networks, it has
IEEE 2015
not yet been studied
in heterogeneous networks. Especially, none of
the existing research on similarity join takes
different semantic meanings behind paths into
consideration and almost all completely ignore
the heterogeneity and diversity of the HINs. In
this paper, we propose a path-
based similarity join (PS-join) method to
return the top k similar pairs of objects based
on any user specified join path in
a heterogeneous information network. We
study how to prune expensive
similarity computation by introducing bucket
pruning based locality sensitive hashing
(BPLSH) indexing. Compared with existing
Link-based Similarity join (LS-join) method,
PS-join can derive various similarity
semantics. Experimental results on real data
sets show the efficiency and effectiveness of
the proposed approach.
TTA-JD-
C1525
Active Learning for
Ranking through
Expected Loss
Optimization
Learning to rank arises in many data mining
applications, ranging from web search engine,
online advertising to recommendation system.
In learning to rank, the performance of
a ranking model is strongly affected by the
number of labeled examples in the training set;
on the other hand, obtaining labeled examples
for training data is very expensive and time-
consuming. This presents a great need for
the active learning approaches to select most
informative examples for ranking learning;
however, in the literature there is still very
limited work to
address active learning for ranking. In this
paper, we propose a
general active learning framework, expected lo
ss optimization (ELO), for ranking. The ELO
framework is applicable to a wide range
of ranking functions. Under this framework,
we derive a novel
algorithm, expected discounted cumulative
gain (DCG) loss optimization (ELO-DCG), to
select most informative examples. Then, we
investigate both query and document
level active learning for raking and propose a
IEEE 2015
two-stage ELO-DCG algorithm which
incorporate both query and document selection
into active learning. Furthermore, we show
that it is flexible for the algorithm to deal with
the skewed grade distribution problem with the
modification of the loss function. Extensive
experiments on real-world web search data sets
have demonstrated great potential and
effectiveness of the proposed framework and
algorithms.
TTA-JD-
C1526
Relational Collaborative
Topic Regression for
Recommender Systems
Due to its successful application
in recommender systems, collaborative filterin
g (CF) has become a hot research topic in data
mining and information retrieval. In traditional
CF methods, only the feedback matrix, which
contains either explicit feedback (also called
ratings) or implicit feedback on the items given
by users, is used for training and prediction.
Typically, the feedback matrix is sparse, which
means that most users interact with few items.
Due to this sparsity problem, traditional CF
with only feedback information will suffer
from unsatisfactory performance. Recently,
many researchers have proposed to utilize
auxiliary information, such as item content
(attributes), to alleviate the data sparsity
problem in
CF. Collaborative topic regression (CTR) is
one of these methods which has achieved
promising performance by successfully
integrating both feedback information and item
content information. In many real applications,
besides the feedback and item content
information, there may exist relations (also
known as networks) among the items which
can be helpful for recommendation. In this
paper, we develop a novel hierarchical
Bayesian model
called Relational Collaborative Topic
Regression (RCTR), which extends CTR by
seamlessly integrating the user-item feedback
information, item content information, and
network structure among items into the same
model. Experiments on real-world datasets
show that our model can achieve better
IEEE 2015
prediction accuracy than the state-of-the-art
methods with lower empirical training time.
Moreover, RCTR can learn good interpretable
latent structures which are useful for
recommendation.
TTA-JD-
C1527
Relevance Feature
Discovery for Text
Mining
It is a big challenge to guarantee the quality of
discovered relevance features in text document
s for describing user preferences because of
large scale terms and data patterns. Most
existing popular text mining and classification
methods have adopted term-based approaches.
However, they have all suffered from the
problems of polysemy and synonymy. Over
the years, there has been often held the
hypothesis that pattern-based methods should
perform better than term-based ones in
describing user preferences; yet, how to
effectively use large scale patterns remains a
hard problem in text mining. To make a
breakthrough in this challenging issue, this
paper presents an innovative model
for relevance feature discovery. It discovers
both positive and negative patterns
in text documents as higher level features and
deploys them over low-level features (terms).
It also classifies terms into categories and
updates term weights based on their specificity
and their distributions in patterns. Substantial
experiments using this model on RCV1, TREC
topics and Reuters-21578 show that the
proposed model significantly outperforms both
the state-of-the-art term-based methods and the
pattern based methods.
IEEE 2015
TTA-JD-
C1528
Differentially Private
Frequent Itemset
Mining via Transaction
Splitting
Recently, there has been a growing interest in
designing differentially private data mining alg
orithms.Frequent itemset mining (FIM) is one
of the most fundamental problems in
data mining. In this paper, we explore the
possibility of designing
a differentially private FIM algorithm which
can not only achieve high data utility and a
high degree of privacy, but also offer high time
efficiency. To this end, we propose a
differentially private FIM algorithm based on
the FP-growth algorithm, which is referred to
as PFP-growth. The PFP-growth algorithm
consists of a preprocessing phase and
a mining phase. In the preprocessing phase, to
improve the utility and privacy tradeoff, a
novel smart splitting method is proposed to
transform the database. For a given database,
the preprocessing phase needs to be performed
only once. In the mining phase, to offset the
information loss caused
by transaction splitting, we devise a run-time
estimation method to estimate the actual
support of item sets in the original database. In
addition, by leveraging the downward closure
property, we put forward a dynamic reduction
method to dynamically reduce the amount of
noise added to guarantee privacy during the
mining process. Through formal privacy
analysis, we show that our PFP-growth
algorithm is ε-differentially private. Extensive
experiments on real datasets illustrate that our
PFP-growth algorithm substantially
outperforms the state-of-the-art techniques.
IEEE 2015
TTA-JD-
C1529
Backward Path Growth
for Efficient Mobile
Sequential
Recommendation
The problem
of mobile sequential recommendation is to
suggest a route connecting a set of pick-up
points for a taxi driver so that he/she is more
likely to get passengers with less travel cost.
Essentially, a key challenge of this problem is
its high computational complexity. In this
paper, we propose a novel dynamic
programming based method to solve
the mobile sequential recommendation proble
m consisting of two separate stages: an offline
IEEE 2015
pre-processing stage and an online search
stage. The offline stage pre-computes potential
candidate sequences from a set of pick-up
points. A backward incremental sequence
generation algorithm is proposed based on the
identified iterative property of the cost
function. Simultaneously, an incremental
pruning policy is adopted in the process of
sequence generation to reduce the search space
of the potential sequences effectively. In
addition, a batch pruning algorithm is further
applied to the generated potential sequences to
remove some non-optimal sequences of a
given length. Since the pruning effectiveness
keeps growing with the increase of the
sequence length, at the online stage, our
method can efficiently find the optimal driving
route for an unloaded taxi in the remaining
candidate sequences. Moreover, our method
can handle the problem of optimal route search
with a maximum cruising distance or a
destination constraint. Experimental results on
real and synthetic data sets show that both the
pruning ability and the efficiency of our
method surpass the state-of-the-art methods.
Our techniques can therefore be effectively
employed to address the problem
of mobile sequential recommendation with
many pick-up points in real-world
applications.
TTA-JD-
C1530
Mining Partially-
Ordered Sequential
Rules Common to
Multiple Sequences
Sequential rule mining is an important
data mining problem
with multiple applications. An important
limitation of algorithms
for mining sequential rules common to multipl
e sequences is that rules are very specific and
therefore many similar rules may represent the
same situation. This can cause three major
problems: (1) similar rules can be rated quite
differently, (2) rules may not be found because
they are individually considered uninteresting,
and (3) rules that are too specific are less
likely to be used for making
predictions. To address these issues, we
explore the idea of mining “partially-ordered
IEEE 2015
sequential rules” (POSR), a more general form
of sequential rules such that items in the
antecedent and the consequent of each rule are
unordered. To mine POSR, we propose the
RuleGrowth algorithm, which is efficient and
easily extendable. In particular, we present an
extension (TRuleGrowth) that accepts a
sliding-window
constraint to find rules occurring within a
maximum amount of time. A performance
study with four real-life datasets show that
RuleGrowth and TRuleGrowth have excellent
performance and scalability
compared to baseline algorithms and that the
number of rules discovered can be several
orders of magnitude smaller when the sliding-
window constraint is applied. Furthermore, we
also report results from a real application
showing that POSR can provide much higher
prediction accuracy than
regular sequential rules for sequence prediction
.
TTA-JD-
C1531
CRoM and HuspExt
Improving Efficiency of
High Utility Sequential
Pattern Extraction
High utility sequential pattern mining has been
considered as an important research problem
and a number of relevant algorithms have been
proposed for this topic. The main challenge
of high utility sequential pattern mining is that,
the search space is large and the efficiency of
the solutions is directly affected by the degree
at which they can eliminate the
candidate patterns. Therefore, the efficiency of
any high utility sequential pattern mining
solution depends on its ability to reduce this
big search space, and as a result, lower the
computational complexity of calculating
the utilities of the candidate patterns. In this
paper, we propose efficient data structures and
pruning technique which is based on
Cumulated Rest of Match (CRoM) based
upper bound. CRoM, by defining a tighter
upper bound on the utility of the candidates,
allows more conservative pruning before
candidate pattern generation in comparison to
the existing techniques. In addition, we have
developed an efficient
IEEE 2015
algorithm, High Utility Sequential
Pattern Extraction (HuspExt), which calculates
the utilities of the child patterns based on that
of the parents'. Substantial experiments on
both synthetic and real datasets from different
domains show that, the proposed solution
efficiently
discovers high utility sequential patterns from
large scale datasets with different data
characteristics, under low utility thresholds.
TTA-JD-
C1532
Co-Extracting Opinion
Targets and Opinion
Words from Online
Reviews Based on the
Word Alignment Model
Mining opinion targets and opinion words fro
m online reviews are important tasks for fine-
grained opinion mining, the key component of
which involves detecting opinion relations
among words. To this end, this paper proposes
a novel approach based on the partially-
supervised alignment model, which regards
identifying opinion relations as
an alignment process. Then, a graph-based co-
ranking algorithm is exploited to estimate the
confidence of each candidate. Finally,
candidates with higher confidence are
extracted as opinion targets or opinion words.
Compared to previous methods based on the
nearest-neighbor rules,
our model captures opinion relations more
precisely, especially for long-span relations.
Compared to syntax-based methods,
our word alignment model effectively
alleviates the negative effects of parsing errors
when dealing with informal online texts. In
particular, compared to the traditional
unsupervised alignment model, the
proposed model obtains better precision
because of the usage of partial supervision. In
addition, when estimating candidate
confidence, we penalize higher-degree vertices
in our graph-based co-ranking algorithm to
decrease the probability of error generation.
Our experimental results on three corpora with
different sizes and languages show that our
approach effectively outperforms state-of-the-
art methods.
IEEE 2015
TTA-JD-
C1533
Global Redundancy
Minimization for
Feature Ranking
Feature selection has been an important
research topic in data mining, because the real
data sets often have high-dimensional features,
such as the bioinformatics and text mining
applications. Many existing
filter feature selection
methods rank features by optimizing
certain feature ranking criterions, such that
correlated features often have similar rankings.
These correlated features are redundant and
don't provide large mutual information to help
data mining. Thus, when we select a limited
number of features, we hope to select the top
non-redundant features such that the useful
mutual information can be maximized. In
previous research, Ding et al. recognized this
important issue and proposed the minimum
Redundancy Maximum
Relevance Feature Selection (mRMR) model
to minimize the redundancy between
sequentially selected features. However, this
method used the greedy search, thus the global
feature redundancy wasn't considered and the
results are not optimal. In this paper, we
propose a new feature selection framework to
globally minimize the feature redundancy with
maximizing the given feature ranking scores,
which can come from any supervised or
unsupervised methods. Our new model has no
parameter so that it is especially suitable for
practical data mining application.
Experimental results on benchmark data sets
show that the proposed method consistently
improves the feature selection results
compared to the original methods. Meanwhile,
we introduce a new unsupervised global and
local discriminative feature selection method
which can be unified with the
global feature redundancy minimization frame
work and shows superior performance.
IEEE 2015
TTA-JD-
C1534
Review Selection Using
Micro-Reviews
Given the proliferation of review content, and
the fact that reviews are highly diverse and
often unnecessarily verbose, users frequently
face the problem of selecting the
IEEE 2015
appropriate reviews to consume. Micro-
reviews are emerging as a new type of
online review content in the social media.
Micro-reviews are posted by users of check-in
services such as Foursquare. They are concise
(up to 200 characters long) and highly focused,
in contrast to the comprehensive and
verbose reviews. In this paper, we propose a
novel mining problem, which brings together
these two disparate sources of review content.
Specifically, we use coverage of micro-
reviews as an objective for selecting a set of
reviews that cover efficiently the salient
aspects of an entity. Our approach consists of a
two-step process: matching review sentences
to micro-reviews, and selecting a small set
of reviews that cover as many micro-
reviews as possible, with few sentences. We
formulate this objective as a combinatorial
optimization problem, and show how to derive
an optimal solution using Integer Linear
Programming. We also propose an efficient
heuristic algorithm that approximates the
optimal solution. Finally, we perform a
detailed evaluation of all the steps of our
methodology using data collected from
Foursquare and Yelp.
TTA-JD-
C1535
A Trust Management
Scheme Based on
Behavior Feedback for
Opportunistic Networks
In the harsh environment where node density is
sparse, the slow-moving nodes cannot
effectively utilize the encountering
opportunities to realize the self-organized
identity authentications, and do not have the
chance to join the network routing. However,
considering most of the communications in
opportunistic networks are caused by
forwarding operations, there is no need to
establish the complete mutual authentications
IEEE 2015
for each conversation. Accordingly, a
novel trust management scheme is
presented based on the information
of behavior feedback, in order to complement
the insufficiency of identity authentications.
By utilizing the certificate chains based on
social attributes, the mobile nodes build the
local certificate graphs gradually to realize the
web of “Identity Trust” relationship.
Meanwhile, the successors generate
Verified Feedback Packets for each
positive behavior, and consequently the
“Behavior Trust” relationship is formed for
slow-moving nodes. Simulation result shows
that, by implementing our trust scheme, the
delivery probability and trust reconstruction
ratio can be effectively improved when there
are large numbers of compromised nodes, and
it means that our trust management scheme can
efficiently explore and filter the trust nodes for
secure forwarding inopportunistic networks.
TTA-JD-
C1536
Extending Association
Rule Summarization
Techniques to Assess
Risk of Diabetes
Mellitus
Early detection of patients with elevated risk of
developing diabetes mellitus is critical to the
improved prevention and overall clinical
management of these patients. We
aim to apply association rule mining to
electronic medical records (EMR) to discover
sets of risk factors and their corresponding
subpopulations that represent patients at
particularly high risk of developing diabetes.
Given the high dimensionality of
EMRs, association rule mining generates a
very large set of rules which we need to
summarize for easy clinical use. We reviewed
four association rule set summarization techniq
ues and conducted a comparative
evaluation to provide guidance regarding their
applicability, strengths and weaknesses. We
proposed
extensions to incorporate risk of diabetes into
the process of finding an optimal summary.
We evaluated these modified techniques on a
real-world prediabetic patient cohort. We
found that all four methods produced
summaries that described subpopulations at
IEEE 2015
high risk of diabetes with each method having
its clear strength. For our purpose, our
extension to the Buttom-Up
Summarization (BUS) algorithm produced the
most suitable summary. The subpopulations
identified by this summary covered most high-
risk patients, had low overlap and were at very
high risk of diabetes.
TTA-JD-
C1537
A decision-theoretic
rough set approach for
dynamic data mining
Uncertainty and fuzziness generally exist in
real-life data. Approximations are employed to
describe the uncertain information
approximately in rough set theory. Certain and
uncertain rules are induced directly from
different regions partitioned by
approximations. Approximation can further be
applied to data-mining-related task, ,
attribute reduction. Nowadays, different types
of data collected from different applications
evolve with time, especially new attributes
may appear while new objects are added. This
paper presents
an approach for dynamic maintenance of
approximations objects and attributes
added simultaneously under the framework
of decision-theoretic rough set (DTRS).
Equivalence feature vector and matrix are
defined first to update approximations of
DTRS in different levels of granularity. Then,
the information system is decomposed into
subspaces, and the equivalence feature matrix
is updated in different subspaces
incrementally. Finally, the approximations of
DTRS are renewed during the process of
updating the equivalence feature matrix.
Extensive experimental results verify the
effectiveness of the proposed methods.
IEEE 2015
TTA-JD-
C1538
A Joint Segmentation
and Classification
Framework for
Sentence Level
Sentiment
Classification
In this paper, we propose
a joint segmentation and classification framew
ork for sentence-levelsentiment classification.
It is widely recognized that phrasal
information is crucial for sentiment
classification. However,
IEEE 2015
existing sentiment classification algorithms
typically split a sentence as a word sequence,
which does not effectively handle the
inconsistent sentiment polarity between a
phrase and the words it contains, such as {“not
bad,” “bad”} and {“a great deal of,” “great”}.
We address this issue by developing
a joint framework for sentence-
level sentiment classification. It
simultaneously generates
useful segmentations and predicts sentence-
level polarity based on
the segmentation results. Specifically, we
develop a candidate generation model to
produce segmentation candidates of a
sentence; a segmentation ranking model to
score the usefulness of
a segmentation candidate for
sentiment classification; and
a classification model for predicting
the sentiment polarity of as egmentation. We
train the joint framework directly
from sentences annotated with only sentiment
polarity, without using any syntactic
or sentiment annotations in segmentation level.
We conduct experiments
for sentiment classification on two benchmark
datasets: a tweet dataset and a review dataset.
Experimental results show that: 1) our method
performs comparably with state-of-the-art
methods on both datasets;
2) joint modeling segmentation and classificati
on outperforms pipelined baseline methods in
various experimental settings.
TTA-JD-
C1539
A Similarity-Based
Learning Algorithm
Using Distance
Transformation
Numerous theories and algorithms have been
developed to solve vectorial
data learning problems by searching for the
hypothesis that best fits the observed training
sample. However, many real-world
applications involve samples that are not
described as feature vectors, but as
(dis)similarity data. Converting vectorial data
into (dis)similarity data is more easily
performed than converting (dis)similarity data
into vectorial data. This study proposes a
IEEE 2015
stochastic
iterative distance transformation model for
similarity-based learning. The proposed model
can be used to identify a clear class boundary
in data by modifying the (dis)similarities
between examples. The experimental results
indicate that the performance of the proposed
method is comparable with those of various
vector-based and proximity-
based learning algorithms.
TTA-JD-
C1540
Active Learning from
Relative Comparisons
This work focuses
on active learning from relative comparison inf
ormation. A relative comparison specifies, for
a data triplet (xi, xj, xk), that instance xi is
more similar to xj than to xk. Such constraints,
when available, have been shown to be useful
toward learning tasks such as defining
appropriate distance metrics or finding good
clustering solutions. In real-world applications,
acquiring constraints often involves
considerable human effort, as it requires the
user to manually inspect the instances. This
motivates us to study how to select and query
the most useful relative comparisons to
achieve effective learning with minimum user
effort. Given an underlying class concept that
is employed by the user to provide such
constraints, we present an information-
theoretic criterion that selects the triplet whose
answer leads to the highest expected
information gain about the classes of a set of
examples. Directly applying the proposed
criterion requires examining O(n3) triplets
with n instances, which is prohibitive even for
datasets of moderate size. We show that a
randomized selection strategy can be used to
reduce the selection pool from O(n3) to O(n)
with minimal loss in efficiency, allowing us to
scale up to considerably larger problems.
Experiments show that the proposed method
consistently outperforms baseline policies.
IEEE 2015
TTA-JD-
C1541
Adaptive Processing for
Distributed Skyline
Queries over Uncertain
Data
Query processing over uncertain data has
gained growing attention, because it is
necessary to deal with uncertain data in many
IEEE 2015
real-life applications. In this paper, we
investigate skyline queries overuncertain data i
n distributed environments (DSUD query)
whose research is only in an early stage. The
state-of-the-art algorithm, called e-DSUD
algorithm, is designed
for processing this query. It has the desirable
characteristics of progressiveness and
minimum bandwidth consumption. However,
it still needs to be perfected in three aspects.
(1) Progressiveness. Each time it only returns
one query result at most. (2) Efficiency. There
are a significant amount of redundant I/O cost
and numerous iterations which causes a long
total query time. (3) Universality. It is
restricted to the case where local skylinetuples
are incomparability. To address these
concerns, we first present a detailed analysis of
the e-DSUD algorithm and then develop an
improved framework for the DSUD query,
namely IDSUD. Based on the new framework,
we propose an adaptive algorithm, called
ADSUD, for the DSUD query. In the
algorithm, we redefine the approximate
global skyline probability and choose local
representative tuples due to minimum
probabilistic bounding rectangle adaptively.
Furthermore, we design a progressive pruning
method and apply the reuse mechanism to
improve its efficiency. The results of extensive
experiments verify the better overall
performance of our algorithm than the e-
DSUD algorithm.
TTA-JD-
C1542
Adding Geospatial Data
Provenance into SDI—
A Service-Oriented
Approach
Geospatial data provenance records the
derivation history of a geospatial data product.
It is important in evaluating the quality
of data products. In
a Geospatial Web Service environment
where data are often disseminated and
processed widely and frequently in an
unpredictable way, it is even more important in
identifying original data sources, tracing
workflows, updating or reproducing scientific
results, and evaluating reliability and quality
of geospatial data products. Geospatial data pr
IEEE 2015
ovenance has become a fundamental issue in
establishing the spatial data infrastructure
(SDI). This paper investigates how to
support provenance awareness in SDI. It
addresses key issues
including provenance modeling, capturing, and
sharing in a SDI enabled by
interoperable geospatial services. A reference
architecture for provenance tracking is
proposed, which can
accommodate geospatial feature provenance at
different levels of granularity. Open standards
from ISO, World Wide Web Consortium
(W3C), and OGC are leveraged to facilitate the
interoperability. At the feature type level, this
paper proposes extensions of W3C PROV-
XML for ISO 19115 lineage and “Parent
Level” provenance registration in
the geospatial catalog service. At the feature
instance level, light-weight lineage information
entities for feature provenance are proposed
and managed by Web Feature Services.
Experiments demonstrate the applicability of
the approach for
creating provenance awareness in an
interoperable geospatialservice-
oriented environment.
TTA-JD-
C1543
Answering Pattern
Queries Using Views
Answering queries using views has proven
effective for querying relational and
semistructured data. This paper investigates
this issue for graph pattern queries based on
graph simulation. We propose a notion
of pattern containment to characterize
graph pattern matching using graph pattern vie
ws. We show that a pattern query can
be answered using a set of views if and only if
it is contained in the views. Based on this
characterization, we develop efficient
algorithms to answer graph pattern queries. We
also study problems for determining (minimal,
minimum) containment of pattern queries. We
establish their complexity (from cubic-time to
NP-complete) and provide efficient checking
algorithms (approximation when the problem
is intractable). In addition, when
IEEE 2015
a pattern query is not contained in the views,
we study maximally contained rewriting to
find approximate answers; we show that it is in
cubic-time to compute such rewriting, and
present a rewriting algorithm. We
experimentally verify that these methods are
able to efficiently answer pattern queries on
large real-world graphs.
TTA-JD-
C1544
CloudKeyBank Privacy
and Owner
Authorization Enforced
Key Management
Framework
Explosive growth in the number of passwords
for Web based applications and
encryption keys for outsourced data storage
well exceeds the management limit of users.
Therefore, outsourcing keys(including
passwords and data encryption keys) to
professional password managers (honest-but-
curious service providers) is attracting the
attention of many users. However, existing
solutions in a traditional data outsourcing
scenario are unable to simultaneously meet the
following three security requirements
for keys outsourcing: (1) Confidentiality
and privacy of keys; (2) Search privacy on
identity attributes tied to keys;
(3) Owner controllable authorization over
his/her shared keys. In this paper, we
propose CloudKeyBank, the first
unified key management framework that
addresses all the three goals above. Under
our framework, the key owner can
perform privacy and controllable
authorization enforced encryption with
minimum information leakage. To
implement CloudKey Bank efficiently, we
propose a new cryptographic primitive named
Searchable Conditional Proxy Re-Encryption
(SC-PRE) which combines the techniques of
Hidden Vector Encryption (HVE) and Proxy
Re-Encryption (PRE) seamlessly, and propose
a concrete SC-PRE scheme based on existing
HVE and PRE schemes. Our experimental
results and security analysis show the
efficiency and security goals are well achieved.
IEEE 2015
TTA-JD-
C1545
Clustering Deviations
for Black Box
Regression Testing of
Database Applications
Regression tests often result in
many deviations (differences between two
system versions), either due to changes
or regression faults. For the tester to analyze
such deviations efficiently, it would be helpful
to accurately group them, such that each group
contains deviations representing one unique
change or regression fault. Because it is
unlikely that a general solution to the above
problem can be found, we focus our work on a
common type of software
system: database applications. We investigate
the use of clustering, based
on database manipulations
and test specifications (from test models), to
group regression test deviations according to
the faults or changes causing them. We also
propose assessment criteria based on the
concept of entropy to compare
alternative clustering strategies. To validate
our approach, we ran a large scale industrial
case study, and our results show that our
clustering approach can indeed serve as an
accurate strategy for
grouping regression test deviations. Among the
four test campaigns
assessed, deviations were clustered perfectly
for two of them, while for the other two,
the clusters were all homogenous. Our analysis
suggests that this approach can significantly
reduce the effort spent by testers in
analyzing regression test deviations, increase
their level of confidence, and therefore
make regression testing more scalable.
IEEE 2015
TTA-JD-
C1546
Context-based
Collaborative Filtering
for Citation
Recommendation
Citation recommendation is an interesting and
significant research area as it solves the
information overload in academia by
automatically suggesting relevant references
for a research paper. Recently, with the rapid
proliferation of information technology,
research papers are rapidly published in
various conferences and journals. This
makes citation recommendation a highly
important and challenging discipline. In this
paper, we propose a
IEEE 2015
novel citation recommendation method that
uses only easily obtained citation relations as
source data. The rationale underlying this
method is that, if two citing papers are
significantly co-occurring with the same citing
paper(s), they should be similar to some
extent. Based on the above rationale, an
association mining technique is employed to
obtain the paper representation of each citing
paper from the citation context. Then, these
paper representations are pairwise compared to
compute similarities between the citing papers
for collaborative filtering. We evaluate our
proposed method through two relevant real-
world data sets. Our experimental results
demonstrate that the proposed method
significantly outperforms the baseline method
in terms of precision, recall, and F1, as well as
mean average precision and mean reciprocal
rank, which are metrics related to the rank
information in the recommendation list.
TTA-JD-
C1547
Crowdsourcing for Top-
K Query Processing
over Uncertain Data
Querying uncertain data has become a
prominent application due to the proliferation
of user-generated content from social media
and of data streams from sensors.
When data ambiguity cannot be reduced
algorithmically, crowdsourcing proves a viable
approach, which consists in posting tasks to
humans and harnessing their judgment for
improving the confidence about data values or
relationships. This paper tackles the problem
of processing top-
K queries over uncertain data with the help
ofcrowdsourcing for quickly converging to the
real ordering of relevant results. Several offline
and online approaches for addressing questions
to a crowd are defined and contrasted on both
synthetic and real data sets, with the aim of
minimizing the crowd interactions necessary to
find the real ordering of the result set.
IEEE 2015
TTA-JD-
C1548
Discovering Latent
Semantics in Web
Documents using Fuzzy
Clustering
Web documents are heterogeneous and
complex. There exists complicated
associations within oneweb document and
linking to the others. The high interactions
IEEE 2015
between terms in documents demonstrate
vague and ambiguous meanings. Efficient and
effective clustering methods
to discoverlatent and coherent meanings in
context are necessary. This paper presents
a fuzzy linguistic topological space along with
a fuzzy clustering algorithm to discover the
contextual meaning in the web documents. The
proposed algorithm extracts features from
the web documents using conditional random
field methods and builds a fuzzy linguistic
topological space based on the associations of
features. The associations of cooccurring
features organize a hierarchy of
connected semantic complexes called
“CONCEPTS,” wherein a fuzzy linguistic
measure is applied on each complex to
evaluate 1) the relevance of
a document belonging to a topic, and 2) the
difference between the other
topics. Web contents are able to
be clustered into topics in the hierarchy
depending on their fuzzylinguistic
measures; web users can further explore the
CONCEPTS of web contents accordingly.
Besides the algorithm applicability in web text
domains, it can be extended to other
applications, such as data mining,
bioinformatics, content-based, or collaborative
information filtering, etc.
TTA-JD-
C1549
Discovery of Ranking
Fraud for Mobile Apps
Ranking fraud in the mobile App market refers
to fraudulent or deceptive activities which
have a purpose of bumping up the Apps in the
popularity list. Indeed, it becomes more and
more frequent for App developers to use shady
means, such as inflating their Apps' sales or
posting phony App ratings, to
commit ranking fraud. While the importance of
preventing ranking fraud has been widely
recognized, there is limited understanding and
research in this area. To this end, in this paper,
we provide a holistic view
of ranking fraud and propose
a ranking fraud detection system
for mobile Apps. Specifically, we first propose
IEEE 2015
to accurately locate the ranking fraud by
mining the active periods, namely leading
sessions, of mobile Apps. Such leading
sessions can be leveraged for detecting the
local anomaly instead of globalanomaly of
App rankings. Furthermore, we investigate
three types of evidences, i.e., ranking based
evidences, rating based evidences and review
based evidences, by modeling Apps' ranking,
rating and review behaviors through statistical
hypotheses tests. In addition, we propose an
optimization based aggregation method to
integrate all the evidences for fraud detection.
Finally, we evaluate the proposed system with
real-world App data collected from the iOS
App Store for a long time period. In the
experiments, we validate the effectiveness of
the proposed system, and show the scalability
of the detection algorithm as well as some
regularity of ranking fraud activities.
TTA-JD-
C1550
k-Nearest Neighbor
Classification over
Semantically Secure
Encrypted Relational
Data
Data Mining has wide applications in many
areas such as banking, medicine, scientific
research and among government
agencies. Classification is one of the
commonly used tasks in data mining
applications. For the past decade, due to the
rise of various privacy issues, many theoretical
and practical solutions to
the classification problem have been proposed
under different security models. However, with
the recent popularity of cloud computing, users
now have the opportunity to outsource
their data, in encrypted form, as well as
the data mining tasks to the cloud. Since
the data on the cloud is in encrypted form,
existing privacy-
preserving classification techniques are not
applicable. In this paper, we focus on solving
the classification problem over encrypted data.
In particular, we propose asecure k-NN
classifier over encrypted data in the cloud. The
proposed protocol protects the confidentiality
of data, privacy of user's input query, and hides
the data access patterns. To the best of our
knowledge, our work is the first to develop
IEEE 2015
a secure k-NN classifier
over encrypted data under the semi-honest
model. Also, we empirically analyze the
efficiency of our proposed protocol using a
real-world dataset under different parameter
settings.
TTA-JD-
C1551
Location Aware
Keyword Query
Suggestion Based on
Document Proximity
Keyword suggestion in web search helps users
to access relevant information without having
to know how to precisely express their queries.
Existing keyword suggestion techniques do not
consider the locations of the users and
the query results; i.e., the spatial proximity of a
user to the retrieved results is not taken as a
factor in the recommendation. However, the
relevance of search results in many
applications (e.g., location-based services) is
known to be correlated with their
spatial proximity to the query issuer. In this
paper, we design a location-
aware keyword query suggestion framework.
We propose a weighted keyword-
document graph, which captures both the
semantic relevance between
keyword queries and the spatial distance
between the resulting documents and the
user location. The graph is browsed in a
random-walk-with-restart fashion, to select
the keyword queries with the highest scores
as suggestions. To make our framework
scalable, we propose a partition-
based approach that outperforms the baseline
algorithm by up to an order of magnitude. The
appropriateness of our framework and the
performance of the algorithms are evaluated
using real data.
IEEE 2015
TTA-JD-
C1552
Mining Temporal
Patterns in Time
Interval-based Data
Sequential pattern mining is an important
subfield in data mining. Recently, applications
using time interval-based event data have
attracted considerable efforts in
discovering patterns from events that persist
for some duration. Since the relationship
between two intervals is intrinsically complex,
how to effectively and
efficiently mine interval-based sequences is a
IEEE 2015
challenging issue. In this paper, two novel
representations, endpoint representation and
end time representation, are proposed to
simplify the processing of complex
relationships among event intervals. Based on
the proposed representations, three types
of interval-based patterns: temporal pattern,
occurrence-probabilistic temporal pattern, and
duration-probabilistic temporal pattern, are
defined. In addition, we develop two novel
algorithms, Temporal Pattern Miner (TPMiner)
and Probabilistic Temporal Pattern Miner (P-
TPMiner), to discover three types of interval-
based sequential patterns. We also propose
three pruning techniques to further reduce the
search space of the mining process.
Experimental studies show that both
algorithms are able to find three types
of patterns efficiently. Furthermore, we apply
proposed algorithms to real datasets to
demonstrate the effectiveness and validate the
practicability of proposed patterns.
TTA-JD-
C1553
Multi-Objective Service
Composition in
Uncertain
Environments
Web services have the potential to offer the
enterprises with the ability to compose internal
and external business services in order to
accomplish complex
processes. Service composition then becomes
an increasingly challenging issue when
complex and critical applications are built
upon services with different QoS criteria.
However, most of the existing QoS-
aware service composition techniques are
simply based on the assumption that multiple
QoS criteria, no matter whether these multiple
criteria are conflicting or not, can be combined
into a single criterion to be optimized,
according to some utility functions. In practice,
this can be very difficult as these utility
functions or weights are not well-known a
priori. In addition, the existing approaches are
designed to work in certain environments,
where the QoS parameters are well-known in
advance. These approaches will render
fruitless when facing uncertainand
dynamic environments, e.g.,
IEEE 2015
cloud environments, where no prior knowledge
of the QoS parameters is available. In this
paper, two novel multi-objective approaches
are proposed to handle QoS-aware
Web service composition with conflicting
objectives and various restrictions on the
quality matrices. The proposed approaches use
reinforcement learning in order to deal with the
uncertainty characteristics inherent in open and
dynamic environments. Experimental results
reveal the ability of the proposed approaches to
find a set of Pareto optimal solutions, which
have the equivalent quality to satisfy multiple
QoS-objectives with different user preferences.
TTA-JD-
C1554
Pattern-based Topics
for Document Modelling
in Information Filtering
Many mature term-based or pattern-
based approaches have been used in the field
of information filtering to generate
users' information needs from a collection
of documents. A fundamental assumption for
these approaches is that the documents in the
collection are all about one topic. However, in
reality users' interests can be diverse and
the documents in the collection often involve
multiple topics. Topic modelling, such as
Latent Dirichlet Allocation (LDA), was
proposed to generate statistical models to
represent multiple topics in a collection
of documents, and this has been widely
utilized in the fields of machine learning
and information retrieval, etc. But its
effectiveness in information filtering has not
been so well explored. Patterns are always
thought to be more discriminative than single
terms for describing documents. However, the
enormous amount of discovered patterns
hinder them from being effectively and
efficiently used in real applications, therefore,
selection of the most discriminative and
representative patterns from the huge amount
of discovered patterns becomes crucial. To
deal with the above mentioned limitations and
problems, in this paper, a
novel information filtering model, Maximum
matched Pattern-
based Topic Model (MPBTM), is proposed.
IEEE 2015
The main distinctive features of the
proposed model include: (1)
user information needs are generated in terms
of multiple topics; (2) eachtopic is represented
by patterns; (3) patterns are generated
from topic models and are organized in terms
of their statistical and taxonomic features; and
(4) the most discriminative and representative
patterns, called Maximum Matched Patterns,
are proposed to estimate
the document relevance to the
user's information needs in order to filter out
irrelevant documents. Extensive experiments
are conducted to evaluate the effectiveness of
the proposed model by using the TREC data
collection Reuters Corpus Volume 1. The
results show that the
proposed model significantly - utperforms both
state-of-the-art term-based models and pattern-
based models.
TTA-JD-
C1555
Polarity Consistency
Checking for Domain
Independent Sentiment
Dictionaries
Polarity classification of words is
important for applications such as
Opinion Mining and Sentiment
Analysis. A number
of sentiment word/sense dictionaries ha
ve been manually or
(semi)automatically constructed. We
notice that
these sentiment dictionaries have
numerous inaccuracies. Besides obvious
instances, where the same word appears
with different polarities in
different dictionaries, the
dictionaries exhibit complex cases
of polarity inconsistency, which cannot
be detected by mere manual inspection.
We introduce the concept
of polarity consistency of words/senses
in sentiment dictionariesin this paper.
We show that the consistency problem
is NP-complete. We reduce the polarity
consistency problem to the satisfiability
IEEE 2015
problem and utilize two fast SAT
solvers to detect inconsistencies in
a sentiment dictionary. We perform
experiments on
five sentiment dictionaries and
WordNet to show interand intra-
dictionaries inconsistencies.
TTA-JD-
C1556
Predicting User-Topic
Opinions in Twitter with
Social and Topical
Context
With popular microblogging services
like Twitter, users are able to online share their
real-time feelings in a more convenient way.
The user generated data in Twitter is thus
regarded as a resource providing individuals'
spontaneous emotional information, and has
attracted much attention of researchers. Prior
work has measured the emotional expressions
in users' tweets and then performed various
analysis and learning. However, how to utilize
those learned knowledge from the observed
tweets and the context information
to predict users' opinions toward specific
topics they had not directly given yet, is a
novel problem presenting both challenges and
opportunities. In this paper, we mainly focus
on solving this problem with
a Social context and Topical context incorporat
ed Matrix Factorization (ScTcMF) framework.
The experimental results on a real-
world Twitter data set show that this
framework outperforms the state-of-the-art
collaborative filtering methods, and
demonstrate that
both social contextand topical context are
effective in improving the user-
topic opinion prediction performance.
IEEE 2015
TTA-JD-
C1557
RankRC Large-scale
Nonlinear Rare Class
Ranking
Rare class problems are common in real-world
applications across a wide range of domains.
Standard classification algorithms are known
to perform poorly in these cases, since they
focus on overall classification accuracy. In
addition, we have seen a significant increase of
data in recent years, resulting in
many large scale rare class problems. In this
paper, we focus on nonlinear kernel based
IEEE 2015
classification methods expressed as a
regularized loss minimization problem. We
address the challenges associated with
both rare class problems
and large scale learning, by 1) optimizing area
under curve of the receiver of operator
characteristic in the training process, instead of
classification accuracy and 2) using
a rare class kernel representation to achieve an
efficient time and space algorithm. We call the
algorithm RankRC. We provide justifications
for the rare class representation and
experimentally illustrate the effectiveness
of RankRC in test performance, computational
complexity, and model robustness.
TTA-JD-
C1558
Reverse Keyword
Search for Spatio-
Textual Top-kQueries
in Location-Based
Services
Spatio-textual queries retrieve the most similar
objects with respect to a given location and
a keywordset. Existing studies mainly focus on
how to efficiently find the top-k result set
given a spatio-textualquery. Nevertheless, in
many application scenarios, users cannot
precisely formulate their keywords and instead
prefer to choose them from some
candidate keyword sets. Moreover, in
information browsing applications, it is useful
to highlight the objects with the tags
(keywords) under which the objects have high
rankings. Driven by these applications, we
propose a novel query paradigm, namely
reverse keyword search for spatio-textual top-k
queries (RSTQ). It returns the keywords under
which a target object will be a spatio-
textual top-k result. To efficiently process the
new query, we devise a novel hybrid index
KcR-tree to store and summarize the spatial
and textual information of objects. By
accessing the high-level nodes of KcR-tree, we
can estimate the rankings of the target object
without accessing the actual objects. To further
improve the performance, we propose three
query optimization techniques, i.e., KcR*-tree,
lazy upper-bound updating, and keyword set
filtering. We also extend RSTQ to allow the
input location to be a spatial region instead of a
point. Extensive experimental evaluation
IEEE 2015
demonstrates the efficiency of our proposed
query techniques in terms of both the
computational cost and I/O cost.
TTA-JD-
C1559
Scalable Distributed
Processing Of K
Nearest Neighbore
Queries over Moving
Objects
Central to many applications
involving moving objects is the task
of processing k-nearest neighbor (k-
NN) queries. Most of the existing approaches
to this problem are designed for the centralized
setting where query processing takes place on
a single server; it is difficult, if not impossible,
for them to scale to a distributed setting to
handle the vast volume of data and
concurrent queries that are increasingly
common in those applications. To address this
problem, we propose a suite of solutions that
can
support scalable distributed processing of k-
NN queries. We first present a new index
structure called Dynamic Strip Index (DSI),
which can better adapt to different data
distributions than exiting grid indexes.
Moreover, it can be naturally distributed across
the cluster, therefore lending itself well
todistributed processing. We further propose
a distributed k-NN search (DKNN) algorithm
based on DSI. DKNN avoids having an
uncertain number of potentially expensive
iterations, and is thus more efficient and more
predictable than existing approaches. DSI and
DKNN are implemented on Apache S4, an
open-source platform
for distributed stream processing. We perform
extensive experiments to study the
characteristics of DSI and DKNN, and
compare them with three baseline methods.
Experimental results show that our proposal
scales well and significantly outperforms the
alternative methods.
IEEE 2015
TTA-JD-
C1560
Sentiment analysis
from opinion mining to
human-agent
interaction
The opinion mining and human-
agent interaction communities are currently
addressing sentiment analysis from different
perspectives that comprise, on the one hand,
disparate sentiment-related phenomena and
computational representations, and on the
IEEE 2015
other hand, different detection and dialog
management methods. In this paper we
identify and discuss the growing opportunities
for cross-disciplinary work that may increase
individual
advances. Sentiment/opinion detection
methods used inhuman-agent interaction are
indeed rare and, when they are employed, they
are not different from the ones used
in opinion mining and consequently not
designed for socio-
affective interactions (timing constraint of
the interaction, sentiment analysis as an input
and an output
of interaction strategies). Tosupport our
claims, we present a comparative state of the
art which analyzes the sentiment-related
phenomena and the sentiment detection
methods used in both communities and makes
an overview of the goals of socio-
affective human-agent strategies. We propose
then different possibilities for mutual benefit,
specifying several research tracks and
discussing the open questions and
prospects. To show the feasibility of the
general guidelines proposed we also approach
them from a specific perspective by applying
them to the case of the Greta embodied
conversational agents platform and discuss the
way they can be used to make a more
significative sentiment analysis for human-
agent interactions in two different use cases:
job interviews and dialogs with museum
visitors.
TTA-JD-
C1561
Similarity Measure
Selection for Clustering
Time Series Databases
In the past few years, clustering has become a
popular task associated with time series. The
choice of a suitable distance measure is crucial
to the clustering process and, given the vast
number of distance
measures for time series available in the
literature and their diverse characteristics,
this selection is not straightforward. With the
objective of simplifying this task, we propose a
multi-label classification framework that
provides the means to automatically select the
IEEE 2015
most suitable distance measures for
clustering a time series database. This
classifier is based on a novel collection of
characteristics that describe the main features
of the time series databases and provide the
predictive information necessary to
discriminate between a set of
distance measures. In order to test the validity
of this classifier, we conduct a complete set of
experiments using both synthetic and
real time series databases and a set of 5
common distance measures. The positive
results obtained by the designed classification
framework for various
performance measures indicate that the
proposed methodology is useful to simplify the
process of
distance selection in time series clustering task
s.
TTA-JD-
C1562
Splitting Large Medical
Data Sets based on
Normal Distribution in
Cloud Environment
The surge of medical and e-commerce
applications has generated tremendous amount
of data, which brings people to a so-called
“Big Data” era. Different from
traditional large data sets, the term “Big Data”
not only means the large size of data volume
but also indicates the high velocity
of data generation. However,
current data mining and analytical techniques
are facing the challenge of dealing with large
volume data in a short period of time. This
paper explores the efficiency of utilizing
the Normal Distribution (ND) method
for splitting and
processing large volume medical data in cloud
environment, which can provide representative
information in the split data sets. The ND-
based new model consists of two stages. The
first stage adopts the ND method
for large data sets splitting and processing,
which can reduce the volume of data sets. The
second stage implements the ND-based model
in a cloud computing infrastructure for
allocating the split data sets. The experimental
results show substantial efficiency gains of the
proposed method over the conventional
IEEE 2015
methods without splitting data into small
partitions. The ND-based method can generate
representative data sets, which can offer
efficient solution for large data processing.
The split data sets can be processed in parallel
in Cloud computing environment.
TTA-JD-
C1563
Steganography Using
Reversible Texture
Synthesis
We propose a novel approach
for steganography using a reversible texture sy
nthesis. A texture synthesis process resamples
a smaller texture image, which synthesizes a
new texture image with a similar local
appearance and an arbitrary size. We weave
the texture synthesis process into
steganography to conceal secret messages. In
contrast to using an existing cover image to
hide messages, our algorithm conceals the
source texture image and embeds secret
messages through the process
of texture synthesis. This allows us to extract
the secret messages and source texture from a
stego synthetic texture. Our approach offers
three distinct advantages. First, our scheme
offers the embedding capacity that is
proportional to the size of the
stego texture image. Second, a steganalytic
algorithm is not likely to defeat our
steganographic approach. Third,
the reversible capability inherited from our
scheme provides functionality, which allows
recovery of the source texture. Experimental
results have verified that our proposed
algorithm can provide various numbers of
embedding capacities, produce a visually
plausible texture images, and recover the
source texture.
IEEE 2015
TTA-JD-
C1564
TASCTopic-Adaptive
Sentiment
Classification on
Dynamic Tweets Topic
Model for Graph Mining
Sentiment classification is a topic-sensitive
task, i.e., a classifier trained from
one topic will perform worse on another. This
is especially a problem for
the tweets sentiment analysis. Since
the topics in Twitter are very diverse, it is
impossible to train a universal classifier for
all topics. Moreover, compared to product
review, Twitter lacks data labeling and a rating
IEEE 2015
mechanism to acquire sentiment labels. The
extremely sparse text of tweets also brings
down the performance of
a sentiment classifier. In this paper, we
propose a semi-supervised topic-
adaptive sentiment classification (TASC) mod
el, which starts with a classifier built on
common features and mixed labeled data from
various topics. It minimizes the hinge loss to
adapt to unlabeled data and features
including topic-related sentiment words,
authors' sentiments and sentiment connections
derived from“@” mentions of tweets, named
astopic-adaptive features. Text and non-text
features are extracted and naturally split into
two views for co-training. The TASC learning
algorithm updates topic-adaptive features
based on the collaborative selection of
unlabeled data, which in turn helps to select
more reliable tweets to boost the performance.
We also design the adapting model along a
timeline (TASC-t) for dynamic tweets. An
experiment on 6topics from
published tweet corpuses demonstrates
that TASC outperforms other well-known
supervised and ensemble classifiers. It also
beats those semi-supervised learning methods
without feature adaption. Meanwhile, TASC-t
can also achieve impressive accuracy and F-
score. Finally, with timeline visualization of
“river” graph, people can intuitively grasp the
ups and downs of sentiments' evolvement, and
the intensity by color gradation.
TTA-JD-
C1565
Towards Effective Bug
Triage with Software
Data Reduction
Techniques
Software companies spend over 45 percent of
cost in dealing with software bugs. An
inevitable step of fixing bugs is bug triage,
which aims to correctly assign a developer to a
new bug. To decrease the time cost in manual
work, text classification techniques are applied
to conduct automatic bug triage. In this paper,
we address the problem
of data reduction for bug triage, i.e., how to
reduce the scale and improve the quality
of bug data. We combine instance selection
with feature selection to simultaneously
IEEE 2015
reduce data scale on the bug dimension and the
word dimension. To determine the order of
applying instance selection and feature
selection, we extract attributes from
historical bug data sets and build a predictive
model for a new bug data set. We empirically
investigate the performance of data reduction
on totally 600,000 bug reports of two large
open source projects, namely Eclipse and
Mozilla. The results show that
our data reduction can effectively reduce
the data scale and improve the accuracy of
bug triage. Our work provides an approach to
leveraging techniques on data processing to
form reduced and high-
quality bug data in software development and
maintenance.
DOMAIN : CLOUD COMPUTING
TTA-DC-
C1501
An Access Control
Model for Online Social
Networks Using User-
to-User Relationships
Users and resources
in online social networks (OSNs) are
interconnected via various types of
relationships. In particular, user-to-
user relationships form the basis of the OSN
structure, and play a significant role in
specifying and enforcing access control.
Individual users and the OSN provider should
be enabled to specify which access can be
granted in terms of existing relationships. In
this paper, we propose a novel user-to-
user relationship-
based access control (UURAC) model for
OSN systems that utilizes regular expression
notation for such policy
specification. Access control policies on users
and resources are composed in terms of
requested action, multiple relationship types,
the starting point of the evaluation, and the
number of hops on the path. We present two
path checking algorithms to determine whether
the required relationship path between users
for a given access request exists. We validate
the feasibility of our approach by
implementing a prototype system and
evaluating the performance of these two
IEEE 2015
algorithms.
TTA-DC-
C1502
Cost-Effective
Authentic and
Anonymous Data
Sharing with Forward
Security
Data sharing has never been easier with the
advances of cloud computing, and an
accurate analysis on
the shared data provides an array of
benefits to both the society and
individuals. Data sharing with a large
number of participants must take into
account several issues, including
efficiency, data integrity and privacy
of data owner. Ring signature is a promising
candidate to construct an anonymous and
authentic data sharing system. It allows
a data owner to anonymously authenticate
his data which can be put into the cloud for
storage or analysis purpose. Yet the costly
certificate verification in the traditional public
key infrastructure (PKI) setting becomes a
bottleneck for this solution to be scalable.
Identity-based (ID-based) ring signature,
which eliminates the process of certificate
verification, can be used instead. In this
paper, we further enhance the security of ID-
based ring signature by providing
forward security: If a secret key of any user
has been compromised, all previous
generated signatures that include this user
still remain valid. This property is especially
important to any large scale data
sharing system, as it is impossible to ask
all data owners to reauthenticate
their data even if a secret key of one single
user has been compromised. We provide a
concrete and efficient instantiation of our
scheme, prove its security and provide an
implementation to show its practicality.
IEEE 2015
TTA-DC-
C1503
SEDASC - shared data
authority Scheme
Cloud storage is an application of clouds that
liberates organizations from establishing in-
house data storage systems. However, cloud
storage gives rise to security concerns. In case
of group-shared data, the data face both cloud-
specific and conventional insider threats.
Secure data sharing among a group that
counters insider threats of legitimate yet
malicious users is an important research issue.
In this paper, we propose the Secure Data
Sharing in Clouds (SeDaSC) methodology that
provides: 1) data confidentiality and integrity;
2) access control; 3) data sharing (forwarding)
without using compute-intensive reencryption;
4) insider threat security; and 5) forward and
backward access control. The SeDaSC
methodology encrypts a file with a single
encryption key. Two different key shares for
each of the users are generated, with the user
only getting one share. The possession of a
single share of a key allows the SeDaSC
methodology to counter the insider threats. The
other key share is stored by a trusted third
party, which is called the cryptographic server.
The SeDaSC methodology is applicable to
conventional and mobile cloud computing
environments. We implement a working
prototype of the SeDaSC methodology and
evaluate its performance based on the time
consumed during various operations. We
formally verify the working of SeDaSC by
using high-level Petri nets, the Satisfiability
Modulo Theories Library, and a Z3 solver. The
results proved to be encouraging and show that
SeDaSC has the potential to be effectively
used for secure data sharing in the cloud.
IEEE 2015
TTA-DC-
C1504
A Computational
Dynamic Trust Model
for User Authorization
Development of authorization mechanisms for
secure information access by a large
community of users in an open environment is
an important problem in the ever-growing
Internet world. In this paper we propose
a computational dynamic trust model for user a
uthorization, rooted in findings from social
science. Unlike most
existing computational trust models,
IEEE 2015
this model distinguishes trusting belief in
integrity from that in competence in different
contexts and accounts for subjectivity in the
evaluation of a particular trustee by different
trusters. Simulation studies were conducted to
compare the performance of the proposed
integrity belief model with
other trust models from the literature for
different user behavior patterns. Experiments
show that the proposed model achieves higher
performance than other models especially in
predicting the behavior of unstable users.
TTA-DC-
C1505
Shared Authority Based
Privacy-Preserving
Authentication Protocol
in Cloud Computing
Cloud computing is an emerging data
interactive paradigm to realize users' data
remotely stored in an
online cloud server. Cloud services provide
great conveniences for the users to enjoy the
on-demand cloud applications without
considering the local infrastructure limitations.
During the data accessing, different users may
be in a collaborative relationship, and thus
data sharing becomes significant to achieve
productive benefits. The existing security
solutions mainly focus on the authentication to
realize that a user's privative data cannot be
illegally accessed, but neglect a
subtle privacy issue during a user challenging
the cloud server to request other users for
data sharing. The challenged access request
itself may reveal the user's privacy no matter
whether or not it can obtain the data access
permissions. In this paper, we propose
a shared authority based privacy-
preserving authenticationprotocol (SAPA) to
address above privacy issue for cloud storage.
In the SAPA, 1) shared access authority is
achieved by anonymous access request
matching mechanism with security and privacy
considerations (e.g., authentication, data
anonymity, user privacy, and forward
security); 2) attribute based access control is
adopted to realize that the user can only access
its own data fields; 3) proxy re-encryption is
applied to provide data sharing among the
multiple users. Meanwhile, universal
IEEE 2015
composability (UC) model is established to
prove that the SAPA theoretically has the
design correctness. It indicates that the
proposed protocol is attractive for multi-user
collaborative cloud applications.
TTA-DC-
C1506
Provable Multicopy
Dynamic Data
Possession in Cloud
Computing Systems
Increasingly more and more organizations are
opting for outsourcing data to
remote cloud service providers (CSPs).
Customers can rent the CSPs storage
infrastructure to store and retrieve almost
unlimited amount of data by paying fees
metered in gigabyte/month. For an increased
level of scalability, availability, and durability,
some customers may want their data to be
replicated on multiple servers across
multiple data centers. The more copies the
CSP is asked to store, the more fees the
customers are charged. Therefore, customers
need to have a strong guarantee that the CSP is
storing all data copies that are agreed upon in
the service contract, and all these copies are
consistent with the most recent modifications
issued by the customers. In this paper, we
propose a map-based provable
multicopy dynamic data possession (MB-
PMDDP) scheme that has the following
features: 1) it provides an evidence to the
customers that the CSP is not cheating by
storing fewer copies; 2) it supports outsourcing
of dynamic data, i.e., it supports block-level
operations, such as block modification,
insertion, deletion, and append; and 3) it
allows authorized users to seamlessly access
the file copies stored by the CSP. We give a
comparative analysis of the proposed MB-
PMDDP scheme with a reference model
obtained by extending
existing provable possession of dynamic single
-copy schemes. The theoretical analysis is
validated through experimental results on a
commercial cloud platform. In addition, we
show the security against colluding servers,
and discuss how to identify corrupted copies
by slightly modifying the proposed scheme.
IEEE 2015
TTA-DC-
C1507
My Privacy My Decision
- Control of Photo
Sharing on Online
Social Networks
Photo sharing is an attractive feature which
popularizes Online Social Networks (OSNs).
Unfortunately, it may leak users’ privacy if
they are allowed to post, comment, and tag
a photo freely. In this paper, we attempt to
address this issue and study the scenario when
a user shares a photo containing individuals
other than himself/herself (termed co-photo for
short). To prevent possible privacy leakage of
a photo, we design a mechanism to enable each
individual in a photo be aware of the posting
activity and participate in the decision making
on the photo posting. For this purpose, we
need an efficient facial recognition (FR)
system that can recognize everyone in
the photo. However, more demanding privacy
setting may limit the number of
the photos publicly available to train the FR
system. To deal with this dilemma, our
mechanism attempts to utilize users’
private photos to design a personalized FR
system specifically trained to differentiate
possible photo co-owners without leaking
their privacy. We also develop a distributed
consensus based method to reduce the
computational complexity and protect the
private training set. We show that our system
is superior to other possible approaches in
terms of recognition ratio and efficiency. Our
mechanism is implemented as a proof of
concept Android application on Facebook’s
platform.
IEEE 2015
TTA-DC-
C1508
A Profit Maximization
Scheme with
Guaranteed Quality of
Service in Cloud
Computing
As an effective and efficient way to
provide computing resources and services to
customers on demand, cloud computing has
become more and more popular.
From cloud service providers'
perspective, profit is one of the most important
considerations, and it is mainly determined by
the configuration of a cloud service platform
under given market demand. However, a single
long-term renting scheme is usually adopted to
configure a cloud platform, which
cannot guarantee the service quality but leads
to serious resource waste. In this paper, a
IEEE 2015
double resource renting scheme is designed
firstly in which short-term renting and long-
term renting are combined aiming at the
existing issues. This double
renting scheme can
effectively guarantee the quality of service of
all requests and reduce the resource waste
greatly. Secondly, a service system is
considered as an M/M/m+D queuing model
and the performance indicators that affect
the profit of our double renting scheme are
analyzed, e.g., the average charge, the ratio of
requests that need temporary servers, and so
forth. Thirdly, a profit maximization problem
is formulated for the double
renting scheme and the optimized
configuration of a cloud platform is obtained
by solving the profit maximization problem.
Finally, a series of calculations are conducted
to compare the profit of our
proposed scheme with that of the single renting
scheme. The results show that our scheme can
not only guarantee the service quality of all
requests, but also obtain more profit than the
latter.
TTA-DC-
C1509
Attribute-based Access
Control with Constant-
size Ciphertext in Cloud
Computing
With the popularity of cloud computing, there
have been increasing concerns about its
security and privacy. Since
the cloud computing environment is distributed
and untrusted, data owners have to encrypt
outsourced data to enforce confidentiality.
Therefore, how to achieve practicable access
control of encrypted data in an untrusted
environment is an urgent issue that needs to be
solved. Attribute-Based Encryption (ABE) is a
promising scheme suitable
for access control in cloud storage systems.
This paper proposes a hierarchical attribute-
based access control scheme with constant-size
Ciphertext. The scheme is efficient because the
length of Ciphertext and the number of bilinear
pairing evaluations to a constant are fixed. Its
computation cost in encryption and decryption
algorithms is low. Moreover, the hierarchical
authorization structure of our scheme reduces
IEEE 2015
the burden and risk of a single authority
scenario. We prove the scheme is of CCA2
security under the decisional q-Bilinear Diffie-
Hellman Exponent assumption. In addition, we
implement our scheme and analyse its
performance. The analysis results show the
proposed scheme is efficient, scalable, and
fine-grained in dealing with access control for
outsourced data in cloud computing.
TTA-DC-
C1510
Bidding Strategies for
Spot Instances in Cloud
Computing Markets
In recent times, spot pricing - a dynamic
pricing scheme - is becoming increasingly
popular for cloud services. This new pricing
format, though efficient in terms of cost
and resource use, has added to the
complexity of decision making for
typical cloud computing users. To
recommend bidding strategies in
spot markets, we use a simulation study to
understand the implications that provider-
recommended strategies have
for cloud users. We use data based on
Amazon's
Elastic Compute Cloud spot market to
provide users with guidelines when
considering tradeoffs between cost, wait
time, and interruption rates.
IEEE 2015
TTA-DC-
C1511
CHARM - A Cost-
efficient Multi-cloud
Data Hosting Scheme
with High Availability
Nowadays, more and more enterprises and
organizations are hosting their data into
the cloud, in order to reduce the IT
maintenance cost and enhance
the data reliability. However, facing the
numerous cloud vendors as well as their
heterogeneous pricing policies, customers may
well be perplexed with which cloud(s) are
suitable for storing their data and
what hosting strategy is cheaper. The general
status quo is that customers usually put
their data into a single cloud (which is subject
to the vendor lock-in risk) and then simply
trust to luck. Based on comprehensive analysis
of various state-of-the-artcloud vendors, this
paper proposes a
novel data hosting scheme (named CHARM)
IEEE 2015
which integrates two key functions desired.
The first is selecting several suitable clouds
and an appropriate redundancy strategy to
store data with minimized monetary cost and
guaranteed availability. The second is
triggering a transition process to re-
distribute data according to the variations
of data access pattern and pricing of clouds.
We evaluate the performance
of CHARM using both trace-driven
simulations and prototype experiments. The
results show that compared with the major
existing schemes; CHARM not only saves
around 20 percent of monetary cost but also
exhibits sound adaptability to data and price
adjustments.
TTA-DC-
C1512
CloudArmor -
Supporting Reputation-
based Trust
Management for Cloud
Services
Trust management is one of the most
challenging issues for the adoption and growth
of cloud computing. The highly dynamic,
distributed, and non-transparent nature
of cloud services introduces several
challenging issues such as privacy, security,
and availability. Preserving consumers’
privacy is not an easy task due to the sensitive
information involved in the interactions
between consumers and
the trust management service.
Protecting cloud services against their
malicious users (e.g., such users might give
misleading feedback to disadvantage a
particular cloud service) is a difficult problem.
Guaranteeing the availability of
the trust management service is another
significant challenge because of the dynamic
nature of cloud environments. In this article,
we describe the design and implementation
of CloudArmor, a reputation-
based trust management framework that
provides a set of functionalities to
deliver Trust as a Service (TaaS), which
includes i) a novel protocol to prove the
credibility of trust feedbacks and preserve
users’ privacy, ii) an adaptive and robust
credibility model for measuring the credibility
of trust feedbacks to
IEEE 2015
protect cloud services from malicious users
and to compare the trustworthiness
of cloud services, and iii) an availability model
to manage the availability of the decentralized
implementation of
the trust management service. The feasibility
and benefits of our approach have been
validated by a prototype and experimental
studies using a collection of real-world
trust feedbacks on cloud services.
TTA-DC-
C1513
DaSCE - Data Security
for Cloud Environment
with Semi-Trusted
Third Party
Off-site data storage is an application
of cloud that relieves the customers from
focusing on data storage system. However,
outsourcing data to a third-party administrative
control entails serious
security concerns. Data leakage may occur due
to attacks by other users and machines in
the cloud. Wholesale of data by cloud service
provider is yet another problem that is faced in
the cloud environment. Consequently, high-
level of security measures is required. In this
paper, we
propose DataSecurity for Cloud Environment
with Semi-Trusted Third Party (DaSCE),
a data security system that provides (a) key
management (b) access control, and (c) file
assured deletion. The DaSCE utilizes Shamir’s
(k, n) threshold scheme to manage the keys,
where k out of n shares are required to
generate the key. We use multiple key
managers, each hosting one share of key.
Multiple key managers avoid single point of
failure for the cryptographic keys. We (a)
implement a working prototype of DaSCE and
evaluate its performance based on the time
consumed during various operations, (b)
formally model and analyze the working
of DaSCE using High Level Petri nets
(HLPN), and (c) verify the working of
DaSCE using Satisfiability Modulo Theories
Library (SMT-Lib) and Z3 solver. The results
reveal that DaSCE can be effectively used
for security of outsourced data by employing
key management, access control, and file
assured deletion.
IEEE 2015
TTA-DC-
C1514
Data as a Currency and
Cloud-Based Data
Lockers
With large data volumes being generated
through Google search, Facebook, Twitter,
Instagram, and the increasingly instrumented
physical world (with embedded sensors), the
authors discuss whether such data can be the
basis of a new transactional relationship
between people and companies in which both
sides benefit from new products and services
and increased economic growth. However, the
key distinction from previous discussions is
whether the existence of a
global cloud computing industry (consisting of
datacenters located in different parts of the
world) can be used to facilitate such
transactional relations, with awareness
of data privacy and access management. The
authors propose the use of data as a currency,
to enable consumers to directly monetize their
own data and request services (based on the
"value" their data holds within a marketplace).
IEEE 2015
TTA-DC-
C1515
Dynamic Weight-Based
Individual Similarity
Calculation for
Information Searching
in Social Computing
In the social computing environment, the
complete information about an individual is
usually distributed in
heterogeneous social networks, which are
presented as linked data. Synthetically
recognizing and integrating these distributed
and heterogeneous data for
efficiently information searching is an
important but challenging work. In this paper,
a dynamic weight (DW)-
based similarity calculation is proposed to
recognize and integrate
similar individuals from distributed data
environments. First, each link of an
individual is weighted by applying DW. Then,
a semantic similarity metric is proposed to
combine the DW into similarity calculation.
Then, a searching system framework for
a similarity-based individual is designed and
tested in real-world data sets. Finally, massive
experiments are conducted both in benchmark
and real-world social community data sets. The
results show that our approach can produce a
IEEE 2015
good result in
similar individual searching in social networks.
In addition, it performs significantly better
than the existing state-of-the-art approaches in
similar individual searching.
TTA-DC-
C1516
Efficient audit service
outsourcing for data
integrity in clouds
Cloud computing that provides elastic
computing and storage resource on demand
has become increasingly important due to the
emergence of “big data”. Cloud computing
resources are a natural fit for processing
big data streams as they allow
big data application to run at a scale which is
required for handling its complexities
(data volume, variety and velocity). With
the data no longer under users' direct
control, data security in cloud computing is
becoming one of the most concerns in the
adoption of cloud computing resources. In
order to improve data reliability and
availability, storing multiple replicas along
with original datasets is a common strategy
for cloud service providers.
Public data auditing schemes allow users to
verify their outsourced data storage without
having to retrieve the whole dataset. However,
existing data auditing techniques suffers from
efficiency and security problems. First, for
dynamic datasets with multiple replicas, the
communication overhead for update
verifications is very large, because each update
requires updating of all replicas, where
verification for each update requires O(log n )
communication complexity. Second, existing
schemes cannot provide public auditing and
authentication of block indices at the same
time. Without authentication of block indices,
the server can build a valid proof based
on data blocks other than the blocks client
requested to verify. In order to address these
problems, in this paper, we present a novel
public auditing scheme named MuR-DPA. The
new scheme incorporated a novel
authenticated data structure (ADS) based on
the Merkle hash tree (MHT), which we call
MR-MHT. To support full
IEEE 2015
dynamic data updates and authentication of
block indices, we included rank and level
values in computation of MHT nodes. In
contrast to existing schemes, level values of
nodes in MR-MHT are assigned in a top-down
order, and all replica blocks for
each data block are organized into a same
replica sub-tree. S- ch a configuration
allows efficient verification of updates for
multiple replicas. Compared to
existing integrity verification and
public auditing schemes, theoretical analysis
and experimental results show that the
proposed MuR-DPA scheme can not only
incur much less communication overhead for
both update verification
and integrity verification of cloud datasets with
multiple replicas, but also provide enhanced
security against dishonest cloud
service providers.
TTA-DC-
C1517
Generic and Efficient
Constructions of
Attribute-Based
Encryption with
Verifiable Outsourced
Decryption
Attribute-based encryption (ABE) provides a
mechanism for complex access control over
encrypted data. However in most ABE
systems, the ciphertext size and
the decryption overhead, which grow with the
complexity of the access policy, are becoming
critical barriers in applications running on
resource-limited
devices. Outsourcing decryption of ABE
ciphertexts to a powerful third party is a
considerable manner to solve this problem.
Since the third party is usually believed to be
untrusted, the security requirements of ABE
with outsourced decryption should include
privacy and verifiability. Namely, any
adversary including the third party should
learn nothing about the encrypted message,
and the correctness of
the outsourced decryption is supposed to be
verified efficiently. We propose generic
constructions of CPA-secure and RCCA-
secure ABE systems
with verifiable outsourced decryption from
CPA-secure ABE with outsourced decryption,
IEEE 2015
respectively. We also instantiate our CPA-
secure construction in the standard model and
then show an implementation of this
instantiation. The experimental results show
that, compared with the existing scheme, our
CPA-secure construction has more compact
ciphertext and less computational costs.
Moreover, the techniques involved in the
RCCA-secure construction can be applied in
generally constructing CCA-secure ABE,
which we believe to be of independent interest.
TTA-DC-
C1518
Group Key Agreement
with Local Connectivity
In this paper, we study
a group key agreement problem where a user is
only aware of his neighbors while
the connectivity graph is arbitrary. In our
problem, there is no centralized initialization
for users. A group key agreement with these
features is very suitable for social networks.
Under our setting, we construct two efficient
protocols with passive security. We obtain
lower bounds on the round complexity for this
type of protocol, which demonstrates that our
constructions are round efficient. Finally, we
construct an actively secure protocol from a
passively secure one.
IEEE 2015
TTA-DC-
C1519
Hybrid cloud approach
for secure authorized
seduplication
Data deduplication is one of important data
compression techniques for eliminating
duplicate copies of repeating data, and has
been widely used in cloud storage to reduce
the amount of storage space and save
bandwidth. To protect the confidentiality of
sensitive data while supporting deduplication,
the convergent encryption technique has been
proposed to encrypt the data before
outsourcing. To better protect data security,
this paper makes the first attempt to formally
address the problem of authorized data
deduplication. Different from traditional
deduplication systems, the differential
privileges of users are further considered in
duplicate check besides the data itself. We also
present several new deduplication
constructions supporting authorized duplicate
check in a hybrid cloud architecture. Security
IEEE 2015
analysis demonstrates that our scheme
is secure in terms of the definitions specified in
the proposed security model. As a proof of
concept, we implement a prototype of our
proposed authorized duplicate check scheme
and conduct test bed experiments using our
prototype. We show that our
proposed authorized duplicate check scheme
incurs minimal overhead compared to normal
operations.
TTA-DC-
C1520
Performing Initiative
Data Prefetching in
Distributed File
Systems for Cloud
Computing
This paper presents
an initiative data prefetching scheme on the
storage servers in distributed file
systems for cloud computing. In
this prefetching technique, the client machines
are not substantially involved in the process
of data prefetching, but the storage servers can
directly prefetch the data after analyzing the
history of disk I/O access events, and then send
the prefetched data to the relevant client
machines proactively. To put this technique to
work, the information about client nodes is
piggybacked onto the real client I/O requests,
and then forwarded to the relevant storage
server. Next, two prediction algorithms have
been proposed to forecast future block access
operations for directing what datashould be
fetched on storage servers in advance. Finally,
the prefetched data can be pushed to the
relevant client machine from the storage
server. Through a series of evaluation
experiments with a collection of application
benchmarks, we have demonstrated that our
presented initiative prefetching technique can
benefit distributed file systems for cloud envir
onments to achieve better I/O performance. In
particular, configuration-limited client
machines in the cloud are not responsible for
predicting I/O access operations, which can
definitely contribute to
preferable system performance on them.
IEEE 2015
TTA-DC-
C1521
Privacy protection for
wirelsess sensor
medical data
In recent years, wireless sensor networks have
been widely used in healthcare applications,
such as hospital and home patient monitoring.
IEEE 2015
Wireless medical sensor networks are more
vulnerable to eavesdropping, modification,
impersonation and replaying attacks than the
wired networks. A lot of work has been done
to secure wireless medical sensor networks.
The existing solutions can protect the
patient data during transmission, but cannot
stop the inside attack where the administrator
of the patient database reveals the sensitive
patient data. In this paper, we propose a
practical approach to prevent the inside attack
by using multiple data servers to store
patient data. The main contribution of this
paper is securely distributing the patient data in
multiple data servers and employing the
Paillier and ElGamal cryptosystems to perform
statistic analysis on the patient data without
compromising the patients’ privacy.
TTA-DC-
C1522
Quality-assured
Secured Load Sharing
in Mobile Cloud
Networking
Environment
In mobile cloud networks (MCNs),
a mobile user is connected with a cloud server
through a network gateway, which is
responsible for providing the required quality-
of-service (QoS) to the users. If a user
increases its service demand, the connecting
gateway may fail to provide the requested QoS
due to the overloaded demand, while the other
gateways remain under loaded. Due to the
increase in load in one gateway,
the sharing of load among all the gateways is
one of the prospective solutions for providing
QoS-guaranteed services to the mobile users.
Additionally, if a user misbehaves, the
situation becomes more challenging. In this
paper, we address the problem of QoS-
guaranteed secured service provisioning in
MCNs. We design a utility maximization
problem for quality-
assured secured loadsharing (QuaLShare) in
MCN, and determine its optimal solution using
auction theory. In QuaLShare, the overloaded
gateway detects the misbehaving gateways,
and, then, prevents them from participating in
the auction process. Theoretically, we
characterize both the problem and the solution
approaches in an MCN environment. Finally,
IEEE 2015
we investigate the existence of Nash
Equilibrium of the proposed scheme. We
extend the solution for the case of multiple
users, followed by theoretical analysis.
Numerical analysis establishes the correctness
of the proposed algorithms.
TTA-DC-
C1523
Secure Audit Service
by Using TPA for Data
Integrity in Cloud
System
Cloud service is not only for store
the data in cloud but also for
shared data over cloud for users. The
integrity of data on the cloud be easily lost or
damaged. To ensure cloud storage correctness
based on distributed
storage integrity auditing mechanism it helps
to secure and efficient operations on cloud data
done by third party auditor (TPA). The third
party auditor utilizing the ring signature and
keyed hash based message authenticating code
for checking integrity. The data privacy and
identity privacy on
shared data is secured using private key
encryption during auditing process by public
verifier. In existing process the data freshness
is not proven. So, we propose HMAC
mechanism to protect the metadata
secrecy, integrity, authentication on
shared data in the cloud storage. This also
supports random checking process by the
public verifiers instead of checking the
entire data on the cloud.
Ouraudit system define
the data freshness by secrecy, integrity, and
authentication of metadata and also supports
low computation and communication, less
extra storage for audit the metadata.
IEEE 2015
TTA-DC-
C1524
Secure Data
Transmission using
Stegnography and
encryption technique
Transmission of important data like text,
images, video, etc over the internet is increases
now days hence it's necessary
to use of secure methods for multimedia data.
Image encryption is most secure than other
multimedia components encryption because
some inherent properties such as higher data
capacity and high similarity between pixels.
The older encryption techniques such as AES,
DES, RTS are not suitable for
IEEE 2015
highly secure data transmission on wireless
media. Thus we combine the chaotic theory
and cryptography to form an
valuable technique for information security. In
the first stage, a user encrypts the original
input image using chaotic map theory. After
that data-hider compresses the LSB bits of the
encrypted image using a data-hiding key to
make space for accommodate some more data.
In now day's image encryption is chaos based
for some unique characteristics such as
correlation between neighboring pixels,
sensitivity to initial conditions, non-
periodicity, and control parameters. There are
number of image encryption algorithms based
on chaotic maps have been implemented some
of them are time consuming, complex and
some have very little key space. In this paper
we implement three non linier differential
chaos based encryption technique where for
the first time 3 differential chaoses is used for
position permutation and value
transformation technique. In the data hiding
phase, data which is in the binary forms
embedded into encrypted image by using least
significant bit algorithm. We tabulate
correlation coefficient value both horizontal
and vertical position for cipher and original
image and compare performance of our
Method with some existing methods. We also
discuss about different types of attack, key
sensitivity, and key space of our proposed
approach. The given approach is very simple,
fast, accurate and it have been applied together
as a double algorithm in order to serve best
results in highly unsecure and complex
environment. Each of these algorith- s are been
discussed one by one below.
TTA-DC-
C1525
Smart phone instant
messanger by using
google cloud
messaging
Two of the most important drivers of current
telecommunication markets are the
development of Rich Communication Services
(RCS) and cloud computing. The challenges of
delivering these new services on a cloud-based
architecture are not only on the technical side,
they also concern the definition of feasible
IEEE 2015
business models for all the involved agents and
the definition and negotiation of proper service
level agreements at different levels. This work
proposes to provide telecommunication
operators with cloud-based infrastructures
capable of offering customers innovative and
reliable rich communication services based on
their phone numbers that cannot be
replicated by the Internet competitors in terms
of flexibility, scalability or security. This
Obliquity as a Service model (MaaS) allows
telecommunication providers to maintain
relevance for their clients offering not only the
common communication services
(instant messaging, group communication and
chat, file sharing or enriched calls services) but
also a new kind of mobiquiotus services
related to mobile marketing, smart places,
Internet of Things or health care, exploiting all
the competitive advantages associated to the
development of a vertical cloud in a dynamic
and heterogeneous ecosystem. In addition, the
infrastructure layer needed to support the new
proposed model is defined and a first prototype
is deployed and evaluated with two
real use cases.
TTA-DC-
C1526
Social
Recommendation with
Cross-Domain
Transferable
Knowledge
Recommender systems can suffer from data
sparsity and cold start issues.
However, social networks, which enable users
to build relationships and create different types
of items, present an unprecedented opportunity
to alleviate these issues. In this paper, we
represent a social network as a star-structured
hybrid graph centered on a social domain,
which connects with other item domains. With
this innovative representation,
useful knowledge from an
auxiliary domain can be transferred through
the social domain to a target domain. Various
factors of item transferability, including
popularity and behavioral consistency, are
IEEE 2015
determined. We propose a novel Hybrid
Random Walk (HRW) method, which
incorporates such factors, to
select transferable items in auxiliary domains,
bridge cross-domain knowledge with
the social domain, and accurately predict user-
item links in a target domain. Extensive
experiments on a real social dataset
demonstrate that HRW significantly
outperforms existing approaches.
TTA-DC-
C1527
Three-server swapping
for access
confidentiality
We propose an approach to
protect confidentiality of data and accesses to
them when data are stored and managed by
external providers, and hence not under direct
control of their owner. Our approach is based
on the use of distributed data allocation
among three independent servers and on a
dynamic re-allocation of data at every access.
Dynamic re-allocation is enforced
by swapping data involved in an access across
the servers in such a way that accessing a
given node implies re-allocating it to a
different server, then destroying the ability of
servers to build knowledge by
observing accesses. The use of three servers
provides uncertainty, to the eyes of the servers,
of the result of the swapping operation, even in
presence of collusion among them.
IEEE 2015
TTA-DC-
C1528
Trust and Compactness
in Social Network
Groups
Understanding the dynamics
behind group formation and evolution
in social networks is considered an
instrumental milestone to better describe how
individuals gather and form communities, how
they enjoy and share the platform contents,
how they are driven by their preferences/tastes,
and how their behaviors are influenced by
peers. In this context, the notion
of compactness of a social group is particularly
relevant. While the literature usually refers
to compactness as a measure to merely
IEEE 2015
determine how much members of a group are
similar among each other, we argue that the
mutual trustworthiness between the members
should be considered as an important factor in
defining such a term. In fact, trust has
profound effects on the dynamics
of group formation and their evolution:
individuals are more likely to join with and
stay in a group if they
can trust other group members. In this paper,
we propose a quantitative measure
of group compactness that takes into account
both the similarity and the trustworthiness
among users, and we present an algorithm to
optimize such a measure. We provide
empirical results, obtained from the
real social networks EPINIONS and CIAO,
that compare our notion of compactness versus
the traditional notion of user similarity, clearly
proving the advantages of our approach.
TTA-JC-
C1529
Public Integrity
Auditing for Shared
Dynamic Cloud Data
with Group User
Revocation
The advent of the cloud computing makes
storage outsourcing become a rising trend,
which promotes the secure
remote data auditing a hot topic that appeared
in the research literature. Recently some
research consider the problem of secure and
efficient public data integrity auditing for share
d dynamic data. However, these schemes are
still not secure against the collusion
of cloud storage server and
revoked group users during user revocation in
practical cloud storage system. In this paper,
we figure out the collusion attack in the exiting
scheme and provide an
efficient public integrity auditing scheme with
secure group user revocation based on vector
commitment and verifier-
local revocation group signature. We design a
concrete scheme based on the our scheme
definition. Our scheme supports the
public checking and
efficient user revocation and also some nice
properties, such as confidently, efficiency,
countability and traceability of
secure group user revocation. Finally, the
IEEE 2015
security and experimental analysis show that,
compared with its relevant schemes our
scheme is also secure and efficient.
TTA-JC-
C1530
Audit-Free Cloud
Storage via Deniable
Attribute-based
Encryption
Cloud storage services have become
increasingly popular. Because of the
importance of privacy,
many cloud storage encryption schemes have
been proposed to protect data from those who
do not have access. All such schemes assumed
that cloud storage providers are safe and
cannot be hacked; however, in practice, some
authorities (i.e., coercers) may
force cloud storage providers to reveal user
secrets or confidential data on the cloud, thus
altogether
circumventing storage encryption schemes. In
this paper, we present our design for a
new cloud storage encryption scheme that
enables cloudstorage providers to create
convincing fake user secrets to protect user
privacy. Since coercers cannot tell if obtained
secrets are true or not,
the cloud storage providers ensure that user
privacy is still securely protected.
IEEE 2015
TTA-JC-
C1531
CHARM - A Cost-
effcient Multi-cloud
Data Hosting Scheme
with High Availability
Nowadays, more and more enterprises and
organizations are hosting their data into
the cloud, in order to reduce the IT
maintenance cost and enhance
the data reliability. However, facing the
numerous cloud vendors as well as their
heterogenous pricing policies, customers may
well be perplexed with which cloud(s) are
suitable for storing their data and
what hosting strategy is cheaper. The general
status quo is that customers usually put
their data into a single cloud (which is subject
to the vendor lock-in risk) and then simply
trust to luck. Based on comprehensive analysis
of various state-of-the-artcloud vendors, this
paper proposes a
novel data hosting scheme (named CHARM)
which integrates two key functions desired.
The first is selecting several suitable clouds
and an appropriate redundancy strategy to
IEEE 2015
store data with minimized monetary cost and
guaranteed availability. The second is
triggering a transition process to re-
distribute data according to the variations
of data access pattern and pricing of clouds.
We evaluate the performance
of CHARM using both trace-driven
simulations and prototype experiments. The
results show that compared with the major
existing schemes, CHARM not only saves
around 20 percent of monetary cost but also
exhibits sound adaptability to data and price
adjustments.
TTA-JC-
C1532
Secure Auditing and
Deduplicating Data in
Cloud
As the cloud computing technology develops
during the last decade,
outsourcing data to cloud service for storage
becomes an attractive trend, which benefits in
sparing efforts on heavy data maintenance and
management. Nevertheless, since the
outsourced cloud storage is not fully
trustworthy, it raises security concerns on how
to realize data deduplication in cloud while
achieving integrity auditing. In this work, we
study the problem of
integrity auditing and secure deduplication
on cloud data. Specifically, aiming at
achieving both data integrity and deduplication
in cloud, we propose two secure systems,
namely SecCloud and SecCloud+. SecCloud
introduces an auditing entity with a
maintenance of a MapReduce cloud, which
helps clients generate data tags before
uploading as well as audit the integrity
of data having been stored in cloud. Compared
with previous work, the computation by user in
SecCloud is greatly reduced during the file
uploading and auditing phases. SecCloud+ is
designed motivated by the fact that customers
always want to encrypt their data before
uploading, and enables
integrity auditing and secure deduplication on
encrypted data.
IEEE 2015
TTA-JC-
C1533
A Profit Maximization
Scheme with
Guaranteed Quality of
As an effective and efficient way to
provide computing resources and services to
IEEE 2015
Service in Cloud
Computing
customers on demand, cloud computing has
become more and more popular.
From cloud service providers'
perspective, profit is one of the most important
considerations, and it is mainly determined by
the configuration of a cloud service platform
under given market demand. However, a single
long-term renting scheme is usually adopted to
configure a cloud platform, which
cannot guarantee the service quality but leads
to serious resource waste. In this paper, a
double resource renting scheme is designed
firstly in which short-term renting and long-
term renting are combined aiming at the
existing issues. This double
renting scheme can
effectively guarantee the quality of service of
all requests and reduce the resource waste
greatly. Secondly, a service system is
considered as an M/M/m+D queuing model
and the performance indicators that affect
the profit of our double renting scheme are
analyzed, e.g., the average charge, the ratio of
requests that need temporary servers, and so
forth. Thirdly, a profit maximization problem
is formulated for the double
renting scheme and the optimized
configuration of a cloud platform is obtained
by solving the profit maximization problem.
Finally, a series of calculations are conducted
to compare the profit of our
proposed scheme with that of the single renting
scheme. The results show that our scheme can
not only guarantee the service quality of all
requests, but also obtain more profit than the
latter.
TTA-JC-
C1534
Online Resource
Scheduling under
Concave Pricing for
Cloud Computing
With the booming cloud computing industry,
computational resources are readily and
elastically available to the customers. In order
to attract customers with various demands,
most Infrastructure-as-a-service
(IaaS) cloud service providers offer
several pricing strategies such as pay as you
go, pay less per unit when you use more (so
called volume discount), and pay even less
IEEE 2015
when you reserve. The diverse pricing schemes
among different IaaS service providers or even
in the same provider form a complex economic
landscape that nurtures the market
of cloud brokers. By strategically scheduling
multiple customers’ resource requests,
a cloud broker can fully take advantage of the
discounts offered by cloud service providers.
In this paper, we focus on how a broker can
help a group of customers to fully utilize the
volume discount pricing strategy offered
by cloud service providers through cost-
efficient online resource scheduling. We
present a randomized online stack-
centric scheduling algorithm (ROSA) and
theoretically prove the lower bound of its
competitive ratio. Three special cases of the
offline concave cost scheduling problem and
the corresponding optimal algorithms are
introduced. Our simulation shows that ROSA
achieves a competitive ratio close to the
theoretical lower bound under the special
cases. Trace-driven simulation using Google
cluster data demonstrates that ROSA is
superior to the
conventional online scheduling algorithms in
terms of cost saving.
TTA-JC-
C1535
The Value of
Cooperation -
Minimizing User Costs
in Multi-broker Mobile
Cloud Computing
Networks
We study the problem
of user cost minimization
in mobile cloud computing (MCC) networks.
We consider a MCC model where multiple
brokers assign cloud resources to mobile users.
The model is characterized by an
heterogeneous cloud architecture (which
includes a public cloud and a cloudlet) and by
the heterogeneous pricing strategies
of cloud service providers. In this setting, we
investigate two classes of cloud reservation
strategies, i.e., a competitive strategy, and a
compete-then-cooperate strategy as a
performance bound. We first study a purely
competitive scenario where brokers compete to
reserve computing resources from remote
public clouds (which are affected by long
delays) and from local cloudlets (which have
IEEE 2015
limited computational resources but short
delays). We provide theoretical results
demonstrating the existence of disagreement
points (i.e., the equilibrium reservation
strategy that no broker has incentive to deviate
unilaterally from) and convergence of the best
response strategies of the brokers to
disagreement points. We then consider the
scenario in which brokers agree to cooperate in
exchange for a lower average cost of
resources. We formulate a cooperative
problem where the objective is to minimize the
total average price of all brokers, under the
constraint that no broker should pay a price
higher than the disagreement price (i.e., the
competitive price).We design new globally
optimal solution algorithm to solve the
resulting non-convex cooperative problem,
based on a combination of the branch and
bound framework and of advanced convex
relaxation techniques. The resulting optimal
solution provides a lower bound on the
achievable user cost without complete
collusion among brokers. Compared with pure
competition, we found that i) noticeable
cooperative gains can be achieved over pure
competition in markets with a few brokers
only, and ii) the cooperative gain is only
marginal in crowded markets, i.e., with a high
number of brokers, hence there is n- clear
incentive for brokers to cooperate.
TTA-JC-
C1536
System of Systems for
Quality-of-Service
Observation and
Response in Cloud
Computing
Environments
As military, academic, and
commercial computing systems evolve from
autonomous entities that deliver
computing products into network centric
enterprise systems that deliver computing as
a service, opportunities emerge to
consolidate computing resources, software,
and information through cloud computing.
Along with these opportunities come
challenges, particularly to service providers
and operations centers that struggle to monitor
and manage quality of service (QoS) for these
services in order to meet
customer service commitments. Traditional
IEEE 2015
approaches fall short in addressing these
challenges because they examine QoS from a
limited perspective rather than from a system-
of-systems (SoS) perspective applicable to a
net-centric enterprise system in which any user
from any location can
share computing resources at any time. This
paper presents a SoS approach to enable QoS
monitoring, management, and response for
enterprise systems that deliver computing as
a service through
a cloud computing environment. A concrete
example is provided for application of this new
SoS approach to a real-world scenario (viz.,
distributed denial of service). Simulated results
confirm the efficacy of the approach.
TTA-JC-
C1537
A Computational
Dynamic Trust Model
for User Authorization
Development of authorization mechanisms for
secure information access by a large
community of users in an open environment is
an important problem in the ever-growing
Internet world. In this paper we propose
a computational dynamic trust model for user a
uthorization, rooted in findings from social
science. Unlike most
existing computational trust models,
this model distinguishes trusting belief in
integrity from that in competence in different
contexts and accounts for subjectivity in the
evaluation of a particular trustee by different
trusters. Simulation studies were conducted to
compare the performance of the proposed
integrity belief model with
other trust models from the literature for
different user behavior patterns. Experiments
show that the proposed model achieves higher
performance than other models especially in
predicting the behavior of unstable users.
IEEE 2015
TTA-JC-
C1538
Generic and Efficient
Constructions of
Attribute-Based
Encryption with
Verifiable Outsourced
Decryption
Attribute-based encryption (ABE) provides a
mechanism for complex access control over
encrypted data. However in most ABE
systems, the ciphertext size and
the decryption overhead, which grow with the
complexity of the access policy, are becoming
critical barriers in applications running on
IEEE 2015
resource-limited
devices. Outsourcing decryption of ABE
ciphertexts to a powerful third party is a
considerable manner to solve this problem.
Since the third party is usually believed to be
untrusted, the security requirements of ABE
with outsourced decryption should include
privacy and verifiability. Namely, any
adversary including the third party should
learn nothing about the encrypted message,
and the correctness of
the outsourced decryption is supposed to be
verified efficiently. We propose generic
constructions of CPA-secure and RCCA-
secure ABE systems
with verifiable outsourced decryption from
CPA-secure ABE with outsourced decryption,
respectively. We also instantiate our CPA-
secure construction in the standard model and
then show an implementation of this
instantiation. The experimental results show
that, compared with the existing scheme, our
CPA-secure construction has more compact
ciphertext and less computational costs.
Moreover, the techniques involved in the
RCCA-secure construction can be applied in
generally constructing CCA-secure ABE,
which we believe to be of independent interest.
TTA-JC-
C1539
Leveraging Data
Deduplication to
Improve the
Performance of Primary
Storage Systems in the
Cloud
With the explosive growth in data volume, the
I/O bottleneck has become an increasingly
daunting challenge for big data analytics in
the Cloud. Recent studies have shown that
moderate to high data redundancy clearly
exists in primary storage systems in the Cloud.
Our experimental studies reveal
that data redundancy exhibits a much higher
level of intensity on the I/O path than that on
disks due to relatively high temporal access
locality associated with small I/O
requests to redundant data. Moreover, directly
applying data deduplication to primary storage
systems in the Cloud will likely cause space
contention in memory and data fragmentation
on disks. Based on these observations, we
propose a performance-oriented
IEEE 2015
I/O deduplication, called POD, rather than a
capacity oriented I/O deduplication,
exemplified by iDedup, to improve the
I/O performance of primary storage systems in
the Cloud without sacrificing capacity savings
of the latter. POD takes a two-pronged
approach to improving theperformance of prim
ary storage systems and
minimizing performance overhead
of deduplication, namely, a request-based
selective deduplication technique, called
Select- Dedupe, to alleviate the data
fragmentation and an adaptive memory
management scheme, called iCache, to ease
the memory contention between the bursty
read traffic and the bursty write traffic. We
have implemented a prototype of POD as a
module in the Linux operating system. The
experiments conducted on our lightweight
prototype implementation of POD show that
POD significantly outperforms iDedup in the
I/Operformance measure by up to 87.9% with
an average of 58.8%. Moreover, our evaluation
results also show that POD achieves
comparable or better capacity savings than
iDedup.
TTA-JC-
C1540
Enabling Fine-grained
Multi-keyword Search
Supporting Classified
Sub-dictionaries over
Encrypted Cloud Data
Using cloud computing, individuals can store
their data on remote servers and
allow data access to public users through
the cloud servers. As the outsourced data are
likely to contain sensitive privacy information,
they are typically encrypted before uploaded to
the cloud. This, however, significantly limits
the usability of outsourced data due to the
difficulty of searching over the encrypted data.
In this paper, we address this issue by
developing the fine-grained multi-
keyword search schemes over encrypted
cloud data. Our original contributions are
three-fold. First, we introduce the relevance
scores and preference factors upon keywords
which enable the precise keyword search and
personalized user experience. Second, we
develop a practical and very efficient multi-
keyword search scheme. The proposed scheme
IEEE 2015
can support complicated logic search the
mixed “AND”, “OR” and “NO” operations of
keywords. Third, we further employ
the classified sub-dictionaries technique to
achieve better efficiency on index building,
trapdoor generating and query. Lastly, we
analyze the security of the proposed schemes
in terms of confidentiality of documents,
privacy protection of index and trapdoor, and
unlinkability of trapdoor. Through extensive
experiments using the real-world dataset, we
validate the performance of the proposed
schemes. Both the security analysis and
experimental results demonstrate that the
proposed schemes can achieve the same
security level comparing to the existing ones
and better performance in terms of
functionality, query complexity and efficiency.
TTA-JC-
C1541
On the Security of Data
Access Control for
Multiauthority Cloud
Storage Systems
Data access control has becoming a
challenging issue in cloud storage systems.
Some techniques have been proposed to
achieve the secure data access control in a
semitrusted cloud storage system. Recently,
K.Yang et al.proposed a
basic data access control scheme
for multiauthority cloud storage system (DAC-
MACS) and an
extensive data access control scheme (EDAC-
MACS). They claimed that the DAC-MACS
could achieve efficient decryption and
immediate revocation and the EDAC-MACS
could also achieve these goals even though non
revoked users reveal their Key Update Keys to
the revoked user. However, through our
cryptanalysis, the revocation security of both
schemes cannot be guaranteed. In this paper,
we first give two attacks on the two schemes.
By the first attack, the revoked user can
eavesdrop to obtain other users’ Key Update
Keys to update its Secret Key, and then it can
obtain proper Token to decrypt any secret
information as a non revoked user. In addition,
by the second attack, the revoked user can
intercept Ciphertext Update Key to retrieve its
ability to decrypt any secret information as a
IEEE 2015
non revoked user. Secondly, we propose a new
extensive DAC-MACS scheme (NEDAC-
MACS) to withstand the above two attacks so
as to guarantee more secure attribute
revocation. Then, formal cryptanalysis of
NEDAC-MACS is presented to prove
the security goals of the scheme. Finally, the
performance comparison among NEDAC-
MACS and related schemes is givento
demonstrate that the performance of NEDAC-
MACS is superior to that of DACC, and
relatively same as that of DAC-MACS.
TTA-JC-
C1542
Verifiable Auditing for
Outsourced Database
in Cloud Computing
The notion of database outsourcing enables the
data owner to delegate
the database management to a cloud service
provider (CSP) that provides
various database services to different users.
Recently, plenty of research work has been
done on the primitive of outsourced database.
However, it seems that no existing solutions
can perfectly support the properties of both
correctness and completeness for the query
results, especially in the case when the
dishonest CSP intentionally returns an empty
set for the query request of the user. In this
paper, we propose a
new verifiable auditing scheme for outsourced
database, which can simultaneously achieve
the correctness and completeness of search
results even if the dishonest CSP purposely
returns an empty set. Furthermore, we can
prove that our construction can achieve the
desired security properties even in the
encrypted outsourced database. Besides, the
proposed scheme can be extended to support
the dynamic database setting by incorporating
the notion of verifiable database with updates.
IEEE 2015
TTA-JC-
C1543
A Cost-Effective
Deadline-Constrained
Dynamic Scheduling
Algorithm for Scientific
Workflows in a Cloud
Environment
Cloud Computing, a distributed computing
paradigm, enables delivery of IT resources
over the Internet and follows the pay-as-you-
go billing model. Workflow scheduling is one
of the most challenging problems
in Cloud computing.
Although, workflow scheduling on distributed
IEEE 2015
systems like Grids and Clusters have been
extensively studied, however, these solutions
are not viable for a Cloud environment. It is
because, a Cloud environment differs from
other distributed environment in two major
ways: on-demand resource provisioning and
pay-as-you-go pricing model. Thus, to achieve
the true benefits of workflow orchestration
onto Cloud resources novel approaches that
can capitalize the advantages and address the
challenges specific to
a Cloud environment needs to be developed.
This work proposes a dynamic cost-
effective deadline-
constrained heuristic algorithm for scheduling
ascientific workflow in a public Cloud. The
proposed technique aims to exploit the
advantages offered by Cloud computing while
taking into account the virtual machine
performance variability and instance
acquisition delay to identify a just-in-
time schedule of
a deadline constrained scientific workflow at
lesser costs. Performance evaluation on some
well-known scientific workflows exhibit that
the proposed algorithm delivers better
performance in comparison to the current
state-of-the-art heuristics
TTA-JC-
C1544
A Hybrid Cloud
Approach for Secure
Authorized
Deduplication
Data deduplication is one of important data
compression techniques for eliminating
duplicate copies of repeating data, and has
been widely used in cloud storage to reduce
the amount of storage space and save
bandwidth. To protect the confidentiality of
sensitive data while supporting deduplication,
the convergent encryption technique has been
proposed to encrypt the data before
outsourcing. To better protect data security,
this paper makes the first attempt to formally
address the problem of authorized
data deduplication. Different from
traditional deduplication systems, the
differential privileges of users are further
considered in duplicate check besides the data
itself. We also present several new
IEEE 2015
deduplication constructions
supporting authorized duplicate check in
a hybrid cloud architecture. Security analysis
demonstrates that our scheme is secure in
terms of the definitions specified in the
proposed security model. As a proof of
concept, we implement a prototype of our
proposed authorized duplicate check scheme
and conduct test bed experiments using our
prototype. We show that our
proposed authorized duplicate check scheme
incurs minimal overhead compared to normal
operations.
TTA-JC-
C1545
A Secure and Dynamic
Multi-keyword Ranked
Search Scheme over
Encrypted CloudData
Due to the increasing popularity of cloud
computing, more and more data owners are
motivated to outsource their data to cloud
servers for great convenience and reduced cost
in data management. However, sensitive data
should be encrypted before outsourcing for
privacy requirements, which obsoletes data
utilization like keyword-based document
retrieval. In this paper, we present a secure
multi-
keyword ranked search scheme over encrypted
cloud data, which simultaneously supports
dynamic update operations like deletion and
insertion of documents. Specifically, the vector
space model and the widely-used TFIDF
model are combined in the index construction
and query generation. We construct a special
tree-based index structure and propose a
“Greedy Depth-first Search” algorithm to
provide efficient multi-keyword ranked search.
The secure kNN algorithm is utilized
to encrypt the index and query vectors, and
meanwhile ensure accurate relevance score
calculation between encrypted index and query
vectors. In order to resist statistical attacks,
phantom terms are added to the index vector
for blinding search results. Due to the use of
our special tree-based index structure, the
proposed scheme can achieve sub-
linear search time and deal with the deletion
and insertion of documents flexibly. Extensive
experiments are conducted to demonstrate the
IEEE 2015
efficiency of the proposed scheme.
TTA-JC-
C1546
A Universal Fairness
Evaluation Framework
for Resource Allocation
in Cloud Computing
In cloud computing, fairness is one of the most
significant indicators to
evaluate resource allocation algorithms, which
reveals whether each user is allocated as much
as that of all other users having the same
bottleneck. However, how fair
an allocation algorithm is remains an urgent
issue. In this paper, we propose
Dynamic Evaluation Framework for Fairness (
DEFF), a framework to evaluate the fairness of
an resource allocation algorithm. In
our framework, two sub-models, Dynamic
Demand Model (DDM) and Dynamic Node
Model (DNM), are proposed to describe the
dynamic characteristics of resource demand
and the computing node number
under cloud computing environment.
Combining Fairness on Dominant Shares and
the two sub-models above, we finally obtain
DEFF. In our experiment, we adopt several
typical resource allocation algorithms to prove
the effectiveness on fairness evaluation by
using the DEFF framework.
IEEE 2015
TTA-JC-
C1547
Aggressive Resource
Provisioning for
Ensuring QOS in
virtualized Environment
Elasticity has now become the elemental
feature of cloud computing as it enables the
ability to dynamically add or remove virtual
machine instances when workload changes.
However, effective
virtualized resource management is still one of
the most challenging tasks. When the workload
of a service increases rapidly, existing
approaches cannot respond to the growing
performance requirement efficiently because
of either inaccuracy of adaptation decisions or
the slow process of adjustments, both of which
may result in
insufficient resource provisioning. As a
consequence, the Quality of Service (QoS) of
the hosted applications may degrade and the
Service Level Objective (SLO) will be thus
violated. In this paper, we introduce SPRNT, a
novel resource management framework,
to ensure high-level QoS in the cloud
IEEE 2015
computing system. SPRNT utilizes
an aggressive resource provisioning strategy
which encourages SPRNT to substantially
increase the resource allocation in each
adaptation cycle when workload increases.
This strategy first provisions resources which
are possibly more than actual demands, and
then reduces the over-provisioned resources if
needed. By applying the aggressive strategy,
SPRNT can satisfy the increasing performance
requirement in the first place so that
the QoScan be kept at a high level. The
experimental results show that SPRNT
achieves up to 7.7× speedup in adaptation
time, compared with existing efforts. By
enabling quick adaptation, SPRNT limits the
SLO violation rate up to 1.3 percent even when
dealing with rapidly increasing workload.
TTA-JC-
C1548
An Intelligent Economic
Approach for Dynamic
Resource Allocation in
Cloud Services
With Inter-Cloud, distributed cloud and
open cloud exchange (OCX) emerging, a
comprehensive resource allocation approach is
fundamental to highly
competitive cloud market. Oriented to
infrastructure as a service (IaaS),
an intelligent economic approach for dynamic
resource allocation(IEDA) is proposed with the
improved combinatorial double auction
protocol devised to enable various kinds
of resources traded among multiple consumers
and multiple providers at the same time enable
task partitioning among multiple providers. To
make bidding and asking reasonable in each
round of the auction and determine eligible
transaction relationship among providers and
consumers, a price formation mechanism is
proposed, which is consisted of a back
propagation neural network (BPNN) based
price prediction algorithm and a price
matching algorithm. A reputation system is
proposed and integrated to exclude dishonest
participants from the cloud market. The winner
determination problem (WDP) is solved by the
improved paddy field algorithm (PFA).
Simulation results have shown that IEDA can
not only help maximize market surplus and
IEEE 2015
surplus strength but also encourage
participants to be honest.
TTA-JC-
C1549
ANGEL - Agent-Based
Scheduling for Real-
Time Tasks in
Virtualized Clouds
The success of cloud computing makes an
increasing number of real-time applications
such as signal processing and weather
forecasting run in the cloud.
Meanwhile, scheduling for real-time tasks is
playing an essential role for a cloud provider to
maintain its quality of service and enhance the
system’s performance. In this paper, we devise
a novel agent-based scheduling mechanism
in cloud computing environment to
allocate real-time tasks and dynamically
provision resources. In contrast to traditional
contract net protocols, we employ a
bidirectional announcement-bidding
mechanism and the collaborative process
consists of three phases, i.e., basic matching
phase, forward announcement-bidding phase
and backward announcement-bidding phase.
Moreover, the elasticity is sufficiently
considered while scheduling by dynamically
adding virtual machines to improve
schedulability. Furthermore, we design
calculation rules of the bidding values in both
forward and backward announcement-bidding
phases and two heuristics for selecting
contractors. On the basis of the bidirectional
announcement-bidding mechanism, we
propose an agent-based dynamic scheduling
algorithm named ANGEL for real-time,
independent and a periodic tasks in clouds.
Extensive experiments are conducted on
CloudSim platform by injecting random
synthetic workloads and the workloads from
the last version of the Google cloud trace logs
to evaluate the performance of our ANGEL.
The experimental results indicate
that ANGEL can efficiently solve the real-
time task scheduling problem
in virtualized clouds.
IEEE 2015
TTA-JC-
C1550
Attribute-based Access
Control with Constant-
size Ciphertext in Cloud
Computing
With the popularity of cloud computing, there
have been increasing concerns about its
security and privacy. Since
IEEE 2015
the cloud computing environment is distributed
and untrusted, data owners have to encrypt
outsourced data to enforce confidentiality.
Therefore, how to achieve practicable access
control of encrypted data in an untrusted
environment is an urgent issue that needs to be
solved. Attribute-Based Encryption (ABE) is a
promising scheme suitable
for access control in cloud storage systems.
This paper proposes a hierarchical attribute-
based access control scheme with constant-
sizeciphertext. The scheme is efficient because
the length of ciphertext and the number of
bilinear pairing evaluations to a constant are
fixed. Its computation cost in encryption and
decryption algorithms is low. Moreover, the
hierarchical authorization structure of our
scheme reduces the burden and risk of a single
authority scenario. We prove the scheme is of
CCA2 security under the decisional q-Bilinear
Diffie-Hellman Exponent assumption. In
addition, we implement our scheme and
analyse its performance. The analysis results
show the proposed scheme is efficient,
scalable, and fine-grained in dealing
with access control for outsourced data
in cloud computing.
TTA-JC-
C1551
Automatic Memory
Control of Multiple
Virtual Machines on a
Consolidated Server
Through
virtualization, multiple virtual machines can
coexist and operate on one physical machine.
When virtual machines (VMs) compete
for memory, the performances of applications
deteriorate, especially those of memory-
intensive applications. In this study, we aim to
optimize memory control techniques using a
balloon driver for server consolidation. Our
contribution is three-fold: (1) We design and
implement an automatic control system
for memory based on a Xen balloon driver. To
avoid interference with VM monitor operation,
our system works in user mode; therefore, the
system is easily applied in practice. (2) We
design an adaptive global-scheduling
algorithm to regulate memory. This algorithm
is based on a dynamic baseline, which can
IEEE 2015
adjust memory allocation according to the
memory used by the VMs. (3) We evaluate our
optimized solution in a real environment with
10 VMs and well-known benchmarks (DaCapo
and Phoronix Test Suites). Experiments
confirm that our system can improve the
performance of memory-intensive and disk-
intensive applications by up to 500% and
300%, respectively. This toolkit has been
released for free download as a GNU General
Public License v3 software.
TTA-JC-
C1552
Circuit Ciphertext-
policy Attribute-based
Hybrid Encryption with
Verifiable Delegation in
cloud computing
In the cloud, for achieving access control and
keeping data confidential, the data owners
could adopt attribute-based encryption to
encrypt the stored data. Users with
limited computing power are however more
likely to delegate the mask of the decryption
task to the cloud servers to reduce
the computing cost. As a result, attribute-
based encryption with delegation emerges.
Still, there are caveats and questions remaining
in the previous relevant works. For instance,
during the delegation, the cloud servers could
tamper or replace the delegated ciphertext and
respond a forged computing result with
malicious intent. They may also cheat the
eligible users by responding them that they are
ineligible for the purpose of cost saving.
Furthermore, during the encryption, the access
policies may not be flexible enough as well.
Since policy for general circuits enables to
achieve the strongest form of access control, a
construction for realizing circuit ciphertext-
policy attribute-
based hybrid encryption withverifiable delegati
on has been considered in our work. In such a
system, combined with verifiable computation
and encrypt-then-mac mechanism, the data
confidentiality, the fine-grained access control
and the correctness of the
delegated computing results are well
guaranteed at the same time. Besides, our
scheme achieves security against chosen-
plaintext attacks under the k-multilinear
Decisional Diffie-Hellman assumption.
IEEE 2015
Moreover, an extensive simulation campaign
confirms the feasibility and efficiency of the
proposed solution.
TTA-JC-
C1553
CloudArmor -
Supporting Reputation-
based Trust
Management for Cloud
Services
Trust management is one of the most
challenging issues for the adoption and growth
of cloud computing. The highly dynamic,
distributed, and non-transparent nature
of cloud services introduces several
challenging issues such as privacy, security,
and availability. Preserving consumers’
privacy is not an easy task due to the sensitive
information involved in the interactions
between consumers and
the trust management service.
Protecting cloud services against their
malicious users (e.g., such users might give
misleading feedback to disadvantage a
particular cloud service) is a difficult problem.
Guaranteeing the availability of
the trust management service is another
significant challenge because of the dynamic
nature of cloud environments. In this article,
we describe the design and implementation
of CloudArmor, a reputation-
based trust management framework that
provides a set of functionalities to
deliver Trust as a Service (TaaS), which
includes i) a novel protocol to prove the
credibility of trust feedbacks and preserve
users’ privacy, ii) an adaptive and robust
credibility model for measuring the credibility
of trust feedbacks to
protect cloud services from malicious users
and to compare the trustworthiness
of cloud services, and iii) an availability model
to manage the availability of the decentralized
implementation of
the trust management service. The feasibility
and benefits of our approach have been
validated by a prototype and experimental
studies using a collection of real-world
trust feedbacks on cloud services.
IEEE 2015
TTA-JC-
C1554
Cost-Effective
Authentic and
Anonymous Data
Data sharing has never been easier with the
advances of cloud computing, and an accurate
IEEE 2015
Sharing with Forward
Security
analysis on the shared data provides an array
of benefits to both the society and
individuals. Data sharing with a large number
of participants must take into account several
issues, including efficiency, data integrity and
privacy of data owner. Ring signature is a
promising candidate to construct
an anonymous and
authentic data sharing system. It allows
a data owner to anonymously authenticate
his data which can be put into the cloud for
storage or analysis purpose. Yet the costly
certificate verification in the traditional public
key infrastructure (PKI) setting becomes a
bottleneck for this solution to be scalable.
Identity-based (ID-based) ring signature,
which eliminates the process of certificate
verification, can be used instead. In this paper,
we further enhance the security of ID-based
ring signature by providing forward security: If
a secret key of any user has been
compromised, all previous generated
signatures that include this user still remain
valid. This property is especially important to
any large scale data sharing system, as it is
impossible to ask all data owners to
reauthenticate their data even if a secret key of
one single user has been compromised. We
provide a concrete and efficient instantiation of
our scheme, prove its security and provide an
implementation to show its practicality.
TTA-JC-
C1555
DaSCE - Data Security
for Cloud Environment
with Semi-Trusted
Third Party
Off-site data storage is an application
of cloud that relieves the customers from
focusing on data storage system. However,
outsourcing data to a third-party administrative
control entails serious
security concerns. Data leakage may occur due
to attacks by other users and machines in
the cloud. Wholesale of data by cloud service
provider is yet another problem that is faced in
the cloud environment. Consequently, high-
level of security measures is required. In this
paper, we
propose DataSecurity for Cloud Environment
with Semi-Trusted Third Party (DaSCE),
IEEE 2015
a data security system that provides (a) key
management (b) access control, and (c) file
assured deletion. The DaSCE utilizes Shamir’s
(k, n) threshold scheme to manage the keys,
where k out of n shares are required to
generate the key. We use multiple key
managers, each hosting one share of key.
Multiple key managers avoid single point of
failure for the cryptographic keys. We (a)
implement a working prototype of DaSCE and
evaluate its performance based on the time
consumed during various operations, (b)
formally model and analyze the working
of DaSCE using High Level Petri nets
(HLPN), and (c) verify the working of
DaSCE using Satisfiability Modulo Theories
Library (SMT-Lib) and Z3 solver. The results
reveal that DaSCE can be effectively used
for security of outsourced data by employing
key management, access control, and file
assured deletion.
TTA-JC-
C1556
Discover the Expert -
Context-Adaptive
Expert Selection for
Medical Diagnosis
In this paper, we propose
an expert selection system that learns online
the best expert to assign to each patient
depending on the context of the patient. In
general, the context can include an enormous
number and variety of information related to
the patient's health condition, age, gender,
previous drug doses, and so forth, but the most
relevant information is embedded in only a few
contexts. If these most relevant contexts were
known in advance, learning would be
relatively simple but they are not. Moreover,
the relevant contexts may be different for
different health conditions. To address these
challenges, we develop a new class of
algorithms aimed at discovering the most
relevant contexts and the best clinic
and expert to use to make a diagnosis given a
patient's contexts. We prove that as the number
of patients grows, the proposed context-
adaptive algorithm will discover the
optimal expert to select for patients with a
specific context. Moreover, the algorithm also
provides confidence bounds on the diagnostic
IEEE 2015
accuracy of the expert it selects, which can be
considered by the primary care physician
before making the final decision. While our
algorithm is general and can be applied in
numerous medical scenarios, we illustrate its
functionality and performance by applying it to
a real-world breast cancer diagnosis data set.
Finally, while the application we consider in
this paper is medical diagnosis, our proposed
algorithm can be applied in other environments
where expertise needs to be discovered.
TTA-JC-
C1557
Distributed denial of
service attacks in
software-defined
networking with cloud
computing
Although software-defined networking (SDN)
brings numerous benefits by decoupling the
control plane from the data plane, there is a
contradictory relationship between SDN
and distributed denial-of-
service(DDoS) attacks. On one hand, the
capabilities of SDN make it easy to detect and
to react to DDoS attacks. On the other hand,
the separation of the control plane from the
data plane of SDN introduces new attacks.
Consequently, SDN itself may be a target of
DDoS attacks. In this paper, we first discuss
the new trends and characteristics of
DDoS attacks in cloud computing environment
s. We show that SDN brings us a new chance
to defeat
DDoS attacks in cloud computing environment
s, and we summarize good features of SDN in
defeating DDoS attacks. Then we review the
studies about launching DDoS attacks on SDN
and the methods against DDoS attacks in SDN.
In addition, we discuss a number of challenges
that need to be addressed to mitigate DDoS
attached in SDN with cloud computing. This
work can help understand how to make full use
of SDN's advantages to defeat
DDoS attacks in cloud computing environment
s and how to prevent SDN itself from
becoming a victim of DDoS attacks.
IEEE 2015
TTA-JC-
C1558
Mathematical
Programming Approach
for Revenue
Maximization in Cloud
Federations
This paper assesses the benefits
of cloud federation for cloud providers.
Outsourcing and in sourcing are explored as
means to maximize the revenues of the
IEEE 2015
providers involved in the federation. An exact
method using a linear integer program is
proposed to optimize the partitioning of the
incoming workload across
the federation members. A pricing model is
suggested to enable providers to set their offers
dynamically and achieve highest revenues. The
conditions leading to highest gains are
identified and the benefits
of cloud federation are quantified.
TTA-JC-
C1559
My Privacy My Decision
- Control of Photo
Sharing on Online
Social Networks
Photo sharing is an attractive feature which
popularizes Online Social Networks (OSNs).
Unfortunately, it may leak users’ privacy if
they are allowed to post, comment, and tag
a photo freely. In this paper, we attempt to
address this issue and study the scenario when
a user shares a photo containing individuals
other than himself/herself (termed co-photo for
short). To prevent possible privacy leakage of
a photo, we design a mechanism to enable each
individual in a photo be aware of the posting
activity and participate in the decision making
on the photo posting. For this purpose, we
need an efficient facial recognition (FR)
system that can recognize everyone in
the photo. However, more demanding privacy
setting may limit the number of
the photos publicly available to train the FR
system. To deal with this dilemma, our
mechanism attempts to utilize users’
private photos to design a personalized FR
system specifically trained to differentiate
possible photo co-owners without leaking
their privacy. We also develop a distributed
consensus based method to reduce the
computational complexity and protect the
private training set. We show that our system
is superior to other possible approaches in
terms of recognition ratio and efficiency. Our
mechanism is implemented as a proof of
concept Android application on Facebook’s
platform.
IEEE 2015
TTA-JC-
C1560
OPoR - Enabling Proof
of Retrievability in
Cloud Computing with
Cloud computing moves the application
software and databases to the centralized large
IEEE 2015
Resource-Constrained
Devices
data centers, where the management of the
data and services may not be fully trustworthy.
In this work, we study the problem of ensuring
the integrity of data storage
in cloud computing. To reduce the
computational cost at user side during the
integrity verification of their data, the notion of
public verifiability has been proposed.
However, the challenge is that the
computational burden is too huge for the users
with resource-
constrained devices to compute the public
authentication tags of file blocks. To tackle the
challenge, we propose OPoR, a
new cloud storage scheme involving
a cloud storage server and a cloud audit server,
where the latter is assumed to be semi-honest.
In particular, we consider the task of allowing
the cloud audit server, on behalf of
the cloud users, to pre-process the data before
uploading to the cloud storage server and later
verifying the data integrity. OPoR outsources
and offloads the heavy computation of the tag
generation to the cloud audit server and
eliminates the involvement of user in the
auditing and in the pre-processing phases.
Furthermore, we strengthen
the proof of retrievability (PoR) model to
support dynamic data operations, as well as
ensure security against reset attacks launched
by the cloud storage server in the upload
phase.
TTA-JC-
C1561
Performing Initiative
Data Prefetching in
Distributed File
Systems for Cloud
Computing
This paper presents
an initiative data prefetching scheme on the
storage servers in distributed file
systems for cloud computing. In
this prefetching technique, the client machines
are not substantially involved in the process
of data prefetching, but the storage servers can
directly prefetch the data after analyzing the
history of disk I/O access events, and then send
the prefetched data to the relevant client
machines proactively. To put this technique to
work, the information about client nodes is
piggybacked onto the real client I/O requests,
IEEE 2015
and then forwarded to the relevant storage
server. Next, two prediction algorithms have
been proposed to forecast future block access
operations for directing what data should be
fetched on storage servers in advance. Finally,
the prefetched data can be pushed to the
relevant client machine from the storage
server. Through a series of evaluation
experiments with a collection of application
benchmarks, we have demonstrated that our
presented initiative prefetching technique can
benefit distributed file systems for cloud envir
onments to achieve better I/O performance. In
particular, configuration-limited client
machines in the cloud are not responsible for
predicting I/O access operations, which can
definitely contribute to
preferable system performance on them.
TTA-JC-
C1562
Privacy-Preserving
Multikeyword Similarity
Search Over
Outsourced Cloud Data
The amount of data generated by individuals
and enterprises is rapidly increasing. With the
emerging cloud computing paradigm,
the data and corresponding complex
management tasks can be outsourced to
the cloud for the management flexibility and
cost savings. Unfortunately, as the data could
be sensitive, the direct data outsourcing would
have the problem of privacy leakage. The
encryption can be used, before
the data outsourcing, with the concern that the
operations can still be accomplished by
the cloud. We consider the multikey
word similarity search over outsourced cloud d
ata. In particular, with the consideration of the
text data only, multiple keywords are specified
by the user. The cloud returns the files
containing more than a threshold number of
input keywords or similar keywords, where
the similarity here is defined according to the
edit distance metric. We propose three
solutions, where blind signature provides the
user access privacy, and a novel use of Bloom
filter's bit pattern provides the speedup
of search task at the cloud side. Our final
design to achieve the search is secure against
insider threats and efficient in terms of
IEEE 2015
the search time at the cloud side. Performance
evaluation and analysis are used to
demonstrate the practicality of our proposed
solutions.
TTA-JC-
C1563
Provable Multicopy
Dynamic Data
Possession in Cloud
Computing Systems
Increasingly more and more organizations are
opting for outsourcing data to
remote cloud service providers (CSPs).
Customers can rent the CSPs storage
infrastructure to store and retrieve almost
unlimited amount of data by paying fees
metered in gigabyte/month. For an increased
level of scalability, availability, and durability,
some customers may want their data to be
replicated on multiple servers across
multiple data centers. The more copies the
CSP is asked to store, the more fees the
customers are charged. Therefore, customers
need to have a strong guarantee that the CSP is
storing all data copies that are agreed upon in
the service contract, and all these copies are
consistent with the most recent modifications
issued by the customers. In this paper, we
propose a map-based provable
multicopy dynamic data possession (MB-
PMDDP) scheme that has the following
features: 1) it provides an evidence to the
customers that the CSP is not cheating by
storing fewer copies; 2) it supports outsourcing
of dynamic data, i.e., it supports block-level
operations, such as block modification,
insertion, deletion, and append; and 3) it
allows authorized users to seamlessly access
the file copies stored by the CSP. We give a
comparative analysis of the proposed MB-
PMDDP scheme with a reference model
obtained by extending
existing provable possession of dynamic single
-copy schemes. The theoretical analysis is
validated through experimental results on a
commercial cloud platform. In addition, we
show the security against colluding servers,
and discuss how to identify corrupted copies
by slightly modifying the proposed scheme.
IEEE 2015
TTA-JC-
C1564
SAE - Toward Efficient
Cloud Data Analysis
Service for Large-Scale
Social Networks
Social network analysis is used to extract
features of human communities and proves to
be very instrumental in a variety of scientific
domains. The dataset of a social network is
often so large that a
cloud data analysis service, in which the
computation is performed on a parallel
platform in the could, becomes a good choice
for researchers not experienced in parallel
programming. In the cloud, a primary
challenge to efficient data analysis is the
computation and communication skew (i.e.,
load imbalance) among computers caused by
humanity’s group behavior (e.g., bandwagon
effect). Traditional load balancing techniques
either require significant effort to re-balance
loads on the nodes, or cannot well cope with
stragglers. In this paper, we propose a general
straggler-aware execution approach, SAE, to
support the analysis service in the cloud. It
offers a novel computational decomposition
method that factors straggling feature
extraction processes into more fine-grained
sub-processes, which are then distributed over
clusters of computers for parallel execution.
Experimental results show that SAE can speed
up the analysis by up to 1.77 times compared
with state-of-the-art solutions.
IEEE 2015
TTA-JC-
C1565
Secure Cloud Storage
Meets with Secure
Network Coding
This paper reveals an intrinsic relationship
between secure cloud storage and secure netwo
rk coding for the first
time. Secure cloud storage was proposed only
recently while secure network coding has been
studied for more than ten years. Although the
two areas are quite different in their nature and
are studied independently, we show how to
construct a secure cloud storage protocol given
any secure network coding protocol. This gives
rise to a systematic way to
construct secure cloud storage protocols. Our
construction is secure under a definition which
captures the real world usage of the
cloud storage. Furthermore, we propose two
specific secure cloud storage protocols based
on two recent secure network coding protocols.
IEEE 2015
In particular, we obtain the first publicly
verifiable secure cloud storage protocol in the
standard model. We also enhance the proposed
generic construction to support user anonymity
and third-party public auditing, which both
have received considerable attention recently.
Finally, we prototype the newly proposed
protocol and evaluate its performance.
Experimental results validate the effectiveness
of the protocol.
TTA-JC-
C1566
SeDaSC - Secure Data
Sharing in Clouds
Cloud storage is an application of clouds that
liberates organizations from establishing in-
house data storage systems.
However, cloud storage gives rise to security
concerns. In case of group-shared data,
the data face both cloud-specific and
conventional insider
threats. Secure data sharing among a group
that counters insider threats of legitimate yet
malicious users is an important research issue.
In this paper, we propose
the Secure Data Sharing in Clouds (SeDaSC)
methodology that provides:
1)data confidentiality and integrity; 2) access
control; 3) data sharing (forwarding) without
using compute-intensive reencryption; 4)
insider threat security; and 5) forward and
backward access control. The
SeDaSC methodology encrypts a file with a
single encryption key. Two different
key shares for each of the users are generated,
with the user only getting one share. The
possession of a single share of a key allows
the SeDaSC methodology to counter the
insider threats. The other key share is stored by
a trusted third party, which is called the
cryptographic server.
The SeDaSC methodology is applicable to
conventional and mobile cloud computing
environments. We implement a working
prototype of the SeDaSC methodology and
evaluate its performance based on the time
consumed during various operations. We
formally verify the working of SeDaSC by
using high-level Petri nets, the Satisfiability
IEEE 2015
Modulo Theories Library, and a Z3 solver. The
results proved to be encouraging and show that
SeDaSC has the potential to be effectively
used for secure data sharing in the cloud.
TTA-JC-
C1567
Shared Authority Based
Privacy-Preserving
Authentication Protocol
in Cloud Computing
Cloud computing is an emerging data
interactive paradigm to realize users' data
remotely stored in an
online cloud server. Cloud services provide
great conveniences for the users to enjoy the
on-demand cloud applications without
considering the local infrastructure limitations.
During the data accessing, different users may
be in a collaborative relationship, and thus
data sharing becomes significant to achieve
productive benefits. The existing security
solutions mainly focus on the authentication to
realize that a user's privative data cannot be
illegally accessed, but neglect a
subtle privacy issue during a user challenging
the cloud server to request other users for
data sharing. The challenged access request
itself may reveal the user's privacy no matter
whether or not it can obtain the data access
permissions. In this paper, we propose
a shared authority based privacy-
preserving authenticationprotocol (SAPA) to
address above privacy issue for cloud storage.
In the SAPA, 1) shared access authority is
achieved by anonymous access request
matching mechanism with security and privacy
considerations (e.g., authentication, data
anonymity, user privacy, and forward
security); 2) attribute based access control is
adopted to realize that the user can only access
its own data fields; 3) proxy re-encryption is
applied to provide data sharing among the
multiple users. Meanwhile, universal
composability (UC) model is established to
prove that the SAPA theoretically has the
design correctness. It indicates that the
proposed protocol is attractive for multi-user
collaborative cloud applications.
IEEE 2015
TTA-JC-
C1568
Social
Recommendation with
Cross-Domain
Recommender systems can suffer from data
sparsity and cold start issues.
IEEE 2015
Transferable
Knowledge
However, social networks, which enable users
to build relationships and create different
types of items, present an unprecedented
opportunity to alleviate these issues. In this
paper, we represent a social network as a
star-structured hybrid graph centered on
a social domain, which connects with other
item domains. With this innovative
representation, useful knowledge from an
auxiliary domain can be transferred through
the social domain to a target domain. Various
factors of item transferability, including
popularity and behavioral consistency, are
determined. We propose a novel Hybrid
Random Walk (HRW) method, which
incorporates such factors, to
select transferable items in auxiliary domains,
bridge cross-domain knowledge with
the social domain, and accurately predict
user-item links in a target domain. Extensive
experiments on a real social dataset
demonstrate that HRW significantly
outperforms existing approaches.
TTA-JC-
C1569
TMACS - A Robust and
Verifiable Threshold
Multi-Authority Access
Control System in
Public Cloud Storage
Attribute-based Encryption (ABE) is regarded
as a promising cryptographic conducting tool
to guarantee data owners’ direct control over
their data in public cloud storage. The earlier
ABE schemes involve only one authority to
maintain the whole attribute set, which can
bring a single-point bottleneck on both security
and performance. Subsequently, some multi-
authority schemes are proposed, in which
multiple authorities separately maintain
disjoint attribute subsets. However, the single-
point bottleneck problem remains unsolved. In
this paper, from another perspective, we
conduct a threshold multi-authority CP-
ABE access control scheme
for public cloud storage, named TMACS, in
which multiple authorities jointly manage a
uniform attribute set. In TMACS, taking
advantage of (t; n) threshold secret sharing, the
master key can be shared among multiple
authorities, and a legal user can generate
his/her secret key by interacting with any t
IEEE 2015
authorities. Security and performance analysis
results show that TMACS is not
only verifiable secure when less than t
authorities are compromised, but also robust
when no less than t authorities are alive in
the system. Furthermore, by efficiently
combining the traditional multi-
authority scheme with TMACS, we construct a
hybrid one, which satisfies the scenario of
attributes coming from different authorities as
well as achieving security and system-level
robustness.
TTA-JC-
C1570
Towards Privacy
Preserving Publishing
of set-valued Data on
Hybrid Cloud
Storage as a service has become an
important paradigm in cloud computing for
its great flexibility and economic savings.
However, the development is hampered
by data privacy concerns: data owners no
longer physically possess the storage of
their data. In this work, we study the issue
of privacy-preserving set-
valued data publishing.
Existing data privacy-
preserving techniques (such as encryption,
suppression, generalization) are not
applicable in many real scenes, since they
would incur large overhead for data query
or high information loss. Motivated by this
observation, we present a suite of new
techniques that make privacy-aware set-
valued data publishing feasible
on hybrid cloud. Ondata publishing phase,
we propose a data partition technique,
named extended quasi-identifier-
partitioning (EQI-partitioning), which
disassociates record terms that participate
in identifying combinations. This way
the cloud server cannot associate with high
probability a record with rare term
combinations. We prove
the privacy guarantee of our mechanism.
On data querying phase, we adopt
interactive differential privacy strategy to
resist privacy breaches from statistical
IEEE 2015
queries. We finally evaluate its
performance using real-life data sets on
our cloud test-bed. Our extensive
experiments demonstrate the validity and
practicality of the proposed scheme.
TTA-JC-
C1571
Towards Privacy-
Preserving Storage and
Retrieval in Multiple
Clouds
Cloud computing is growing exponentially,
whereby there are now hundreds
of cloud service providers (CSPs) of various
sizes. While the cloud consumers may enjoy
cheaper data storage and computation offered
in this multi-cloud environment, they are also
in face of more complicated reliability issues
and privacy preservation problems of their
outsourced data. Though searchable encryption
allows users to encrypt their stored data
while preserving some search capabilities, few
efforts have sought to consider the reliability
of the searchable encrypted data outsourced to
the clouds. In this paper, we propose a privacy-
preserving Storage and Retrieval (STRE)
mechanism that not only ensures security and
privacy but also provides reliability guarantees
for the outsourced searchable encrypted data.
The STRE mechanism enables the cloud users
to distribute and search their encrypted data
across multiple independent clouds managed
by different CSPs, and is robust even when a
certain number of CSPs crash. Besides the
reliability, STRE also offers the benefit of
partially hidden search pattern. We evaluate
the STRE mechanism on Amazon EC2 using a
real world dataset and the results demonstrate
both effectiveness and efficiency of our
approach.
IEEE 2015
TTA-JC-
C1572
Trust Enhanced
Cryptographic Role-
based Access Control
for Secure Cloud Data
Storage
Cloud data storage has provided significant
benefits by allowing users to store massive
amount of data on demand in a cost-effective
manner. To protect the privacy of data stored
in the cloud, cryptographic role-
based access control (RBAC) schemes have
been developed to ensure that the data can only
be accessed by those who are allowed
by access policies. However,
these cryptographic approaches do not address
IEEE 2015
the issues of trust. In this paper, we
propose trust models to reason about and to
improve the security for
stored data in cloud storage systems that
use cryptographic RBAC schemes. The trust
models provide an approach for the owners
and roles to determine the trustworthiness of
individual roles and users, respectively, in the
RBAC system. The proposed trust models
consider role inheritance and hierarchy in the
evaluation of trustworthiness of roles. We
present a design of a trust-
based cloud storage system, which shows how
the trust models can be integrated into a
system that uses cryptographic RBAC
schemes. We have also considered practical
application scenarios and illustrated how
the trust evaluations can be used to reduce the
risks and to enhance the quality of decision
making by data owners and roles
of cloud storage service.
TTA-JC-
C1573
Using ant colony
system to consolidate
VMS for green cloud
computing
High energy consumption of cloud data centers
is a matter of great concern. Dynamic
consolidation of Virtual Machines (VMs)
presents a significant opportunity to save
energy in data centers. A VM consolidation
approach uses live migration of VMs so that
some of the under-loaded Physical Machines
(PMs) can be switched-off or put into a low-
power mode. On the other hand, achieving the
desired level of Quality of Service (QoS)
between cloud providers and their users is
critical. Therefore, the main challenge
is to reduce energy consumption of data
centers while satisfying QoS requirements. In
this paper, we present a
distributed system architecture to perform
dynamic VM consolidation to reduce energy
consumption of cloud data centers while
maintaining the desired QoS. Since the VM
consolidation problem is strictly NP-hard,
we use an online optimization metaheuristic
algorithm called Ant Colony System (ACS).
The proposed ACS-based VM Consolidation
(ACS-VMC) approach finds a near-optimal
IEEE 2015
solution based on a specified objective
function. Experimental results on real
workload traces show that ACS-VMC reduces
energy consumption while maintaining the
required performance levels in a cloud data
center. It outperforms existing VM
consolidation approaches in terms of energy
consumption, number of VM migrations, and
QoS requirements concerning performance.
TTA-JC-
C1574
Using Virtual Machine
Allocation Policies to
Defend against Co-
resident Attacks in
Cloud Computing
Cloud computing enables users to consume
various IT resources in an on-demand manner,
and with low management overhead. However,
customers can face new security risks when
they use cloud computing platforms. In this
paper, we focus on one such threat − the co-
resident attack, where malicious users build
side channels and extract private information
from virtual machines co-located on the same
server. Previous works mainly
attempt to address the problem by eliminating
side channels. However, most of these
methods are not suitable for immediate
deployment due to the required
modifications to current cloud platforms. We
choose to solve the problem from a different
perspective, by studying how to improve
the virtual machine allocation policy, so that it
is difficult for attackers to co-locate with their
targets. Specifically, we (1) define security
metrics for assessing the attack; (2) model
these metrics, and compare the difficulty of
achieving co-residence under three
commonly used policies; (3) design a
new policy that not only mitigates the threat
of attack, but also satisfies the requirements for
workload balance and low power consumption;
and (4) implement, test, and prove the
effectiveness of the policy on the popular
open-source platform OpenStack
IEEE 2015
DOMAIN : BIG DATA
TTA-JB-
C1501
FastRAQ A Fast
Approach to Range-
Aggregate Queries in
Big Data Environments
Range-aggregate queries are to apply a
certain aggregate function on all tuples within
given query ranges.
IEEE 2015
Existing approaches to range-
aggregate queries are insufficient to quickly
provide accurate results
in big data environments. In this paper, we
propose FastRAQ-a fast approach to range-
aggregate queries in big data environments. Fa
stRAQ first divides big data into independent
partitions with a balanced partitioning
algorithm, and then generates a local
estimation sketch for each partition. When
a range-aggregate query request arrives, Fast
RAQ obtains the result directly by
summarizing local estimates from all
partitions. Fast RAQ has O(1) time complexity
for data updates and O(N/P×B) time
complexity for range-aggregate queries, where
N is the number of distinct tuples for all
dimensions, P is the partition number, and B is
the bucket number in the histogram. We
implement the Fast RAQ approach on the
Linux platform, and evaluate its performance
with about 10 billion data records.
Experimental results demonstrate that Fast
RAQ provides range-aggregate query results
within a time period two orders of magnitude
lower than that of Hive, while the relative error
is less than 3 percent within the given
confidence interval.
TTA-JB-
C1502
Collaboration- and
Fairness-Aware Big
Data Management in
Distributed Clouds
With the advancement of information and
communication technology, data are being
generated at an exponential rate via various
instruments and collected at an unprecedented
scale. Such large volume of data generated is
referred to as big data, which now are
revolutionizing all aspects of our life ranging
from enterprises to individuals, from science
communities to governments, as they exhibit
great potentials to improve efficiency of
enterprises and the quality of life. To obtain
nontrivial patterns and derive valuable
information from big data, a fundamental
problem is how to properly place the collected
data by different users
to distributed clouds and to efficiently analyze
the collected data to save user costs
IEEE 2015
in data storage and processing, particularly the
cost savings of users who share data. By doing
so, it needs the close collaborations among the
users, by sharing and utilizing
the big data in distributed clouds due to the
complexity and volume of big data. Since
computing, storage and bandwidth resources in
a distributed cloud usually are limited, and
such resource provisioning typically is
expensive, the collaborative users require to
make use of the resources fairly. In this paper,
we study a novel collaboration- and fairness-
aware big data management problem
in distributed cloud environments that aims to
maximize the system throughout, while
minimizing the operational cost of service
providers to achieve the system throughput,
subject to resource capacity and users fairness
constraints. We first propose a novel
optimization framework for the problem. We
then devise a fast yet scalable approximation
algorithm based on the built optimization
framework. We also analyze the time
complexity and approximation ratio of the
proposed algorithm. We finally conduct
experiments by simulations to evaluate the
performance of the proposed algorithm.
Experimental results demonstrate that the
proposed algorithm is promising, and
outperforms other heuristics.
TTA-JB-
C1503
On Traffic-Aware
Partition and
Aggregation in
MapReduce for Big
Data Applications
The MapReduce programming model
simplifies large-scale data processing on
commodity cluster by exploiting parallel map
tasks and reduce tasks. Although many efforts
have been made to improve the performance
of MapReduce jobs, they ignore the
network traffic generated in the shuffle phase,
which plays a critical role in performance
enhancement. Traditionally, a hash function is
used to partition intermediate data among
reduce tasks, which, however, is not traffic-
efficient because network topology
and data size associated with each key are not
taken into consideration. In this paper, we
study to reduce network traffic cost for
IEEE 2015
a MapReduce job by designing a novel
intermediate data partition scheme.
Furthermore, we jointly consider the
aggregator placement problem, where each
aggregator can reduce merged traffic from
multiple map tasks. A decomposition-based
distributed algorithm is proposed to deal with
the large-scale optimization problem
for big data application and an online
algorithm is also designed to
adjust data partition and aggregation in a
dynamic manner. Finally, extensive simulation
results demonstrate that our proposals can
significantly reduce network traffic cost under
both offline and online cases.
TTA-JB-
C1504
Privacy-Preserving
Ciphertext Multi-
Sharing Control for Big
Data Storage
The need of secure big data storage service is
more desirable than ever to date. The basic
requirement of the service is to guarantee the
confidentiality of the data. However, the
anonymity of the service clients, one of the
most essential aspects of privacy, should be
considered simultaneously. Moreover, the
service also should provide practical and fine-
grained encrypted data sharing such that a data
owner is allowed to share
a ciphertext of data among others under some
specified conditions. This paper, for the first
time, proposes a privacy-
preserving ciphertext multi-sharing mechanism
to achieve the above properties. It combines
the merits of proxy re-encryption with
anonymous technique in which
a ciphertext can be securely and conditionally
shared multiple times without leaking both the
knowledge of underlying message and the
identity information
of ciphertext senders/recipients. Furthermore,
this paper shows that the new primitive is
secure against chosen-ciphertext attacks in the
standard model.
IEEE 2015
TTA-JB-
C1505
Self-Adjusting Slot
Configurations for
Homogeneous and
Heterogeneous Hadoop
The MapReduce framework and its open
source implementation Hadoop have become
the defacto platform for scalable analysis on
large data sets in recent years. One of the
IEEE 2015
primary concerns in Hadoop is how to
minimize the completion length (i.e.,
makespan) of a set of MapReduce jobs. The
current Hadoop only allows
static slot configuration, i.e., fixed numbers of
map slots and reduce slots throughout the
lifetime of a cluster. However, we found that
such a static configuration may lead to low
system resource utilizations as well as long
completion length. Motivated by this, we
propose simple yet effective schemes which
use slot ratio between map and reduce tasks as
a tunable knob for reducing the makespan of a
given set. By leveraging the workload
information of recently completed jobs, our
schemes dynamically allocates resources
(or slots) to map and reduce tasks. We
implemented the presented schemes
in Hadoop V0.20.2 and evaluated them with
representative MapReduce benchmarks at
Amazon EC2. The experimental results
demonstrate the effectiveness and robustness
of our schemes under both simple workloads
and more complex mixed workloads.
TTA-JB-
C1506
A General
Communication Cost
Optimization
Framework for Big
Data Stream
Processing in Geo-
distributed Data
Centers
With the explosion
of big data, processing large numbers of
continuous data streams, i.e., big data
stream processing (BDSP), has become a
crucial requirement for many scientific and
industrial applications in recent years. By
offering a pool of
computation, communication and storage
resources, public clouds, like Amazon’s EC2,
are undoubtedly the most efficient platforms to
meet the ever-growing needs of BDSP. Public
cloud service providers usually operate a
number of geo-distributed datacenters across
the globe. Different datacenter pairs are with
different inter-datacenter network
costs charged by Internet Service Providers
(ISPs). While, inter-datacenter traffic in BDSP
constitutes a large portion of a cloud provider’s
traffic demand over the Internet and incurs
substantial communication cost, which may
even become the dominant operational
IEEE 2015
expenditure factor. As the datacenter resources
are provided in a virtualized way, the virtual
machines (VMs) for stream processing tasks
can be freely deployed onto any datacenters,
provided that the Service Level Agreement
(SLA, e.g., quality-of-information) is obeyed.
This raises the opportunity, but also a
challenge, to explore the inter-datacenter
network cost diversities to optimize both VM
placement and load balancing towards
network cost minimization with guaranteed
SLA. In this paper, we first propose
a general modeling framework that describes
all representative intertask relationship
semantics in BDSP. Based on our
novel framework, we then formulate
the communication cost minimization problem
for BDSP into a mixed-integer linear
programming (MILP) problem and prove it to
be NP-hard. We then propose a computation-
efficient solution based on MILP. The high
efficiency of our proposal is validated by
extensive simulation based studies.
TTA-JB-
C1507
Data Transfer
Scheduling for
Maximizing Throughput
of Big-Data Computing
in Cloud Systems
Many big-data computing applications have
been deployed in cloud platforms. These
applications normally demand
concurrent data transfers among computing no
des for parallel processing. It is important to
find the best transfer scheduling leading to the
least data retrieval time – the maximum
throughput in other words. However, the
existing methods cannot achieve this, because
they ignore link bandwidths and the diversity
of data replicas and paths. In this paper, we
aim to develop a max-
throughput data transfer scheduling to
minimize the data retrieval time of
applications. Specifically, the problem is
formulated into mixed integer programming,
and an approximation algorithm is proposed,
with its approximation ratio analyzed. The
extensive simulations demonstrate that our
algorithm can obtain near optimal solutions.
IEEE 2015
TTA-JB-
C1508
Accelerated PSO
Swarm Search Feature
Selection for Data
Stream Mining Big Data
Big Data though it is a hype up-springing
many technical challenges that confront both
academic research communities and
commercial IT deployment, the root sources
of Big Data are founded on data streams and
the curse of dimensionality. It is generally
known that data which are sourced from data
streams accumulate continuously making
traditional batch-based model induction
algorithms infeasible for real-
time data mining. Feature selection has been
popularly used to lighten the processing load in
inducing a data mining model. However, when
it comes to mining over high
dimensional data the search space from which
an optimal feature subset is derived grows
exponentially in size, leading to an intractable
demand in computation. In order to tackle this
problem which is mainly based on the high-
dimensionality and streaming format
of data feeds in Big Data, a novel
lightweight feature selection is proposed.
The feature selection is designed particularly
for mining streaming data on the fly, by using
accelerated particle swarm optimization
(APSO) type of swarm search that achieves
enhanced analytical accuracy within
reasonable processing time. In this paper, a
collection of Big Data with exceptionally large
degree of dimensionality are put under test of
our new feature selection algorithm for
performance evaluation.
IEEE 2015
TTA-JB-
C1509
An Efficient Privacy-
Preserving Ranked
Keyword Search
Method
Cloud data owners prefer to outsource
documents in an encrypted form for the
purpose of privacy preserving. Therefore it is
essential to develop efficient and reliable
ciphertext search techniques. One challenge is
that the relationship between documents will
be normally concealed in the process of
encryption, which will lead to
significant search accuracy performance
degradation. Also the volume of data in data
centers has experienced a dramatic growth.
This will make it even more challenging to
design ciphertext search schemes that can
IEEE 2015
provide efficient and reliable online
information retrieval on large volume of
encrypted data. In this paper, a hierarchical
clustering method is proposed to support
more search semantics and also to meet the
demand for fast ciphertext search within a big
data environment. The proposed hierarchical
approach clusters the documents based on the
minimum relevance threshold, and then
partitions the resulting clusters into sub-
clusters until the constraint on the maximum
size of cluster is reached. In the search phase,
this approach can reach a linear computational
complexity against an exponential size
increase of document collection. In order to
verify the authenticity of search results, a
structure called minimum hash sub-tree is
designed in this paper. Experiments have been
conducted using the collection set built from
the IEEE Xplore. The results show that with a
sharp increase of documents in the dataset
the search time of the proposed method
increases linearly whereas the search time of
the traditional method increases exponentially.
Furthermore, the proposed method has an
advantage over the traditional method in
the rank privacy and relevance of retrieved
documents.
TTA-JB-
C1510
Splitting Large Medical
Data Sets based on
Normal Distribution in
Cloud Environment
The surge of medical and e-commerce
applications has generated tremendous amount
of data, which brings people to a so-called
“Big Data” era. Different from
traditional large data sets, the term “Big Data”
not only means the large size of data volume
but also indicates the high velocity
of data generation. However,
current data mining and analytical techniques
are facing the challenge of dealing with large
volume data in a short period of time. This
paper explores the efficiency of utilizing
the Normal Distribution (ND) method
for splitting and
processing large volume medical data in cloud
environment, which can provide representative
information in the split data sets. The ND-
IEEE 2015
based new model consists of two stages. The
first stage adopts the ND method
for large data sets splitting and processing,
which can reduce the volume of data sets. The
second stage implements the ND-based model
in a cloud computing infrastructure for
allocating the split data sets. The experimental
results show substantial efficiency gains of the
proposed method over the conventional
methods without splitting data into small
partitions. The ND-based method can generate
representative data sets, which can offer
efficient solution for large data processing.
The split data sets can be processed in parallel
in Cloud computing environment.
DOMAIN : ANDROID
TTA-AA-
C1501
MARS Mobile
Application
Relaunching Speed-up
through Flash-Aware
Page Swapping
The approach for
fast application relaunching on the current
Android system is to cache background
applications in memory. This mechanism is
limited by the available memory size. In
addition, the application state may not be
easily recovered. We propose a prototype
system, MARS, to enable page swapping and
cache
more applications. MARS can speed up the ap
plication relaunching and restore
the application state. As a
new page swapping design for
optimizing application relaunching, MARS
isolates Android runtime Garbage Collection
(GC) from page swapping for compatibility
and employs several flash-aware techniques
for swap-in speedup. Two main components
of MARS are page slot allocation and
read/write control. Page slot allocation
reorganizes page slots in swap area to produce
sequential reads and improve the performance
of swap-in. Read/Write control addresses the
read/write interference issue by reducing
concurrent and extra internal writes. Compared
to the conventional Linux page swapping,
these two components can scale up the read
bandwidth up to about 3.8 times.
IEEE 2015
Application tests on a Google Nexus 4 phone
show that MARS reduces the launching time
of applications by 50% 80%. The
modified page swapping mechanism can
outperform the conventional
Linux page swapping up to 4 times.
TTA-AA-
C1503
EGC MONITORING
SYSTEM USING
ANDROID
This paperwork describes the development and
test of circuitry and software to enable
the use of Android mobile phones equipped
with Bluetooth to receive the incoming
electrocardiogram (ECG) signal from a user
and show it in real-time on the cell phone
screen. The system comprises three distinct
subsystems. The first one is dedicated to
condition the analog ECG signal, preparing it
for conversion to the digital world. The second
one consists of a microcontroller and a
Bluetooth module. This unit samples the ECG,
serializes the samples and transmits them via
the Bluetooth module to the Android cell
phone. The third subsystem is the cell phone
itself. An application program written to the
cell phone receives the ECG samples and
suitably charts the ECG signal on the screen
for analysis. The good quality of the ECG
signal allows for identification of arrhythmias.
IEEE 2015
TTA-AA-
C1504
Auto emergency alert
using android
In this paper, we describe the Well Phone, a
smart phone with additional software, that
is used as a personal health monitoring device.
The Well Phone interfaces various health
monitoring devices to the smart phone, and
collects physiological data from those devices.
It employs novel algorithms that perform
statistical analyses, relate sequences of
disparate measurements from different devices,
and correlate physical activity with
physiological measurements. The Well Phone
provides feedback to the user by means of
visualization and speech interaction,
and alerts a caregiver, medical professional, or
emergency responder, as needed.
IEEE 2015
TTA-AA-
C1505
Disaster Alert system
using android
Robot can do a work with ease which seems to
be impossible for a man and it becomes more
IEEE 2015
helpful if one can able to control it wirelessly.
Now a day robot is becoming a versatile and
has a lot of features like one can control it by
Smartphone, can avoid obstacles
automatically, sense the environment and can
send alert and now even it can diffuse the
bomb and can perform almost all the critical
task. The feature which is discussed in this
paper is to use it in rescue and search mission.
The robot can be controlled
wirelessly using RF technology, has ultrasonic
sensor for obstacle detection and it is also
equipped with the smart phone camera to
provide a Omni directional view and can send
the video stream wirelessly to remote device
which makes it easier to controlled the bot.
The robot can explore those places where
human cannot reach easily like the places
suffered from natural disaster like earthquake,
tsunami and hurricane.
TTA-AA-
C1506
Farm corps
management system
using android
This study aimed to investigate an
establishment using an
Intelligent System which employed an
Embedded System and Smart Phone for
chicken farming management and problem
solving using Raspberry Pi and Arduino Uno.
An experiment and comparative analysis of the
intelligent system was applied in a sample
chicken farm in this study. The findings of this
study found that the system could monitor
surrounding weather conditions including
humidity, temperature, climate quality, and
also the filter fan switch control in the
chicken farm. The system was found to be
comfortable for farmers to use as they could
effectively control the farm anywhere at
anytime, resulting in cost reduction, asset
saving, and productive management in
chicken farming.
IEEE 2015
TTA-AA-
C1507
ACCIDENT TRACKING
APP FOR ANDROID
MOBILE
The usage of mobile devices has increased
dramatically in recent years. These devices
serve us in many practical ways and provide us
with many services -- many of them in real-
time. The delivery of streaming audio,
IEEE 2015
streaming video and internet content to these
devices has become common place. One
emerging application in recent years is the use
of mobile devices for tracking local traffic
incidents and there are several such providers
of this content on the Internet as Google Maps,
here.com, Twitter, various Department of
Transportation's web sites, various radio
stations websites, and many others. Some sites
as Twitter only provide text information but
are updated often with recent data. Map
enhanced websites provide visual information
but are often not updated as often. The goal of
this project is to integrate all the sources of
traffic information together in one place and
filter intelligently all the recent incident data so
the results are as accurate and up to date as
possible thus minimizing the number of false
reports and incidents. This process,
implemented for iOS 7 using XCode and
Objective-C, allows the user to view traffic
reports for 15 large US cities with the
capabilities for the addition of many more
locations. Results for the app are compared
with the major individual sources and the
percentage of additional incidents detected and
false incidents incorrectly identified for several
large cities are provided.
TTA-AA-
C1508
Friend book A
Semantic-based Friend
Recommendation
System for Social
Networks
Existing social networking services
recommend friends to users based on
their social graphs, which may not be the most
appropriate to reflect a user's preferences
on friend selection in real life. In this paper, we
present Friend book, a novel semantic-
based friend recommendation system for social
networks, which recommends friends to
users based on their life styles instead
of social graphs. By taking advantage of
sensor-rich smart phones, Friend book
discovers life styles of users from user-centric
sensor data, measures the similarity of life
styles between users, and
recommends friends to users if their life styles
have high similarity. Inspired by text mining,
we model a user's daily life as life documents,
IEEE 2015
from which his/her life styles are extracted by
using the Latent Dirichlet Allocation
algorithm. We further propose a similarity
metric to measure the similarity of life styles
between users, and calculate users' impact in
terms of life styles with a friend-matching
graph. Upon receiving a request, Friend book
returns a list of people with
highest recommendation scores to the query
user. Finally, Friend book integrates a
feedback mechanism to further improve
the recommendation accuracy. We have
implemented Friend book on the Android-
based smart phones, and evaluated its
performance on both small-scale experiments
and large-scale simulations. The results show
that there commendations accurately reflect the
preferences of users in choosing friends.
TTA-AA-
C1509
Blood Banking System
Using Android
Automated Blood Bank is an associate work
that brings voluntary blood donors and those in
need of blood on to a common platform. The
mission is to fulfill every blood request in the
country with a promising android application
and motivated individuals who are willing to
donate blood. The proposed work aims to
overcome this communication barrier by
providing a direct link between the donor and
the recipient by using low cost and low power
Raspberry Pi B+ kit. It requires Micro USB of
5V and 2A power supply only. Entire
communication takes place via SMS (Short
Messaging Service) which is compatible
among all mobile types.
"Automated Blood Bank" is an project that
brings voluntary blood donors and those in
need of blood on to a common platform. This
project aims at servicing the persons who seek
donors who are willing to donate blood and
also provide it in the time frame required.
Automated Blood Bank tries to assist
victims/patients/those in want of blood. It is an
endeavor to achieve dead set these people in
want of blood and connect them to those
willing to donate. The proposed work explores
to find blood donors by using GSM based
IEEE 2015
Smart Card CPU - Raspberry Pi B+ Kit. The
vision is to be “The hope of every Indian in
search of a voluntary blood donor”.
TTA-AM-
C1501
Timer-based Bloom
Filter Aggregation for
Reducing Signaling
Overhead in
Distributed Mobility
Management
Distributed mobility management (DMM) is a
promising technology to address the mobile
data traffic explosion problem. Since the
location information of mobile nodes (MNs)
are distributed in several mobility agents
(MAs), DMM requires an additional
mechanism to share the location information of
MNs between MAs. In the literature, multicast
or distributed hash table (DHT)-based sharing
methods have been suggested; however they
incur significant signaling overhead owing to
unnecessary location information updates
under frequent handovers.
To reduce the signaling overhead, we propose
a timer-
based Bloom filter aggregation (TBFA)
scheme for distributing the location
information. In the TBFA scheme, the location
information of MNs is maintained
by Bloom filters at each MA. Also, since the
propagation of the whole Bloom filter for
every MN movement leads to
high signaling overhead, each MA only
propagates changed indexes in
the Bloom filter when a pre-
defined timer expires. To verify the
performance of the TBFA scheme, we develop
analytical models on
the signaling overhead and the latency and
devise an algorithm to select an
appropriate timer value. Extensive simulation
results are given to show the accuracy of
analytical models and effectiveness of the
TBFA scheme over the existing DMM scheme.
IEEE 2015
DOMAIN : IMAGE PROCESSING
TTA-AI-
C1501
Smartphone-Based
Wound Assessment
System for Patients
With Diabetes
Diabetic foot ulcers represent a significant
health issue. Currently, clinicians and nurses
mainly base their wound assessment on visual
examination of wound size and healing status,
while the patients themselves seldom have an
IEEE 2015
opportunity to play an active role. Hence, a
more quantitative and cost-effective
examination method that enables
the patients and their caregivers to take a more
active role in daily wound care potentially can
accelerate wound healing, save travel cost and
reduce healthcare expenses. Considering the
prevalence of smart phones with a high-
resolution digital camera, assessing wounds by
analyzing images of chronic foot ulcers is an
attractive option. In this paper, we propose a
novel wound image
analysis system implemented solely on the
Android smart phone. The wound image is
captured by the camera on the smart
phone with the assistance of an image capture
box. After that, the smart
phone performs wound segmentation by
applying the accelerated mean-shift algorithm.
Specifically, the outline of the foot is
determined based on skin color, and
the wound boundary is found using a simple
connected region detection method. Within
the wound boundary, the healing status is next
assessed based on red-yellow-black color
evaluation model. Moreover, the healing status
is quantitatively assessed, based on trend
analysis of time records for a given patient.
Experimental results on wound images
collected in UMASS-Memorial Health
Center Wound Clinic (Worcester, MA)
following an Institutional Review Board
approved protocol show that our system can be
efficiently used to analyze the wound healing
status with promising accuracy.
TTA-AI-
C1502
Hand Gesture
Recognition Using
Kinect Sensor
Hand gesture is becoming one of the most
common ways that people use in information
technology products needing interaction
between people and computer, which brings to
user an interesting experience. 3D camera are
developed recently, e.g. Kinect, not only
provide color image, but also depth map. It
opens a new opportunity in development of
human computer interaction (HCI) application.
This paper shows a
IEEE 2015
novel hand gesture recognition method based
on depth image obtained from
the Kinectsensor. Firstly, the hand region
extraction is done by putting thresholds
on hand point detected by using NITE 2 library
provided by Prime Sense. Secondly, we extract
the feature vector including the number of
open fingers, the angles between the fingertips
and horizontal of the hand, the angles between
two consecutive fingers, and the difference
between the distance from the hand center to
the fingertips and the radius of the biggest
inscribed circle. Finally, a support vector
machine (SVM) is applied to identify
different gestures. The experimental result
shows that the proposed method
performs hand gesture recognition at accuracy
of 95% in real-time.
DOMAIN : MOBILE COMPUTING
TTA-AM-
C1501
Timer-based Bloom
Filter Aggregation for
Reducing Signaling
Overhead in
Distributed Mobility
Management
Distributed mobility management (DMM) is a
promising technology to address the mobile
data traffic explosion problem. Since the
location information of mobile nodes (MNs)
are distributed in several mobility agents
(MAs), DMM requires an additional
mechanism to share the location information of
MNs between MAs. In the literature, multicast
or distributed hash table (DHT)-based sharing
methods have been suggested; however they
incur significant signaling overhead owing to
unnecessary location information updates
under frequent handovers.
To reduce the signaling overhead, we propose
a timer-
based Bloom filter aggregation (TBFA)
scheme for distributing the location
information. In the TBFA scheme, the location
information of MNs is maintained
by Bloom filters at each MA. Also, since the
propagation of the whole Bloom filter for
every MN movement leads to
high signaling overhead, each MA only
propagates changed indexes in
the Bloom filter when a pre-
IEEE 2015
defined timer expires. To verify the
performance of the TBFA scheme, we develop
analytical models on
the signaling overhead and the latency and
devise an algorithm to select an
appropriate timer value. Extensive simulation
results are given to show the accuracy of
analytical models and effectiveness of the
TBFA scheme over the existing DMM scheme.

Final Year Project IEEE 2015

  • 1.
    TTA FINAL YEAR PROJECTSTITLES WITH ABSTRACT www.ttafinalyearprojects.com IEEE 2015, 2014, 2013, 2012, etc.., Projects for B.E/B.Tech/M.E/MCA/Bsc/Msc For complete base paper, call now and talk to our expert 90942066260 | 9042066280| 044 4353 3393
  • 2.
    DOMAIN : NETWORKING CODEPROJECT TITLE DESCRIPTION REFERENCE TTA-DN- C1501 Delay Analysis of Multichannel Opportunistic Spectrum Access MAC Protocols We provide a comprehensive delay and queuing analysis for two baseline medium access control protocols for multi-user cognitive radio networks with homogeneous users and channels and investigate the impact of different network parameters on the system performance. In addition to an accurate Markov chain, which follows the queue status of all users, several lower complexity queuing theory approximations are provided. Accuracy and performance of the proposed analytical approximations are verified with extensive simulations. It is observed that using an Aloha- type access to the control channel, a buffering MAC protocol, where in case of interruption the CR user waits for the primary user to vacate the channel before resuming the transmission, outperforms a switching MAC protocol, where the CR user vacates the channel in case of appearance of primary users and then compete again to gain access to a new channel. The reason is that the delay bottleneck for both protocols is the time required to successfully access the control channel, which occurs more frequently for the switching MAC protocol. It is thus shown that a clustering approach, where users are divided into clusters with a separate control channel per cluster, can significantly improve the performance by reducing the competitions over control channel. IEEE 2015 TTA-DN- C1502 LEISURE A Framework for Load-Balanced Network - Wide Traffic Measurement Network-wide traffic measurement is of interest to network operators to uncover global network behavior for the management tasks of traffic accounting, debugging or troubleshooting, security, and traffic engineering. Increasingly, sophisticated network measurement tasks such as anomaly detection and security forensic analysis are requiring in-depth fine-grained IEEE 2015
  • 3.
    flow-level measurements. However, performingin-depth per- flow measurements (e.g., detailed payload analysis) is often an expensive process. Given the fast-changing Internet traffic landscape and large traffic volume, a single monitor is not capable of accomplishing the measurement tasks for all applications of interest due to its resource constraint. Moreover, uncovering global network behavior requires network-wide traffic measurements at multiple monitors across the network since traffic measured at any single monitor only provides a partial view and may not be sufficient or accurate. These factors call for coordinated measurements among multiple distributed monitors. In this paper, we present a centralized optimization framework, LEISURE (Load- Equalized measurement), for load- balancing network measurement workloads across distributed monitors. Specifically, we consider various load-balancing problems under different objectives and study their extensions to support different deployment scenarios. We evaluate LEISURE via detailed simulations on Abilene and GEANT network traces to show that LEISURE can achieve much better load- balanced performance (e.g., 4.75X smaller peak workload and 70X smaller variance in workloads) across all coordinated monitors in comparison to naive solution (uniform assignment) to accomplish network- wide traffic measurement tasks. TTA-DN- C1503 Authenticated Key Exchange Protocols for Parallel Network File Systems We study the problem of key establishment for secure many-to-many communications. The problem is inspired by the proliferation of large-scale distributed file systems supporting parallel acc ess to multiple storage devices. Our work focuses on the current Internet standard for such file systems, i.e., parallel Network File System (pNFS), which makes use of Kerberos to IEEE 2015
  • 4.
    establish parallel sessionkeys between clients and storage devices. Our review of the existing Kerberos-based protocol shows that it has a number of limitations: (i) a metadata server facilitating key exchange between the clients and the storage devices has heavy workload that restricts the scalability of the protocol; (ii) the protocol does not provide forward secrecy; (iii) the metadata server generates itself all the session keys that are used between the clients and storage devices, and this inherently leads to key escrow. In this paper, we propose a variety of authenticated key exchange protocols that are designed to address the above issues. We show that our protocols are capable of reducing up to approximately 54% of the workload of the metadata server and concurrently supporting forward secrecy and escrow-freeness. All this requires only a small fraction of increased computation overhead at the client. TTA-DN- C1504 Diversifying Web Service Recommendation Results via Exploring Service Usage History The last decade has witnessed a tremendous growth of Web services as a major technology for sharing data, computing resources, and programs on the Web. With the increasing adoption and presence of Web services, design of novel approaches for effective Web service recommendation to satisfy users’ potential requirements has become of paramount importance. Existing Web service commendation approaches mainly focus on predicting missing QoS values of Web service candidates which are interesting to a user using collaborative filtering approach, content-based approach, or their hybrid. These recommendation approaches assume that recommended Web services are independent to each other, which sometimes may not be true. As a result, many similar or redundant Web services may exist in a recommendation list. In this paper, we propose a novel Web service recommendation approach IEEE 2015
  • 5.
    incorporating a user’spotential QoS preferences and diversity feature of user interests on Web services. User’s interests and QoS preferences on Web services are first mined by exploring the Web service usage history. Then we compute scores of Web service candidates by measuring their relevance with historical and potential user interests, and their QoS utility. We also construct a Web service graph based on the functional similarity between Web services. Finally, we present an innovative diversity- aware Web service ranking algorithm to rank the Web service candidates based on their scores, and diversity degrees derived from the Web service graph. Extensive experiments are conducted based on a real world Web service dataset, indicating that our proposed Web service recommendation approa ch significantly improves the quality of their commendation results compared with existing methods. TTA-DN- C1505 Virtual Servers Co- Migration for Mobile Accesses Online vs. Offline In this paper, we study the problem of co- migrating a set of service replicas residing on one or more redundant virtual servers in clouds in order to satisfy a sequence of mobile batch- request demands in a cost effective way. With such a migration, we can not only reduce the service access latency for end users but also minimize the network costs for service providers. The co-migration can be achieved at the cost of bulk-data transfer and increases the overall monetary costs for the service providers. To gain the benefits of service migration while minimizing the overall costs, we propose a co-migration algorithm Migk for multiple servers, each hosting a service replica. Migk is a randomized algorithm with a competitive cost of O(γ log n/min{1/k, μ/λ+μ}) to migrate κ services in a static n-node network where γ is the maximal ratio of the migration costs between any pair of neighbor nodes in the network, and where λ and μ represent the maximum wired IEEE 2015
  • 6.
    transmission cost andthe wireless link cost respectively. For comparison, we also study this problem in its static off-line form by proposing a parallel dynamic programming (hereafter DP) based algorithm that integrates the branch & bound strategy with sampling techniques in order to approximate the optimal DP results. We validate the advantage of the proposed algorithms via extensive simulation studies using various requests patterns and cloud network topologies. Our simulation results show that the proposed algorithms can effectively adapt to mobile access patterns to satisfy the service request sequences in a cost- effective way. TTA-DN- C1506 Anomaly-Based Network Intrusion Detection System We present POSEIDON, a new anomaly- based network intrusion detection system. POSEIDON is payload-based, and has a two- tier architecture: the first stage consists of a self-organizing map, while the second one is a modified PAYL system. Our benchmarks on the 1999 DARPA data set show a higher detection rate and lower number of false positives than PAYL and PHAD IEEE 2015 TTA-DN- C1507 CEDAR A Low-Latency and Distributed Strategy for Packet Recovery in Wireless Networks Underlying link-layer protocols of well- established wireless networks that use the conventional “store-and-forward” design paradigm cannot provide highly sustainable reliability and stability in wireless communication, which introduce significant barriers and setbacks in scalability and deployments of wireless networks. In this paper, we propose a Code Embedded Distributed Adaptive and Reliable (CEDAR) link-layer framework that targets low latency and balancing en/decoding load among nodes. CEDAR is the first comprehensive theoretical framework for analyzing and designing distributed and adaptive error recovery for wireless networks. It employs a theoretically sound framework for embedding channel codes in each packet and performs the error correcting process in selected intermediate nodes in a packet's route. To identify the intermediate nodes for the IEEE 2015
  • 7.
    decoding, we mathematicallycalculate the average packet delay and formalize the problem as a nonlinear integer programming problem. By minimizing the delays, we derive three propositions that: 1) can identify the intermediate nodes that minimize the propagation and transmission delay of a packet; and 2) and 3) can identify the intermediate nodes that simultaneously minimize the queuing delay and maximize the fairness of en/decoding load of all the nodes. Guided by the propositions, we then propose a scalable and distributed scheme in CEDAR to choose the intermediate en/decoding nodes in a route to achieve its objective. The results from real-world test bed “NESTbed” and simulation with MATLAB prove that CEDAR is superior to schemes using hop-by-hop decoding and destination decoding not only in packet delay and throughput but also in energy-consumption and load distribution balance. TTA-DN- C1508 CoCoWa A Collaborative Contact- Based Watchdog for Detecting Selfish Nodes Mobile ad-hoc networks (MANETs) assume that mobile nodes voluntary cooperate in order to work properly. This cooperation is a cost- intensive activity and some nodes can refuse to cooperate, leading to a selfish node behavior. Thus, the overall network performance could be seriously affected. The use of watchdogs is a well-known mechanism to detect selfish nodes. However, the detection process performed by watchdogs can fail, generating false positives and false negatives that can induce to wrong operations. Moreover, relying on local watchdogs alone can lead to poor performance when detecting selfish nodes, in term of precision and speed. This is specially important on networks with sporadic contacts, such as delay tolerant networks (DTNs), where sometimes watchdogs lack of enough time or information to detect the selfish nodes. Thus, we propose collaborative contact-based watchdog (CoCoWa) as a collaborative approach based on the diffusion of local selfish nodes awareness when IEEE 2015
  • 8.
    a contact occurs,so that information about selfish nodes is quickly propagated. As shown in the paper, this collaborative approach reduces the time and increases the precision when detecting selfish nodes. TTA-DN- C1509 Distributed Opportunistic Scheduling for EnergyHarvesting Based Wireless Networks A Two- StageProbing Approach This paper considers a heterogeneous ad hoc network with multiple transmitter-receiver pairs, in which all transmitters are capable of harvesting renewable energy from the environment and compete for one shared channel by random access. In particular, we focus on two different scenarios: the constant energy harvesting (EH) rate model where the EH rate remains constant within the time of interest and the i.i.d. EH rate model where the EH rates are independent and identically distributed across different contention slots. To quantify the roles of both the energy state information (ESI) and the channel state information (CSI), a distributed opportunistic scheduling (DOS) framework with two-stage probing and save- then-transmit energy utilization is proposed. Then, the optimal throughput and the optimal scheduling strategy are obtained via one- dimension search, i.e., an iterative algorithm consisting of the following two steps in each iteration: First, assuming that the stored energy level at each transmitter is stationary with a given distribution, the expected throughput maximization problem is formulated as an optimal stopping problem, whose solution is proven to exist and then derived for both models; second, for a fixed stopping rule, the energy level at each transmitter is shown to be stationary and an efficient iterative algorithm is proposed to compute its steady-state distribution. Finally, we validate our analysis by numerical results and quantify the throughput gain compared with the best-effort delivery scheme. IEEE 2015 TTA-DN- C1510 Enabling Efficient Multi- Keyword Ranked Search Over Encrypted Mobile Cloud Data In mobile cloud computing, a fundamental application is to outsource the mobile data to external cloud servers for scalable data storage. The outsourced data, however, need to IEEE 2015
  • 9.
    Through Blind Storagebe encrypted due to the privacy and confidentiality concerns of their owner. This results in the distinguished difficulties on the accurate search over the encrypted mobile cloud data. To tackle this issue, in this paper, we develop the searchable encryption for multi- keyword ranked search over the storage data. Specifically, by considering the large number of outsourced documents (data) in the cloud, we utilize the relevance score and k-nearest neighbor techniques to develop an efficient multi-keyword search scheme that can return the ranked search results based on the accuracy. Within this framework, we leverage an efficient index to further improve the search efficiency, and adopt the blind storage system to conceal access pattern of the search user. Security analysis demonstrates that our scheme can achieve confidentiality of documents and index, trapdoor privacy, trapdoor unlinkability, and concealing access pattern of the search user. Finally, using extensive simulations, we show that our proposal can achieve much improved efficiency in terms of search functionality and search time compared with the existing proposals. TTA-DN- C1511 Energy-Efficient Group Key Agreement for Wireless Networks Advances in lattice-based cryptography are enabling the use of public key algorithms (PKAs) in power-constrained ad hoc and sensor network devices. Unfortunately, while many wireless networks are dominated by group communications, PKAs are inherently unicast i.e., public/private key pairs are generated by data destinations. To fully realize public key cryptography in these networks, lightweight PKAs should be augmented with energy-efficient mechanisms for group key agreement. We consider a setting where master keys are loaded on clients according to an arbitrary distribution. We present a protocol that uses session keys derived from those master keys to establish a group key that is information- IEEE 2015
  • 10.
    theoretically secure. Whenmaster keys are distributed randomly, our protocol requires O(logb t) multicasts, where 1-1 is the probability that a given client possesses a given master key. The minimum number of public multicast transmissions required for a set of clients to agree on a secret key in our setting was recently characterized. The proposed protocol achieves the best possible approximation to that optimum that is computable in polynomial time. Moreover, the computational requirements of our protocol compare favorably to multi-party extensions of Diffie-Hellman key exchange. TTA-DN- C1512 iPath Path Inference in Wireless Sensor Networks Recent wireless sensor networks (WSNs) are becoming increasingly complex with the growing network scale and the dynamic nature of wireless communications. Many measurement and diagnostic approaches depend on per-packet routing paths for accurate and fine-grained analysis of the complex network behaviors. In this paper, we propose iPath, a novel path inference approach to reconstructing the per-packet routing paths in dynamic and large- scale networks. The basic idea of iPath is to exploit high path similarity to iteratively infer long paths from short ones. iPath starts with an initial known set ofpaths and performs path inference iteratively. iPath inclu des a novel design of a lightweight hash function for verification of the inferred paths. In order to further improve the inference capability as well as the execution efficiency, iPath includes a fast bootstrapping algorithm to reconstruct the initial set ofpaths. We also implement iPath and evaluate its performance using traces from large-scale WSN deployments as well as extensive simulations. Results show that iPath achieves much higher reconstruction ratios under different network settings compared to other state-of-the-art approaches. IEEE 2015
  • 11.
    TTA-DN- C1513 Joint Static and DynamicTraffic Scheduling in Data Center Networks The advent and continued growth of large data centers has led to much interest in switch architectures that can economically meet the high capacities needed for interconnecting the thousands of servers in these data centers. Various multilayer architectures employing thousands of switches have been proposed in the literature. We make use of the observation that the traffic in a data center is a mixture of relatively static and rapidly fluctuating components, and develop a combined scheduler for both these components using a generalization of the load-balanced scheduler. The presence of the known static component introduces asymmetries in the ingress-egress capacities, which preclude the use of a load- balanced scheduler as is. We generalize the load-balanced scheduler and also incorporate an opportunistic scheduler that sends traffic on a direct path when feasible to enhance the overall switch throughput. Our evaluations show that this scheduler works very well despite avoiding the use of a central scheduler for making packet-by- packet scheduling decisions. IEEE 2015 TTA-DN- C1514 On Downlink Beamforming with Small Cells inWireless Heterogeneous Systems In this letter, we study downlink beam forming for wireless heterogeneous networks with two groups of users. The users in one group (group 1) are supported by the small cell base station (SBS) as well as the macro cell base station (MBS), while the users in the other group (group 2) are supported by the MBS only. The MBS is equipped with an antenna array for downlink beam forming. We formulate a convex optimization problem, which can be solved by semi definite programming (SDP) relaxation, for downlink beam forming that takes advantage of the presence of the SBS for group 1, but also takes into account the interfering signal from the SBS for group 2. IEEE 2015
  • 12.
    TTA-DN- C1515 On-Demand Discovery of SoftwareService Dependencies in MANETs The dependencies among the components of service-oriented software applications hosted in a mobile ad hoc network (MANET) are difficult to determine due to the inherent loose coupling of the services and the transient communication topologies of the network. Yet understanding these dependencies is critical to making good management decisions, since dependence data underlie important analyses such as fault localization and impact analysis. Current methods for discovering dependencies, developed primarily for fixed networks, assume that dependencies change only slowly and require relatively long monitoring periods as well as substantial memory and communication resources, all of which are impractical in the MANET environment. We describe a new dynamic dependence discovery method designed specifically for this environment, yielding dynamic snapshots of dependence relationships discovered through observations of service interactions. We evaluate the performance of our method in terms of the accuracy of the discovered dependencies, and draw insights on the selection of critical parameters under various operational conditions. Although operated under more stringent conditions, our method is shown to provide results comparable to or better than existing methods. IEEE 2015 TTA-DN- C1516 PWDGR Pair-Wise Directional Geographical Routing Based on Wireless Sensor Network Multipath routing in wireless multimedia senso r network makes it possible to transfer data simultaneously so as to reduce delay and congestion and it is worth researching. However, the current multipath routing strategy may cause problem that the node energy near sink becomes obviously higher than other nodes which makes the network invalid and dead. It also has serious impact on the performance of wireless multimedia sensor network (WMS N). In this paper, we propose a pair-wise directional geographical routing (PWDGR) strategy to solve the energy bottleneck problem. First, the source node can send the IEEE 2015
  • 13.
    data to thepair-wise node around the sink node in accordance with certain algorithm and then it will send the data to the sink node. These pair-wise nodes are equally selected in 360° scope around sink according to a certain algorithm. Therefore, it can effectively relieve the serious energy burden around Sink and also make a balance between energy consumption and end-to-end delay. Theoretical analysis and a lot of simulation experiments on PWDGR have been done and the results indicate that PWDGR is superior to the proposed strategies of the similar strategies both in the view of the theory and the results of those simulation experiments. With respect to the strategies of the same kind, PWDGR is able to prolong 70% network life. The delay time is also measured and it is only increased by 8.1% compared with the similar strategies. TTA-DN- C1517 REAL A Reciprocal Protocol for Location Privacy in Wireless Sensor Networks K-anonymity has been used to protect location privacy for location monitorin g services in wireless sensor networks (WSNs), where sensor nodes work together to report k-anonymized aggregate locations to a server. Each k- anonymized aggregate location is a cloaked area that contains at least k persons. However, we identify an attack model to show that overlapping aggregate locations still pose privacy risks because an adversary can infer some overlapping areas with less than k persons that violates the k- anonymity privacy requirement. In this paper, we propose a reciprocal protocol for location privacy (REAL) in WSNs. In REAL, sensor nodes are required to autonomously organize their sensing areas into a set of non-overlapping and highly accurate k- anonymized aggregate locations. To confront the three key challenges in REAL, namely, self-organization, reciprocity property and high accuracy, we design a state transition process, a locking mechanism and a time delay mechanism, respectively. We compare the performance of REAL with IEEE 2015
  • 14.
    current protocols throughsimulated experiments. The results show that REAL protects location privacy, provides more accurate query answers, and reduces communication and computational costs. TTA-DN- C1518 SanGA A Self-Adaptive Network-Aware Approach to Service Composition Service-Oriented Computing enables the composition of loosely coupled services provided with varying Quality of Service (QoS) levels. Selecting a near-optimal set of services for a composition in terms of QoS is crucial when many functionally equivalent services are available. As the number of distributed services, particularly in the cloud, is rising rapidly, the impact of the network on the QoS keeps increasing. Despite this, current approaches do not differentiate between the QoS of services themselves and the network. Therefore, the computed latency differs from the actual latency, resulting in suboptimal QoS. Thus, we propose a network- aware approach that handles the QoS of services and the QoS of the network independently. First, we build a network model in order to estimate the network latency between arbitrary services and potential users. Our selection algorithm then leverages this model to find compositions with a low latency for a given execution policy. We employ a self-adaptive genetic algorithm which balances the optimization of latency and other QoS as needed and improves the convergence speed. In our evaluation, we show that our approach works under realistic network conditions, efficiently computing compositions with much lower latency and otherwise equivalent QoS compared to current approaches. IEEE 2015 TTA-DN- C1519 Secure Binary Image Stegnograpghy Based On Minimizing the disortion on the texture Most state-of-the- art binary image steganographic techniques only consider the flipping distortion according to the human visual system, which will be not secure when they are attacked by IEEE 2015
  • 15.
    steganalyzers. In thispaper, a binary image steganographic scheme that aims to minimize the embedding distortion on the texture is presented. We extract the complement, rotation, and mirroring-invariant local texture patterns (crmiLTPs) from the binary image first. The weighted sum of crmiLTP changes when flipping one pixel is then employed to measure the flipping distortion corresponding to that pixel. By testing on both simple binary images and the constructed image data set, we show that the proposed measurement can well describe the distortions on both visual quality and statistics. Based on the proposed measurement, a practical steganographic scheme is developed. The steganographic scheme generates the cover vector by dividing the scrambled image into super pixels. Thereafter, the syndrome-trellis code is employed to minimize the designed embedding distortion. Experimental results have demonstrated that the proposed steganographic scheme can achieve statistical security without degrading the image quality or the embedding capacity. TTA-DN- C1520 Software Puzzle A Countermeasure to Resource-Inflated Denial-of- Service Attacks Denial-of-service (DoS) and distributed DoS (DDoS) are among the major threats to cyber- security, and client puzzle, which demands a client to perform computationally expensive operations before being granted services from a server, is a well- known countermeasure to them. However, an attacker can inflate its capability of DoS/DDoS attacks with fast puzzle- solving software and/or built-in graphics processing unit (GPU) hardware to significantly weaken the effectiveness of client puzzles. In this paper, we study how to prevent DoS/DDoS attackers from inflating their puzzle-solving capabilities. To this end, we introduce a new client puzzle referred to as software puzzle. Unlike the existing client puzzle schemes, which publish their puzzle algorithms in IEEE 2015
  • 16.
    advance, a puzzlealgorithm in the present software puzzle scheme is randomly generated only after a client request is received at the server side and the algorithm is generated such that: 1) an attacker is unable to prepare an implementation to solve the puzzle in advance and 2) the attacker needs considerable effort in translating a central processing unit puzzle software to its functionally equivalent GPU version such that the translation cannot be done in real time. Moreover, we show how to implement software puzzle in the generic server-browser model. TTA-DN- C1521 Task Allocation for Wireless Sensor Network Using Modified Binary Particle Swarm Optimization Many applications of wireless sensor network (WSN) require the execution of several computationally intense in-network processing tasks. Collaborative in- network processing among multiple nodes is essential when executing such a task due to the strictly constrained energy and resources in single node. Task allocation is essential to allocate the workload of each task to proper nodes in an efficient manner. In this paper, a modified version of binary particle swarm optimization (MBPS O), which adopts a different transfer function and a new position updating procedure with mutation, is proposed for the task allocation problem to obtain the best solution. Each particle in MBPSO is encoded to represent a complete potential solution for task allocation. The task workload and connectivity are ensured by taking them as constraints for the problem. Multiple metrics, including task execution time, energy consumption, and network lifetime, are considered a whole by designing a hybrid fitness function to achieve the best overall performance. Simulation results show the feasibility of the proposed MBPSO-based approach for task allocation problem in WSN. The proposed MBPSO-based approach also outperforms the approaches based on genetic algorithm and BPSO in the comparative IEEE 2015
  • 17.
    analysis. TTA-DN- C1522 Towards Distributed Optimal Movement Strategyfor Data Gathering in Wireless Sensor Network In this paper, we address how to design a distributed movement strategy for mobile collectors, which can be either physical mobile agents or query/collector packets periodically launched by the sink, to achieve successful data gathering in wireless sensor net works. Formulating the problem as general random walks on a graph composed of sensor nodes, we analyze how much data can be successfully gathered in time under any Markovian random- walk movement strategies for mobile collectors moving over a graph (or network), while each sensor node is equipped with limited buffer space and data arrival rates are heterogeneous over different sensor nodes. In particular, from the analysis, we obtain the optimal movement strategy among a class of Markovian strategies so as to minimize the data loss rate over all sensor nodes, and explain how such an optimal movement strategy can be made to work in a distributed fashion. We demonstrate that our distributed optimal movement strategy can lead to about 2 times smaller loss rate than a standard random walk strategy under diverse scenarios. In particular, our strategy results in up to 70% cost savings for the deployment of multiple collectors to achieve the target data loss rate than the standard random walk strategy. IEEE 2015 TTA-DN- C1523 Universal Network Coding-Based Opportunistic Routing for Unicast Network coding- based opportunistic routing has emerged as an elegant way to optimize the capacity of lossy wireless multihop networks by reducing the amount of required feedback messages. Most of the works on network coding- based opportunistic routing in the literature assume that the links are independent. This assumption has been invalidated by the recent empirical studies that showed that the IEEE 2015
  • 18.
    correlation among thelinks can be arbitrary. In this work, we show that the performance of network coding- based opportunistic routing is greatly impacted by the correlation among the links. We formulate the problem of maximizing the throughput while achieving fairness under arbitrary channel conditions, and we identify the structure of its optimal solution. As is typical in the literature, the optimal solution requires a large amount of immediate feedback messages, which is unrealistic. We propose the idea of performing network coding on the feedback messages and show that if the intermediate node waits until receiving only one feedback message from each next-hop node, the optimal level of network coding redundancy can be computed in a distributed manner. The coded feedback messages require a small amount of overhead, as they can be integrated with the packets. Our approach is also oblivious to losses and correlations among the links, as it optimizes the performance without the explicit knowledge of these two factors. TTA-JN- C1524 VEGAS Visual influEnce GrAph Summarization on Citation Networks Visually analyzing citation networks poses challenges to many fields of the data mining research. How can we summarize a large citation graph according to the user's interest? In particular, how can we illustrate the impact of a highly influential paper through the summarization? Can we maintain the sensory node-link graph structure while revealing the flow-based influence patterns and preserving a fine readability? The state-of-the- art influence maximization algorithms can detect the most influential node in a citation network, but fail to summarize a graph structure to account for its influence. On the other hand, existing graph summarization methods fold large graphs into clustered views, but can not reveal the hidden influence patterns underneath the citation network. In this paper, we first formally define IEEE 2015
  • 19.
    the Influence GraphSummarization problem on citation networks. Second, we propose a matrix decomposition based algorithm pipeline to solve the IGS problem. Our method can not only highlight the flow- based influence patterns, but also easily extend to support the rich attribute information. A prototype system called VEGAS implementing this pipeline is also developed. Third, we present a theoretical analysis on our main algorithm, which is equivalent to the kernel k- mean clustering. It can be proved that the matrix decomposition based algorithm can approximate the objective of the proposed IGS problem. Last, we conduct comprehensive experiments with real- world citation networks to compare the proposed algorithm with classical graph summarization methods. Evaluation results demonstrate that our method significantly outperforms the previous ones in optimizing both the quantitative IGS objective and the quality of the visual summarizations. TTA-JN- C1525 Privacy Protection for Wireless Medical Sensor Data In recent years, wireless sensor networks have been widely used in healthcare applications, such as hospital and home patient monitoring. Wireless medical sensor net works are more vulnerable to eavesdropping, modification, impersonation and replaying attacks than the wired networks. A lot of work has been done to secure wireless medical sensor networks . The existing solutions can protect the patient data during transmission, but cannot stop the inside attack where the administrator of the patient database reveals the sensitive patient data. In this paper, we propose a practical approach to prevent the inside attack by using IEEE 2015
  • 20.
    multiple data serversto store patient data. The main contribution of this paper is securely distributing the patient data in multiple data servers and employing the Paillier and ElGamal cryptosystems to perform statistic analysis on the patient data without compromising the patients’ privacy. TTA-JN- C1526 A Decentralized Cloud Firewall Framework with Resources Provisioning Cost Optimization Cloud computing is becoming popular as the next infrastructure of computing platform. Despite the promising model and hype surrounding, security has become the major concern that people hesitate to transfer their applications to clouds. Concretely, cloud platform is under numerous attacks. As a result, it is definitely expected to establish a firewall to protect cloud from these attacks. However, setting up a centralized firewall for a whole cloud data center is infeasible from both performance and financial aspects. In this paper, we propose a decentralized cloud firewall framework for individual cloud customers. We investigate how to dynamically allocate resources to optimize resources provisioning cost, while satisfying QoS requirement specified by individual customers simultaneously. Moreover, we establish novel queuing theory based model M/Geo/1 and M/Geo/m for quantitative system analysis, where the service times follow a geometric distribution. By employing Z-transform and embedded Markov chain techniques, we obtain a closed-form expression of mean packet response time. Through extensive simulations and experiments, we conclude that an M/Geo/1 model reflects the cloud firewall real system much better than a traditional M/M/1 model. Our numerical results also indicate that we are able to set up cloud firewall with affordable cost to cloud customers. IEEE 2015
  • 21.
    TTA-JN- C1527 A Privacy-Aware Authentication Scheme forDistributed Mobile Cloud Computing Services In modern societies, the number of mobile users has dramatically risen in recent years. In this paper, an efficient authentication scheme for distributed mobile cloud computing services is proposed. The proposed scheme provides security and convenience for mobile users to access multiple mobile cloud computing services from multiple service providers using only a single private key. The security strength of the proposed scheme is based on bilinear pairing cryptosystem and dynamic nonce generation. In addition, the scheme supports mutual authentication, key exchange, user anonymity, and user untraceability. From system implementation point of view, verification tables are not required for the trusted smart card generator (SCG) service and cloud computing service pr oviders when adopting the proposed scheme. In consequence, this scheme reduces the usage of memory spaces on these corresponding service providers. In one mobile user authentication session, only the targeted cloud service provider needs to interact with the service requestor (user). The trusted SCG serves as the secure key distributor for distributed cloud service providers and mobile clients. In the proposed scheme, the trusted SCG service is not involved in individual user authentication process. With this design, our scheme reduces authentication processing time required by communication and computation between cloud service providers and traditional trusted third party service. Formal security proof and performance analyses are conducted to show that the scheme is both secure and efficient. IEEE 2015 TTA-JN- C1528 CPCDN Content Delivery Powered by Context and User Intelligence There is an unprecedented trend that content providers (CPs) are building their own content delivery networks (CDNs) to provide a variety of content services to IEEE 2015
  • 22.
    their users. Byexploiting powerful CP-level information in content distribution, these CP- built CDNs open up a whole new design space and are changing the content delivery landscape. In this paper, we adopt a measurement-based approach to understanding why, how, and how much CP- level intelligences can help content delivery. We first present a measurement study of the CDN built by Tencent, a largest content provider based in China. We observe new characteristics and trends in content delivery which pose great challenges to the conventional content delivery paradigm and motivate the proposal of CPCDN, a CDN powered by CP-aware information. We then reveal the benefits obtained by exploiting two indispensable CP-level intelligences, namely context intelligence and user intelligen ce, in content delivery. Inspired by the insights learnt from the measurement studies, we systematically explore the design space of CPCDNand present the novel architecture and algorithms to address the new content delivery challenges that have arisen. Our results not only demonstrate the potential of CPCDN in pushing content delivery performance to the next level, but also identify new research problems calling for further investigation. TTA-JN- C1529 QoS Evaluation for Web Service Recommendation Web service recommendation is one of the most important fields of research in the area of service computing. The two core problems of Web service recommendation are the prediction of unknown QoSproperty values and the evaluation of overall QoS according to user preferences. Aiming to address these two problems and their current challenges, we propose two efficient approaches to solve these problems. First, unknown QoS property values were predicted by modeling the high- dimensional QoSdata as tensors, by utilizing an important tensor operation, i.e., tensor composition, to predict these QoSvalues. Our IEEE 2015
  • 23.
    method, which considersall QoS dimensions integrally and uniformly, allows us to predict multi-dimensional QoS values accurately and easily. Second, the overall QoS was evaluated by proposing an efficient user preference learning method, which learns user preferences based on users' ratings history data, allowing us to obtain user preferences quantifiably and accurately. By solving these two core problems, it became possible to compute a realistic value for the overall QoS. The experimental results showed our proposed methods to be more efficient than existing methods. TTA-JN- C1530 Towards Information Diffusion in Mobile Social Networks The emerging of mobile social networks opens opportunities for viral marketing. However, before fully utilizing mobile social networks as a platform for viral marketing, many challenges have to be addressed. In this paper, we address the problem of identifying a small number of individuals through whom the information can be diffused to the network as soon as possible, referred to as the diffusion minimization problem. Diffusion minimization under the probabilistic diffusion model can be formulated as an asymmetric k- center problem which is NP-hard, and the best known approximation algorithm for the asymmetric k- center problem has approximation ratio of log n and time complexity O(n5). Clearly, the performance and the time complexity of the approximation algorithm are not satisfiable in large-scale mobile social networks. To deal with this problem, we propose a community based algorithm and a distributed set-cover algorithm. The performance of the proposed algorithms is evaluated by extensive experiments on both synthetic networks and a real trace. The results show that the community based algorithm has the best performance in both synthetic networks and the real trace compared to existing algorithms, and the distributed set-cover algorithm outperforms the approximation algorithm in IEEE 2015
  • 24.
    the real tracein terms of diffusion time. TTA-JN- C1531 Location-Sharing Systems With Enhanced Privacy in Mobile Online Social Networks Location sharing is one of the critical components in mobile online social networks (mOSNs), which has attracted much attention recently. With the advent of mOSNs, more and more users' location information will be collected by the service providers in mOSN. However, the users' privacy, including location privacy and social network privacy, cannot be guaranteed in the previous work without the trust assumption on the service providers. In this paper, aiming at achieving enhanced privacy against the insider attack launched by the service providers in mOSNs, we introduce a new architecture with multiple location servers for the first time and propose a secure solution supporting location sharing among friends and strangers in location-based applications. In our construction, the user's friend set in each friend’s query submitted to the location servers is divided into multiple subsets by the social network server randomly. Each location server can only get a subset of friends, instead of the whole friends' set of the user as the previous work. In addition, for the first time, we propose a location-sharing construction which provides check ability of the searching results returned from location servers in an efficient way. We also prove that the new construction is secure under the stronger security model with enhanced privacy. Finally, we provide extensive experimental results to demonstrate the efficiency of our proposed construction. IEEE 2015 TTA-JN- C1532 Mobile Based Healthcare Management Using Artificial Intelligence In this growing age of technology it is necessary to have a proper health care management system which should be cent percent accurate but also should be portable so that every person carry with it as personalized health care system. The health care management system which will consist of mobile based Heart Rate Measurement so IEEE 2015
  • 25.
    that the datacan be transferred and diagnosis based on heart rate can be provided quickly with a click of button. The system will consist of video conferencing to connect remotely with the Doctor. The Doc-Bot which was developed earlier is now being transferred to mobile platform and will be further advanced for diagnosis of common diseases. The system will also consist of Online Blood Bank which will provide up-to-date details about availability of blood in different hospitals. TTA-JN- C1533 PSMPA Patient Self- Controllable and Multi- Level Privacy- Preserving Cooperative Authentication in Distributed m- Healthcare Cloud Computing System Distributed m- healthcare cloud computing system significantl y facilitates efficient patient treatment for medical consultation by sharing personal health information among healthcare providers. However, it brings about the challenge of keeping both the data confidentiality and patients' identity privacy simultaneously. Many existing access control and anonymous authentication schemes cannot be straightforwardly exploited. To solve the problem, in this paper, a novel authorized accessible privacy model (AAPM) is established. Patients can authorize physicians by setting an access tree supporting flexible threshold predicates. Then, based on it, by devising a new technique of attribute-based designated verifier signature, a patient self- controllable multi-level privacy- preserving cooperativeauthentication scheme (PSMPA) realizing three levels of security and privacy requirement in distribute dm- healthcare cloud computing system is proposed. The directly authorized physicians, the indirectly authorized physicians and the unauthorized persons in medical consultation can respectively decipher the personal health information and/or verify patients' identities by satisfying the access tree with their own attribute sets. Finally, the formal security proof and simulation results illustrate our scheme can resist various kinds of attacks and far IEEE 2015
  • 26.
    outperforms the previousones in terms of computational, communication and storage overhead. TTA-JN- C1534 Secure and Distributed Data Discovery and Dissemination in Wireless Sensor Networks A data discovery and dissemination protocol for wireless sensor networks (WSNs) is responsible for updating configuration parameters of, and distributing management commands to, the sensor nodes. All existing data discovery and dissemination prot ocols suffer from two drawbacks. First, they are based on the centralized approach; only the base station can distribute data items. Such an approach is not suitable for emergent multi- owner-multi-user WSNs. Second, those protocols were not designed with security in mind and hence adversaries can easily launch attacks to harm the network. This paper proposes the first secure and distributed data discovery and dissemination protocol named DiDrip. It allows the network owners to authorize multiple network users with different privileges to simultaneously and directly disseminate data items to the sensor nodes. Moreover, as demonstrated by our theoretical analysis, it addresses a number of possible security vulnerabilities that we have identified. Extensive security analysis show DiDrip is provably secure. We also implement DiDrip in an experimental network of resource- limited sensor nodes to show its high efficiency in practice. IEEE 2015 TTA-JN- C1535 DDSGA A Data-Driven Semi-Global Alignment Approach for Detecting Masquerade Attacks A masquerade attacker impersonates a legal user to utilize the user services and privileges. The semi-global alignment algorithm (SGA) is one of the most effective and efficient techniques to detect these attacks but it has not reached yet the accuracy and performance required by large scale, multiuser systems. To improve both the effectiveness and the performances of this algorithm, we propose the Data-Driven Semi- Global Alignment, DDSGA approach. From the security effectiveness view point, IEEE 2015
  • 27.
    DDSGA improves thescoring systems by adopting distinct alignment parameters for each user. Furthermore, it tolerates small mutations in user command sequences by allowing small changes in the low-level representation of the commands functionality. It also adapts to changes in the user behavior by updating the signature of a user according to its current behavior. To optimize the runtime overhead, DDSGA minimizes the alignment overhead and parallelizes the detection and the update. After describing the DDSGA phases, we present the experimental results that show that DDSGA achieves a high hit ratio of 88.4 percent with a low false positive rate of 1.7 percent. It improves the hit ratio of the enhanced SGA by about 21.9 percent and reduces Maxion- Townsend cost by 22.5 percent. Hence, DDSGA results in improving both the hit ratio and false positive rates with an acceptable computational overhead. TTA-JN- C1536 Revisiting Attribute- Based Encryption with Verifiable Outsourced Decryption Attribute-based encryption (ABE) is a promising technique for fine-grained access control of encrypted data in a cloud storage, however, decryption involved in the ABEs is usually too expensive for resource-constrained front-end users, which greatly hinders its practical popularity. In order to reduce the decryption overhead for a user to recover the plaintext, Green et al. suggested to outsource the majority of the decryption work without revealing actually data or private keys. To ensure the third-party service honestly computes the outsourced work, Lai et al. provided a requirement of verifiability to the decryption of ABE, but their scheme doubled the size of the underlying ABE ciphertext and the computation costs. Roughly speaking, their main idea is to use a parallel encryption technique, while one of the encryption components is used for the verification purpose. Hence, the bandwidth and the computation cost are doubled. In this IEEE 2015
  • 28.
    paper, we investigatethe same problem. In particular, we propose a more efficient and generic construction of ABE with verifiable outsourced decryption based on an attribute-based key encapsulation mechanism, a symmetric- key encryption scheme and a commitment scheme. Then, we prove the security and the verification soundness of our constructed ABE scheme in the standard model. Finally, we instantiate our scheme with concrete building blocks. Compared with Lai et al.'s scheme, our scheme reduces the bandwidth and the computation costs almost by half. TTA-JN- C1537 A Strategy of Clustering Modification Directions in Spatial Image Steganography Most of the recently proposed steganographic schemes are based on minimizing an additive distortion function defined as the sum of embedding costs for individual pixels. In such an approach, mutual embedding impacts are often ignored. In this paper, we present an approach that can exploit the interactions among embedding changes in order to reduce the risk of detection by steganalysis. It employs a novel strategy, called clustering modification directions (CMDs), based on the assumption that when embedding modifications in heavily textured regions are locally heading toward the same direction, the steganographic security might be improved. To implement the strategy, a cover image is decomposed into several sub images, in which message segments are embedded with well-known schemes using additive distortion functions. The costs of pixels are updated dynamically to take mutual embedding impacts into account. Specifically, when neighboring pixels IEEE 2015
  • 29.
    are changed towarda positive/negative direction, the cost of the considered pixel is biased toward the same direction. Experimental results show that our proposed CMD strategy, incorporated into existing steganographic schemes, can effectively overcome the challenges posed by the modern steganalyzers with high- dimensional features. TTA-JN- C1538 An Access Control Model for Online Social Networks Using User- to-User Relationships Users and resources in online social networks (OSNs) are interconnected via various types of relationships. In particular, user-to- user relationships form the basis of the OSN structure, and play a significant role in specifying and enforcing access control. Individual users and the OSN provider should be enabled to specify which access can be granted in terms of existing relationships. In this paper, we propose a novel user-to- user relationship- based access control (UURAC) model for OSN systems that utilizes regular expression notation for such policy specification. Access control policies on users and resources are composed in terms of requested action, multiple relationship types, the starting point of the evaluation, and the number of hops on the path. We present two path checking algorithms to determine whether the required relationship path between users for a given access request exists. We validate the feasibility of our approach by implementing a prototype system and evaluating the performance of these two algorithms. IEEE 2015 TTA-JN- C1539 An Authenticated Trust and Reputation Calculation and Management System for Cloud and Sensor Networks Integration Induced by incorporating the powerful data storage and data processing abilities of cloud computing (CC) as well as ubiquitous data gathering capability of wireless sensor networks (WSNs), CC-WSN integration received a lot of attention from IEEE 2015
  • 30.
    both academia andindustry. However, authentication as well as trust and reputation calculation and manage ment of cloud service providers (CSPs) and sensor network providers (SNPs) are two very critical and barely explored issues for this new paradigm. To fill the gap, this paper proposes a novel authenticated trust and reputation calcula tion and management (ATRCM) system for CC-WSN integration. Considering the authenticity of CSP and SNP, the attribute requirement of cloud service user (CSU) and CSP, the cost, trust, and reputation of the service of CSP and SNP, the proposed ATRCM system achieves the following three functions: 1) authenticating CSP and SNP to avoid malicious impersonation attacks; 2) calculating and managing trust and reputation regarding the service of CSP and SNP; and 3) helping CSU choose desirable CSP and assisting CSP in selecting appropriate SNP. Detailed analysis and design as well as further functionality evaluation results are presented to demonstrate the effectiveness of ATRCM, followed with system security analysis. TTA-JN- C1540 An Efficient Privacy- Preserving Ranked Keyword Search Method Cloud data owners prefer to outsource documents in an encrypted form for the purpose of privacy preserving. Therefore it is essential to develop efficient and reliable ciphertext search techniques. One challenge is that the relationship between documents will be normally concealed in the process of encryption, which will lead to significant search accuracy performance degradation. Also the volume of data in data centers has experienced a dramatic growth. This will make it even more challenging to design ciphertext search schemes that can provide efficient and reliable online information retrieval on large volume of encrypted data. In this paper, a hierarchical clustering method is proposed to support more search semantics and also to meet the demand for fast ciphertext search within a big IEEE 2015
  • 31.
    data environment. Theproposed hierarchical approach clusters the documents based on the minimum relevance threshold, and then partitions the resulting clusters into sub- clusters until the constraint on the maximum size of cluster is reached. In the search phase, this approach can reach a linear computational complexity against an exponential size increase of document collection. In order to verify the authenticity of search results, a structure called minimum hash sub-tree is designed in this paper. Experiments have been conducted using the collection set built from the IEEE Xplore. The results show that with a sharp increase of documents in the dataset the search time of the proposed method increases linearly whereas the search time of the traditional method increases exponentially. Furthermore, the proposed method has an advantage over the traditional method in the rank privacy and relevance of retrieved documents. TTA-JN- C1541 An Internal Intrusion Detection and Protection System by Using Data Mining and Forensic Techniques Currently, most computer systems use user IDs and passwords as the login patterns to authenticate users. However, many people share their login patterns with coworkers and request these coworkers to assist co-tasks, thereby making the pattern as one of the weakest points of computer security. Insider attackers, the valid users of a system who attack the system internally, are hard to detect since most intrusion detection systems and firewalls identify and isolate malicious behaviors launched from the outside world of the system only. In addition, some studies claimed that analyzing system calls (SCs) generated by commands can identify these commands, with which to accurately detect attacks, and attack patterns are the features of an attack. Therefore, in this paper, a security system, named the Internal Intrusion Detection and Protection Sys tem (IIDPS), is proposed to detect insider attacks at SC level by using data mining and forensic techniq IEEE 2015
  • 32.
    ues. The IIDPScreates users' personal profiles to keep track of users' usage habits as their forensic features and determines whether a valid login user is the account holder or not by comparing his/her current computer usage behaviors with the patterns collected in the account holder's personal profile. The experimental results demonstrate that the IIDPS's user identification accuracy is 94.29%, whereas the response time is less than 0.45 s, implying that it can prevent a protected system from insider attacks effectively and efficiently. TTA-JN- C1542 Cloud-Assisted Safety Message Dissemination in VANET–Cellular Heterogeneous Wireless Network In vehicular ad hoc networks (VANETs), efficient message dissemination is critical to road safety and traffic efficiency. Since many VANET-based schemes suffer from high transmission delay and data redundancy, the integrated VANET– cellular heterogeneous network has been proposed recently and attracted significant attention. However, most existing studies focus on selecting suitable gateways to deliver safety message from the source vehicle to a remote server, whereas rapid safety message dissemination from the remote server to a targeted area has not been well studied. In this paper, we propose a framework for rapid message dissemination that combines the advantages of diverse communication and cloud computing technologies. Specifically, we propose a novel Cloud- assisted Message Downlink dissemination Scheme (CMDS), with which the safety messages in the cloud server are first delivered to the suitable mobile gateways on relevant roads with the help of cloud computing (where gateways are buses with both cellular and VANET interfaces), and then being disseminated among neighboring vehicles via vehicle-to-vehicle (V2V) communication. To evaluate the proposed scheme, we mathematically analyze its performance and IEEE 2015
  • 33.
    conduct extensive simulationexperiments. Numerical results confirm the efficiency of CMDS in various urban scenarios. TTA-JN- C1543 Collaborative Task Execution in Mobile Cloud Computing Under a Stochastic Wireless Channel This paper investigates collaborative task execution betwe en a mobile device and a cloud clone for mobile applications under a stochastic wireless channel. A mobile application is modeled as a sequence of tasks that can be executed on the mobile device or on the cloud clone. We aim to minimize the energy consumption on the mobile device while meeting a time deadline, by strategically offloading tasks to the cloud. We formulate the collaborative task execution as a constrained shortest path problem. We derive a one-climb policy by characterizing the optimal solution and then propose an enumeration algorithm for the collaborative task execution in polynomial time. Further, we apply the LARAC algorithm to solving the optimization problem approximately, which has lower complexity than the enumeration algorithm. Simulation results show that the approximate solution of the LARAC algorithm is close to the optimal solution of the enumeration algorithm. In addition, we consider a probabilistic time deadline, which is transformed to hard deadline by Markov inequality. Moreover, compared to the local execution and the remote execution, the collaborative task execution can significantly save the energy consumption on the mobile device, prolonging its battery life. IEEE 2015 TTA-JN- C1544 Contact-Aware Data Replication in Roadside Unit Aided Vehicular Delay Tolerant Networks Roadside units (RSUs), which enable vehicles- to infrastructure communications, are deployed along roadsides to handle the ever-growing communication demands caused by explosive increase of vehicular traffics. How to efficiently utilize them to enhance the vehicular delay tolerant network (VDTN) performance are the important problems in IEEE 2015
  • 34.
    designing RSU-aided VDTNs.In this work, we implement an extensive experiment involving tens of thousands of operational vehicles in Beijing city. Based on this newly collected Beijing trace and the existing Shanghai trace, we obtain some invariant properties for communication contacts of large scale RSU-aided VDTNs. Specifically, we find that the contact time between RSUs and vehicles obeys an exponential distribution, while the contact rate between them follows a Poisson distribution. According to these observations, we investigate the problem of communication contact- aware mobile data replication for RSU- aided VDTNs by considering the mobile data dissemination system that transmits data from the Internet to vehicles via RSUs through opportunistic communications. In particular, we formulate the communication contact-aware RSU- aidedvehicular mobile data dissemination problem as an optimization problem with realistic VDTN settings, and we provide an efficient heuristic solution for this NP-hard problem. By carrying out extensive simulation using realistic vehicular traces, we demonstrate the effectiveness of our proposed heuristic contact-aware data replication scheme, in comparison with the optimal solution and other existing schemes. TTA-JN- C1545 Cost-Aware SEcure Routing (CASER) Protocol Design for Wireless Sensor Networks Lifetime optimization and security are two conflicting design issues for multi- hop wireless sensor networks (WSNs) with non-replenishable energy resources. In this paper, we first propose a novel secure and efficient Cost- Aware Secure Routing (CASER) protocol to address these two conflicting issues through two adjustable parameters: energy balance control (EBC) and probabilistic-based random walking. We then discover that the energy consumption is severely disproportional to the uniform energy deployment for the given network topology, which greatly reduces IEEE 2015
  • 35.
    the lifetime ofthe sensor networks. To solve this problem, we propose an efficient non- uniform energy deployment strategy to optimize the lifetime and message delivery ratio under the same energy resource and security requirement. We also provide a quantitative security analysis on the proposed routing protocol. Our theoretical analysis and OPNET simulation results demonstrate that the proposed CASER protocol can provide an excellent tradeoff between routing efficiency and energy balance, and can significantly extend the lifetime of the sensor networks in all scenarios. For the non-uniform energy deployment, our analysis shows that we can increase the lifetime and the total number of messages that can be delivered by more than four times under the same assumption. We also demonstrate that the proposed CASER protocol can achieve a high message delivery ratio while preventing routing trace back attacks. TTA-JN- C1546 Deleting Secret Data with Public Verifiability Existing software-based data erasure programs can be summarized as following the same one- bit-return protocol: the deletion program performs data erasure and returns either success or failure. However, such a onebit- return protocol turns the data deletion system into a black box – the user has to trust the outcome but cannot easily verify it. This is especially problematic when the deletion program is encapsulated within a Trusted Platform Module (TPM), and the user has no access to the code inside. In this paper, we present a cryptographic solution that aims to make the data deletion process more transparent and verifiable. In contrast to the conventional black/white assumptions about TPM (i.e., either completely trust or distrust), we introduce a third assumption that sits in between: namely, “trust-but-verify”. Our solution enables a user to verify the correct implementation of two important operations IEEE 2015
  • 36.
    inside a TPMwithout accessing its source code: i.e., the correct encryption of data and the faithful deletion of the key. Finally, we present a proof-of-concept implementation of the SSE system on a resource-constrained Java card to demonstrate its practical feasibility. To our knowledge, this is the first systematic solution to the secure data deletion problem based on a “trust-but-verify” paradigm, together with a concrete prototype implementation. TTA-JN- C1547 Design and Evaluation of the Optimal Cache Allocation for Content- Centric Networking Content-Centric Networking (CCN) is a promising framework to rebuild the Internet’s forwarding substrate around the concept of content. CCN advocates ubiquitous in- network caching to enhance content delivery and thus each router has storage space to cache frequently requested content. In this work, we focus on the cache allocation problem, namely, how to distribute the cache capacity across routers under a constrained total storage budget for the network. We first formulate this problem as a content placement problem and obtain the optimal solution by a two-step method. We then propose a suboptimal heuristic method based on node centrality, which is more practical in dynamic networks with frequent content publishing. We investigate through simulations the factors that affect the optimal cache allocation, and perhaps more importantly we use a real-life Internet topology and video access logs from a large scale Internet video provider to evaluate the performance of various cache allocation methods. We observe that network topology and content popularity are two important factors that affect where exactly should cache capacity be placed. Further, the heuristic method comes with only a very limited performance penalty compared to the optimal allocation. Finally, using our findings, we provide recommendations for network operators on the best deployment IEEE 2015
  • 37.
    of CCN cachescapacity over routers. TTA-JN- C1548 Designing High Performance Web- Based Computing Services to Promote Telemedicine Database Management System Many web computing systems are running real time database services where their information change continuously and expand incrementally. In this context, web data services have a major role and draw significant improvements in monitoring and controlling the information truthfulness and data propagation. Currently, web telemedicine database services are of central importance to distributed systems. However, the increasing complexity and the rapid growth of the real world healthcare challenging applications make it hard to induce the database administrative staff. In this paper, we build an integrated web data services that satisfy fast response time for large scale Tele- health database management systems. Our focus will be on database management with application scenarios in dynamic telemedicine systems to increase care admissions and decrease care difficulties such as distance, travel, and time limitations. We propose three-fold approach based on data fragmentation, database websites clustering and intelligent data distribution. This approach reduces the amount of data migrated between websites during applications' execution; achieves cost-effective communications during applications' processing and improves applications' response time and throughput. The proposed approach is validated internally by measuring the impact of using our computing services' techniques on various performance features like communications cost, response time, and throughput. The external validation is achieved by comparing the performance of our approach to that of other techniques in the literature. The results show that our integrated approach significantly improves the performance of web database systems and outperforms its counterparts. IEEE 2015
  • 38.
    TTA-JN- C1549 Distributed Database Management Techniques forWireless Sensor Networks In sensor networks, the large amount of data generated by sensors greatly influences the lifetime of the network. To manage this amount of sensed data in an energy-efficient way, new methods of storage and data query are needed. In this way, the distributed database approach for sensor networks is proved as one of the most energy-efficient data storage and query techniques. This paper surveys the state of the art of the techniques used to manage data and queries in wireless sensor networks based on the distributed paradigm. A classification of these techniques is also proposed. The goal of this work is not only to present how data and query management techniques have advanced nowadays, but also show their benefits and drawbacks, and to identify open issues providing guidelines for further contributions in this type of distributed architectures. IEEE 2015 TTA-JN- C1550 Diversifying Web Service Recommendation Results via Exploring Service Usage History The last decade has witnessed a tremendous growth of Web services as a major technology for sharing data, computing resources, and programs on the Web. With the increasing adoption and presence of Web services, design of novel approaches for effective Web service recommendation to satisfy users’ potential requirements has become of paramount importance. Existing Web service recommendation approaches mainly focus on predicting missing QoS values of Web service candidates which are interesting to a user using collaborative filtering approach, content-based approach, or their hybrid. These recommendation approaches assume that recommended Web services are independent to each other, which sometimes may not be true. As a result, many similar or redundant Web services may exist in a recommendation list. In this paper, we IEEE 2015
  • 39.
    propose a novelWeb service recommendation approach incorporating a user’s potential QoS preferences and diversity feature of user interests on Web services. User’s interests and QoS preferences on Web services are first mined by exploring the Web service usage history. Then we compute scores of Web service candidates by measuring their relevance with historical and potential user interests, and their QoS utility. We also construct a Web service graph based on the functional similarity between Web services. Finally, we present an innovative diversity- aware Web service ranking algorithm to rank the Web service candidates based on their scores, and diversity degrees derived from the Web service graph. Extensive experiments are conducted based on a real world Web service dataset, indicating that our proposed Web service recommendation approa ch significantly improves the quality of their commendation results compared with existing methods. TTA-JN- C1551 DROPS Division and Replication of Data in Cloud for Optimal Performance and Security Outsourcing data to a third-party administrative control, as is done in cloud computing, gives rise to security concerns. The data compromise may occur due to attacks by other users and nodes within the cloud. Therefore, high security measures are required to protect data within the cloud. However, the employed security strategy must also take into account the optimization of the data retrieval time. In this paper, we propose Division and Replication of Data in the Cloud for Optimal Performance and Securi ty (DROPS) that collectively approaches the security and performance issues. In the DROPS methodology, we divide a file into fragments, and replicate the fragmented data over the cloud nodes. Each of the nodes stores only a single fragment of a particular data file that ensures that even in IEEE 2015
  • 40.
    case of asuccessful attack, no meaningful information is revealed to the attacker. Moreover, the nodes storing the fragments, are separated with certain distance by means of graph T-coloring to prohibit an attacker of guessing the locations of the fragments. Furthermore, the DROPS methodology does not rely on the traditional cryptographic techniques for the data security; thereby relieving the system of computationally expensive methodologies. We show that the probability to locate and compromise all of the nodes storing the fragments of a single file is extremely low. We also compare the performance of the DROPS methodology with ten other schemes. The higher level of security with slight performance overhead was observed. TTA-JN- C1552 Dynamic Bin Packing for On-Demand Cloud Resource Allocation Dynamic Bin Packing (DBP) is a variant of classical bin packing, which assumes that items may arrive and depart at arbitrary times. Existing works on DBP generally aim to minimize the maximum number of bins ever used in the packing. In this paper, we consider a new version of the DBP problem, namely, the MinTotal DBP problem which targets at minimizing the total cost of the bins used over time. It is motivated by the request dispatching problem arising from cloud gaming systems. We analyze the competitive ratios of the modified versions of the commonly used First Fit, Best Fit, and Any Fit packing(the family of packing algorithms that open a new bin only when no currently open bin can accommodate the item to be packed) algorithms for the MinTotal DBP problem. We show that the competitive ratio of Any Fit packing cannot be better than + 1, where is the ratio of the maximum item duration to the minimum item duration. The competitive ratio of Best Fit packing is not bounded for any given. For First Fit packing, if all the item sizes are smaller than 1 of the bin capacity (> 1 is a constant), the competitive ratio has an upper bound of �1 + 3 �1 + 1. For the general case, IEEE 2015
  • 41.
    the competitive ratioof First Fit packing has an upper bound of 2 + 7. We also propose a Hybrid First Fit packing algorithm that can achieve a competitive ratio no larger than 5 4 + 19 4 when is not known and can achieve a competitive ratio no larger than + 5 when is known. TTA-JN- C1553 Location-Aware and Personalized Collaborative Filtering for Web Service Recommendation Collaborative Filtering (CF) is widely employed for making Web service recommendation. CF- based Web service recommendation aims to predict missing QoS (Quality-of-Service) values of Webservices. Although several CF- based Web service QoS prediction methods have been proposed in recent years, the performance still needs significant improvement. Firstly, existing QoS prediction methods seldom consider personalized influence of users and services when measuring the similarity between users and between services. Secondly, Web service QoS factors, such as response time and throughput, usually depends on the locations of Web services and users. However, existing Webservice QoS prediction methods seldom took this observation into consideration. In this paper, we propose a location-aware personalized CF method for Web service recommendation. The proposed method leverages both locations of users and Web services when selecting similar neighbors for the target user or service. The method also includes an enhanced similarity measurement for users andWeb services, by taking into account the personalized influence of them. To evaluate the performance of our proposed method, we conduct a set of comprehensive experiments using a real- world Webservice dataset. The experimental results indicate that our approach improves the QoS prediction accuracy and computational efficiency significantly, compared to previous CF-based methods. IEEE 2015
  • 42.
    TTA-JN- C1554 Location-Based Key Management Strong AgainstInsider Threats in Wireless Sensor Networks To achieve secure communications in wireless sensor networks (WSNs), sensor no des (SNs) must establish secret shared keys with neighboring nodes. Moreover, those keys must be updated by defeating the insider threats of corrupted nodes. In this paper, we propose a location- based key management scheme for WSNs, with special considerations of insider threats. After reviewing existing location- based key management schemes and studying their advantages and disadvantages, we selected location- dependent key management (LDK) as a suitable scheme for our study. To solve a communication interference problem in LDK and similar methods, we have devised a new key revision process that incorporates grid-based location information. We also propose a key establishment process using grid information. Furthermore, we construct key update and revocation processes to effectively resist inside attackers. For analysis, we conducted a rigorous simulation and confirmed that our method can increase connectivity while decreasing the compromise ratio when the minimum number of common keys required for key establishment is high. When there was a corrupted node leveraging insider threats, it was also possible to effectively rekey every SN except for the corrupted node using our method. Finally, the hexagonal deployment of anchor nodes could reduce network costs. IEEE 2015 TTA-JN- C1555 Malware Propagation in Large-Scale Networks Malware is pervasive in networks, and poses a critical threat to network security. However, we have very limited understanding of malware behavior in networks to date. In this paper, we investigate how malware propagates in networks from a global perspective. We formulate the problem, and establish a rigorous two layer epidemic model for malware propagation from network to netw ork. Based on the proposed model, our analysis indicates that the distribution of a IEEE 2015
  • 43.
    given malware followsexponential distribution, power law distribution with a short exponential tail, and power law distribution at its early, late and final stages, respectively. Extensive experiments have been performed through two real-world global scale malware data sets, and the results confirm our theoretical findings. TTA-JN- C1556 Optimal Cloudlet Placement and User to Cloudlet Allocation in Wireless Metropolitan Area Networks Mobile applications are becoming increasingly computation-intensive, while the computing capability of portable mobile devices is limited. A powerful way to reduce the completion time of an application in a mobile device is to offload its tasks to nearby cloudlets, which consist of clusters of computers. Although there is a significant body of research in mobile cloudlet offloading technology, there has been very little attention paid to how cloudlets should be placed in a given network to optimize mobile application performance. In this paper we study cloudlet placement and mobile user allocation to the cloudlets in a wireless metropolitan area network (WMAN) . We devise an algorithm for the problem, which enables the placement of the cloudlets at user dense regions of the WMAN, and assigns mobile users to the placed cloudlets while balancing their workload. We also conduct experiments through simulation. The simulation results indicate that the performance of the proposed algorithm is very promising. IEEE 2015 TTA-JN- C1557 Predistribution Scheme for Establishing Group Keys in Wireless Sensor Networks Wireless sensor networks (WSNs). This is because sensor nodes are limited in memory storage and computational power. In 1992, Blundo et al. proposed a non interactive group key establishment scheme using a multivariate polynomial. Their scheme can establish a group key of m sensors. Since each share is a polynomial involving m - 1 variables and having degree k, each sensor needs to store (k + 1)m- 1 coefficients from GF(p), which is IEEE 2015
  • 44.
    exponentially proportional tothe size of group. This makes their scheme only practical when m = 2 for peer-to-peer communication. So far, most existing predistribution schemes in WSNs establish pair wise keys for sensor nodes. In this paper, we propose a novel design to propose a predistribution scheme for establishing group keys in WSNs. Our design uses a special-type multivariate polynomial in ZN, where N is a RSA modulus. The advantage of using this type of multivariate polynomial can limit the storage space of each sensor to be m(k + 1), which is linearly proportional to the size of group communication. In addition, we prove the security of the proposed scheme and show that the computational complexity of the proposed scheme is efficient. TTA-JN- C1558 Privacy-Preserving Detection of Sensitive Data Exposure Statistics from security firms, research institutions and government organizations show that the number of data-leak instances have grown rapidly in recent years. Among various data-leak cases, human mistakes are one of the main causes of data loss. There exist solutions detecting inadvertent sensitive data leaks caused by human mistakes and to provide alerts for organizations. A common approach is to screen content in storage and transmission for exposed sensitive information. Such an approach usually requires the detection operation to be conducted in secrecy. However, this secrecy requirement is challenging to satisfy in practice, as detection servers may be compromised or outsourced. In this paper, we present a privacy- preserving data-leak detection (DLD) solution to solve the issue where a special set of sensitive data digests is used in detection. The advantage of our method is that it enables the data owner to safely delegate the detection operation to a semi honest provider without revealing the sensitive data to the provider. We describe how Internet service providers can offer their customers DLD as an add-on service with strong privacy guarantees. IEEE 2015
  • 45.
    The evaluation resultsshow that our method can support accurate detection with very small number of false alarms under various data-leak scenarios. TTA-JN- C1559 Providing Privacy- Aware Incentives in Mobile Sensing Systems Mobile sensing relies on data contributed by users through their mobile device (e.g., smart phone) to obtain useful information about people and their surroundings. However, users may not want to contribute due to lack of incentives and concerns on possible privacy leakage. To effectively promote user participation, both incentive and privacy issues should be addressed. Although incentive and privacy have been addressed separately in mobile sensing, it is still an open problem to address them simultaneously. In this paper, we propose two credit-based privacy- aware incentive schemes for mobile sensing systems, where the focus is on privacy protection instead of on the design of incentive mechanisms. Our schemes enable mobile users to earn credits by contributing data without leaking which data they have contributed, and ensure that malicious users cannot abuse the system to earn unlimited credits. Specifically, the first scheme considers scenarios where an online trusted third party (TTP) is available, and relies on the TTP to protect user privacy and prevent abuse attacks. The second scheme considers scenarios where no online TTP is available. It applies blind signature, partially blind signature, and a novel extended Merkle tree technique to protect user privacy and prevent abuse attacks. Security analysis and cost evaluations show that our schemes are secure and efficient. IEEE 2015 TTA-JN- C1560 Response Time Based Optimal Web Service Selection Selecting an optimal web service among a list of functionally equivalent web services still remains a challenging issue. For Internet services, the presence of low- performance servers, high latency or overall poor service quality can translate into lost IEEE 2015
  • 46.
    sales, user frustration,and customers lost. In this paper, we propose a novel method for QoS metrification based on Hidden Markov Models (HMM), which further suggests an optimal path for the execution of user requests. The technique we show can be used to measure and predict the behavior of Web Services in terms of response time, and can thus be used to rank services quantitatively rather than just qualitatively. We demonstrate the feasibility and usefulness of our methodology by drawing experiments on real world data. The results have shown how our proposed method can help the user to automatically select the most reliable Web Servicetaking into account several metrics, among them, system predictability and response time variability. Later ROC curve shows a 12 percent improvement in prediction accuracy using HMM. TTA-JN- C1561 Robust cloud management of MANET checkpoint sessions In a traditional mobile ad-hoc network (MANET), if two nodes are engaged in a session and one of them departs suddenly, their communication is aborted. The session is not active any more, work is lost and, consequently, the energy of the batteries has been wasted. This paper proposes a model that uses a cloud service to register, save, pause and resume sessions between MANET member nodes so that both work in progress and energy are saved. A checkpoint technique is introduced to capture the progress of a session and allow it to be resumed. This is an additional service to our cloud management of the MANET. The model proposed in this paper was tested on Android-based devices and an Amazon cloud instance. Experimental results show that the model is feasible, robust, saves time and, more importantly, energy if session breaks occur frequently. IEEE 2015
  • 47.
    TTA-JN- C1562 Secure Anonymous Key DistributionScheme for Smart Grid To fully support information management among various stakeholders in smart grid domains, how to establish secure communication sessions has become an important issue for smart grid environments. In order to support secure communications between smart meters and service providers, key management for authentication becomes a crucial security topic. Recently, several key distribution schemes have been proposed to provide secure communications for smart grid. However, these schemes do not support smart meter anonymity and possess security weaknesses. This paper utilizes an identity-based signature scheme and an identity-based encryption scheme to propose a newanonymous key distribution scheme for sm art grid environments. In the proposed scheme, a smart meter can anonymously access services provided by service providers using one private key without the help of the trusted anchor during authentication. In addition, the proposed scheme requires only a few of computation operations at the smart meter side. Security analysis is conducted to prove the proposed scheme is secure under random oracle model. IEEE 2015 TTA-JN- C1563 Secure Data Aggregation Technique for Wireless Sensor Networks in the Presence of Collusion Attacks Due to limited computational power and energy resources, aggregation of data from multiple sensor nodes done at the aggregating node is usually accomplished by simple methods such as averaging. However such aggregation is known to be highly vulnerable to node compromising attacks. Since WSN are usually unattended and without tamper resistant hardware, they are highly susceptible to such attacks. Thus, ascertaining trustworthiness of data and reputation of sensor nodes is crucial for WSN. As the performance of very low power processors dramatically improves, future aggregator nodes will be capable of performing more sophisticated data aggregation algorithms, thus making WSN less vulnerable. Iterative IEEE 2015
  • 48.
    filtering algorithms holdgreat promise for such a purpose. Such algorithms simultaneously aggregate data from multiple sources and provide trust assessment of these sources, usually in a form of corresponding weight factors assigned to data provided by each source. In this paper we demonstrate that several existing iterative filtering algorithms, while significantly more robust against collusion attacks than the simple averaging methods, are nevertheless susceptive to a novel sophisticated collusion attack we introduce. To address this security issue, we propose an improvement for iterative filtering techniques by providing an initial approximation for such algorithms which makes them not only collusion robust, but also more accurate and faster converging. TTA-JN- C1564 Secure Distributed Deduplication Systems with Improved Reliability Data deduplication is a technique for eliminating duplicate copies of data, and has been widely used in cloud storage to reduce storage space and upload bandwidth. However, there is only one copy for each file stored in cloud even if such a file is owned by a huge number of users. As a result, deduplication system improves storage utilization while reducing reliability. Furthermore, the challenge of privacy for sensitive data also arises when they are outsourced by users to cloud. Aiming to address the above security challenges, this paper makes the first attempt to formalize the notion of distributed reliable deduplication system. We propose new distributed deduplication systems with higher reliability in which the data chunks are distributed across multiple cloud servers. The security requirements of data confidentiality and tag consistency are also achieved by introducing a deterministic secret sharing scheme in distributed storage systems, instead of using convergent encryption as in previous deduplication systems. Security analysis demonstrates that our deduplication systems are secure in terms IEEE 2015
  • 49.
    of the definitionsspecified in the proposed security model. As a proof of concept, we implement the proposed systems and demonstrate that the incurred overhead is very limited in realistic environments. TTA-JN- C1565 TEES An Efficient Search Scheme over Encrypted Data on Mobile Cloud Cloud storage provides a convenient, massive, and scalable storage at low cost, but data privacy is a major concern that prevents users from storing files on the cloud trustingly. One way of enhancing privacy from data owner point of view is to encrypt the files before outsourcing them onto the cloud and decrypt the files after downloading them. However, data encryption is a heavy overhead for the mobile devices, and data retrieval process incurs a complicated communication between the data user and cloud. Normally with limited bandwidth capacity and limited battery life, these issues introduce heavy overhead to computing and communication as well as a higher power consumption for mobile device users, which makes the encrypted search over mobile cloud very challenging. In this paper, we propose TEES (Traffic and Energy saving Encrypted Search), a bandwidth and energy efficient encrypted search architecture over mobile cloud. The proposed architecture offloads the computation from mobile devices to the cloud, and we further optimize the communication between the mobile clients and the cloud. It is demonstrated that the data privacy does not degrade when the performance enhancement methods are applied. Our experiments show that TEES reduces the computation time by 23% to 46% and save the energy consumption by 35% to 55% per file retrieval, meanwhile the network traffics during the file retrievals are also significantly reduced. IEEE 2015 TTA-JN- C1566 Transparent Real-Time Task Scheduling on Temporal Resource Partitions The Hierarchical Real- Time Scheduling (HiRTS) technique helps improve overall resource utilization in real- IEEE 2015
  • 50.
    time embedded systems.With HiRTS, a computation resource is divided into a group of temporal resource partitions, each of which accommodates multiple real-time tasks. Besides the computationresource partitioning problem, real -time task scheduling on resource partitions is also a major problem of HiRTS. The existing scheduling techniques for dedicated resources, like schedulability tests and utilization bounds, are unable to work without changes on temporal resource partitions in most cases. In this paper, we show how to achieve maximal transparency for task scheduling on Regular Partitions, a type of resource partition introduced by the Regularity-based Resource Partition (RRP) Model. We show that several classes of real- time scheduling problems on a regular partition can be transformed into equivalent problems on a dedicated single resource, such that comprehensive single-resource scheduling techniques provide optimal solutions. Furthermore, this transformation method could be applied to different types of real-time tasks such as periodic tasks, sporadic tasks and a periodic tasks. TTA-JN- C1567 User-Defined Privacy Grid System for Continuous Location- Based Services Location-based services (LBS) require users to continuously report their location to a potentially untrusted server to obtain services based on their location, which can expose them to privacy risks. Unfortunately, existing privacy-preserving techniques for LBS have several limitations, such as requiring a fully-trusted third party, offering limited privacy guarantees and incurring high communication overhead. In this paper, we propose a user- defined privacy grid system called dynamic grid system (DGS); the first holistic system that fulfills four essential requirements for privacy-preserving snapshot and continuous LBS. (1) The system only IEEE 2015
  • 51.
    requires a semi-trustedthird party, responsible for carrying out simple matching operations correctly. This semi-trusted third party does not have any information about a user's location. (2) Secure snapshot and continuous location privacy is guaranteed under our defined adversary models. (3) The communication cost for the user does not depend on the user's desired privacy level; it only depends on the number of relevant points of interest in the vicinity of the user. (4) Although we only focus on range and k- nearest-neighbor queries in this work, our system can be easily extended to support other spatial queries without changing the algorithms run by the semi-trusted third party and the database server, provided the required search area of a spatial query can be abstracted into spatial regions. Experimental results show that our DGS is more efficient than the state- of-the-art privacy-preserving technique for continuous LBS. TTA-JN- C1568 VoteTrust Leveraging Friend Invitation Graph to Defend against Social Network Sybils Online social networks (OSNs) suffer from the creation of fake accounts that introduce fake product reviews, malware and spam. Existing defenses focus on using the social graph structure to isolate fakes. However, our work shows that Sybils could befriend a large number of real users, invalidating the assumption behind social- graph-based detection. In this paper, we present VoteTrust, a scalable defense system that further leverages user-level activities. VoteTrust models the friend invitation interactions among users as a directed, signed graph, and uses two key mechanisms to detect Sybilsover the graph: a voting-based Sybil detection to find Sybils that users vote to reject, and a Sybil community detection to find other colluding Sybils around identified Sybils. Through evaluating on Renren social network, we show that VoteTrust is able to prevent Sybils from generating many unsolicited friend requests. We also deploy VoteTrust in Renen, and our IEEE 2015
  • 52.
    real experience demonstrates thatVoteTrust can detect large-scale collusion among Sybils. DOMAIN : DATA MINING TTA-DD- C1501 CrowdOp Query Optimization for Declarative Crowdsourcing Systems We study the query optimization problem in declarative crowdsourcing systems. Declarat ivecrowdsourcing is designed to hide the complexities and relieve the user of the burden of dealing with the crowd. The user is only required to submit an SQL-like query and the system takes the responsibility of compiling the query, generating the execution plan and evaluating in the crowd sourcing marketplace. A given query can have many alternative execution plans and the difference in crowd sourcing cost between the best and the worst plans may be several orders of magnitude. Therefore, as in relational database systems, query optimization is important to crowd sourcing systems that provide declarative query interfaces. In this paper, we propose CROWDOP, a cost- based query optimization approach for declarative crowd sourcing systems. CROWDOP considers both cost and latency in query optimization objectives and generates query plans that provide a good balance between the cost and latency. We develop efficient algorithms in the CROWDOP for optimizing three types of queries: selection queries, join queries, and complex selection-join queries. We validate our approach via extensive experiments by simulation as well as with the real crowd on Amazon Mechanical Turk. IEEE 2015 TTA-DD- C1502 Time-Series Classification with COTE The Collective of Transformation-Based Ensembles Recently, two ideas have been explored that lead to more accurate algorithms for time- series classification (TSC). First, it has been shown that the simplest way to gain improvement on TSC problems is to transform into an alternative data space where discriminatory features are more easily IEEE 2015
  • 53.
    detected. Second, itwas demonstrated that with a single data representation, improved accuracy can be achieved through simple ensemble schemes. We combine these two principles to test the hypothesis that forming a collective of ensembles of classifiers on different data transformations improves the accuracy of time-series classification. The collective contains classifiers constructed in the time, frequency, change, and shapelet transformation domains. For the time domain, we use a set of elastic distance measures. For the other domains, we use a range of standard classifiers. Through extensive experimentation on 72 datasets, including all of the 46 UCR datasets, we demonstrate that the simple collective formed by including all classifiers in one ensemble is significantly more accurate than any of its components and any other previously published TSC algorithm. We investigate alternative hierarchical collective structures and demonstrate the utility of the approach on a new problem involving classifying Caenorhabditiselegans mutant types. TTA-DD- C1503 PruDent A Pruned and Confident Stacking Approach for Multi- label Classification Over the past decade or so, several research groups have addressed the problem of multi- label classification where each example can belong to more than one class at the same time. A common approach, called Binary Relevance (BR), addresses this problem by inducing a separate classifier for each class. Research has shown that this framework can be improved if mutual class dependence is exploited: an example that belongs to class X is likely to belong also to class Y ; conversely, belonging to X can make an example less likely to belong to Z. Several works sought to model this information by using the vector of class labels as additional example attributes. To fill the unknown values of these attributes during prediction, existing methods resort to using outputs of other classifiers, and this makes them prone to errors. This is where our paper IEEE 2015
  • 54.
    wants to contribute.We identified two potential ways to prune unnecessary dependencies and to reduce error-propagation in our new classifier-stacking technique, which is named PruDent. Experimental results indicate that the classification performance of PruDent compares favorably with that of other state-of-the-art approaches over a broad range of testbeds. Moreover, its computational costs grow only linearly in the number of classes. TTA-DD- C1504 Raw Wind Data Preprocessing A Data- Mining Approach Wind energy integration research generally relies on complex sensors located at remote sites. The procedure for generating high-level synthetic information from databases containing large amounts of low- level data must therefore account for possible sensor failures and imperfect input data. The data input is highly sensitive to data quality. To address this problem, this paper presents an empirical methodology that can efficiently preprocess and filter the raw wind data using only aggregated active power output and the corresponding wind speed values at the wind farm. First, raw wind data properties are analyzed, and all the data are divided into six categories according to their attribute magnitudes from a statistical perspective. Next, the weighted distance, a novel concept of the degree of similarity between the individual objects in the wind database and the local outlier factor (LOF) algorithm, is incorporated to compute the outlier factor of every individual object, and this outlier factor is then used to assess which category an object belongs to. Finally, the methodology was tested successfully on the data collected from a large wind farm in northwest China. IEEE 2015 TTA-DD- C1505 Removing DUST Using Multiple Alignment of Sequences A large number of URLs collected by web crawlers correspond to pages with duplicate or near-duplicate contents. To crawl, store, and use such duplicated data implies a waste of resources, the building of low quality rankings, and poor user experiences. To deal with this IEEE 2015
  • 55.
    problem, several studieshave been proposed to detect and remove duplicate documents without fetching their contents. To accomplish this, the proposed methods learn normalization rules to transform all duplicate URLs into the same canonical form. A challenging aspect of this strategy is deriving a set of general and precise rules. In this work, we present DUSTER, a new approach to derive quality rules that take advantage of a multi- sequence alignment strategy. We demonstrate that a full multi-sequence alignment of URLs with duplicated content, before the generation of the rules, can lead to the deployment of very effective rules. By evaluating our method, we observed it achieved larger reductions in the number of duplicate URLs than our best baseline, with gains of 82 and 140.74 percent in two different web collections. TTA-DD- C1506 Keyword Extraction and Clustering for Document Recommendation in Conversations This paper addresses the problem of keyword extraction from conversations, with the goal of using these keywords to retrieve, for each short conversation fragment, a small number of potentially relevant documents, which can be recommended to participants. However, even a short fragment contains a variety of words, which are potentially related to several topics; moreover, using an automatic speech recognition (ASR) system introduces errors among them. Therefore, it is difficult to infer precisely the information needs of the conversation participants. We first propose an algorithm to extract keywords from the output of an ASR system (or a manual transcript for testing), which makes use of topic modeling techniques and of a sub modular reward function which favors diversity in the keyword set, to match the potential diversity of topics and reduce ASR noise. Then, we propose a method to derive multiple topically separated queries from this keyword set, in order to maximize the chances of making at least one relevant recommendation when using these IEEE 2015
  • 56.
    queries to searchover the English Wikipedia. The proposed methods are evaluated in terms of relevance with respect to conversation fragments from the Fisher, AMI, and ELEA conversational corpora, rated by several human judges. The scores show that our proposal improves over previous methods that consider only word frequency or topic similarity, and represents a promising solution for a document recommender system to be used in conversations. TTA-DD- C1507 An Internal Intrusion Detection and Protection System by Using Data Mining and Forensic Techniques Currently, most computer systems use user IDs and passwords as the login patterns to authenticate users. However, many people share their login patterns with coworkers and request these coworkers to assist co-tasks, thereby making the pattern as one of the weakest points of computer security. Insider attackers, the valid users of a system who attack the system internally, are hard to detect since most intrusion detection systems and firewalls identify and isolate malicious behaviors launched from the outside world of the system only. In addition, some studies claimed that analyzing system calls (SCs) generated by commands can identify these commands, with which to accurately detect attacks, and attack patterns are the features of an attack. Therefore, in this paper, a security system, named the Internal Intrusion Detection and Protection Sys tem (IIDPS), is proposed to detect insider attacks at SC level by using data mining and forensic techniq ues. The IIDPS creates users' personal profiles to keep track of users' usage habits as their forensic features and determines whether a valid login user is the account holder or not by comparing his/her current computer usage behaviors with the patterns collected in the account holder's personal profile. The experimental results demonstrate that the IIDPS's user identification accuracy is 94.29%, whereas the response time is less than 0.45 s, implying that it can prevent a IEEE 2015
  • 57.
    protected system frominsider attacks effectively and efficiently. TTA-DD- C1508 A Critical-time-point Approach to All- departure-time Lagrangian Shortest Paths Given a spatio-temporal network, a source, a destination, and a desired departure time interval, the All- departure- time Lagrangian Shortest Paths (ALSP) problem determines a set which includes the shortest path for every departure time in the given interval. ALSP is important for critical societal applications such as eco- routing. However, ALSP is computationally challenging due to the non-stationary ranking of the candidate paths across distinct departure-times. Current related work for reducing the redundant work, across consecutive departure-times sharing a common solution, exploits only partial information e.g., the earliest feasible arrival time of a path. In contrast, our approach uses all available information, e.g., the entire time series of arrival times for all departure-times. This allows elimination of all knowable redundant computation based on complete information available at hand. We operationalize this idea through the concept of critical-time-points (CTP), i.e., departure-times before which ranking among candidate paths cannot change. In our preliminary work, we proposed a CTP based forward search strategy. In this paper, we propose a CTP based temporal bi- directional search for the ALSP problem via a novel impromptu rendezvous termination condition. Theoretical and experimental analysis show that the proposed approach outperforms the related work approaches particularly when there are few critical-time-points. IEEE 2015 TTA-DD- C1509 Co-ClusterD A Distributed Frame work for Data Co-Clustering with Sequential Updates Co-clustering has emerged to be a powerful data mining tool for two- dimensional co-occurrence and dyadic data. However, co-clustering algorithms often require significant computational resources and have been dismissed as impractical for IEEE 2015
  • 58.
    large data sets.Existing studies have provided strong empirical evidence that expectation- maximization (EM) algorithms (e.g., k-means algorithm) with sequential updates can significantly reduce the computational cost without degrading the resulting solution. Motivated by this observation, we introduce sequential updates for alternate minimization co-clustering(AMCC) algorithms which are variants of EM algorithms, and also show that AMCC algorithms with sequential updates converge. We then propose two approaches to parallelize AMCC algorithms with sequential updates in a distributed environment. Both approaches are proved to maintain the convergence properties of AMCC algorithms. Based on these two approaches, we present a new distributed framework, Co-ClusterD, which supports efficient implementations of AMCC algorithms with sequential updates. We design and implement Co-ClusterD, and show its efficiency through two AMCC algorithms: fast nonnegative matrix tri-factorization (FNMTF) and information theoretic co-clustering(ITCC). We evaluate our framework on both a local cluster of machines and the Amazon EC2 cloud. Empirical results show that AMCC algorithms implemented in Co-ClusterD can achieve a much faster convergence and often obtain better results than their traditional concurrent counterparts. TTA-DD- C1511 Differentially Private Frequent Itemset Mining via Transaction Splitting Recently, there has been a growing interest in designing differentially private data mining alg orithms.Frequent itemset mining (FIM) is one of the most fundamental problems in data mining. In this paper, we explore the possibility of designing a differentially private FIM algorithm which can not only achieve high data utility and a high degree of privacy, but also offer high time efficiency. To this end, we propose a differentially private FIM algorithm based on the FP-growth algorithm, which is referred to as PFP-growth. The PFP-growth algorithm IEEE 2015
  • 59.
    consists of apreprocessing phase and a mining phase. In the preprocessing phase, to improve the utility and privacy tradeoff, a novel smart splitting method is proposed to transform the database. For a given database, the preprocessing phase needs to be performed only once. In the mining phase, to offset the information loss caused by transaction splitting, we devise a run-time estimation method to estimate the actual support of item sets in the original database. In addition, by leveraging the downward closure property, we put forward a dynamic reduction method to dynamically reduce the amount of noise added to guarantee privacy during the mining process. Through formal privacy analysis, we show that our PFP-growth algorithm is ε-differentially private. Extensive experiments on real datasets illustrate that our PFP-growth algorithm substantially outperforms the state-of-the-art techniques. TTA-DD- C1512 Efficient Algorithms for Mining Top-K High Utility Itemsets High utility itemsets (HUIs) mining is an emerging topic in data mining, which refers to discovering all itemsets having a utility meeting a user-specified minimum utility threshold min_util. However, setting min_util appropriately is a difficult problem for users. Generally speaking, finding an appropriate minimum utility threshold by trial and error is a tedious process for users. If min_util is set too low, too many HUIs will be generated, which may cause the mining process to be very inefficient. On the other hand, if min_util is set too high, it is likely that no HUIs will be found. In this paper, we address the above issues by proposing a new framework for top- k high utility itemset mining, where k is the desired number of HUIs to be mined. Two types of efficient algorithms named TKU (mining Top-K Utilityitemsets) and TKO (mining Top-K utility itemsets in One phase) are proposed for mining such item sets without the need to set min_util. We provide a structural comparison of the IEEE 2015
  • 60.
    two algorithms withdiscussions on their advantages and limitations. Empirical evaluations on both real and synthetic datasets show that the performance of the proposed algorithms is close to that of the optimal case of state-of-the- art utility mining algorithms. TTA-DD- C1513 k-Nearest Neighbor Classification over Semantically Secure Encrypted Relational Data Data Mining has wide applications in many areas such as banking, medicine, scientific research and among government agencies. Classification is one of the commonly used tasks in data mining applications. For the past decade, due to the rise of various privacy issues, many theoretical and practical solutions to the classification problem have been proposed under different security models. However, with the recent popularity of cloud computing, users now have the opportunity to outsource their data, in encrypted form, as well as the data mining tasks to the cloud. Since the data on the cloud is in encrypted form, existing privacy- preserving classification techniques are not applicable. In this paper, we focus on solving the classification problem over encrypted data. In particular, we propose asecure k-NN classifier over encrypted data in the cloud. The proposed protocol protects the confidentiality of data, privacy of user's input query, and hides the data access patterns. To the best of our knowledge, our work is the first to develop a secure k-NN classifier over encrypted data under the semi-honest model. Also, we empirically analyze the efficiency of our proposed protocol using a real-world dataset under different parameter settings. IEEE 2015 TTA-DD- C1514 Location Aware Keyword Query Suggestion Based on Document Proximity Keyword suggestion in web search helps users to access relevant information without having to know how to precisely express their queries. Existing keyword suggestion techniques do not consider the locations of the users and the query results; i.e., the spatial proximity of a IEEE 2015
  • 61.
    user to theretrieved results is not taken as a factor in the recommendation. However, the relevance of search results in many applications (e.g., location-based services) is known to be correlated with their spatial proximity to the query issuer. In this paper, we design a location- aware keyword query suggestion framework. We propose a weighted keyword- document graph, which captures both the semantic relevance between keyword queries and the spatial distance between the resulting documents and the user location. The graph is browsed in a random-walk-with-restart fashion, to select the keyword queries with the highest scores as suggestions. To make our framework scalable, we propose a partition- based approach that outperforms the baseline algorithm by up to an order of magnitude. The appropriateness of our framework and the performance of the algorithms are evaluated using real data. TTA-DD- C1515 Rank-Based Similarity Search Reducing the Dimensional Dependence This paper introduces a data structure for k- NN search, the Rank Cover Tree (RCT), whose pruning tests rely solely on the comparison of similarity values; other properties of the underlying space, such as the triangle inequality, are not employed. Objects are selected according to their ranks with respect to the query object, allowing much tighter control on the overall execution costs. A formal theoretical analysis shows that with very high probability, the RCT returns a correct query result in time that depends very competitively on a measure of the intrinsic dimensionality of the data set. The experimental results for the RCT show that non-metric pruning strategies for similarity search can be practical even when the representational dimension of the data is extremely high. They also show that the RCT is capable of meeting or exceeding the level of performance of state-of-the-art methods that make use of metric pruning or IEEE 2015
  • 62.
    other selection testsinvolving numerical constraints on distance values. TTA-DD- C1516 RANWAR Rank-Based Weighted Association Rule Mining from Gene Expression and Methylation Data Review selection using microreview Ranking of association rules is currently an interesting topic in data mining and bioinformatics. The huge number of evolved rules of items (or, genes) by association rule mining (ARM) algorithms makes confusion to the decision maker. In this article, we propose a weighted rule- mining technique (say, RANWAR or rank- based weighted association rule-mining) to rank the rules using two novel rule- interestingness measures, viz., rank- based weighted condensed support (wcs) and weighted condensed confidence (wcc) measures to bypass the problem. These measures are basically depended on the rank of items (genes). Using the rank, we assign weight to each item. RANWAR generates much less number of frequent item sets than the state-of-the- art association rule mining algorithms. Thus, it saves time of execution of the algorithm. We run RANWAR on gene expression and methylation datasets. The genes of the top rules are biologically validated by Gene Ontologies (GOs) and KEGG pathway analyses. Many top ranked rules extracted from RANWAR that hold poor ranks in traditional Apriori, are highly biologically significant to the related diseases. Finally, the top rules evolved from RANWAR, that are not in Apriori, are reported. IEEE 2015 TTA-DD- C1517 Text Detection and Recognition in Imagery A Survey Towards Effective Bug Triage with Software Data Reduction Techniques Software companies spend over 45 percent of cost in dealing with software bugs. An inevitable step of fixing bugs is bug triage, which aims to correctly assign a developer to a new bug. To decrease the time cost in manual work, text classification techniques are applied to conduct automatic bug triage. In this paper, we address the problem of data reduction for bug triage, i.e., how to reduce the scale and improve the quality IEEE 2015
  • 63.
    of bug data.We combine instance selection with feature selection to simultaneously reduce data scale on the bug dimension and the word dimension. To determine the order of applying instance selection and feature selection, we extract attributes from historical bug data sets and build a predictive model for a new bug data set. We empirically investigate the performance of data reduction on totally 600,000 bug reports of two large open source projects, namely Eclipse and Mozilla. The results show that our data reduction can effectively reduce the data scale and improve the accuracy ofbug triage. Ourwork provides an approach to leveraging techniques on data processing to form reduced and high- quality bug data in software development and maintenance. TTA-DD- C1518 Towards Open-World Person Re- Identification by One- Shot Group-based Verification Solving the problem of matching people across non-overlapping multi-camera views, known as person reidentification (re-id), has received increasing interests in computer vision. In a real-world application scenario, a watch-list (gallery set) of a handful of known target people are provided with very few (in many cases only a single) image(s) (shots) per target. Existing re-id methods are largely unsuitable to address this open-world re-id challenge because they are designed for (1) a closed- world scenario where the gallery and probe sets are assumed to contain exactly the same people, (2) person-wise identification whereby the model attempts to verify exhaustively against each individual in the gallery set, and (3) learning a matching model using multi- shots. In this paper, a novel transfer local relative distance comparison (t-LRDC) model is formulated to address the open- world person re-identificationproblem by one- shot group-based verification. The model is designed to mine and transfer useful information from a labelled open-world non- target dataset. Extensive experiments demonstrate that the proposed approach IEEE 2015
  • 64.
    outperforms both non-transferlearning and existing transfer learning based re-id methods. TTA-DD- C1519 Improving Accuracy and Robustness of Self- Tuning Histograms by Subspace Clustering In large databases, the amount and the complexity of the data calls for data summarization techniques. Such summaries are used to assist fast approximate query answering or query optimization. Histograms are a prominent class of model- free data summaries and are widely used in database systems. So-called self- tuning histograms look at query-execution results to refine themselves. An assumption with such histograms, which has not been questioned so far, is that they can learn the dataset from scratch, that is-starting with an empty bucket configuration. We show that this is not the case. Self-tuning methods are very sensitive to the initial configuration. Three major problems stem from this. Traditional self-tuning is unable to learn projections of multi-dimensional data, is sensitive to the order of queries, and reaches only local optima with high estimation errors. We show how to improve a self-tuning method significantly by starting with a carefully chosen initial configuration. We propose initialization by dense subspace clusters in projections of the data, which improves bothaccuracy and robustness o f self-tuning. Our experiments on different datasets show that the error rate is typically halved compared to the uninitialized version. IEEE 2015 TTA-JD- C1520 TRIP An Interactive Retrieving-Inferring Data Imputation Approach Data imputation aims at filling in missing attribute values in databases. Most existing imputation methods to string attribute values are inferring-based approaches, which usually fail to reach a high imputation recall by just inferring missing values from the complete part of the data set. Recently, some retrieving- based methods are proposed to retrieve missing values from external resources such as the World Wide Web, which tend to reach a much higher imputation recall, but inevitably IEEE 2015
  • 65.
    bring a largeoverhead by issuing a large number of search queries. In this paper, we investigate the interaction between the inferring-based methods and the retrieving- based methods. We show that retrieving a small number of selected missing values can greatly improve the imputation recall of the inferring-based methods. With this intuition, we propose an interactive Retrieving- Inferring data imPutation approach (TRIP), which performs retrieving and inferring alternately in filling in missing attribute values in a dataset. To ensure the high recall at the minimum cost, TRIP faces a challenge of selecting the least number of missing values for retrieving to maximize the number of inferable values. Our proposed solution is able to identify an optimal retrieving-inferring scheduling scheme in deterministic data imputation, and the optimality of the generated scheme is theoretically analyzed with proofs. We also analyze with an example that the optimal scheme is not feasible to be achieved in τ- constrained stochastic data imputation (τ-SDI), but still, our proposed solution identifies an expected-optimal scheme in τ-SDI. Extensive experiments on four data collections show that TRIP retrieves on average 20 percent missing values and achieves the same high recall that was reached by the retrieving- basedapproach. TTA-JD- C1521 Pattern-Aided Regression Modeling and Prediction Model Analysis This paper first introduces pattern aided regression (PXR) mod els, a new type of regression models designed to represent accurate and interpretable prediction models. This was motivated by two observations: (1) Regression modeling applications often involve complex diverse predictor-response relationships, which occur when the optimal regression models (of given regression model type) fitting two or more distinct logical groups of data are highly different. (2) State-of-the- IEEE 2015
  • 66.
    art regression methodsare often unable to adequately model such relationships. This paper defines PXR models using several patterns and local regression models, which respectively serve as logical and behavioral characterizations of distinct predictor-response relationships. The paper also introduces a contrast pattern aided regression (CPXR) method, to build accurate PXR models. In experiments, the PXR models built by CPXR are very accurate in general, often outperforming state-of-the-art regression methods by big margins. Usually using (a) around seven simple patterns and (b) linear local regression models, those PXR models are easy to interpret; in fact, their complexity is just a bit higher than that of (piecewise) linear regression models and is significantly lower than that of traditional ensemble based regression models. CPXR is especially effective for high-dimensional data. The paper also discusses how to use CPXR methodology for analyzing prediction models and correcting their prediction errors. TTA-JD- C1522 A Set of Complexity Measures Designed for Applying Meta-Learning to Instance Selection In recent years, some authors have approached the instance selection problem from a meta- learning perspective. In their work, they try to find relationships between the performance of some methods from this field and the values of some data- complexity measures, with the aim of determining the best performing method given a data set, using only the values of the measures computed on this data. Nevertheless, most of the data- complexity measures existing in the literature were not conceived for this purpose and the feasibility of their use in this field is yet to be determined. In this paper, we revise the definition of some measures that we presented in a previous work, that were designed for meta- learning based instance selection. Also, we assess them in an experimental study involving three setsof measures, 59 databases, 16 instance selection methods, two classifiers, IEEE 2015
  • 67.
    and eight regressionlearners used as meta- learners. The results suggest that our measures are more efficient and effective than those traditionally used by researchers that have addressed the instance selection from a perspective based on meta-learning. TTA-JD- C1522 Efficient Algorithms for Mining the Concise and Lossless Representation of High Utility Item sets Mining high utility itemsets (HUIs) from databases is an important data mining task, which refers to the discovery of itemsets with high utilities (e.g. high profits). However, it may present too many HUIs to users, which also degrades the efficiency of the mining process. To achieve high efficiency for themining task and provide a concise mining result to users, we propose a novel framework in this paper for mining closed+ high utility itemsets(CHUI s), which serves as a compact and lossless representation of HUIs. We propose three efficient algorithms named AprioriCH (Apriori- based algorithm for miningHigh utility Closed + itemsets), AprioriHC-D (AprioriHC algorithm with Discarding unpromising and isolated items) and CHUD (Closed+ High Utility Itemset Discovery) to find this representation. Further, a method called DAHU (Derive All High Utility Itemsets) is proposed to recover all HUIs from the set of CHUIs without accessing the original database. Results on real and synthetic datasets show that the proposed algorithms are very efficient and that our approaches achieve a massive reduction in the number of HUIs. In addition, when all HUIs can be recovered by DAHU, the combination of CHUD and DAHU outperforms the state-of-the- art algorithms for mining HUIs. IEEE 2015 TTA-JD- C1523 Keyword Extraction and Clustering for Document Recommendation in Conversations This paper addresses the problem of keyword extraction from conversations, with the goal of using these keywords to retrieve, for each short conversation fragment, a small number of potentially relevant IEEE 2015
  • 68.
    documents, which canbe recommended to participants. However, even a short fragment contains a variety of words, which are potentially related to several topics; moreover, using an automatic speech recognition (ASR) system introduces errors among them. Therefore, it is difficult to infer precisely the information needs of the conversation participants. We first propose an algorithm to extract keywords from the output of an ASR system (or a manual transcript for testing), which makes use of topic modeling techniques and of a submodular reward function which favors diversity in the keyword set, to match the potential diversity of topics and reduce ASR noise. Then, we propose a method to derive multiple topically separated queries from this keyword set, in order to maximize the chances of making at least one relevant recommendation when using these queries to search over the English Wikipedia. The proposed methods are evaluated in terms of relevance with respect to conversation fragments from the Fisher, AMI, and ELEA conversational corpora, rated by several human judges. The scores show that our proposal improves over previous methods that consider only word frequency or topic similarity, and represents a promising solution for a document recommender system to be used in conversations. TTA-JD- C1524 Top-k Similarity Join in Heterogeneous Information Networks As a newly emerging network model, heterogeneous infor mation networks (HINs) have received growing attention. Many data mining tasks have been explored in HINs, including clustering, classification, and similarity search. Similarity join is a fundamental operation required for many problems. It is attracting attention from various applications on network data, such as friend recommendation, link prediction, and online advertising. Although similarity join has been well studied in homogeneous networks, it has IEEE 2015
  • 69.
    not yet beenstudied in heterogeneous networks. Especially, none of the existing research on similarity join takes different semantic meanings behind paths into consideration and almost all completely ignore the heterogeneity and diversity of the HINs. In this paper, we propose a path- based similarity join (PS-join) method to return the top k similar pairs of objects based on any user specified join path in a heterogeneous information network. We study how to prune expensive similarity computation by introducing bucket pruning based locality sensitive hashing (BPLSH) indexing. Compared with existing Link-based Similarity join (LS-join) method, PS-join can derive various similarity semantics. Experimental results on real data sets show the efficiency and effectiveness of the proposed approach. TTA-JD- C1525 Active Learning for Ranking through Expected Loss Optimization Learning to rank arises in many data mining applications, ranging from web search engine, online advertising to recommendation system. In learning to rank, the performance of a ranking model is strongly affected by the number of labeled examples in the training set; on the other hand, obtaining labeled examples for training data is very expensive and time- consuming. This presents a great need for the active learning approaches to select most informative examples for ranking learning; however, in the literature there is still very limited work to address active learning for ranking. In this paper, we propose a general active learning framework, expected lo ss optimization (ELO), for ranking. The ELO framework is applicable to a wide range of ranking functions. Under this framework, we derive a novel algorithm, expected discounted cumulative gain (DCG) loss optimization (ELO-DCG), to select most informative examples. Then, we investigate both query and document level active learning for raking and propose a IEEE 2015
  • 70.
    two-stage ELO-DCG algorithmwhich incorporate both query and document selection into active learning. Furthermore, we show that it is flexible for the algorithm to deal with the skewed grade distribution problem with the modification of the loss function. Extensive experiments on real-world web search data sets have demonstrated great potential and effectiveness of the proposed framework and algorithms. TTA-JD- C1526 Relational Collaborative Topic Regression for Recommender Systems Due to its successful application in recommender systems, collaborative filterin g (CF) has become a hot research topic in data mining and information retrieval. In traditional CF methods, only the feedback matrix, which contains either explicit feedback (also called ratings) or implicit feedback on the items given by users, is used for training and prediction. Typically, the feedback matrix is sparse, which means that most users interact with few items. Due to this sparsity problem, traditional CF with only feedback information will suffer from unsatisfactory performance. Recently, many researchers have proposed to utilize auxiliary information, such as item content (attributes), to alleviate the data sparsity problem in CF. Collaborative topic regression (CTR) is one of these methods which has achieved promising performance by successfully integrating both feedback information and item content information. In many real applications, besides the feedback and item content information, there may exist relations (also known as networks) among the items which can be helpful for recommendation. In this paper, we develop a novel hierarchical Bayesian model called Relational Collaborative Topic Regression (RCTR), which extends CTR by seamlessly integrating the user-item feedback information, item content information, and network structure among items into the same model. Experiments on real-world datasets show that our model can achieve better IEEE 2015
  • 71.
    prediction accuracy thanthe state-of-the-art methods with lower empirical training time. Moreover, RCTR can learn good interpretable latent structures which are useful for recommendation. TTA-JD- C1527 Relevance Feature Discovery for Text Mining It is a big challenge to guarantee the quality of discovered relevance features in text document s for describing user preferences because of large scale terms and data patterns. Most existing popular text mining and classification methods have adopted term-based approaches. However, they have all suffered from the problems of polysemy and synonymy. Over the years, there has been often held the hypothesis that pattern-based methods should perform better than term-based ones in describing user preferences; yet, how to effectively use large scale patterns remains a hard problem in text mining. To make a breakthrough in this challenging issue, this paper presents an innovative model for relevance feature discovery. It discovers both positive and negative patterns in text documents as higher level features and deploys them over low-level features (terms). It also classifies terms into categories and updates term weights based on their specificity and their distributions in patterns. Substantial experiments using this model on RCV1, TREC topics and Reuters-21578 show that the proposed model significantly outperforms both the state-of-the-art term-based methods and the pattern based methods. IEEE 2015
  • 72.
    TTA-JD- C1528 Differentially Private Frequent Itemset Miningvia Transaction Splitting Recently, there has been a growing interest in designing differentially private data mining alg orithms.Frequent itemset mining (FIM) is one of the most fundamental problems in data mining. In this paper, we explore the possibility of designing a differentially private FIM algorithm which can not only achieve high data utility and a high degree of privacy, but also offer high time efficiency. To this end, we propose a differentially private FIM algorithm based on the FP-growth algorithm, which is referred to as PFP-growth. The PFP-growth algorithm consists of a preprocessing phase and a mining phase. In the preprocessing phase, to improve the utility and privacy tradeoff, a novel smart splitting method is proposed to transform the database. For a given database, the preprocessing phase needs to be performed only once. In the mining phase, to offset the information loss caused by transaction splitting, we devise a run-time estimation method to estimate the actual support of item sets in the original database. In addition, by leveraging the downward closure property, we put forward a dynamic reduction method to dynamically reduce the amount of noise added to guarantee privacy during the mining process. Through formal privacy analysis, we show that our PFP-growth algorithm is ε-differentially private. Extensive experiments on real datasets illustrate that our PFP-growth algorithm substantially outperforms the state-of-the-art techniques. IEEE 2015 TTA-JD- C1529 Backward Path Growth for Efficient Mobile Sequential Recommendation The problem of mobile sequential recommendation is to suggest a route connecting a set of pick-up points for a taxi driver so that he/she is more likely to get passengers with less travel cost. Essentially, a key challenge of this problem is its high computational complexity. In this paper, we propose a novel dynamic programming based method to solve the mobile sequential recommendation proble m consisting of two separate stages: an offline IEEE 2015
  • 73.
    pre-processing stage andan online search stage. The offline stage pre-computes potential candidate sequences from a set of pick-up points. A backward incremental sequence generation algorithm is proposed based on the identified iterative property of the cost function. Simultaneously, an incremental pruning policy is adopted in the process of sequence generation to reduce the search space of the potential sequences effectively. In addition, a batch pruning algorithm is further applied to the generated potential sequences to remove some non-optimal sequences of a given length. Since the pruning effectiveness keeps growing with the increase of the sequence length, at the online stage, our method can efficiently find the optimal driving route for an unloaded taxi in the remaining candidate sequences. Moreover, our method can handle the problem of optimal route search with a maximum cruising distance or a destination constraint. Experimental results on real and synthetic data sets show that both the pruning ability and the efficiency of our method surpass the state-of-the-art methods. Our techniques can therefore be effectively employed to address the problem of mobile sequential recommendation with many pick-up points in real-world applications. TTA-JD- C1530 Mining Partially- Ordered Sequential Rules Common to Multiple Sequences Sequential rule mining is an important data mining problem with multiple applications. An important limitation of algorithms for mining sequential rules common to multipl e sequences is that rules are very specific and therefore many similar rules may represent the same situation. This can cause three major problems: (1) similar rules can be rated quite differently, (2) rules may not be found because they are individually considered uninteresting, and (3) rules that are too specific are less likely to be used for making predictions. To address these issues, we explore the idea of mining “partially-ordered IEEE 2015
  • 74.
    sequential rules” (POSR),a more general form of sequential rules such that items in the antecedent and the consequent of each rule are unordered. To mine POSR, we propose the RuleGrowth algorithm, which is efficient and easily extendable. In particular, we present an extension (TRuleGrowth) that accepts a sliding-window constraint to find rules occurring within a maximum amount of time. A performance study with four real-life datasets show that RuleGrowth and TRuleGrowth have excellent performance and scalability compared to baseline algorithms and that the number of rules discovered can be several orders of magnitude smaller when the sliding- window constraint is applied. Furthermore, we also report results from a real application showing that POSR can provide much higher prediction accuracy than regular sequential rules for sequence prediction . TTA-JD- C1531 CRoM and HuspExt Improving Efficiency of High Utility Sequential Pattern Extraction High utility sequential pattern mining has been considered as an important research problem and a number of relevant algorithms have been proposed for this topic. The main challenge of high utility sequential pattern mining is that, the search space is large and the efficiency of the solutions is directly affected by the degree at which they can eliminate the candidate patterns. Therefore, the efficiency of any high utility sequential pattern mining solution depends on its ability to reduce this big search space, and as a result, lower the computational complexity of calculating the utilities of the candidate patterns. In this paper, we propose efficient data structures and pruning technique which is based on Cumulated Rest of Match (CRoM) based upper bound. CRoM, by defining a tighter upper bound on the utility of the candidates, allows more conservative pruning before candidate pattern generation in comparison to the existing techniques. In addition, we have developed an efficient IEEE 2015
  • 75.
    algorithm, High UtilitySequential Pattern Extraction (HuspExt), which calculates the utilities of the child patterns based on that of the parents'. Substantial experiments on both synthetic and real datasets from different domains show that, the proposed solution efficiently discovers high utility sequential patterns from large scale datasets with different data characteristics, under low utility thresholds. TTA-JD- C1532 Co-Extracting Opinion Targets and Opinion Words from Online Reviews Based on the Word Alignment Model Mining opinion targets and opinion words fro m online reviews are important tasks for fine- grained opinion mining, the key component of which involves detecting opinion relations among words. To this end, this paper proposes a novel approach based on the partially- supervised alignment model, which regards identifying opinion relations as an alignment process. Then, a graph-based co- ranking algorithm is exploited to estimate the confidence of each candidate. Finally, candidates with higher confidence are extracted as opinion targets or opinion words. Compared to previous methods based on the nearest-neighbor rules, our model captures opinion relations more precisely, especially for long-span relations. Compared to syntax-based methods, our word alignment model effectively alleviates the negative effects of parsing errors when dealing with informal online texts. In particular, compared to the traditional unsupervised alignment model, the proposed model obtains better precision because of the usage of partial supervision. In addition, when estimating candidate confidence, we penalize higher-degree vertices in our graph-based co-ranking algorithm to decrease the probability of error generation. Our experimental results on three corpora with different sizes and languages show that our approach effectively outperforms state-of-the- art methods. IEEE 2015
  • 76.
    TTA-JD- C1533 Global Redundancy Minimization for FeatureRanking Feature selection has been an important research topic in data mining, because the real data sets often have high-dimensional features, such as the bioinformatics and text mining applications. Many existing filter feature selection methods rank features by optimizing certain feature ranking criterions, such that correlated features often have similar rankings. These correlated features are redundant and don't provide large mutual information to help data mining. Thus, when we select a limited number of features, we hope to select the top non-redundant features such that the useful mutual information can be maximized. In previous research, Ding et al. recognized this important issue and proposed the minimum Redundancy Maximum Relevance Feature Selection (mRMR) model to minimize the redundancy between sequentially selected features. However, this method used the greedy search, thus the global feature redundancy wasn't considered and the results are not optimal. In this paper, we propose a new feature selection framework to globally minimize the feature redundancy with maximizing the given feature ranking scores, which can come from any supervised or unsupervised methods. Our new model has no parameter so that it is especially suitable for practical data mining application. Experimental results on benchmark data sets show that the proposed method consistently improves the feature selection results compared to the original methods. Meanwhile, we introduce a new unsupervised global and local discriminative feature selection method which can be unified with the global feature redundancy minimization frame work and shows superior performance. IEEE 2015 TTA-JD- C1534 Review Selection Using Micro-Reviews Given the proliferation of review content, and the fact that reviews are highly diverse and often unnecessarily verbose, users frequently face the problem of selecting the IEEE 2015
  • 77.
    appropriate reviews toconsume. Micro- reviews are emerging as a new type of online review content in the social media. Micro-reviews are posted by users of check-in services such as Foursquare. They are concise (up to 200 characters long) and highly focused, in contrast to the comprehensive and verbose reviews. In this paper, we propose a novel mining problem, which brings together these two disparate sources of review content. Specifically, we use coverage of micro- reviews as an objective for selecting a set of reviews that cover efficiently the salient aspects of an entity. Our approach consists of a two-step process: matching review sentences to micro-reviews, and selecting a small set of reviews that cover as many micro- reviews as possible, with few sentences. We formulate this objective as a combinatorial optimization problem, and show how to derive an optimal solution using Integer Linear Programming. We also propose an efficient heuristic algorithm that approximates the optimal solution. Finally, we perform a detailed evaluation of all the steps of our methodology using data collected from Foursquare and Yelp. TTA-JD- C1535 A Trust Management Scheme Based on Behavior Feedback for Opportunistic Networks In the harsh environment where node density is sparse, the slow-moving nodes cannot effectively utilize the encountering opportunities to realize the self-organized identity authentications, and do not have the chance to join the network routing. However, considering most of the communications in opportunistic networks are caused by forwarding operations, there is no need to establish the complete mutual authentications IEEE 2015
  • 78.
    for each conversation.Accordingly, a novel trust management scheme is presented based on the information of behavior feedback, in order to complement the insufficiency of identity authentications. By utilizing the certificate chains based on social attributes, the mobile nodes build the local certificate graphs gradually to realize the web of “Identity Trust” relationship. Meanwhile, the successors generate Verified Feedback Packets for each positive behavior, and consequently the “Behavior Trust” relationship is formed for slow-moving nodes. Simulation result shows that, by implementing our trust scheme, the delivery probability and trust reconstruction ratio can be effectively improved when there are large numbers of compromised nodes, and it means that our trust management scheme can efficiently explore and filter the trust nodes for secure forwarding inopportunistic networks. TTA-JD- C1536 Extending Association Rule Summarization Techniques to Assess Risk of Diabetes Mellitus Early detection of patients with elevated risk of developing diabetes mellitus is critical to the improved prevention and overall clinical management of these patients. We aim to apply association rule mining to electronic medical records (EMR) to discover sets of risk factors and their corresponding subpopulations that represent patients at particularly high risk of developing diabetes. Given the high dimensionality of EMRs, association rule mining generates a very large set of rules which we need to summarize for easy clinical use. We reviewed four association rule set summarization techniq ues and conducted a comparative evaluation to provide guidance regarding their applicability, strengths and weaknesses. We proposed extensions to incorporate risk of diabetes into the process of finding an optimal summary. We evaluated these modified techniques on a real-world prediabetic patient cohort. We found that all four methods produced summaries that described subpopulations at IEEE 2015
  • 79.
    high risk ofdiabetes with each method having its clear strength. For our purpose, our extension to the Buttom-Up Summarization (BUS) algorithm produced the most suitable summary. The subpopulations identified by this summary covered most high- risk patients, had low overlap and were at very high risk of diabetes. TTA-JD- C1537 A decision-theoretic rough set approach for dynamic data mining Uncertainty and fuzziness generally exist in real-life data. Approximations are employed to describe the uncertain information approximately in rough set theory. Certain and uncertain rules are induced directly from different regions partitioned by approximations. Approximation can further be applied to data-mining-related task, , attribute reduction. Nowadays, different types of data collected from different applications evolve with time, especially new attributes may appear while new objects are added. This paper presents an approach for dynamic maintenance of approximations objects and attributes added simultaneously under the framework of decision-theoretic rough set (DTRS). Equivalence feature vector and matrix are defined first to update approximations of DTRS in different levels of granularity. Then, the information system is decomposed into subspaces, and the equivalence feature matrix is updated in different subspaces incrementally. Finally, the approximations of DTRS are renewed during the process of updating the equivalence feature matrix. Extensive experimental results verify the effectiveness of the proposed methods. IEEE 2015 TTA-JD- C1538 A Joint Segmentation and Classification Framework for Sentence Level Sentiment Classification In this paper, we propose a joint segmentation and classification framew ork for sentence-levelsentiment classification. It is widely recognized that phrasal information is crucial for sentiment classification. However, IEEE 2015
  • 80.
    existing sentiment classificationalgorithms typically split a sentence as a word sequence, which does not effectively handle the inconsistent sentiment polarity between a phrase and the words it contains, such as {“not bad,” “bad”} and {“a great deal of,” “great”}. We address this issue by developing a joint framework for sentence- level sentiment classification. It simultaneously generates useful segmentations and predicts sentence- level polarity based on the segmentation results. Specifically, we develop a candidate generation model to produce segmentation candidates of a sentence; a segmentation ranking model to score the usefulness of a segmentation candidate for sentiment classification; and a classification model for predicting the sentiment polarity of as egmentation. We train the joint framework directly from sentences annotated with only sentiment polarity, without using any syntactic or sentiment annotations in segmentation level. We conduct experiments for sentiment classification on two benchmark datasets: a tweet dataset and a review dataset. Experimental results show that: 1) our method performs comparably with state-of-the-art methods on both datasets; 2) joint modeling segmentation and classificati on outperforms pipelined baseline methods in various experimental settings. TTA-JD- C1539 A Similarity-Based Learning Algorithm Using Distance Transformation Numerous theories and algorithms have been developed to solve vectorial data learning problems by searching for the hypothesis that best fits the observed training sample. However, many real-world applications involve samples that are not described as feature vectors, but as (dis)similarity data. Converting vectorial data into (dis)similarity data is more easily performed than converting (dis)similarity data into vectorial data. This study proposes a IEEE 2015
  • 81.
    stochastic iterative distance transformationmodel for similarity-based learning. The proposed model can be used to identify a clear class boundary in data by modifying the (dis)similarities between examples. The experimental results indicate that the performance of the proposed method is comparable with those of various vector-based and proximity- based learning algorithms. TTA-JD- C1540 Active Learning from Relative Comparisons This work focuses on active learning from relative comparison inf ormation. A relative comparison specifies, for a data triplet (xi, xj, xk), that instance xi is more similar to xj than to xk. Such constraints, when available, have been shown to be useful toward learning tasks such as defining appropriate distance metrics or finding good clustering solutions. In real-world applications, acquiring constraints often involves considerable human effort, as it requires the user to manually inspect the instances. This motivates us to study how to select and query the most useful relative comparisons to achieve effective learning with minimum user effort. Given an underlying class concept that is employed by the user to provide such constraints, we present an information- theoretic criterion that selects the triplet whose answer leads to the highest expected information gain about the classes of a set of examples. Directly applying the proposed criterion requires examining O(n3) triplets with n instances, which is prohibitive even for datasets of moderate size. We show that a randomized selection strategy can be used to reduce the selection pool from O(n3) to O(n) with minimal loss in efficiency, allowing us to scale up to considerably larger problems. Experiments show that the proposed method consistently outperforms baseline policies. IEEE 2015 TTA-JD- C1541 Adaptive Processing for Distributed Skyline Queries over Uncertain Data Query processing over uncertain data has gained growing attention, because it is necessary to deal with uncertain data in many IEEE 2015
  • 82.
    real-life applications. Inthis paper, we investigate skyline queries overuncertain data i n distributed environments (DSUD query) whose research is only in an early stage. The state-of-the-art algorithm, called e-DSUD algorithm, is designed for processing this query. It has the desirable characteristics of progressiveness and minimum bandwidth consumption. However, it still needs to be perfected in three aspects. (1) Progressiveness. Each time it only returns one query result at most. (2) Efficiency. There are a significant amount of redundant I/O cost and numerous iterations which causes a long total query time. (3) Universality. It is restricted to the case where local skylinetuples are incomparability. To address these concerns, we first present a detailed analysis of the e-DSUD algorithm and then develop an improved framework for the DSUD query, namely IDSUD. Based on the new framework, we propose an adaptive algorithm, called ADSUD, for the DSUD query. In the algorithm, we redefine the approximate global skyline probability and choose local representative tuples due to minimum probabilistic bounding rectangle adaptively. Furthermore, we design a progressive pruning method and apply the reuse mechanism to improve its efficiency. The results of extensive experiments verify the better overall performance of our algorithm than the e- DSUD algorithm. TTA-JD- C1542 Adding Geospatial Data Provenance into SDI— A Service-Oriented Approach Geospatial data provenance records the derivation history of a geospatial data product. It is important in evaluating the quality of data products. In a Geospatial Web Service environment where data are often disseminated and processed widely and frequently in an unpredictable way, it is even more important in identifying original data sources, tracing workflows, updating or reproducing scientific results, and evaluating reliability and quality of geospatial data products. Geospatial data pr IEEE 2015
  • 83.
    ovenance has becomea fundamental issue in establishing the spatial data infrastructure (SDI). This paper investigates how to support provenance awareness in SDI. It addresses key issues including provenance modeling, capturing, and sharing in a SDI enabled by interoperable geospatial services. A reference architecture for provenance tracking is proposed, which can accommodate geospatial feature provenance at different levels of granularity. Open standards from ISO, World Wide Web Consortium (W3C), and OGC are leveraged to facilitate the interoperability. At the feature type level, this paper proposes extensions of W3C PROV- XML for ISO 19115 lineage and “Parent Level” provenance registration in the geospatial catalog service. At the feature instance level, light-weight lineage information entities for feature provenance are proposed and managed by Web Feature Services. Experiments demonstrate the applicability of the approach for creating provenance awareness in an interoperable geospatialservice- oriented environment. TTA-JD- C1543 Answering Pattern Queries Using Views Answering queries using views has proven effective for querying relational and semistructured data. This paper investigates this issue for graph pattern queries based on graph simulation. We propose a notion of pattern containment to characterize graph pattern matching using graph pattern vie ws. We show that a pattern query can be answered using a set of views if and only if it is contained in the views. Based on this characterization, we develop efficient algorithms to answer graph pattern queries. We also study problems for determining (minimal, minimum) containment of pattern queries. We establish their complexity (from cubic-time to NP-complete) and provide efficient checking algorithms (approximation when the problem is intractable). In addition, when IEEE 2015
  • 84.
    a pattern queryis not contained in the views, we study maximally contained rewriting to find approximate answers; we show that it is in cubic-time to compute such rewriting, and present a rewriting algorithm. We experimentally verify that these methods are able to efficiently answer pattern queries on large real-world graphs. TTA-JD- C1544 CloudKeyBank Privacy and Owner Authorization Enforced Key Management Framework Explosive growth in the number of passwords for Web based applications and encryption keys for outsourced data storage well exceeds the management limit of users. Therefore, outsourcing keys(including passwords and data encryption keys) to professional password managers (honest-but- curious service providers) is attracting the attention of many users. However, existing solutions in a traditional data outsourcing scenario are unable to simultaneously meet the following three security requirements for keys outsourcing: (1) Confidentiality and privacy of keys; (2) Search privacy on identity attributes tied to keys; (3) Owner controllable authorization over his/her shared keys. In this paper, we propose CloudKeyBank, the first unified key management framework that addresses all the three goals above. Under our framework, the key owner can perform privacy and controllable authorization enforced encryption with minimum information leakage. To implement CloudKey Bank efficiently, we propose a new cryptographic primitive named Searchable Conditional Proxy Re-Encryption (SC-PRE) which combines the techniques of Hidden Vector Encryption (HVE) and Proxy Re-Encryption (PRE) seamlessly, and propose a concrete SC-PRE scheme based on existing HVE and PRE schemes. Our experimental results and security analysis show the efficiency and security goals are well achieved. IEEE 2015
  • 85.
    TTA-JD- C1545 Clustering Deviations for BlackBox Regression Testing of Database Applications Regression tests often result in many deviations (differences between two system versions), either due to changes or regression faults. For the tester to analyze such deviations efficiently, it would be helpful to accurately group them, such that each group contains deviations representing one unique change or regression fault. Because it is unlikely that a general solution to the above problem can be found, we focus our work on a common type of software system: database applications. We investigate the use of clustering, based on database manipulations and test specifications (from test models), to group regression test deviations according to the faults or changes causing them. We also propose assessment criteria based on the concept of entropy to compare alternative clustering strategies. To validate our approach, we ran a large scale industrial case study, and our results show that our clustering approach can indeed serve as an accurate strategy for grouping regression test deviations. Among the four test campaigns assessed, deviations were clustered perfectly for two of them, while for the other two, the clusters were all homogenous. Our analysis suggests that this approach can significantly reduce the effort spent by testers in analyzing regression test deviations, increase their level of confidence, and therefore make regression testing more scalable. IEEE 2015 TTA-JD- C1546 Context-based Collaborative Filtering for Citation Recommendation Citation recommendation is an interesting and significant research area as it solves the information overload in academia by automatically suggesting relevant references for a research paper. Recently, with the rapid proliferation of information technology, research papers are rapidly published in various conferences and journals. This makes citation recommendation a highly important and challenging discipline. In this paper, we propose a IEEE 2015
  • 86.
    novel citation recommendationmethod that uses only easily obtained citation relations as source data. The rationale underlying this method is that, if two citing papers are significantly co-occurring with the same citing paper(s), they should be similar to some extent. Based on the above rationale, an association mining technique is employed to obtain the paper representation of each citing paper from the citation context. Then, these paper representations are pairwise compared to compute similarities between the citing papers for collaborative filtering. We evaluate our proposed method through two relevant real- world data sets. Our experimental results demonstrate that the proposed method significantly outperforms the baseline method in terms of precision, recall, and F1, as well as mean average precision and mean reciprocal rank, which are metrics related to the rank information in the recommendation list. TTA-JD- C1547 Crowdsourcing for Top- K Query Processing over Uncertain Data Querying uncertain data has become a prominent application due to the proliferation of user-generated content from social media and of data streams from sensors. When data ambiguity cannot be reduced algorithmically, crowdsourcing proves a viable approach, which consists in posting tasks to humans and harnessing their judgment for improving the confidence about data values or relationships. This paper tackles the problem of processing top- K queries over uncertain data with the help ofcrowdsourcing for quickly converging to the real ordering of relevant results. Several offline and online approaches for addressing questions to a crowd are defined and contrasted on both synthetic and real data sets, with the aim of minimizing the crowd interactions necessary to find the real ordering of the result set. IEEE 2015 TTA-JD- C1548 Discovering Latent Semantics in Web Documents using Fuzzy Clustering Web documents are heterogeneous and complex. There exists complicated associations within oneweb document and linking to the others. The high interactions IEEE 2015
  • 87.
    between terms indocuments demonstrate vague and ambiguous meanings. Efficient and effective clustering methods to discoverlatent and coherent meanings in context are necessary. This paper presents a fuzzy linguistic topological space along with a fuzzy clustering algorithm to discover the contextual meaning in the web documents. The proposed algorithm extracts features from the web documents using conditional random field methods and builds a fuzzy linguistic topological space based on the associations of features. The associations of cooccurring features organize a hierarchy of connected semantic complexes called “CONCEPTS,” wherein a fuzzy linguistic measure is applied on each complex to evaluate 1) the relevance of a document belonging to a topic, and 2) the difference between the other topics. Web contents are able to be clustered into topics in the hierarchy depending on their fuzzylinguistic measures; web users can further explore the CONCEPTS of web contents accordingly. Besides the algorithm applicability in web text domains, it can be extended to other applications, such as data mining, bioinformatics, content-based, or collaborative information filtering, etc. TTA-JD- C1549 Discovery of Ranking Fraud for Mobile Apps Ranking fraud in the mobile App market refers to fraudulent or deceptive activities which have a purpose of bumping up the Apps in the popularity list. Indeed, it becomes more and more frequent for App developers to use shady means, such as inflating their Apps' sales or posting phony App ratings, to commit ranking fraud. While the importance of preventing ranking fraud has been widely recognized, there is limited understanding and research in this area. To this end, in this paper, we provide a holistic view of ranking fraud and propose a ranking fraud detection system for mobile Apps. Specifically, we first propose IEEE 2015
  • 88.
    to accurately locatethe ranking fraud by mining the active periods, namely leading sessions, of mobile Apps. Such leading sessions can be leveraged for detecting the local anomaly instead of globalanomaly of App rankings. Furthermore, we investigate three types of evidences, i.e., ranking based evidences, rating based evidences and review based evidences, by modeling Apps' ranking, rating and review behaviors through statistical hypotheses tests. In addition, we propose an optimization based aggregation method to integrate all the evidences for fraud detection. Finally, we evaluate the proposed system with real-world App data collected from the iOS App Store for a long time period. In the experiments, we validate the effectiveness of the proposed system, and show the scalability of the detection algorithm as well as some regularity of ranking fraud activities. TTA-JD- C1550 k-Nearest Neighbor Classification over Semantically Secure Encrypted Relational Data Data Mining has wide applications in many areas such as banking, medicine, scientific research and among government agencies. Classification is one of the commonly used tasks in data mining applications. For the past decade, due to the rise of various privacy issues, many theoretical and practical solutions to the classification problem have been proposed under different security models. However, with the recent popularity of cloud computing, users now have the opportunity to outsource their data, in encrypted form, as well as the data mining tasks to the cloud. Since the data on the cloud is in encrypted form, existing privacy- preserving classification techniques are not applicable. In this paper, we focus on solving the classification problem over encrypted data. In particular, we propose asecure k-NN classifier over encrypted data in the cloud. The proposed protocol protects the confidentiality of data, privacy of user's input query, and hides the data access patterns. To the best of our knowledge, our work is the first to develop IEEE 2015
  • 89.
    a secure k-NNclassifier over encrypted data under the semi-honest model. Also, we empirically analyze the efficiency of our proposed protocol using a real-world dataset under different parameter settings. TTA-JD- C1551 Location Aware Keyword Query Suggestion Based on Document Proximity Keyword suggestion in web search helps users to access relevant information without having to know how to precisely express their queries. Existing keyword suggestion techniques do not consider the locations of the users and the query results; i.e., the spatial proximity of a user to the retrieved results is not taken as a factor in the recommendation. However, the relevance of search results in many applications (e.g., location-based services) is known to be correlated with their spatial proximity to the query issuer. In this paper, we design a location- aware keyword query suggestion framework. We propose a weighted keyword- document graph, which captures both the semantic relevance between keyword queries and the spatial distance between the resulting documents and the user location. The graph is browsed in a random-walk-with-restart fashion, to select the keyword queries with the highest scores as suggestions. To make our framework scalable, we propose a partition- based approach that outperforms the baseline algorithm by up to an order of magnitude. The appropriateness of our framework and the performance of the algorithms are evaluated using real data. IEEE 2015 TTA-JD- C1552 Mining Temporal Patterns in Time Interval-based Data Sequential pattern mining is an important subfield in data mining. Recently, applications using time interval-based event data have attracted considerable efforts in discovering patterns from events that persist for some duration. Since the relationship between two intervals is intrinsically complex, how to effectively and efficiently mine interval-based sequences is a IEEE 2015
  • 90.
    challenging issue. Inthis paper, two novel representations, endpoint representation and end time representation, are proposed to simplify the processing of complex relationships among event intervals. Based on the proposed representations, three types of interval-based patterns: temporal pattern, occurrence-probabilistic temporal pattern, and duration-probabilistic temporal pattern, are defined. In addition, we develop two novel algorithms, Temporal Pattern Miner (TPMiner) and Probabilistic Temporal Pattern Miner (P- TPMiner), to discover three types of interval- based sequential patterns. We also propose three pruning techniques to further reduce the search space of the mining process. Experimental studies show that both algorithms are able to find three types of patterns efficiently. Furthermore, we apply proposed algorithms to real datasets to demonstrate the effectiveness and validate the practicability of proposed patterns. TTA-JD- C1553 Multi-Objective Service Composition in Uncertain Environments Web services have the potential to offer the enterprises with the ability to compose internal and external business services in order to accomplish complex processes. Service composition then becomes an increasingly challenging issue when complex and critical applications are built upon services with different QoS criteria. However, most of the existing QoS- aware service composition techniques are simply based on the assumption that multiple QoS criteria, no matter whether these multiple criteria are conflicting or not, can be combined into a single criterion to be optimized, according to some utility functions. In practice, this can be very difficult as these utility functions or weights are not well-known a priori. In addition, the existing approaches are designed to work in certain environments, where the QoS parameters are well-known in advance. These approaches will render fruitless when facing uncertainand dynamic environments, e.g., IEEE 2015
  • 91.
    cloud environments, whereno prior knowledge of the QoS parameters is available. In this paper, two novel multi-objective approaches are proposed to handle QoS-aware Web service composition with conflicting objectives and various restrictions on the quality matrices. The proposed approaches use reinforcement learning in order to deal with the uncertainty characteristics inherent in open and dynamic environments. Experimental results reveal the ability of the proposed approaches to find a set of Pareto optimal solutions, which have the equivalent quality to satisfy multiple QoS-objectives with different user preferences. TTA-JD- C1554 Pattern-based Topics for Document Modelling in Information Filtering Many mature term-based or pattern- based approaches have been used in the field of information filtering to generate users' information needs from a collection of documents. A fundamental assumption for these approaches is that the documents in the collection are all about one topic. However, in reality users' interests can be diverse and the documents in the collection often involve multiple topics. Topic modelling, such as Latent Dirichlet Allocation (LDA), was proposed to generate statistical models to represent multiple topics in a collection of documents, and this has been widely utilized in the fields of machine learning and information retrieval, etc. But its effectiveness in information filtering has not been so well explored. Patterns are always thought to be more discriminative than single terms for describing documents. However, the enormous amount of discovered patterns hinder them from being effectively and efficiently used in real applications, therefore, selection of the most discriminative and representative patterns from the huge amount of discovered patterns becomes crucial. To deal with the above mentioned limitations and problems, in this paper, a novel information filtering model, Maximum matched Pattern- based Topic Model (MPBTM), is proposed. IEEE 2015
  • 92.
    The main distinctivefeatures of the proposed model include: (1) user information needs are generated in terms of multiple topics; (2) eachtopic is represented by patterns; (3) patterns are generated from topic models and are organized in terms of their statistical and taxonomic features; and (4) the most discriminative and representative patterns, called Maximum Matched Patterns, are proposed to estimate the document relevance to the user's information needs in order to filter out irrelevant documents. Extensive experiments are conducted to evaluate the effectiveness of the proposed model by using the TREC data collection Reuters Corpus Volume 1. The results show that the proposed model significantly - utperforms both state-of-the-art term-based models and pattern- based models. TTA-JD- C1555 Polarity Consistency Checking for Domain Independent Sentiment Dictionaries Polarity classification of words is important for applications such as Opinion Mining and Sentiment Analysis. A number of sentiment word/sense dictionaries ha ve been manually or (semi)automatically constructed. We notice that these sentiment dictionaries have numerous inaccuracies. Besides obvious instances, where the same word appears with different polarities in different dictionaries, the dictionaries exhibit complex cases of polarity inconsistency, which cannot be detected by mere manual inspection. We introduce the concept of polarity consistency of words/senses in sentiment dictionariesin this paper. We show that the consistency problem is NP-complete. We reduce the polarity consistency problem to the satisfiability IEEE 2015
  • 93.
    problem and utilizetwo fast SAT solvers to detect inconsistencies in a sentiment dictionary. We perform experiments on five sentiment dictionaries and WordNet to show interand intra- dictionaries inconsistencies. TTA-JD- C1556 Predicting User-Topic Opinions in Twitter with Social and Topical Context With popular microblogging services like Twitter, users are able to online share their real-time feelings in a more convenient way. The user generated data in Twitter is thus regarded as a resource providing individuals' spontaneous emotional information, and has attracted much attention of researchers. Prior work has measured the emotional expressions in users' tweets and then performed various analysis and learning. However, how to utilize those learned knowledge from the observed tweets and the context information to predict users' opinions toward specific topics they had not directly given yet, is a novel problem presenting both challenges and opportunities. In this paper, we mainly focus on solving this problem with a Social context and Topical context incorporat ed Matrix Factorization (ScTcMF) framework. The experimental results on a real- world Twitter data set show that this framework outperforms the state-of-the-art collaborative filtering methods, and demonstrate that both social contextand topical context are effective in improving the user- topic opinion prediction performance. IEEE 2015 TTA-JD- C1557 RankRC Large-scale Nonlinear Rare Class Ranking Rare class problems are common in real-world applications across a wide range of domains. Standard classification algorithms are known to perform poorly in these cases, since they focus on overall classification accuracy. In addition, we have seen a significant increase of data in recent years, resulting in many large scale rare class problems. In this paper, we focus on nonlinear kernel based IEEE 2015
  • 94.
    classification methods expressedas a regularized loss minimization problem. We address the challenges associated with both rare class problems and large scale learning, by 1) optimizing area under curve of the receiver of operator characteristic in the training process, instead of classification accuracy and 2) using a rare class kernel representation to achieve an efficient time and space algorithm. We call the algorithm RankRC. We provide justifications for the rare class representation and experimentally illustrate the effectiveness of RankRC in test performance, computational complexity, and model robustness. TTA-JD- C1558 Reverse Keyword Search for Spatio- Textual Top-kQueries in Location-Based Services Spatio-textual queries retrieve the most similar objects with respect to a given location and a keywordset. Existing studies mainly focus on how to efficiently find the top-k result set given a spatio-textualquery. Nevertheless, in many application scenarios, users cannot precisely formulate their keywords and instead prefer to choose them from some candidate keyword sets. Moreover, in information browsing applications, it is useful to highlight the objects with the tags (keywords) under which the objects have high rankings. Driven by these applications, we propose a novel query paradigm, namely reverse keyword search for spatio-textual top-k queries (RSTQ). It returns the keywords under which a target object will be a spatio- textual top-k result. To efficiently process the new query, we devise a novel hybrid index KcR-tree to store and summarize the spatial and textual information of objects. By accessing the high-level nodes of KcR-tree, we can estimate the rankings of the target object without accessing the actual objects. To further improve the performance, we propose three query optimization techniques, i.e., KcR*-tree, lazy upper-bound updating, and keyword set filtering. We also extend RSTQ to allow the input location to be a spatial region instead of a point. Extensive experimental evaluation IEEE 2015
  • 95.
    demonstrates the efficiencyof our proposed query techniques in terms of both the computational cost and I/O cost. TTA-JD- C1559 Scalable Distributed Processing Of K Nearest Neighbore Queries over Moving Objects Central to many applications involving moving objects is the task of processing k-nearest neighbor (k- NN) queries. Most of the existing approaches to this problem are designed for the centralized setting where query processing takes place on a single server; it is difficult, if not impossible, for them to scale to a distributed setting to handle the vast volume of data and concurrent queries that are increasingly common in those applications. To address this problem, we propose a suite of solutions that can support scalable distributed processing of k- NN queries. We first present a new index structure called Dynamic Strip Index (DSI), which can better adapt to different data distributions than exiting grid indexes. Moreover, it can be naturally distributed across the cluster, therefore lending itself well todistributed processing. We further propose a distributed k-NN search (DKNN) algorithm based on DSI. DKNN avoids having an uncertain number of potentially expensive iterations, and is thus more efficient and more predictable than existing approaches. DSI and DKNN are implemented on Apache S4, an open-source platform for distributed stream processing. We perform extensive experiments to study the characteristics of DSI and DKNN, and compare them with three baseline methods. Experimental results show that our proposal scales well and significantly outperforms the alternative methods. IEEE 2015 TTA-JD- C1560 Sentiment analysis from opinion mining to human-agent interaction The opinion mining and human- agent interaction communities are currently addressing sentiment analysis from different perspectives that comprise, on the one hand, disparate sentiment-related phenomena and computational representations, and on the IEEE 2015
  • 96.
    other hand, differentdetection and dialog management methods. In this paper we identify and discuss the growing opportunities for cross-disciplinary work that may increase individual advances. Sentiment/opinion detection methods used inhuman-agent interaction are indeed rare and, when they are employed, they are not different from the ones used in opinion mining and consequently not designed for socio- affective interactions (timing constraint of the interaction, sentiment analysis as an input and an output of interaction strategies). Tosupport our claims, we present a comparative state of the art which analyzes the sentiment-related phenomena and the sentiment detection methods used in both communities and makes an overview of the goals of socio- affective human-agent strategies. We propose then different possibilities for mutual benefit, specifying several research tracks and discussing the open questions and prospects. To show the feasibility of the general guidelines proposed we also approach them from a specific perspective by applying them to the case of the Greta embodied conversational agents platform and discuss the way they can be used to make a more significative sentiment analysis for human- agent interactions in two different use cases: job interviews and dialogs with museum visitors. TTA-JD- C1561 Similarity Measure Selection for Clustering Time Series Databases In the past few years, clustering has become a popular task associated with time series. The choice of a suitable distance measure is crucial to the clustering process and, given the vast number of distance measures for time series available in the literature and their diverse characteristics, this selection is not straightforward. With the objective of simplifying this task, we propose a multi-label classification framework that provides the means to automatically select the IEEE 2015
  • 97.
    most suitable distancemeasures for clustering a time series database. This classifier is based on a novel collection of characteristics that describe the main features of the time series databases and provide the predictive information necessary to discriminate between a set of distance measures. In order to test the validity of this classifier, we conduct a complete set of experiments using both synthetic and real time series databases and a set of 5 common distance measures. The positive results obtained by the designed classification framework for various performance measures indicate that the proposed methodology is useful to simplify the process of distance selection in time series clustering task s. TTA-JD- C1562 Splitting Large Medical Data Sets based on Normal Distribution in Cloud Environment The surge of medical and e-commerce applications has generated tremendous amount of data, which brings people to a so-called “Big Data” era. Different from traditional large data sets, the term “Big Data” not only means the large size of data volume but also indicates the high velocity of data generation. However, current data mining and analytical techniques are facing the challenge of dealing with large volume data in a short period of time. This paper explores the efficiency of utilizing the Normal Distribution (ND) method for splitting and processing large volume medical data in cloud environment, which can provide representative information in the split data sets. The ND- based new model consists of two stages. The first stage adopts the ND method for large data sets splitting and processing, which can reduce the volume of data sets. The second stage implements the ND-based model in a cloud computing infrastructure for allocating the split data sets. The experimental results show substantial efficiency gains of the proposed method over the conventional IEEE 2015
  • 98.
    methods without splittingdata into small partitions. The ND-based method can generate representative data sets, which can offer efficient solution for large data processing. The split data sets can be processed in parallel in Cloud computing environment. TTA-JD- C1563 Steganography Using Reversible Texture Synthesis We propose a novel approach for steganography using a reversible texture sy nthesis. A texture synthesis process resamples a smaller texture image, which synthesizes a new texture image with a similar local appearance and an arbitrary size. We weave the texture synthesis process into steganography to conceal secret messages. In contrast to using an existing cover image to hide messages, our algorithm conceals the source texture image and embeds secret messages through the process of texture synthesis. This allows us to extract the secret messages and source texture from a stego synthetic texture. Our approach offers three distinct advantages. First, our scheme offers the embedding capacity that is proportional to the size of the stego texture image. Second, a steganalytic algorithm is not likely to defeat our steganographic approach. Third, the reversible capability inherited from our scheme provides functionality, which allows recovery of the source texture. Experimental results have verified that our proposed algorithm can provide various numbers of embedding capacities, produce a visually plausible texture images, and recover the source texture. IEEE 2015 TTA-JD- C1564 TASCTopic-Adaptive Sentiment Classification on Dynamic Tweets Topic Model for Graph Mining Sentiment classification is a topic-sensitive task, i.e., a classifier trained from one topic will perform worse on another. This is especially a problem for the tweets sentiment analysis. Since the topics in Twitter are very diverse, it is impossible to train a universal classifier for all topics. Moreover, compared to product review, Twitter lacks data labeling and a rating IEEE 2015
  • 99.
    mechanism to acquiresentiment labels. The extremely sparse text of tweets also brings down the performance of a sentiment classifier. In this paper, we propose a semi-supervised topic- adaptive sentiment classification (TASC) mod el, which starts with a classifier built on common features and mixed labeled data from various topics. It minimizes the hinge loss to adapt to unlabeled data and features including topic-related sentiment words, authors' sentiments and sentiment connections derived from“@” mentions of tweets, named astopic-adaptive features. Text and non-text features are extracted and naturally split into two views for co-training. The TASC learning algorithm updates topic-adaptive features based on the collaborative selection of unlabeled data, which in turn helps to select more reliable tweets to boost the performance. We also design the adapting model along a timeline (TASC-t) for dynamic tweets. An experiment on 6topics from published tweet corpuses demonstrates that TASC outperforms other well-known supervised and ensemble classifiers. It also beats those semi-supervised learning methods without feature adaption. Meanwhile, TASC-t can also achieve impressive accuracy and F- score. Finally, with timeline visualization of “river” graph, people can intuitively grasp the ups and downs of sentiments' evolvement, and the intensity by color gradation. TTA-JD- C1565 Towards Effective Bug Triage with Software Data Reduction Techniques Software companies spend over 45 percent of cost in dealing with software bugs. An inevitable step of fixing bugs is bug triage, which aims to correctly assign a developer to a new bug. To decrease the time cost in manual work, text classification techniques are applied to conduct automatic bug triage. In this paper, we address the problem of data reduction for bug triage, i.e., how to reduce the scale and improve the quality of bug data. We combine instance selection with feature selection to simultaneously IEEE 2015
  • 100.
    reduce data scaleon the bug dimension and the word dimension. To determine the order of applying instance selection and feature selection, we extract attributes from historical bug data sets and build a predictive model for a new bug data set. We empirically investigate the performance of data reduction on totally 600,000 bug reports of two large open source projects, namely Eclipse and Mozilla. The results show that our data reduction can effectively reduce the data scale and improve the accuracy of bug triage. Our work provides an approach to leveraging techniques on data processing to form reduced and high- quality bug data in software development and maintenance. DOMAIN : CLOUD COMPUTING TTA-DC- C1501 An Access Control Model for Online Social Networks Using User- to-User Relationships Users and resources in online social networks (OSNs) are interconnected via various types of relationships. In particular, user-to- user relationships form the basis of the OSN structure, and play a significant role in specifying and enforcing access control. Individual users and the OSN provider should be enabled to specify which access can be granted in terms of existing relationships. In this paper, we propose a novel user-to- user relationship- based access control (UURAC) model for OSN systems that utilizes regular expression notation for such policy specification. Access control policies on users and resources are composed in terms of requested action, multiple relationship types, the starting point of the evaluation, and the number of hops on the path. We present two path checking algorithms to determine whether the required relationship path between users for a given access request exists. We validate the feasibility of our approach by implementing a prototype system and evaluating the performance of these two IEEE 2015
  • 101.
    algorithms. TTA-DC- C1502 Cost-Effective Authentic and Anonymous Data Sharingwith Forward Security Data sharing has never been easier with the advances of cloud computing, and an accurate analysis on the shared data provides an array of benefits to both the society and individuals. Data sharing with a large number of participants must take into account several issues, including efficiency, data integrity and privacy of data owner. Ring signature is a promising candidate to construct an anonymous and authentic data sharing system. It allows a data owner to anonymously authenticate his data which can be put into the cloud for storage or analysis purpose. Yet the costly certificate verification in the traditional public key infrastructure (PKI) setting becomes a bottleneck for this solution to be scalable. Identity-based (ID-based) ring signature, which eliminates the process of certificate verification, can be used instead. In this paper, we further enhance the security of ID- based ring signature by providing forward security: If a secret key of any user has been compromised, all previous generated signatures that include this user still remain valid. This property is especially important to any large scale data sharing system, as it is impossible to ask all data owners to reauthenticate their data even if a secret key of one single user has been compromised. We provide a concrete and efficient instantiation of our scheme, prove its security and provide an implementation to show its practicality. IEEE 2015
  • 102.
    TTA-DC- C1503 SEDASC - shareddata authority Scheme Cloud storage is an application of clouds that liberates organizations from establishing in- house data storage systems. However, cloud storage gives rise to security concerns. In case of group-shared data, the data face both cloud- specific and conventional insider threats. Secure data sharing among a group that counters insider threats of legitimate yet malicious users is an important research issue. In this paper, we propose the Secure Data Sharing in Clouds (SeDaSC) methodology that provides: 1) data confidentiality and integrity; 2) access control; 3) data sharing (forwarding) without using compute-intensive reencryption; 4) insider threat security; and 5) forward and backward access control. The SeDaSC methodology encrypts a file with a single encryption key. Two different key shares for each of the users are generated, with the user only getting one share. The possession of a single share of a key allows the SeDaSC methodology to counter the insider threats. The other key share is stored by a trusted third party, which is called the cryptographic server. The SeDaSC methodology is applicable to conventional and mobile cloud computing environments. We implement a working prototype of the SeDaSC methodology and evaluate its performance based on the time consumed during various operations. We formally verify the working of SeDaSC by using high-level Petri nets, the Satisfiability Modulo Theories Library, and a Z3 solver. The results proved to be encouraging and show that SeDaSC has the potential to be effectively used for secure data sharing in the cloud. IEEE 2015 TTA-DC- C1504 A Computational Dynamic Trust Model for User Authorization Development of authorization mechanisms for secure information access by a large community of users in an open environment is an important problem in the ever-growing Internet world. In this paper we propose a computational dynamic trust model for user a uthorization, rooted in findings from social science. Unlike most existing computational trust models, IEEE 2015
  • 103.
    this model distinguishestrusting belief in integrity from that in competence in different contexts and accounts for subjectivity in the evaluation of a particular trustee by different trusters. Simulation studies were conducted to compare the performance of the proposed integrity belief model with other trust models from the literature for different user behavior patterns. Experiments show that the proposed model achieves higher performance than other models especially in predicting the behavior of unstable users. TTA-DC- C1505 Shared Authority Based Privacy-Preserving Authentication Protocol in Cloud Computing Cloud computing is an emerging data interactive paradigm to realize users' data remotely stored in an online cloud server. Cloud services provide great conveniences for the users to enjoy the on-demand cloud applications without considering the local infrastructure limitations. During the data accessing, different users may be in a collaborative relationship, and thus data sharing becomes significant to achieve productive benefits. The existing security solutions mainly focus on the authentication to realize that a user's privative data cannot be illegally accessed, but neglect a subtle privacy issue during a user challenging the cloud server to request other users for data sharing. The challenged access request itself may reveal the user's privacy no matter whether or not it can obtain the data access permissions. In this paper, we propose a shared authority based privacy- preserving authenticationprotocol (SAPA) to address above privacy issue for cloud storage. In the SAPA, 1) shared access authority is achieved by anonymous access request matching mechanism with security and privacy considerations (e.g., authentication, data anonymity, user privacy, and forward security); 2) attribute based access control is adopted to realize that the user can only access its own data fields; 3) proxy re-encryption is applied to provide data sharing among the multiple users. Meanwhile, universal IEEE 2015
  • 104.
    composability (UC) modelis established to prove that the SAPA theoretically has the design correctness. It indicates that the proposed protocol is attractive for multi-user collaborative cloud applications. TTA-DC- C1506 Provable Multicopy Dynamic Data Possession in Cloud Computing Systems Increasingly more and more organizations are opting for outsourcing data to remote cloud service providers (CSPs). Customers can rent the CSPs storage infrastructure to store and retrieve almost unlimited amount of data by paying fees metered in gigabyte/month. For an increased level of scalability, availability, and durability, some customers may want their data to be replicated on multiple servers across multiple data centers. The more copies the CSP is asked to store, the more fees the customers are charged. Therefore, customers need to have a strong guarantee that the CSP is storing all data copies that are agreed upon in the service contract, and all these copies are consistent with the most recent modifications issued by the customers. In this paper, we propose a map-based provable multicopy dynamic data possession (MB- PMDDP) scheme that has the following features: 1) it provides an evidence to the customers that the CSP is not cheating by storing fewer copies; 2) it supports outsourcing of dynamic data, i.e., it supports block-level operations, such as block modification, insertion, deletion, and append; and 3) it allows authorized users to seamlessly access the file copies stored by the CSP. We give a comparative analysis of the proposed MB- PMDDP scheme with a reference model obtained by extending existing provable possession of dynamic single -copy schemes. The theoretical analysis is validated through experimental results on a commercial cloud platform. In addition, we show the security against colluding servers, and discuss how to identify corrupted copies by slightly modifying the proposed scheme. IEEE 2015
  • 105.
    TTA-DC- C1507 My Privacy MyDecision - Control of Photo Sharing on Online Social Networks Photo sharing is an attractive feature which popularizes Online Social Networks (OSNs). Unfortunately, it may leak users’ privacy if they are allowed to post, comment, and tag a photo freely. In this paper, we attempt to address this issue and study the scenario when a user shares a photo containing individuals other than himself/herself (termed co-photo for short). To prevent possible privacy leakage of a photo, we design a mechanism to enable each individual in a photo be aware of the posting activity and participate in the decision making on the photo posting. For this purpose, we need an efficient facial recognition (FR) system that can recognize everyone in the photo. However, more demanding privacy setting may limit the number of the photos publicly available to train the FR system. To deal with this dilemma, our mechanism attempts to utilize users’ private photos to design a personalized FR system specifically trained to differentiate possible photo co-owners without leaking their privacy. We also develop a distributed consensus based method to reduce the computational complexity and protect the private training set. We show that our system is superior to other possible approaches in terms of recognition ratio and efficiency. Our mechanism is implemented as a proof of concept Android application on Facebook’s platform. IEEE 2015 TTA-DC- C1508 A Profit Maximization Scheme with Guaranteed Quality of Service in Cloud Computing As an effective and efficient way to provide computing resources and services to customers on demand, cloud computing has become more and more popular. From cloud service providers' perspective, profit is one of the most important considerations, and it is mainly determined by the configuration of a cloud service platform under given market demand. However, a single long-term renting scheme is usually adopted to configure a cloud platform, which cannot guarantee the service quality but leads to serious resource waste. In this paper, a IEEE 2015
  • 106.
    double resource rentingscheme is designed firstly in which short-term renting and long- term renting are combined aiming at the existing issues. This double renting scheme can effectively guarantee the quality of service of all requests and reduce the resource waste greatly. Secondly, a service system is considered as an M/M/m+D queuing model and the performance indicators that affect the profit of our double renting scheme are analyzed, e.g., the average charge, the ratio of requests that need temporary servers, and so forth. Thirdly, a profit maximization problem is formulated for the double renting scheme and the optimized configuration of a cloud platform is obtained by solving the profit maximization problem. Finally, a series of calculations are conducted to compare the profit of our proposed scheme with that of the single renting scheme. The results show that our scheme can not only guarantee the service quality of all requests, but also obtain more profit than the latter. TTA-DC- C1509 Attribute-based Access Control with Constant- size Ciphertext in Cloud Computing With the popularity of cloud computing, there have been increasing concerns about its security and privacy. Since the cloud computing environment is distributed and untrusted, data owners have to encrypt outsourced data to enforce confidentiality. Therefore, how to achieve practicable access control of encrypted data in an untrusted environment is an urgent issue that needs to be solved. Attribute-Based Encryption (ABE) is a promising scheme suitable for access control in cloud storage systems. This paper proposes a hierarchical attribute- based access control scheme with constant-size Ciphertext. The scheme is efficient because the length of Ciphertext and the number of bilinear pairing evaluations to a constant are fixed. Its computation cost in encryption and decryption algorithms is low. Moreover, the hierarchical authorization structure of our scheme reduces IEEE 2015
  • 107.
    the burden andrisk of a single authority scenario. We prove the scheme is of CCA2 security under the decisional q-Bilinear Diffie- Hellman Exponent assumption. In addition, we implement our scheme and analyse its performance. The analysis results show the proposed scheme is efficient, scalable, and fine-grained in dealing with access control for outsourced data in cloud computing. TTA-DC- C1510 Bidding Strategies for Spot Instances in Cloud Computing Markets In recent times, spot pricing - a dynamic pricing scheme - is becoming increasingly popular for cloud services. This new pricing format, though efficient in terms of cost and resource use, has added to the complexity of decision making for typical cloud computing users. To recommend bidding strategies in spot markets, we use a simulation study to understand the implications that provider- recommended strategies have for cloud users. We use data based on Amazon's Elastic Compute Cloud spot market to provide users with guidelines when considering tradeoffs between cost, wait time, and interruption rates. IEEE 2015 TTA-DC- C1511 CHARM - A Cost- efficient Multi-cloud Data Hosting Scheme with High Availability Nowadays, more and more enterprises and organizations are hosting their data into the cloud, in order to reduce the IT maintenance cost and enhance the data reliability. However, facing the numerous cloud vendors as well as their heterogeneous pricing policies, customers may well be perplexed with which cloud(s) are suitable for storing their data and what hosting strategy is cheaper. The general status quo is that customers usually put their data into a single cloud (which is subject to the vendor lock-in risk) and then simply trust to luck. Based on comprehensive analysis of various state-of-the-artcloud vendors, this paper proposes a novel data hosting scheme (named CHARM) IEEE 2015
  • 108.
    which integrates twokey functions desired. The first is selecting several suitable clouds and an appropriate redundancy strategy to store data with minimized monetary cost and guaranteed availability. The second is triggering a transition process to re- distribute data according to the variations of data access pattern and pricing of clouds. We evaluate the performance of CHARM using both trace-driven simulations and prototype experiments. The results show that compared with the major existing schemes; CHARM not only saves around 20 percent of monetary cost but also exhibits sound adaptability to data and price adjustments. TTA-DC- C1512 CloudArmor - Supporting Reputation- based Trust Management for Cloud Services Trust management is one of the most challenging issues for the adoption and growth of cloud computing. The highly dynamic, distributed, and non-transparent nature of cloud services introduces several challenging issues such as privacy, security, and availability. Preserving consumers’ privacy is not an easy task due to the sensitive information involved in the interactions between consumers and the trust management service. Protecting cloud services against their malicious users (e.g., such users might give misleading feedback to disadvantage a particular cloud service) is a difficult problem. Guaranteeing the availability of the trust management service is another significant challenge because of the dynamic nature of cloud environments. In this article, we describe the design and implementation of CloudArmor, a reputation- based trust management framework that provides a set of functionalities to deliver Trust as a Service (TaaS), which includes i) a novel protocol to prove the credibility of trust feedbacks and preserve users’ privacy, ii) an adaptive and robust credibility model for measuring the credibility of trust feedbacks to IEEE 2015
  • 109.
    protect cloud servicesfrom malicious users and to compare the trustworthiness of cloud services, and iii) an availability model to manage the availability of the decentralized implementation of the trust management service. The feasibility and benefits of our approach have been validated by a prototype and experimental studies using a collection of real-world trust feedbacks on cloud services. TTA-DC- C1513 DaSCE - Data Security for Cloud Environment with Semi-Trusted Third Party Off-site data storage is an application of cloud that relieves the customers from focusing on data storage system. However, outsourcing data to a third-party administrative control entails serious security concerns. Data leakage may occur due to attacks by other users and machines in the cloud. Wholesale of data by cloud service provider is yet another problem that is faced in the cloud environment. Consequently, high- level of security measures is required. In this paper, we propose DataSecurity for Cloud Environment with Semi-Trusted Third Party (DaSCE), a data security system that provides (a) key management (b) access control, and (c) file assured deletion. The DaSCE utilizes Shamir’s (k, n) threshold scheme to manage the keys, where k out of n shares are required to generate the key. We use multiple key managers, each hosting one share of key. Multiple key managers avoid single point of failure for the cryptographic keys. We (a) implement a working prototype of DaSCE and evaluate its performance based on the time consumed during various operations, (b) formally model and analyze the working of DaSCE using High Level Petri nets (HLPN), and (c) verify the working of DaSCE using Satisfiability Modulo Theories Library (SMT-Lib) and Z3 solver. The results reveal that DaSCE can be effectively used for security of outsourced data by employing key management, access control, and file assured deletion. IEEE 2015
  • 110.
    TTA-DC- C1514 Data as aCurrency and Cloud-Based Data Lockers With large data volumes being generated through Google search, Facebook, Twitter, Instagram, and the increasingly instrumented physical world (with embedded sensors), the authors discuss whether such data can be the basis of a new transactional relationship between people and companies in which both sides benefit from new products and services and increased economic growth. However, the key distinction from previous discussions is whether the existence of a global cloud computing industry (consisting of datacenters located in different parts of the world) can be used to facilitate such transactional relations, with awareness of data privacy and access management. The authors propose the use of data as a currency, to enable consumers to directly monetize their own data and request services (based on the "value" their data holds within a marketplace). IEEE 2015 TTA-DC- C1515 Dynamic Weight-Based Individual Similarity Calculation for Information Searching in Social Computing In the social computing environment, the complete information about an individual is usually distributed in heterogeneous social networks, which are presented as linked data. Synthetically recognizing and integrating these distributed and heterogeneous data for efficiently information searching is an important but challenging work. In this paper, a dynamic weight (DW)- based similarity calculation is proposed to recognize and integrate similar individuals from distributed data environments. First, each link of an individual is weighted by applying DW. Then, a semantic similarity metric is proposed to combine the DW into similarity calculation. Then, a searching system framework for a similarity-based individual is designed and tested in real-world data sets. Finally, massive experiments are conducted both in benchmark and real-world social community data sets. The results show that our approach can produce a IEEE 2015
  • 111.
    good result in similarindividual searching in social networks. In addition, it performs significantly better than the existing state-of-the-art approaches in similar individual searching. TTA-DC- C1516 Efficient audit service outsourcing for data integrity in clouds Cloud computing that provides elastic computing and storage resource on demand has become increasingly important due to the emergence of “big data”. Cloud computing resources are a natural fit for processing big data streams as they allow big data application to run at a scale which is required for handling its complexities (data volume, variety and velocity). With the data no longer under users' direct control, data security in cloud computing is becoming one of the most concerns in the adoption of cloud computing resources. In order to improve data reliability and availability, storing multiple replicas along with original datasets is a common strategy for cloud service providers. Public data auditing schemes allow users to verify their outsourced data storage without having to retrieve the whole dataset. However, existing data auditing techniques suffers from efficiency and security problems. First, for dynamic datasets with multiple replicas, the communication overhead for update verifications is very large, because each update requires updating of all replicas, where verification for each update requires O(log n ) communication complexity. Second, existing schemes cannot provide public auditing and authentication of block indices at the same time. Without authentication of block indices, the server can build a valid proof based on data blocks other than the blocks client requested to verify. In order to address these problems, in this paper, we present a novel public auditing scheme named MuR-DPA. The new scheme incorporated a novel authenticated data structure (ADS) based on the Merkle hash tree (MHT), which we call MR-MHT. To support full IEEE 2015
  • 112.
    dynamic data updatesand authentication of block indices, we included rank and level values in computation of MHT nodes. In contrast to existing schemes, level values of nodes in MR-MHT are assigned in a top-down order, and all replica blocks for each data block are organized into a same replica sub-tree. S- ch a configuration allows efficient verification of updates for multiple replicas. Compared to existing integrity verification and public auditing schemes, theoretical analysis and experimental results show that the proposed MuR-DPA scheme can not only incur much less communication overhead for both update verification and integrity verification of cloud datasets with multiple replicas, but also provide enhanced security against dishonest cloud service providers. TTA-DC- C1517 Generic and Efficient Constructions of Attribute-Based Encryption with Verifiable Outsourced Decryption Attribute-based encryption (ABE) provides a mechanism for complex access control over encrypted data. However in most ABE systems, the ciphertext size and the decryption overhead, which grow with the complexity of the access policy, are becoming critical barriers in applications running on resource-limited devices. Outsourcing decryption of ABE ciphertexts to a powerful third party is a considerable manner to solve this problem. Since the third party is usually believed to be untrusted, the security requirements of ABE with outsourced decryption should include privacy and verifiability. Namely, any adversary including the third party should learn nothing about the encrypted message, and the correctness of the outsourced decryption is supposed to be verified efficiently. We propose generic constructions of CPA-secure and RCCA- secure ABE systems with verifiable outsourced decryption from CPA-secure ABE with outsourced decryption, IEEE 2015
  • 113.
    respectively. We alsoinstantiate our CPA- secure construction in the standard model and then show an implementation of this instantiation. The experimental results show that, compared with the existing scheme, our CPA-secure construction has more compact ciphertext and less computational costs. Moreover, the techniques involved in the RCCA-secure construction can be applied in generally constructing CCA-secure ABE, which we believe to be of independent interest. TTA-DC- C1518 Group Key Agreement with Local Connectivity In this paper, we study a group key agreement problem where a user is only aware of his neighbors while the connectivity graph is arbitrary. In our problem, there is no centralized initialization for users. A group key agreement with these features is very suitable for social networks. Under our setting, we construct two efficient protocols with passive security. We obtain lower bounds on the round complexity for this type of protocol, which demonstrates that our constructions are round efficient. Finally, we construct an actively secure protocol from a passively secure one. IEEE 2015 TTA-DC- C1519 Hybrid cloud approach for secure authorized seduplication Data deduplication is one of important data compression techniques for eliminating duplicate copies of repeating data, and has been widely used in cloud storage to reduce the amount of storage space and save bandwidth. To protect the confidentiality of sensitive data while supporting deduplication, the convergent encryption technique has been proposed to encrypt the data before outsourcing. To better protect data security, this paper makes the first attempt to formally address the problem of authorized data deduplication. Different from traditional deduplication systems, the differential privileges of users are further considered in duplicate check besides the data itself. We also present several new deduplication constructions supporting authorized duplicate check in a hybrid cloud architecture. Security IEEE 2015
  • 114.
    analysis demonstrates thatour scheme is secure in terms of the definitions specified in the proposed security model. As a proof of concept, we implement a prototype of our proposed authorized duplicate check scheme and conduct test bed experiments using our prototype. We show that our proposed authorized duplicate check scheme incurs minimal overhead compared to normal operations. TTA-DC- C1520 Performing Initiative Data Prefetching in Distributed File Systems for Cloud Computing This paper presents an initiative data prefetching scheme on the storage servers in distributed file systems for cloud computing. In this prefetching technique, the client machines are not substantially involved in the process of data prefetching, but the storage servers can directly prefetch the data after analyzing the history of disk I/O access events, and then send the prefetched data to the relevant client machines proactively. To put this technique to work, the information about client nodes is piggybacked onto the real client I/O requests, and then forwarded to the relevant storage server. Next, two prediction algorithms have been proposed to forecast future block access operations for directing what datashould be fetched on storage servers in advance. Finally, the prefetched data can be pushed to the relevant client machine from the storage server. Through a series of evaluation experiments with a collection of application benchmarks, we have demonstrated that our presented initiative prefetching technique can benefit distributed file systems for cloud envir onments to achieve better I/O performance. In particular, configuration-limited client machines in the cloud are not responsible for predicting I/O access operations, which can definitely contribute to preferable system performance on them. IEEE 2015 TTA-DC- C1521 Privacy protection for wirelsess sensor medical data In recent years, wireless sensor networks have been widely used in healthcare applications, such as hospital and home patient monitoring. IEEE 2015
  • 115.
    Wireless medical sensornetworks are more vulnerable to eavesdropping, modification, impersonation and replaying attacks than the wired networks. A lot of work has been done to secure wireless medical sensor networks. The existing solutions can protect the patient data during transmission, but cannot stop the inside attack where the administrator of the patient database reveals the sensitive patient data. In this paper, we propose a practical approach to prevent the inside attack by using multiple data servers to store patient data. The main contribution of this paper is securely distributing the patient data in multiple data servers and employing the Paillier and ElGamal cryptosystems to perform statistic analysis on the patient data without compromising the patients’ privacy. TTA-DC- C1522 Quality-assured Secured Load Sharing in Mobile Cloud Networking Environment In mobile cloud networks (MCNs), a mobile user is connected with a cloud server through a network gateway, which is responsible for providing the required quality- of-service (QoS) to the users. If a user increases its service demand, the connecting gateway may fail to provide the requested QoS due to the overloaded demand, while the other gateways remain under loaded. Due to the increase in load in one gateway, the sharing of load among all the gateways is one of the prospective solutions for providing QoS-guaranteed services to the mobile users. Additionally, if a user misbehaves, the situation becomes more challenging. In this paper, we address the problem of QoS- guaranteed secured service provisioning in MCNs. We design a utility maximization problem for quality- assured secured loadsharing (QuaLShare) in MCN, and determine its optimal solution using auction theory. In QuaLShare, the overloaded gateway detects the misbehaving gateways, and, then, prevents them from participating in the auction process. Theoretically, we characterize both the problem and the solution approaches in an MCN environment. Finally, IEEE 2015
  • 116.
    we investigate theexistence of Nash Equilibrium of the proposed scheme. We extend the solution for the case of multiple users, followed by theoretical analysis. Numerical analysis establishes the correctness of the proposed algorithms. TTA-DC- C1523 Secure Audit Service by Using TPA for Data Integrity in Cloud System Cloud service is not only for store the data in cloud but also for shared data over cloud for users. The integrity of data on the cloud be easily lost or damaged. To ensure cloud storage correctness based on distributed storage integrity auditing mechanism it helps to secure and efficient operations on cloud data done by third party auditor (TPA). The third party auditor utilizing the ring signature and keyed hash based message authenticating code for checking integrity. The data privacy and identity privacy on shared data is secured using private key encryption during auditing process by public verifier. In existing process the data freshness is not proven. So, we propose HMAC mechanism to protect the metadata secrecy, integrity, authentication on shared data in the cloud storage. This also supports random checking process by the public verifiers instead of checking the entire data on the cloud. Ouraudit system define the data freshness by secrecy, integrity, and authentication of metadata and also supports low computation and communication, less extra storage for audit the metadata. IEEE 2015 TTA-DC- C1524 Secure Data Transmission using Stegnography and encryption technique Transmission of important data like text, images, video, etc over the internet is increases now days hence it's necessary to use of secure methods for multimedia data. Image encryption is most secure than other multimedia components encryption because some inherent properties such as higher data capacity and high similarity between pixels. The older encryption techniques such as AES, DES, RTS are not suitable for IEEE 2015
  • 117.
    highly secure datatransmission on wireless media. Thus we combine the chaotic theory and cryptography to form an valuable technique for information security. In the first stage, a user encrypts the original input image using chaotic map theory. After that data-hider compresses the LSB bits of the encrypted image using a data-hiding key to make space for accommodate some more data. In now day's image encryption is chaos based for some unique characteristics such as correlation between neighboring pixels, sensitivity to initial conditions, non- periodicity, and control parameters. There are number of image encryption algorithms based on chaotic maps have been implemented some of them are time consuming, complex and some have very little key space. In this paper we implement three non linier differential chaos based encryption technique where for the first time 3 differential chaoses is used for position permutation and value transformation technique. In the data hiding phase, data which is in the binary forms embedded into encrypted image by using least significant bit algorithm. We tabulate correlation coefficient value both horizontal and vertical position for cipher and original image and compare performance of our Method with some existing methods. We also discuss about different types of attack, key sensitivity, and key space of our proposed approach. The given approach is very simple, fast, accurate and it have been applied together as a double algorithm in order to serve best results in highly unsecure and complex environment. Each of these algorith- s are been discussed one by one below. TTA-DC- C1525 Smart phone instant messanger by using google cloud messaging Two of the most important drivers of current telecommunication markets are the development of Rich Communication Services (RCS) and cloud computing. The challenges of delivering these new services on a cloud-based architecture are not only on the technical side, they also concern the definition of feasible IEEE 2015
  • 118.
    business models forall the involved agents and the definition and negotiation of proper service level agreements at different levels. This work proposes to provide telecommunication operators with cloud-based infrastructures capable of offering customers innovative and reliable rich communication services based on their phone numbers that cannot be replicated by the Internet competitors in terms of flexibility, scalability or security. This Obliquity as a Service model (MaaS) allows telecommunication providers to maintain relevance for their clients offering not only the common communication services (instant messaging, group communication and chat, file sharing or enriched calls services) but also a new kind of mobiquiotus services related to mobile marketing, smart places, Internet of Things or health care, exploiting all the competitive advantages associated to the development of a vertical cloud in a dynamic and heterogeneous ecosystem. In addition, the infrastructure layer needed to support the new proposed model is defined and a first prototype is deployed and evaluated with two real use cases. TTA-DC- C1526 Social Recommendation with Cross-Domain Transferable Knowledge Recommender systems can suffer from data sparsity and cold start issues. However, social networks, which enable users to build relationships and create different types of items, present an unprecedented opportunity to alleviate these issues. In this paper, we represent a social network as a star-structured hybrid graph centered on a social domain, which connects with other item domains. With this innovative representation, useful knowledge from an auxiliary domain can be transferred through the social domain to a target domain. Various factors of item transferability, including popularity and behavioral consistency, are IEEE 2015
  • 119.
    determined. We proposea novel Hybrid Random Walk (HRW) method, which incorporates such factors, to select transferable items in auxiliary domains, bridge cross-domain knowledge with the social domain, and accurately predict user- item links in a target domain. Extensive experiments on a real social dataset demonstrate that HRW significantly outperforms existing approaches. TTA-DC- C1527 Three-server swapping for access confidentiality We propose an approach to protect confidentiality of data and accesses to them when data are stored and managed by external providers, and hence not under direct control of their owner. Our approach is based on the use of distributed data allocation among three independent servers and on a dynamic re-allocation of data at every access. Dynamic re-allocation is enforced by swapping data involved in an access across the servers in such a way that accessing a given node implies re-allocating it to a different server, then destroying the ability of servers to build knowledge by observing accesses. The use of three servers provides uncertainty, to the eyes of the servers, of the result of the swapping operation, even in presence of collusion among them. IEEE 2015 TTA-DC- C1528 Trust and Compactness in Social Network Groups Understanding the dynamics behind group formation and evolution in social networks is considered an instrumental milestone to better describe how individuals gather and form communities, how they enjoy and share the platform contents, how they are driven by their preferences/tastes, and how their behaviors are influenced by peers. In this context, the notion of compactness of a social group is particularly relevant. While the literature usually refers to compactness as a measure to merely IEEE 2015
  • 120.
    determine how muchmembers of a group are similar among each other, we argue that the mutual trustworthiness between the members should be considered as an important factor in defining such a term. In fact, trust has profound effects on the dynamics of group formation and their evolution: individuals are more likely to join with and stay in a group if they can trust other group members. In this paper, we propose a quantitative measure of group compactness that takes into account both the similarity and the trustworthiness among users, and we present an algorithm to optimize such a measure. We provide empirical results, obtained from the real social networks EPINIONS and CIAO, that compare our notion of compactness versus the traditional notion of user similarity, clearly proving the advantages of our approach. TTA-JC- C1529 Public Integrity Auditing for Shared Dynamic Cloud Data with Group User Revocation The advent of the cloud computing makes storage outsourcing become a rising trend, which promotes the secure remote data auditing a hot topic that appeared in the research literature. Recently some research consider the problem of secure and efficient public data integrity auditing for share d dynamic data. However, these schemes are still not secure against the collusion of cloud storage server and revoked group users during user revocation in practical cloud storage system. In this paper, we figure out the collusion attack in the exiting scheme and provide an efficient public integrity auditing scheme with secure group user revocation based on vector commitment and verifier- local revocation group signature. We design a concrete scheme based on the our scheme definition. Our scheme supports the public checking and efficient user revocation and also some nice properties, such as confidently, efficiency, countability and traceability of secure group user revocation. Finally, the IEEE 2015
  • 121.
    security and experimentalanalysis show that, compared with its relevant schemes our scheme is also secure and efficient. TTA-JC- C1530 Audit-Free Cloud Storage via Deniable Attribute-based Encryption Cloud storage services have become increasingly popular. Because of the importance of privacy, many cloud storage encryption schemes have been proposed to protect data from those who do not have access. All such schemes assumed that cloud storage providers are safe and cannot be hacked; however, in practice, some authorities (i.e., coercers) may force cloud storage providers to reveal user secrets or confidential data on the cloud, thus altogether circumventing storage encryption schemes. In this paper, we present our design for a new cloud storage encryption scheme that enables cloudstorage providers to create convincing fake user secrets to protect user privacy. Since coercers cannot tell if obtained secrets are true or not, the cloud storage providers ensure that user privacy is still securely protected. IEEE 2015 TTA-JC- C1531 CHARM - A Cost- effcient Multi-cloud Data Hosting Scheme with High Availability Nowadays, more and more enterprises and organizations are hosting their data into the cloud, in order to reduce the IT maintenance cost and enhance the data reliability. However, facing the numerous cloud vendors as well as their heterogenous pricing policies, customers may well be perplexed with which cloud(s) are suitable for storing their data and what hosting strategy is cheaper. The general status quo is that customers usually put their data into a single cloud (which is subject to the vendor lock-in risk) and then simply trust to luck. Based on comprehensive analysis of various state-of-the-artcloud vendors, this paper proposes a novel data hosting scheme (named CHARM) which integrates two key functions desired. The first is selecting several suitable clouds and an appropriate redundancy strategy to IEEE 2015
  • 122.
    store data withminimized monetary cost and guaranteed availability. The second is triggering a transition process to re- distribute data according to the variations of data access pattern and pricing of clouds. We evaluate the performance of CHARM using both trace-driven simulations and prototype experiments. The results show that compared with the major existing schemes, CHARM not only saves around 20 percent of monetary cost but also exhibits sound adaptability to data and price adjustments. TTA-JC- C1532 Secure Auditing and Deduplicating Data in Cloud As the cloud computing technology develops during the last decade, outsourcing data to cloud service for storage becomes an attractive trend, which benefits in sparing efforts on heavy data maintenance and management. Nevertheless, since the outsourced cloud storage is not fully trustworthy, it raises security concerns on how to realize data deduplication in cloud while achieving integrity auditing. In this work, we study the problem of integrity auditing and secure deduplication on cloud data. Specifically, aiming at achieving both data integrity and deduplication in cloud, we propose two secure systems, namely SecCloud and SecCloud+. SecCloud introduces an auditing entity with a maintenance of a MapReduce cloud, which helps clients generate data tags before uploading as well as audit the integrity of data having been stored in cloud. Compared with previous work, the computation by user in SecCloud is greatly reduced during the file uploading and auditing phases. SecCloud+ is designed motivated by the fact that customers always want to encrypt their data before uploading, and enables integrity auditing and secure deduplication on encrypted data. IEEE 2015 TTA-JC- C1533 A Profit Maximization Scheme with Guaranteed Quality of As an effective and efficient way to provide computing resources and services to IEEE 2015
  • 123.
    Service in Cloud Computing customerson demand, cloud computing has become more and more popular. From cloud service providers' perspective, profit is one of the most important considerations, and it is mainly determined by the configuration of a cloud service platform under given market demand. However, a single long-term renting scheme is usually adopted to configure a cloud platform, which cannot guarantee the service quality but leads to serious resource waste. In this paper, a double resource renting scheme is designed firstly in which short-term renting and long- term renting are combined aiming at the existing issues. This double renting scheme can effectively guarantee the quality of service of all requests and reduce the resource waste greatly. Secondly, a service system is considered as an M/M/m+D queuing model and the performance indicators that affect the profit of our double renting scheme are analyzed, e.g., the average charge, the ratio of requests that need temporary servers, and so forth. Thirdly, a profit maximization problem is formulated for the double renting scheme and the optimized configuration of a cloud platform is obtained by solving the profit maximization problem. Finally, a series of calculations are conducted to compare the profit of our proposed scheme with that of the single renting scheme. The results show that our scheme can not only guarantee the service quality of all requests, but also obtain more profit than the latter. TTA-JC- C1534 Online Resource Scheduling under Concave Pricing for Cloud Computing With the booming cloud computing industry, computational resources are readily and elastically available to the customers. In order to attract customers with various demands, most Infrastructure-as-a-service (IaaS) cloud service providers offer several pricing strategies such as pay as you go, pay less per unit when you use more (so called volume discount), and pay even less IEEE 2015
  • 124.
    when you reserve.The diverse pricing schemes among different IaaS service providers or even in the same provider form a complex economic landscape that nurtures the market of cloud brokers. By strategically scheduling multiple customers’ resource requests, a cloud broker can fully take advantage of the discounts offered by cloud service providers. In this paper, we focus on how a broker can help a group of customers to fully utilize the volume discount pricing strategy offered by cloud service providers through cost- efficient online resource scheduling. We present a randomized online stack- centric scheduling algorithm (ROSA) and theoretically prove the lower bound of its competitive ratio. Three special cases of the offline concave cost scheduling problem and the corresponding optimal algorithms are introduced. Our simulation shows that ROSA achieves a competitive ratio close to the theoretical lower bound under the special cases. Trace-driven simulation using Google cluster data demonstrates that ROSA is superior to the conventional online scheduling algorithms in terms of cost saving. TTA-JC- C1535 The Value of Cooperation - Minimizing User Costs in Multi-broker Mobile Cloud Computing Networks We study the problem of user cost minimization in mobile cloud computing (MCC) networks. We consider a MCC model where multiple brokers assign cloud resources to mobile users. The model is characterized by an heterogeneous cloud architecture (which includes a public cloud and a cloudlet) and by the heterogeneous pricing strategies of cloud service providers. In this setting, we investigate two classes of cloud reservation strategies, i.e., a competitive strategy, and a compete-then-cooperate strategy as a performance bound. We first study a purely competitive scenario where brokers compete to reserve computing resources from remote public clouds (which are affected by long delays) and from local cloudlets (which have IEEE 2015
  • 125.
    limited computational resourcesbut short delays). We provide theoretical results demonstrating the existence of disagreement points (i.e., the equilibrium reservation strategy that no broker has incentive to deviate unilaterally from) and convergence of the best response strategies of the brokers to disagreement points. We then consider the scenario in which brokers agree to cooperate in exchange for a lower average cost of resources. We formulate a cooperative problem where the objective is to minimize the total average price of all brokers, under the constraint that no broker should pay a price higher than the disagreement price (i.e., the competitive price).We design new globally optimal solution algorithm to solve the resulting non-convex cooperative problem, based on a combination of the branch and bound framework and of advanced convex relaxation techniques. The resulting optimal solution provides a lower bound on the achievable user cost without complete collusion among brokers. Compared with pure competition, we found that i) noticeable cooperative gains can be achieved over pure competition in markets with a few brokers only, and ii) the cooperative gain is only marginal in crowded markets, i.e., with a high number of brokers, hence there is n- clear incentive for brokers to cooperate. TTA-JC- C1536 System of Systems for Quality-of-Service Observation and Response in Cloud Computing Environments As military, academic, and commercial computing systems evolve from autonomous entities that deliver computing products into network centric enterprise systems that deliver computing as a service, opportunities emerge to consolidate computing resources, software, and information through cloud computing. Along with these opportunities come challenges, particularly to service providers and operations centers that struggle to monitor and manage quality of service (QoS) for these services in order to meet customer service commitments. Traditional IEEE 2015
  • 126.
    approaches fall shortin addressing these challenges because they examine QoS from a limited perspective rather than from a system- of-systems (SoS) perspective applicable to a net-centric enterprise system in which any user from any location can share computing resources at any time. This paper presents a SoS approach to enable QoS monitoring, management, and response for enterprise systems that deliver computing as a service through a cloud computing environment. A concrete example is provided for application of this new SoS approach to a real-world scenario (viz., distributed denial of service). Simulated results confirm the efficacy of the approach. TTA-JC- C1537 A Computational Dynamic Trust Model for User Authorization Development of authorization mechanisms for secure information access by a large community of users in an open environment is an important problem in the ever-growing Internet world. In this paper we propose a computational dynamic trust model for user a uthorization, rooted in findings from social science. Unlike most existing computational trust models, this model distinguishes trusting belief in integrity from that in competence in different contexts and accounts for subjectivity in the evaluation of a particular trustee by different trusters. Simulation studies were conducted to compare the performance of the proposed integrity belief model with other trust models from the literature for different user behavior patterns. Experiments show that the proposed model achieves higher performance than other models especially in predicting the behavior of unstable users. IEEE 2015 TTA-JC- C1538 Generic and Efficient Constructions of Attribute-Based Encryption with Verifiable Outsourced Decryption Attribute-based encryption (ABE) provides a mechanism for complex access control over encrypted data. However in most ABE systems, the ciphertext size and the decryption overhead, which grow with the complexity of the access policy, are becoming critical barriers in applications running on IEEE 2015
  • 127.
    resource-limited devices. Outsourcing decryptionof ABE ciphertexts to a powerful third party is a considerable manner to solve this problem. Since the third party is usually believed to be untrusted, the security requirements of ABE with outsourced decryption should include privacy and verifiability. Namely, any adversary including the third party should learn nothing about the encrypted message, and the correctness of the outsourced decryption is supposed to be verified efficiently. We propose generic constructions of CPA-secure and RCCA- secure ABE systems with verifiable outsourced decryption from CPA-secure ABE with outsourced decryption, respectively. We also instantiate our CPA- secure construction in the standard model and then show an implementation of this instantiation. The experimental results show that, compared with the existing scheme, our CPA-secure construction has more compact ciphertext and less computational costs. Moreover, the techniques involved in the RCCA-secure construction can be applied in generally constructing CCA-secure ABE, which we believe to be of independent interest. TTA-JC- C1539 Leveraging Data Deduplication to Improve the Performance of Primary Storage Systems in the Cloud With the explosive growth in data volume, the I/O bottleneck has become an increasingly daunting challenge for big data analytics in the Cloud. Recent studies have shown that moderate to high data redundancy clearly exists in primary storage systems in the Cloud. Our experimental studies reveal that data redundancy exhibits a much higher level of intensity on the I/O path than that on disks due to relatively high temporal access locality associated with small I/O requests to redundant data. Moreover, directly applying data deduplication to primary storage systems in the Cloud will likely cause space contention in memory and data fragmentation on disks. Based on these observations, we propose a performance-oriented IEEE 2015
  • 128.
    I/O deduplication, calledPOD, rather than a capacity oriented I/O deduplication, exemplified by iDedup, to improve the I/O performance of primary storage systems in the Cloud without sacrificing capacity savings of the latter. POD takes a two-pronged approach to improving theperformance of prim ary storage systems and minimizing performance overhead of deduplication, namely, a request-based selective deduplication technique, called Select- Dedupe, to alleviate the data fragmentation and an adaptive memory management scheme, called iCache, to ease the memory contention between the bursty read traffic and the bursty write traffic. We have implemented a prototype of POD as a module in the Linux operating system. The experiments conducted on our lightweight prototype implementation of POD show that POD significantly outperforms iDedup in the I/Operformance measure by up to 87.9% with an average of 58.8%. Moreover, our evaluation results also show that POD achieves comparable or better capacity savings than iDedup. TTA-JC- C1540 Enabling Fine-grained Multi-keyword Search Supporting Classified Sub-dictionaries over Encrypted Cloud Data Using cloud computing, individuals can store their data on remote servers and allow data access to public users through the cloud servers. As the outsourced data are likely to contain sensitive privacy information, they are typically encrypted before uploaded to the cloud. This, however, significantly limits the usability of outsourced data due to the difficulty of searching over the encrypted data. In this paper, we address this issue by developing the fine-grained multi- keyword search schemes over encrypted cloud data. Our original contributions are three-fold. First, we introduce the relevance scores and preference factors upon keywords which enable the precise keyword search and personalized user experience. Second, we develop a practical and very efficient multi- keyword search scheme. The proposed scheme IEEE 2015
  • 129.
    can support complicatedlogic search the mixed “AND”, “OR” and “NO” operations of keywords. Third, we further employ the classified sub-dictionaries technique to achieve better efficiency on index building, trapdoor generating and query. Lastly, we analyze the security of the proposed schemes in terms of confidentiality of documents, privacy protection of index and trapdoor, and unlinkability of trapdoor. Through extensive experiments using the real-world dataset, we validate the performance of the proposed schemes. Both the security analysis and experimental results demonstrate that the proposed schemes can achieve the same security level comparing to the existing ones and better performance in terms of functionality, query complexity and efficiency. TTA-JC- C1541 On the Security of Data Access Control for Multiauthority Cloud Storage Systems Data access control has becoming a challenging issue in cloud storage systems. Some techniques have been proposed to achieve the secure data access control in a semitrusted cloud storage system. Recently, K.Yang et al.proposed a basic data access control scheme for multiauthority cloud storage system (DAC- MACS) and an extensive data access control scheme (EDAC- MACS). They claimed that the DAC-MACS could achieve efficient decryption and immediate revocation and the EDAC-MACS could also achieve these goals even though non revoked users reveal their Key Update Keys to the revoked user. However, through our cryptanalysis, the revocation security of both schemes cannot be guaranteed. In this paper, we first give two attacks on the two schemes. By the first attack, the revoked user can eavesdrop to obtain other users’ Key Update Keys to update its Secret Key, and then it can obtain proper Token to decrypt any secret information as a non revoked user. In addition, by the second attack, the revoked user can intercept Ciphertext Update Key to retrieve its ability to decrypt any secret information as a IEEE 2015
  • 130.
    non revoked user.Secondly, we propose a new extensive DAC-MACS scheme (NEDAC- MACS) to withstand the above two attacks so as to guarantee more secure attribute revocation. Then, formal cryptanalysis of NEDAC-MACS is presented to prove the security goals of the scheme. Finally, the performance comparison among NEDAC- MACS and related schemes is givento demonstrate that the performance of NEDAC- MACS is superior to that of DACC, and relatively same as that of DAC-MACS. TTA-JC- C1542 Verifiable Auditing for Outsourced Database in Cloud Computing The notion of database outsourcing enables the data owner to delegate the database management to a cloud service provider (CSP) that provides various database services to different users. Recently, plenty of research work has been done on the primitive of outsourced database. However, it seems that no existing solutions can perfectly support the properties of both correctness and completeness for the query results, especially in the case when the dishonest CSP intentionally returns an empty set for the query request of the user. In this paper, we propose a new verifiable auditing scheme for outsourced database, which can simultaneously achieve the correctness and completeness of search results even if the dishonest CSP purposely returns an empty set. Furthermore, we can prove that our construction can achieve the desired security properties even in the encrypted outsourced database. Besides, the proposed scheme can be extended to support the dynamic database setting by incorporating the notion of verifiable database with updates. IEEE 2015 TTA-JC- C1543 A Cost-Effective Deadline-Constrained Dynamic Scheduling Algorithm for Scientific Workflows in a Cloud Environment Cloud Computing, a distributed computing paradigm, enables delivery of IT resources over the Internet and follows the pay-as-you- go billing model. Workflow scheduling is one of the most challenging problems in Cloud computing. Although, workflow scheduling on distributed IEEE 2015
  • 131.
    systems like Gridsand Clusters have been extensively studied, however, these solutions are not viable for a Cloud environment. It is because, a Cloud environment differs from other distributed environment in two major ways: on-demand resource provisioning and pay-as-you-go pricing model. Thus, to achieve the true benefits of workflow orchestration onto Cloud resources novel approaches that can capitalize the advantages and address the challenges specific to a Cloud environment needs to be developed. This work proposes a dynamic cost- effective deadline- constrained heuristic algorithm for scheduling ascientific workflow in a public Cloud. The proposed technique aims to exploit the advantages offered by Cloud computing while taking into account the virtual machine performance variability and instance acquisition delay to identify a just-in- time schedule of a deadline constrained scientific workflow at lesser costs. Performance evaluation on some well-known scientific workflows exhibit that the proposed algorithm delivers better performance in comparison to the current state-of-the-art heuristics TTA-JC- C1544 A Hybrid Cloud Approach for Secure Authorized Deduplication Data deduplication is one of important data compression techniques for eliminating duplicate copies of repeating data, and has been widely used in cloud storage to reduce the amount of storage space and save bandwidth. To protect the confidentiality of sensitive data while supporting deduplication, the convergent encryption technique has been proposed to encrypt the data before outsourcing. To better protect data security, this paper makes the first attempt to formally address the problem of authorized data deduplication. Different from traditional deduplication systems, the differential privileges of users are further considered in duplicate check besides the data itself. We also present several new IEEE 2015
  • 132.
    deduplication constructions supporting authorizedduplicate check in a hybrid cloud architecture. Security analysis demonstrates that our scheme is secure in terms of the definitions specified in the proposed security model. As a proof of concept, we implement a prototype of our proposed authorized duplicate check scheme and conduct test bed experiments using our prototype. We show that our proposed authorized duplicate check scheme incurs minimal overhead compared to normal operations. TTA-JC- C1545 A Secure and Dynamic Multi-keyword Ranked Search Scheme over Encrypted CloudData Due to the increasing popularity of cloud computing, more and more data owners are motivated to outsource their data to cloud servers for great convenience and reduced cost in data management. However, sensitive data should be encrypted before outsourcing for privacy requirements, which obsoletes data utilization like keyword-based document retrieval. In this paper, we present a secure multi- keyword ranked search scheme over encrypted cloud data, which simultaneously supports dynamic update operations like deletion and insertion of documents. Specifically, the vector space model and the widely-used TFIDF model are combined in the index construction and query generation. We construct a special tree-based index structure and propose a “Greedy Depth-first Search” algorithm to provide efficient multi-keyword ranked search. The secure kNN algorithm is utilized to encrypt the index and query vectors, and meanwhile ensure accurate relevance score calculation between encrypted index and query vectors. In order to resist statistical attacks, phantom terms are added to the index vector for blinding search results. Due to the use of our special tree-based index structure, the proposed scheme can achieve sub- linear search time and deal with the deletion and insertion of documents flexibly. Extensive experiments are conducted to demonstrate the IEEE 2015
  • 133.
    efficiency of theproposed scheme. TTA-JC- C1546 A Universal Fairness Evaluation Framework for Resource Allocation in Cloud Computing In cloud computing, fairness is one of the most significant indicators to evaluate resource allocation algorithms, which reveals whether each user is allocated as much as that of all other users having the same bottleneck. However, how fair an allocation algorithm is remains an urgent issue. In this paper, we propose Dynamic Evaluation Framework for Fairness ( DEFF), a framework to evaluate the fairness of an resource allocation algorithm. In our framework, two sub-models, Dynamic Demand Model (DDM) and Dynamic Node Model (DNM), are proposed to describe the dynamic characteristics of resource demand and the computing node number under cloud computing environment. Combining Fairness on Dominant Shares and the two sub-models above, we finally obtain DEFF. In our experiment, we adopt several typical resource allocation algorithms to prove the effectiveness on fairness evaluation by using the DEFF framework. IEEE 2015 TTA-JC- C1547 Aggressive Resource Provisioning for Ensuring QOS in virtualized Environment Elasticity has now become the elemental feature of cloud computing as it enables the ability to dynamically add or remove virtual machine instances when workload changes. However, effective virtualized resource management is still one of the most challenging tasks. When the workload of a service increases rapidly, existing approaches cannot respond to the growing performance requirement efficiently because of either inaccuracy of adaptation decisions or the slow process of adjustments, both of which may result in insufficient resource provisioning. As a consequence, the Quality of Service (QoS) of the hosted applications may degrade and the Service Level Objective (SLO) will be thus violated. In this paper, we introduce SPRNT, a novel resource management framework, to ensure high-level QoS in the cloud IEEE 2015
  • 134.
    computing system. SPRNTutilizes an aggressive resource provisioning strategy which encourages SPRNT to substantially increase the resource allocation in each adaptation cycle when workload increases. This strategy first provisions resources which are possibly more than actual demands, and then reduces the over-provisioned resources if needed. By applying the aggressive strategy, SPRNT can satisfy the increasing performance requirement in the first place so that the QoScan be kept at a high level. The experimental results show that SPRNT achieves up to 7.7× speedup in adaptation time, compared with existing efforts. By enabling quick adaptation, SPRNT limits the SLO violation rate up to 1.3 percent even when dealing with rapidly increasing workload. TTA-JC- C1548 An Intelligent Economic Approach for Dynamic Resource Allocation in Cloud Services With Inter-Cloud, distributed cloud and open cloud exchange (OCX) emerging, a comprehensive resource allocation approach is fundamental to highly competitive cloud market. Oriented to infrastructure as a service (IaaS), an intelligent economic approach for dynamic resource allocation(IEDA) is proposed with the improved combinatorial double auction protocol devised to enable various kinds of resources traded among multiple consumers and multiple providers at the same time enable task partitioning among multiple providers. To make bidding and asking reasonable in each round of the auction and determine eligible transaction relationship among providers and consumers, a price formation mechanism is proposed, which is consisted of a back propagation neural network (BPNN) based price prediction algorithm and a price matching algorithm. A reputation system is proposed and integrated to exclude dishonest participants from the cloud market. The winner determination problem (WDP) is solved by the improved paddy field algorithm (PFA). Simulation results have shown that IEDA can not only help maximize market surplus and IEEE 2015
  • 135.
    surplus strength butalso encourage participants to be honest. TTA-JC- C1549 ANGEL - Agent-Based Scheduling for Real- Time Tasks in Virtualized Clouds The success of cloud computing makes an increasing number of real-time applications such as signal processing and weather forecasting run in the cloud. Meanwhile, scheduling for real-time tasks is playing an essential role for a cloud provider to maintain its quality of service and enhance the system’s performance. In this paper, we devise a novel agent-based scheduling mechanism in cloud computing environment to allocate real-time tasks and dynamically provision resources. In contrast to traditional contract net protocols, we employ a bidirectional announcement-bidding mechanism and the collaborative process consists of three phases, i.e., basic matching phase, forward announcement-bidding phase and backward announcement-bidding phase. Moreover, the elasticity is sufficiently considered while scheduling by dynamically adding virtual machines to improve schedulability. Furthermore, we design calculation rules of the bidding values in both forward and backward announcement-bidding phases and two heuristics for selecting contractors. On the basis of the bidirectional announcement-bidding mechanism, we propose an agent-based dynamic scheduling algorithm named ANGEL for real-time, independent and a periodic tasks in clouds. Extensive experiments are conducted on CloudSim platform by injecting random synthetic workloads and the workloads from the last version of the Google cloud trace logs to evaluate the performance of our ANGEL. The experimental results indicate that ANGEL can efficiently solve the real- time task scheduling problem in virtualized clouds. IEEE 2015 TTA-JC- C1550 Attribute-based Access Control with Constant- size Ciphertext in Cloud Computing With the popularity of cloud computing, there have been increasing concerns about its security and privacy. Since IEEE 2015
  • 136.
    the cloud computingenvironment is distributed and untrusted, data owners have to encrypt outsourced data to enforce confidentiality. Therefore, how to achieve practicable access control of encrypted data in an untrusted environment is an urgent issue that needs to be solved. Attribute-Based Encryption (ABE) is a promising scheme suitable for access control in cloud storage systems. This paper proposes a hierarchical attribute- based access control scheme with constant- sizeciphertext. The scheme is efficient because the length of ciphertext and the number of bilinear pairing evaluations to a constant are fixed. Its computation cost in encryption and decryption algorithms is low. Moreover, the hierarchical authorization structure of our scheme reduces the burden and risk of a single authority scenario. We prove the scheme is of CCA2 security under the decisional q-Bilinear Diffie-Hellman Exponent assumption. In addition, we implement our scheme and analyse its performance. The analysis results show the proposed scheme is efficient, scalable, and fine-grained in dealing with access control for outsourced data in cloud computing. TTA-JC- C1551 Automatic Memory Control of Multiple Virtual Machines on a Consolidated Server Through virtualization, multiple virtual machines can coexist and operate on one physical machine. When virtual machines (VMs) compete for memory, the performances of applications deteriorate, especially those of memory- intensive applications. In this study, we aim to optimize memory control techniques using a balloon driver for server consolidation. Our contribution is three-fold: (1) We design and implement an automatic control system for memory based on a Xen balloon driver. To avoid interference with VM monitor operation, our system works in user mode; therefore, the system is easily applied in practice. (2) We design an adaptive global-scheduling algorithm to regulate memory. This algorithm is based on a dynamic baseline, which can IEEE 2015
  • 137.
    adjust memory allocationaccording to the memory used by the VMs. (3) We evaluate our optimized solution in a real environment with 10 VMs and well-known benchmarks (DaCapo and Phoronix Test Suites). Experiments confirm that our system can improve the performance of memory-intensive and disk- intensive applications by up to 500% and 300%, respectively. This toolkit has been released for free download as a GNU General Public License v3 software. TTA-JC- C1552 Circuit Ciphertext- policy Attribute-based Hybrid Encryption with Verifiable Delegation in cloud computing In the cloud, for achieving access control and keeping data confidential, the data owners could adopt attribute-based encryption to encrypt the stored data. Users with limited computing power are however more likely to delegate the mask of the decryption task to the cloud servers to reduce the computing cost. As a result, attribute- based encryption with delegation emerges. Still, there are caveats and questions remaining in the previous relevant works. For instance, during the delegation, the cloud servers could tamper or replace the delegated ciphertext and respond a forged computing result with malicious intent. They may also cheat the eligible users by responding them that they are ineligible for the purpose of cost saving. Furthermore, during the encryption, the access policies may not be flexible enough as well. Since policy for general circuits enables to achieve the strongest form of access control, a construction for realizing circuit ciphertext- policy attribute- based hybrid encryption withverifiable delegati on has been considered in our work. In such a system, combined with verifiable computation and encrypt-then-mac mechanism, the data confidentiality, the fine-grained access control and the correctness of the delegated computing results are well guaranteed at the same time. Besides, our scheme achieves security against chosen- plaintext attacks under the k-multilinear Decisional Diffie-Hellman assumption. IEEE 2015
  • 138.
    Moreover, an extensivesimulation campaign confirms the feasibility and efficiency of the proposed solution. TTA-JC- C1553 CloudArmor - Supporting Reputation- based Trust Management for Cloud Services Trust management is one of the most challenging issues for the adoption and growth of cloud computing. The highly dynamic, distributed, and non-transparent nature of cloud services introduces several challenging issues such as privacy, security, and availability. Preserving consumers’ privacy is not an easy task due to the sensitive information involved in the interactions between consumers and the trust management service. Protecting cloud services against their malicious users (e.g., such users might give misleading feedback to disadvantage a particular cloud service) is a difficult problem. Guaranteeing the availability of the trust management service is another significant challenge because of the dynamic nature of cloud environments. In this article, we describe the design and implementation of CloudArmor, a reputation- based trust management framework that provides a set of functionalities to deliver Trust as a Service (TaaS), which includes i) a novel protocol to prove the credibility of trust feedbacks and preserve users’ privacy, ii) an adaptive and robust credibility model for measuring the credibility of trust feedbacks to protect cloud services from malicious users and to compare the trustworthiness of cloud services, and iii) an availability model to manage the availability of the decentralized implementation of the trust management service. The feasibility and benefits of our approach have been validated by a prototype and experimental studies using a collection of real-world trust feedbacks on cloud services. IEEE 2015 TTA-JC- C1554 Cost-Effective Authentic and Anonymous Data Data sharing has never been easier with the advances of cloud computing, and an accurate IEEE 2015
  • 139.
    Sharing with Forward Security analysison the shared data provides an array of benefits to both the society and individuals. Data sharing with a large number of participants must take into account several issues, including efficiency, data integrity and privacy of data owner. Ring signature is a promising candidate to construct an anonymous and authentic data sharing system. It allows a data owner to anonymously authenticate his data which can be put into the cloud for storage or analysis purpose. Yet the costly certificate verification in the traditional public key infrastructure (PKI) setting becomes a bottleneck for this solution to be scalable. Identity-based (ID-based) ring signature, which eliminates the process of certificate verification, can be used instead. In this paper, we further enhance the security of ID-based ring signature by providing forward security: If a secret key of any user has been compromised, all previous generated signatures that include this user still remain valid. This property is especially important to any large scale data sharing system, as it is impossible to ask all data owners to reauthenticate their data even if a secret key of one single user has been compromised. We provide a concrete and efficient instantiation of our scheme, prove its security and provide an implementation to show its practicality. TTA-JC- C1555 DaSCE - Data Security for Cloud Environment with Semi-Trusted Third Party Off-site data storage is an application of cloud that relieves the customers from focusing on data storage system. However, outsourcing data to a third-party administrative control entails serious security concerns. Data leakage may occur due to attacks by other users and machines in the cloud. Wholesale of data by cloud service provider is yet another problem that is faced in the cloud environment. Consequently, high- level of security measures is required. In this paper, we propose DataSecurity for Cloud Environment with Semi-Trusted Third Party (DaSCE), IEEE 2015
  • 140.
    a data securitysystem that provides (a) key management (b) access control, and (c) file assured deletion. The DaSCE utilizes Shamir’s (k, n) threshold scheme to manage the keys, where k out of n shares are required to generate the key. We use multiple key managers, each hosting one share of key. Multiple key managers avoid single point of failure for the cryptographic keys. We (a) implement a working prototype of DaSCE and evaluate its performance based on the time consumed during various operations, (b) formally model and analyze the working of DaSCE using High Level Petri nets (HLPN), and (c) verify the working of DaSCE using Satisfiability Modulo Theories Library (SMT-Lib) and Z3 solver. The results reveal that DaSCE can be effectively used for security of outsourced data by employing key management, access control, and file assured deletion. TTA-JC- C1556 Discover the Expert - Context-Adaptive Expert Selection for Medical Diagnosis In this paper, we propose an expert selection system that learns online the best expert to assign to each patient depending on the context of the patient. In general, the context can include an enormous number and variety of information related to the patient's health condition, age, gender, previous drug doses, and so forth, but the most relevant information is embedded in only a few contexts. If these most relevant contexts were known in advance, learning would be relatively simple but they are not. Moreover, the relevant contexts may be different for different health conditions. To address these challenges, we develop a new class of algorithms aimed at discovering the most relevant contexts and the best clinic and expert to use to make a diagnosis given a patient's contexts. We prove that as the number of patients grows, the proposed context- adaptive algorithm will discover the optimal expert to select for patients with a specific context. Moreover, the algorithm also provides confidence bounds on the diagnostic IEEE 2015
  • 141.
    accuracy of theexpert it selects, which can be considered by the primary care physician before making the final decision. While our algorithm is general and can be applied in numerous medical scenarios, we illustrate its functionality and performance by applying it to a real-world breast cancer diagnosis data set. Finally, while the application we consider in this paper is medical diagnosis, our proposed algorithm can be applied in other environments where expertise needs to be discovered. TTA-JC- C1557 Distributed denial of service attacks in software-defined networking with cloud computing Although software-defined networking (SDN) brings numerous benefits by decoupling the control plane from the data plane, there is a contradictory relationship between SDN and distributed denial-of- service(DDoS) attacks. On one hand, the capabilities of SDN make it easy to detect and to react to DDoS attacks. On the other hand, the separation of the control plane from the data plane of SDN introduces new attacks. Consequently, SDN itself may be a target of DDoS attacks. In this paper, we first discuss the new trends and characteristics of DDoS attacks in cloud computing environment s. We show that SDN brings us a new chance to defeat DDoS attacks in cloud computing environment s, and we summarize good features of SDN in defeating DDoS attacks. Then we review the studies about launching DDoS attacks on SDN and the methods against DDoS attacks in SDN. In addition, we discuss a number of challenges that need to be addressed to mitigate DDoS attached in SDN with cloud computing. This work can help understand how to make full use of SDN's advantages to defeat DDoS attacks in cloud computing environment s and how to prevent SDN itself from becoming a victim of DDoS attacks. IEEE 2015 TTA-JC- C1558 Mathematical Programming Approach for Revenue Maximization in Cloud Federations This paper assesses the benefits of cloud federation for cloud providers. Outsourcing and in sourcing are explored as means to maximize the revenues of the IEEE 2015
  • 142.
    providers involved inthe federation. An exact method using a linear integer program is proposed to optimize the partitioning of the incoming workload across the federation members. A pricing model is suggested to enable providers to set their offers dynamically and achieve highest revenues. The conditions leading to highest gains are identified and the benefits of cloud federation are quantified. TTA-JC- C1559 My Privacy My Decision - Control of Photo Sharing on Online Social Networks Photo sharing is an attractive feature which popularizes Online Social Networks (OSNs). Unfortunately, it may leak users’ privacy if they are allowed to post, comment, and tag a photo freely. In this paper, we attempt to address this issue and study the scenario when a user shares a photo containing individuals other than himself/herself (termed co-photo for short). To prevent possible privacy leakage of a photo, we design a mechanism to enable each individual in a photo be aware of the posting activity and participate in the decision making on the photo posting. For this purpose, we need an efficient facial recognition (FR) system that can recognize everyone in the photo. However, more demanding privacy setting may limit the number of the photos publicly available to train the FR system. To deal with this dilemma, our mechanism attempts to utilize users’ private photos to design a personalized FR system specifically trained to differentiate possible photo co-owners without leaking their privacy. We also develop a distributed consensus based method to reduce the computational complexity and protect the private training set. We show that our system is superior to other possible approaches in terms of recognition ratio and efficiency. Our mechanism is implemented as a proof of concept Android application on Facebook’s platform. IEEE 2015 TTA-JC- C1560 OPoR - Enabling Proof of Retrievability in Cloud Computing with Cloud computing moves the application software and databases to the centralized large IEEE 2015
  • 143.
    Resource-Constrained Devices data centers, wherethe management of the data and services may not be fully trustworthy. In this work, we study the problem of ensuring the integrity of data storage in cloud computing. To reduce the computational cost at user side during the integrity verification of their data, the notion of public verifiability has been proposed. However, the challenge is that the computational burden is too huge for the users with resource- constrained devices to compute the public authentication tags of file blocks. To tackle the challenge, we propose OPoR, a new cloud storage scheme involving a cloud storage server and a cloud audit server, where the latter is assumed to be semi-honest. In particular, we consider the task of allowing the cloud audit server, on behalf of the cloud users, to pre-process the data before uploading to the cloud storage server and later verifying the data integrity. OPoR outsources and offloads the heavy computation of the tag generation to the cloud audit server and eliminates the involvement of user in the auditing and in the pre-processing phases. Furthermore, we strengthen the proof of retrievability (PoR) model to support dynamic data operations, as well as ensure security against reset attacks launched by the cloud storage server in the upload phase. TTA-JC- C1561 Performing Initiative Data Prefetching in Distributed File Systems for Cloud Computing This paper presents an initiative data prefetching scheme on the storage servers in distributed file systems for cloud computing. In this prefetching technique, the client machines are not substantially involved in the process of data prefetching, but the storage servers can directly prefetch the data after analyzing the history of disk I/O access events, and then send the prefetched data to the relevant client machines proactively. To put this technique to work, the information about client nodes is piggybacked onto the real client I/O requests, IEEE 2015
  • 144.
    and then forwardedto the relevant storage server. Next, two prediction algorithms have been proposed to forecast future block access operations for directing what data should be fetched on storage servers in advance. Finally, the prefetched data can be pushed to the relevant client machine from the storage server. Through a series of evaluation experiments with a collection of application benchmarks, we have demonstrated that our presented initiative prefetching technique can benefit distributed file systems for cloud envir onments to achieve better I/O performance. In particular, configuration-limited client machines in the cloud are not responsible for predicting I/O access operations, which can definitely contribute to preferable system performance on them. TTA-JC- C1562 Privacy-Preserving Multikeyword Similarity Search Over Outsourced Cloud Data The amount of data generated by individuals and enterprises is rapidly increasing. With the emerging cloud computing paradigm, the data and corresponding complex management tasks can be outsourced to the cloud for the management flexibility and cost savings. Unfortunately, as the data could be sensitive, the direct data outsourcing would have the problem of privacy leakage. The encryption can be used, before the data outsourcing, with the concern that the operations can still be accomplished by the cloud. We consider the multikey word similarity search over outsourced cloud d ata. In particular, with the consideration of the text data only, multiple keywords are specified by the user. The cloud returns the files containing more than a threshold number of input keywords or similar keywords, where the similarity here is defined according to the edit distance metric. We propose three solutions, where blind signature provides the user access privacy, and a novel use of Bloom filter's bit pattern provides the speedup of search task at the cloud side. Our final design to achieve the search is secure against insider threats and efficient in terms of IEEE 2015
  • 145.
    the search timeat the cloud side. Performance evaluation and analysis are used to demonstrate the practicality of our proposed solutions. TTA-JC- C1563 Provable Multicopy Dynamic Data Possession in Cloud Computing Systems Increasingly more and more organizations are opting for outsourcing data to remote cloud service providers (CSPs). Customers can rent the CSPs storage infrastructure to store and retrieve almost unlimited amount of data by paying fees metered in gigabyte/month. For an increased level of scalability, availability, and durability, some customers may want their data to be replicated on multiple servers across multiple data centers. The more copies the CSP is asked to store, the more fees the customers are charged. Therefore, customers need to have a strong guarantee that the CSP is storing all data copies that are agreed upon in the service contract, and all these copies are consistent with the most recent modifications issued by the customers. In this paper, we propose a map-based provable multicopy dynamic data possession (MB- PMDDP) scheme that has the following features: 1) it provides an evidence to the customers that the CSP is not cheating by storing fewer copies; 2) it supports outsourcing of dynamic data, i.e., it supports block-level operations, such as block modification, insertion, deletion, and append; and 3) it allows authorized users to seamlessly access the file copies stored by the CSP. We give a comparative analysis of the proposed MB- PMDDP scheme with a reference model obtained by extending existing provable possession of dynamic single -copy schemes. The theoretical analysis is validated through experimental results on a commercial cloud platform. In addition, we show the security against colluding servers, and discuss how to identify corrupted copies by slightly modifying the proposed scheme. IEEE 2015
  • 146.
    TTA-JC- C1564 SAE - TowardEfficient Cloud Data Analysis Service for Large-Scale Social Networks Social network analysis is used to extract features of human communities and proves to be very instrumental in a variety of scientific domains. The dataset of a social network is often so large that a cloud data analysis service, in which the computation is performed on a parallel platform in the could, becomes a good choice for researchers not experienced in parallel programming. In the cloud, a primary challenge to efficient data analysis is the computation and communication skew (i.e., load imbalance) among computers caused by humanity’s group behavior (e.g., bandwagon effect). Traditional load balancing techniques either require significant effort to re-balance loads on the nodes, or cannot well cope with stragglers. In this paper, we propose a general straggler-aware execution approach, SAE, to support the analysis service in the cloud. It offers a novel computational decomposition method that factors straggling feature extraction processes into more fine-grained sub-processes, which are then distributed over clusters of computers for parallel execution. Experimental results show that SAE can speed up the analysis by up to 1.77 times compared with state-of-the-art solutions. IEEE 2015 TTA-JC- C1565 Secure Cloud Storage Meets with Secure Network Coding This paper reveals an intrinsic relationship between secure cloud storage and secure netwo rk coding for the first time. Secure cloud storage was proposed only recently while secure network coding has been studied for more than ten years. Although the two areas are quite different in their nature and are studied independently, we show how to construct a secure cloud storage protocol given any secure network coding protocol. This gives rise to a systematic way to construct secure cloud storage protocols. Our construction is secure under a definition which captures the real world usage of the cloud storage. Furthermore, we propose two specific secure cloud storage protocols based on two recent secure network coding protocols. IEEE 2015
  • 147.
    In particular, weobtain the first publicly verifiable secure cloud storage protocol in the standard model. We also enhance the proposed generic construction to support user anonymity and third-party public auditing, which both have received considerable attention recently. Finally, we prototype the newly proposed protocol and evaluate its performance. Experimental results validate the effectiveness of the protocol. TTA-JC- C1566 SeDaSC - Secure Data Sharing in Clouds Cloud storage is an application of clouds that liberates organizations from establishing in- house data storage systems. However, cloud storage gives rise to security concerns. In case of group-shared data, the data face both cloud-specific and conventional insider threats. Secure data sharing among a group that counters insider threats of legitimate yet malicious users is an important research issue. In this paper, we propose the Secure Data Sharing in Clouds (SeDaSC) methodology that provides: 1)data confidentiality and integrity; 2) access control; 3) data sharing (forwarding) without using compute-intensive reencryption; 4) insider threat security; and 5) forward and backward access control. The SeDaSC methodology encrypts a file with a single encryption key. Two different key shares for each of the users are generated, with the user only getting one share. The possession of a single share of a key allows the SeDaSC methodology to counter the insider threats. The other key share is stored by a trusted third party, which is called the cryptographic server. The SeDaSC methodology is applicable to conventional and mobile cloud computing environments. We implement a working prototype of the SeDaSC methodology and evaluate its performance based on the time consumed during various operations. We formally verify the working of SeDaSC by using high-level Petri nets, the Satisfiability IEEE 2015
  • 148.
    Modulo Theories Library,and a Z3 solver. The results proved to be encouraging and show that SeDaSC has the potential to be effectively used for secure data sharing in the cloud. TTA-JC- C1567 Shared Authority Based Privacy-Preserving Authentication Protocol in Cloud Computing Cloud computing is an emerging data interactive paradigm to realize users' data remotely stored in an online cloud server. Cloud services provide great conveniences for the users to enjoy the on-demand cloud applications without considering the local infrastructure limitations. During the data accessing, different users may be in a collaborative relationship, and thus data sharing becomes significant to achieve productive benefits. The existing security solutions mainly focus on the authentication to realize that a user's privative data cannot be illegally accessed, but neglect a subtle privacy issue during a user challenging the cloud server to request other users for data sharing. The challenged access request itself may reveal the user's privacy no matter whether or not it can obtain the data access permissions. In this paper, we propose a shared authority based privacy- preserving authenticationprotocol (SAPA) to address above privacy issue for cloud storage. In the SAPA, 1) shared access authority is achieved by anonymous access request matching mechanism with security and privacy considerations (e.g., authentication, data anonymity, user privacy, and forward security); 2) attribute based access control is adopted to realize that the user can only access its own data fields; 3) proxy re-encryption is applied to provide data sharing among the multiple users. Meanwhile, universal composability (UC) model is established to prove that the SAPA theoretically has the design correctness. It indicates that the proposed protocol is attractive for multi-user collaborative cloud applications. IEEE 2015 TTA-JC- C1568 Social Recommendation with Cross-Domain Recommender systems can suffer from data sparsity and cold start issues. IEEE 2015
  • 149.
    Transferable Knowledge However, social networks,which enable users to build relationships and create different types of items, present an unprecedented opportunity to alleviate these issues. In this paper, we represent a social network as a star-structured hybrid graph centered on a social domain, which connects with other item domains. With this innovative representation, useful knowledge from an auxiliary domain can be transferred through the social domain to a target domain. Various factors of item transferability, including popularity and behavioral consistency, are determined. We propose a novel Hybrid Random Walk (HRW) method, which incorporates such factors, to select transferable items in auxiliary domains, bridge cross-domain knowledge with the social domain, and accurately predict user-item links in a target domain. Extensive experiments on a real social dataset demonstrate that HRW significantly outperforms existing approaches. TTA-JC- C1569 TMACS - A Robust and Verifiable Threshold Multi-Authority Access Control System in Public Cloud Storage Attribute-based Encryption (ABE) is regarded as a promising cryptographic conducting tool to guarantee data owners’ direct control over their data in public cloud storage. The earlier ABE schemes involve only one authority to maintain the whole attribute set, which can bring a single-point bottleneck on both security and performance. Subsequently, some multi- authority schemes are proposed, in which multiple authorities separately maintain disjoint attribute subsets. However, the single- point bottleneck problem remains unsolved. In this paper, from another perspective, we conduct a threshold multi-authority CP- ABE access control scheme for public cloud storage, named TMACS, in which multiple authorities jointly manage a uniform attribute set. In TMACS, taking advantage of (t; n) threshold secret sharing, the master key can be shared among multiple authorities, and a legal user can generate his/her secret key by interacting with any t IEEE 2015
  • 150.
    authorities. Security andperformance analysis results show that TMACS is not only verifiable secure when less than t authorities are compromised, but also robust when no less than t authorities are alive in the system. Furthermore, by efficiently combining the traditional multi- authority scheme with TMACS, we construct a hybrid one, which satisfies the scenario of attributes coming from different authorities as well as achieving security and system-level robustness. TTA-JC- C1570 Towards Privacy Preserving Publishing of set-valued Data on Hybrid Cloud Storage as a service has become an important paradigm in cloud computing for its great flexibility and economic savings. However, the development is hampered by data privacy concerns: data owners no longer physically possess the storage of their data. In this work, we study the issue of privacy-preserving set- valued data publishing. Existing data privacy- preserving techniques (such as encryption, suppression, generalization) are not applicable in many real scenes, since they would incur large overhead for data query or high information loss. Motivated by this observation, we present a suite of new techniques that make privacy-aware set- valued data publishing feasible on hybrid cloud. Ondata publishing phase, we propose a data partition technique, named extended quasi-identifier- partitioning (EQI-partitioning), which disassociates record terms that participate in identifying combinations. This way the cloud server cannot associate with high probability a record with rare term combinations. We prove the privacy guarantee of our mechanism. On data querying phase, we adopt interactive differential privacy strategy to resist privacy breaches from statistical IEEE 2015
  • 151.
    queries. We finallyevaluate its performance using real-life data sets on our cloud test-bed. Our extensive experiments demonstrate the validity and practicality of the proposed scheme. TTA-JC- C1571 Towards Privacy- Preserving Storage and Retrieval in Multiple Clouds Cloud computing is growing exponentially, whereby there are now hundreds of cloud service providers (CSPs) of various sizes. While the cloud consumers may enjoy cheaper data storage and computation offered in this multi-cloud environment, they are also in face of more complicated reliability issues and privacy preservation problems of their outsourced data. Though searchable encryption allows users to encrypt their stored data while preserving some search capabilities, few efforts have sought to consider the reliability of the searchable encrypted data outsourced to the clouds. In this paper, we propose a privacy- preserving Storage and Retrieval (STRE) mechanism that not only ensures security and privacy but also provides reliability guarantees for the outsourced searchable encrypted data. The STRE mechanism enables the cloud users to distribute and search their encrypted data across multiple independent clouds managed by different CSPs, and is robust even when a certain number of CSPs crash. Besides the reliability, STRE also offers the benefit of partially hidden search pattern. We evaluate the STRE mechanism on Amazon EC2 using a real world dataset and the results demonstrate both effectiveness and efficiency of our approach. IEEE 2015 TTA-JC- C1572 Trust Enhanced Cryptographic Role- based Access Control for Secure Cloud Data Storage Cloud data storage has provided significant benefits by allowing users to store massive amount of data on demand in a cost-effective manner. To protect the privacy of data stored in the cloud, cryptographic role- based access control (RBAC) schemes have been developed to ensure that the data can only be accessed by those who are allowed by access policies. However, these cryptographic approaches do not address IEEE 2015
  • 152.
    the issues oftrust. In this paper, we propose trust models to reason about and to improve the security for stored data in cloud storage systems that use cryptographic RBAC schemes. The trust models provide an approach for the owners and roles to determine the trustworthiness of individual roles and users, respectively, in the RBAC system. The proposed trust models consider role inheritance and hierarchy in the evaluation of trustworthiness of roles. We present a design of a trust- based cloud storage system, which shows how the trust models can be integrated into a system that uses cryptographic RBAC schemes. We have also considered practical application scenarios and illustrated how the trust evaluations can be used to reduce the risks and to enhance the quality of decision making by data owners and roles of cloud storage service. TTA-JC- C1573 Using ant colony system to consolidate VMS for green cloud computing High energy consumption of cloud data centers is a matter of great concern. Dynamic consolidation of Virtual Machines (VMs) presents a significant opportunity to save energy in data centers. A VM consolidation approach uses live migration of VMs so that some of the under-loaded Physical Machines (PMs) can be switched-off or put into a low- power mode. On the other hand, achieving the desired level of Quality of Service (QoS) between cloud providers and their users is critical. Therefore, the main challenge is to reduce energy consumption of data centers while satisfying QoS requirements. In this paper, we present a distributed system architecture to perform dynamic VM consolidation to reduce energy consumption of cloud data centers while maintaining the desired QoS. Since the VM consolidation problem is strictly NP-hard, we use an online optimization metaheuristic algorithm called Ant Colony System (ACS). The proposed ACS-based VM Consolidation (ACS-VMC) approach finds a near-optimal IEEE 2015
  • 153.
    solution based ona specified objective function. Experimental results on real workload traces show that ACS-VMC reduces energy consumption while maintaining the required performance levels in a cloud data center. It outperforms existing VM consolidation approaches in terms of energy consumption, number of VM migrations, and QoS requirements concerning performance. TTA-JC- C1574 Using Virtual Machine Allocation Policies to Defend against Co- resident Attacks in Cloud Computing Cloud computing enables users to consume various IT resources in an on-demand manner, and with low management overhead. However, customers can face new security risks when they use cloud computing platforms. In this paper, we focus on one such threat − the co- resident attack, where malicious users build side channels and extract private information from virtual machines co-located on the same server. Previous works mainly attempt to address the problem by eliminating side channels. However, most of these methods are not suitable for immediate deployment due to the required modifications to current cloud platforms. We choose to solve the problem from a different perspective, by studying how to improve the virtual machine allocation policy, so that it is difficult for attackers to co-locate with their targets. Specifically, we (1) define security metrics for assessing the attack; (2) model these metrics, and compare the difficulty of achieving co-residence under three commonly used policies; (3) design a new policy that not only mitigates the threat of attack, but also satisfies the requirements for workload balance and low power consumption; and (4) implement, test, and prove the effectiveness of the policy on the popular open-source platform OpenStack IEEE 2015 DOMAIN : BIG DATA TTA-JB- C1501 FastRAQ A Fast Approach to Range- Aggregate Queries in Big Data Environments Range-aggregate queries are to apply a certain aggregate function on all tuples within given query ranges. IEEE 2015
  • 154.
    Existing approaches torange- aggregate queries are insufficient to quickly provide accurate results in big data environments. In this paper, we propose FastRAQ-a fast approach to range- aggregate queries in big data environments. Fa stRAQ first divides big data into independent partitions with a balanced partitioning algorithm, and then generates a local estimation sketch for each partition. When a range-aggregate query request arrives, Fast RAQ obtains the result directly by summarizing local estimates from all partitions. Fast RAQ has O(1) time complexity for data updates and O(N/P×B) time complexity for range-aggregate queries, where N is the number of distinct tuples for all dimensions, P is the partition number, and B is the bucket number in the histogram. We implement the Fast RAQ approach on the Linux platform, and evaluate its performance with about 10 billion data records. Experimental results demonstrate that Fast RAQ provides range-aggregate query results within a time period two orders of magnitude lower than that of Hive, while the relative error is less than 3 percent within the given confidence interval. TTA-JB- C1502 Collaboration- and Fairness-Aware Big Data Management in Distributed Clouds With the advancement of information and communication technology, data are being generated at an exponential rate via various instruments and collected at an unprecedented scale. Such large volume of data generated is referred to as big data, which now are revolutionizing all aspects of our life ranging from enterprises to individuals, from science communities to governments, as they exhibit great potentials to improve efficiency of enterprises and the quality of life. To obtain nontrivial patterns and derive valuable information from big data, a fundamental problem is how to properly place the collected data by different users to distributed clouds and to efficiently analyze the collected data to save user costs IEEE 2015
  • 155.
    in data storageand processing, particularly the cost savings of users who share data. By doing so, it needs the close collaborations among the users, by sharing and utilizing the big data in distributed clouds due to the complexity and volume of big data. Since computing, storage and bandwidth resources in a distributed cloud usually are limited, and such resource provisioning typically is expensive, the collaborative users require to make use of the resources fairly. In this paper, we study a novel collaboration- and fairness- aware big data management problem in distributed cloud environments that aims to maximize the system throughout, while minimizing the operational cost of service providers to achieve the system throughput, subject to resource capacity and users fairness constraints. We first propose a novel optimization framework for the problem. We then devise a fast yet scalable approximation algorithm based on the built optimization framework. We also analyze the time complexity and approximation ratio of the proposed algorithm. We finally conduct experiments by simulations to evaluate the performance of the proposed algorithm. Experimental results demonstrate that the proposed algorithm is promising, and outperforms other heuristics. TTA-JB- C1503 On Traffic-Aware Partition and Aggregation in MapReduce for Big Data Applications The MapReduce programming model simplifies large-scale data processing on commodity cluster by exploiting parallel map tasks and reduce tasks. Although many efforts have been made to improve the performance of MapReduce jobs, they ignore the network traffic generated in the shuffle phase, which plays a critical role in performance enhancement. Traditionally, a hash function is used to partition intermediate data among reduce tasks, which, however, is not traffic- efficient because network topology and data size associated with each key are not taken into consideration. In this paper, we study to reduce network traffic cost for IEEE 2015
  • 156.
    a MapReduce jobby designing a novel intermediate data partition scheme. Furthermore, we jointly consider the aggregator placement problem, where each aggregator can reduce merged traffic from multiple map tasks. A decomposition-based distributed algorithm is proposed to deal with the large-scale optimization problem for big data application and an online algorithm is also designed to adjust data partition and aggregation in a dynamic manner. Finally, extensive simulation results demonstrate that our proposals can significantly reduce network traffic cost under both offline and online cases. TTA-JB- C1504 Privacy-Preserving Ciphertext Multi- Sharing Control for Big Data Storage The need of secure big data storage service is more desirable than ever to date. The basic requirement of the service is to guarantee the confidentiality of the data. However, the anonymity of the service clients, one of the most essential aspects of privacy, should be considered simultaneously. Moreover, the service also should provide practical and fine- grained encrypted data sharing such that a data owner is allowed to share a ciphertext of data among others under some specified conditions. This paper, for the first time, proposes a privacy- preserving ciphertext multi-sharing mechanism to achieve the above properties. It combines the merits of proxy re-encryption with anonymous technique in which a ciphertext can be securely and conditionally shared multiple times without leaking both the knowledge of underlying message and the identity information of ciphertext senders/recipients. Furthermore, this paper shows that the new primitive is secure against chosen-ciphertext attacks in the standard model. IEEE 2015 TTA-JB- C1505 Self-Adjusting Slot Configurations for Homogeneous and Heterogeneous Hadoop The MapReduce framework and its open source implementation Hadoop have become the defacto platform for scalable analysis on large data sets in recent years. One of the IEEE 2015
  • 157.
    primary concerns inHadoop is how to minimize the completion length (i.e., makespan) of a set of MapReduce jobs. The current Hadoop only allows static slot configuration, i.e., fixed numbers of map slots and reduce slots throughout the lifetime of a cluster. However, we found that such a static configuration may lead to low system resource utilizations as well as long completion length. Motivated by this, we propose simple yet effective schemes which use slot ratio between map and reduce tasks as a tunable knob for reducing the makespan of a given set. By leveraging the workload information of recently completed jobs, our schemes dynamically allocates resources (or slots) to map and reduce tasks. We implemented the presented schemes in Hadoop V0.20.2 and evaluated them with representative MapReduce benchmarks at Amazon EC2. The experimental results demonstrate the effectiveness and robustness of our schemes under both simple workloads and more complex mixed workloads. TTA-JB- C1506 A General Communication Cost Optimization Framework for Big Data Stream Processing in Geo- distributed Data Centers With the explosion of big data, processing large numbers of continuous data streams, i.e., big data stream processing (BDSP), has become a crucial requirement for many scientific and industrial applications in recent years. By offering a pool of computation, communication and storage resources, public clouds, like Amazon’s EC2, are undoubtedly the most efficient platforms to meet the ever-growing needs of BDSP. Public cloud service providers usually operate a number of geo-distributed datacenters across the globe. Different datacenter pairs are with different inter-datacenter network costs charged by Internet Service Providers (ISPs). While, inter-datacenter traffic in BDSP constitutes a large portion of a cloud provider’s traffic demand over the Internet and incurs substantial communication cost, which may even become the dominant operational IEEE 2015
  • 158.
    expenditure factor. Asthe datacenter resources are provided in a virtualized way, the virtual machines (VMs) for stream processing tasks can be freely deployed onto any datacenters, provided that the Service Level Agreement (SLA, e.g., quality-of-information) is obeyed. This raises the opportunity, but also a challenge, to explore the inter-datacenter network cost diversities to optimize both VM placement and load balancing towards network cost minimization with guaranteed SLA. In this paper, we first propose a general modeling framework that describes all representative intertask relationship semantics in BDSP. Based on our novel framework, we then formulate the communication cost minimization problem for BDSP into a mixed-integer linear programming (MILP) problem and prove it to be NP-hard. We then propose a computation- efficient solution based on MILP. The high efficiency of our proposal is validated by extensive simulation based studies. TTA-JB- C1507 Data Transfer Scheduling for Maximizing Throughput of Big-Data Computing in Cloud Systems Many big-data computing applications have been deployed in cloud platforms. These applications normally demand concurrent data transfers among computing no des for parallel processing. It is important to find the best transfer scheduling leading to the least data retrieval time – the maximum throughput in other words. However, the existing methods cannot achieve this, because they ignore link bandwidths and the diversity of data replicas and paths. In this paper, we aim to develop a max- throughput data transfer scheduling to minimize the data retrieval time of applications. Specifically, the problem is formulated into mixed integer programming, and an approximation algorithm is proposed, with its approximation ratio analyzed. The extensive simulations demonstrate that our algorithm can obtain near optimal solutions. IEEE 2015
  • 159.
    TTA-JB- C1508 Accelerated PSO Swarm SearchFeature Selection for Data Stream Mining Big Data Big Data though it is a hype up-springing many technical challenges that confront both academic research communities and commercial IT deployment, the root sources of Big Data are founded on data streams and the curse of dimensionality. It is generally known that data which are sourced from data streams accumulate continuously making traditional batch-based model induction algorithms infeasible for real- time data mining. Feature selection has been popularly used to lighten the processing load in inducing a data mining model. However, when it comes to mining over high dimensional data the search space from which an optimal feature subset is derived grows exponentially in size, leading to an intractable demand in computation. In order to tackle this problem which is mainly based on the high- dimensionality and streaming format of data feeds in Big Data, a novel lightweight feature selection is proposed. The feature selection is designed particularly for mining streaming data on the fly, by using accelerated particle swarm optimization (APSO) type of swarm search that achieves enhanced analytical accuracy within reasonable processing time. In this paper, a collection of Big Data with exceptionally large degree of dimensionality are put under test of our new feature selection algorithm for performance evaluation. IEEE 2015 TTA-JB- C1509 An Efficient Privacy- Preserving Ranked Keyword Search Method Cloud data owners prefer to outsource documents in an encrypted form for the purpose of privacy preserving. Therefore it is essential to develop efficient and reliable ciphertext search techniques. One challenge is that the relationship between documents will be normally concealed in the process of encryption, which will lead to significant search accuracy performance degradation. Also the volume of data in data centers has experienced a dramatic growth. This will make it even more challenging to design ciphertext search schemes that can IEEE 2015
  • 160.
    provide efficient andreliable online information retrieval on large volume of encrypted data. In this paper, a hierarchical clustering method is proposed to support more search semantics and also to meet the demand for fast ciphertext search within a big data environment. The proposed hierarchical approach clusters the documents based on the minimum relevance threshold, and then partitions the resulting clusters into sub- clusters until the constraint on the maximum size of cluster is reached. In the search phase, this approach can reach a linear computational complexity against an exponential size increase of document collection. In order to verify the authenticity of search results, a structure called minimum hash sub-tree is designed in this paper. Experiments have been conducted using the collection set built from the IEEE Xplore. The results show that with a sharp increase of documents in the dataset the search time of the proposed method increases linearly whereas the search time of the traditional method increases exponentially. Furthermore, the proposed method has an advantage over the traditional method in the rank privacy and relevance of retrieved documents. TTA-JB- C1510 Splitting Large Medical Data Sets based on Normal Distribution in Cloud Environment The surge of medical and e-commerce applications has generated tremendous amount of data, which brings people to a so-called “Big Data” era. Different from traditional large data sets, the term “Big Data” not only means the large size of data volume but also indicates the high velocity of data generation. However, current data mining and analytical techniques are facing the challenge of dealing with large volume data in a short period of time. This paper explores the efficiency of utilizing the Normal Distribution (ND) method for splitting and processing large volume medical data in cloud environment, which can provide representative information in the split data sets. The ND- IEEE 2015
  • 161.
    based new modelconsists of two stages. The first stage adopts the ND method for large data sets splitting and processing, which can reduce the volume of data sets. The second stage implements the ND-based model in a cloud computing infrastructure for allocating the split data sets. The experimental results show substantial efficiency gains of the proposed method over the conventional methods without splitting data into small partitions. The ND-based method can generate representative data sets, which can offer efficient solution for large data processing. The split data sets can be processed in parallel in Cloud computing environment. DOMAIN : ANDROID TTA-AA- C1501 MARS Mobile Application Relaunching Speed-up through Flash-Aware Page Swapping The approach for fast application relaunching on the current Android system is to cache background applications in memory. This mechanism is limited by the available memory size. In addition, the application state may not be easily recovered. We propose a prototype system, MARS, to enable page swapping and cache more applications. MARS can speed up the ap plication relaunching and restore the application state. As a new page swapping design for optimizing application relaunching, MARS isolates Android runtime Garbage Collection (GC) from page swapping for compatibility and employs several flash-aware techniques for swap-in speedup. Two main components of MARS are page slot allocation and read/write control. Page slot allocation reorganizes page slots in swap area to produce sequential reads and improve the performance of swap-in. Read/Write control addresses the read/write interference issue by reducing concurrent and extra internal writes. Compared to the conventional Linux page swapping, these two components can scale up the read bandwidth up to about 3.8 times. IEEE 2015
  • 162.
    Application tests ona Google Nexus 4 phone show that MARS reduces the launching time of applications by 50% 80%. The modified page swapping mechanism can outperform the conventional Linux page swapping up to 4 times. TTA-AA- C1503 EGC MONITORING SYSTEM USING ANDROID This paperwork describes the development and test of circuitry and software to enable the use of Android mobile phones equipped with Bluetooth to receive the incoming electrocardiogram (ECG) signal from a user and show it in real-time on the cell phone screen. The system comprises three distinct subsystems. The first one is dedicated to condition the analog ECG signal, preparing it for conversion to the digital world. The second one consists of a microcontroller and a Bluetooth module. This unit samples the ECG, serializes the samples and transmits them via the Bluetooth module to the Android cell phone. The third subsystem is the cell phone itself. An application program written to the cell phone receives the ECG samples and suitably charts the ECG signal on the screen for analysis. The good quality of the ECG signal allows for identification of arrhythmias. IEEE 2015 TTA-AA- C1504 Auto emergency alert using android In this paper, we describe the Well Phone, a smart phone with additional software, that is used as a personal health monitoring device. The Well Phone interfaces various health monitoring devices to the smart phone, and collects physiological data from those devices. It employs novel algorithms that perform statistical analyses, relate sequences of disparate measurements from different devices, and correlate physical activity with physiological measurements. The Well Phone provides feedback to the user by means of visualization and speech interaction, and alerts a caregiver, medical professional, or emergency responder, as needed. IEEE 2015 TTA-AA- C1505 Disaster Alert system using android Robot can do a work with ease which seems to be impossible for a man and it becomes more IEEE 2015
  • 163.
    helpful if onecan able to control it wirelessly. Now a day robot is becoming a versatile and has a lot of features like one can control it by Smartphone, can avoid obstacles automatically, sense the environment and can send alert and now even it can diffuse the bomb and can perform almost all the critical task. The feature which is discussed in this paper is to use it in rescue and search mission. The robot can be controlled wirelessly using RF technology, has ultrasonic sensor for obstacle detection and it is also equipped with the smart phone camera to provide a Omni directional view and can send the video stream wirelessly to remote device which makes it easier to controlled the bot. The robot can explore those places where human cannot reach easily like the places suffered from natural disaster like earthquake, tsunami and hurricane. TTA-AA- C1506 Farm corps management system using android This study aimed to investigate an establishment using an Intelligent System which employed an Embedded System and Smart Phone for chicken farming management and problem solving using Raspberry Pi and Arduino Uno. An experiment and comparative analysis of the intelligent system was applied in a sample chicken farm in this study. The findings of this study found that the system could monitor surrounding weather conditions including humidity, temperature, climate quality, and also the filter fan switch control in the chicken farm. The system was found to be comfortable for farmers to use as they could effectively control the farm anywhere at anytime, resulting in cost reduction, asset saving, and productive management in chicken farming. IEEE 2015 TTA-AA- C1507 ACCIDENT TRACKING APP FOR ANDROID MOBILE The usage of mobile devices has increased dramatically in recent years. These devices serve us in many practical ways and provide us with many services -- many of them in real- time. The delivery of streaming audio, IEEE 2015
  • 164.
    streaming video andinternet content to these devices has become common place. One emerging application in recent years is the use of mobile devices for tracking local traffic incidents and there are several such providers of this content on the Internet as Google Maps, here.com, Twitter, various Department of Transportation's web sites, various radio stations websites, and many others. Some sites as Twitter only provide text information but are updated often with recent data. Map enhanced websites provide visual information but are often not updated as often. The goal of this project is to integrate all the sources of traffic information together in one place and filter intelligently all the recent incident data so the results are as accurate and up to date as possible thus minimizing the number of false reports and incidents. This process, implemented for iOS 7 using XCode and Objective-C, allows the user to view traffic reports for 15 large US cities with the capabilities for the addition of many more locations. Results for the app are compared with the major individual sources and the percentage of additional incidents detected and false incidents incorrectly identified for several large cities are provided. TTA-AA- C1508 Friend book A Semantic-based Friend Recommendation System for Social Networks Existing social networking services recommend friends to users based on their social graphs, which may not be the most appropriate to reflect a user's preferences on friend selection in real life. In this paper, we present Friend book, a novel semantic- based friend recommendation system for social networks, which recommends friends to users based on their life styles instead of social graphs. By taking advantage of sensor-rich smart phones, Friend book discovers life styles of users from user-centric sensor data, measures the similarity of life styles between users, and recommends friends to users if their life styles have high similarity. Inspired by text mining, we model a user's daily life as life documents, IEEE 2015
  • 165.
    from which his/herlife styles are extracted by using the Latent Dirichlet Allocation algorithm. We further propose a similarity metric to measure the similarity of life styles between users, and calculate users' impact in terms of life styles with a friend-matching graph. Upon receiving a request, Friend book returns a list of people with highest recommendation scores to the query user. Finally, Friend book integrates a feedback mechanism to further improve the recommendation accuracy. We have implemented Friend book on the Android- based smart phones, and evaluated its performance on both small-scale experiments and large-scale simulations. The results show that there commendations accurately reflect the preferences of users in choosing friends. TTA-AA- C1509 Blood Banking System Using Android Automated Blood Bank is an associate work that brings voluntary blood donors and those in need of blood on to a common platform. The mission is to fulfill every blood request in the country with a promising android application and motivated individuals who are willing to donate blood. The proposed work aims to overcome this communication barrier by providing a direct link between the donor and the recipient by using low cost and low power Raspberry Pi B+ kit. It requires Micro USB of 5V and 2A power supply only. Entire communication takes place via SMS (Short Messaging Service) which is compatible among all mobile types. "Automated Blood Bank" is an project that brings voluntary blood donors and those in need of blood on to a common platform. This project aims at servicing the persons who seek donors who are willing to donate blood and also provide it in the time frame required. Automated Blood Bank tries to assist victims/patients/those in want of blood. It is an endeavor to achieve dead set these people in want of blood and connect them to those willing to donate. The proposed work explores to find blood donors by using GSM based IEEE 2015
  • 166.
    Smart Card CPU- Raspberry Pi B+ Kit. The vision is to be “The hope of every Indian in search of a voluntary blood donor”. TTA-AM- C1501 Timer-based Bloom Filter Aggregation for Reducing Signaling Overhead in Distributed Mobility Management Distributed mobility management (DMM) is a promising technology to address the mobile data traffic explosion problem. Since the location information of mobile nodes (MNs) are distributed in several mobility agents (MAs), DMM requires an additional mechanism to share the location information of MNs between MAs. In the literature, multicast or distributed hash table (DHT)-based sharing methods have been suggested; however they incur significant signaling overhead owing to unnecessary location information updates under frequent handovers. To reduce the signaling overhead, we propose a timer- based Bloom filter aggregation (TBFA) scheme for distributing the location information. In the TBFA scheme, the location information of MNs is maintained by Bloom filters at each MA. Also, since the propagation of the whole Bloom filter for every MN movement leads to high signaling overhead, each MA only propagates changed indexes in the Bloom filter when a pre- defined timer expires. To verify the performance of the TBFA scheme, we develop analytical models on the signaling overhead and the latency and devise an algorithm to select an appropriate timer value. Extensive simulation results are given to show the accuracy of analytical models and effectiveness of the TBFA scheme over the existing DMM scheme. IEEE 2015 DOMAIN : IMAGE PROCESSING TTA-AI- C1501 Smartphone-Based Wound Assessment System for Patients With Diabetes Diabetic foot ulcers represent a significant health issue. Currently, clinicians and nurses mainly base their wound assessment on visual examination of wound size and healing status, while the patients themselves seldom have an IEEE 2015
  • 167.
    opportunity to playan active role. Hence, a more quantitative and cost-effective examination method that enables the patients and their caregivers to take a more active role in daily wound care potentially can accelerate wound healing, save travel cost and reduce healthcare expenses. Considering the prevalence of smart phones with a high- resolution digital camera, assessing wounds by analyzing images of chronic foot ulcers is an attractive option. In this paper, we propose a novel wound image analysis system implemented solely on the Android smart phone. The wound image is captured by the camera on the smart phone with the assistance of an image capture box. After that, the smart phone performs wound segmentation by applying the accelerated mean-shift algorithm. Specifically, the outline of the foot is determined based on skin color, and the wound boundary is found using a simple connected region detection method. Within the wound boundary, the healing status is next assessed based on red-yellow-black color evaluation model. Moreover, the healing status is quantitatively assessed, based on trend analysis of time records for a given patient. Experimental results on wound images collected in UMASS-Memorial Health Center Wound Clinic (Worcester, MA) following an Institutional Review Board approved protocol show that our system can be efficiently used to analyze the wound healing status with promising accuracy. TTA-AI- C1502 Hand Gesture Recognition Using Kinect Sensor Hand gesture is becoming one of the most common ways that people use in information technology products needing interaction between people and computer, which brings to user an interesting experience. 3D camera are developed recently, e.g. Kinect, not only provide color image, but also depth map. It opens a new opportunity in development of human computer interaction (HCI) application. This paper shows a IEEE 2015
  • 168.
    novel hand gesturerecognition method based on depth image obtained from the Kinectsensor. Firstly, the hand region extraction is done by putting thresholds on hand point detected by using NITE 2 library provided by Prime Sense. Secondly, we extract the feature vector including the number of open fingers, the angles between the fingertips and horizontal of the hand, the angles between two consecutive fingers, and the difference between the distance from the hand center to the fingertips and the radius of the biggest inscribed circle. Finally, a support vector machine (SVM) is applied to identify different gestures. The experimental result shows that the proposed method performs hand gesture recognition at accuracy of 95% in real-time. DOMAIN : MOBILE COMPUTING TTA-AM- C1501 Timer-based Bloom Filter Aggregation for Reducing Signaling Overhead in Distributed Mobility Management Distributed mobility management (DMM) is a promising technology to address the mobile data traffic explosion problem. Since the location information of mobile nodes (MNs) are distributed in several mobility agents (MAs), DMM requires an additional mechanism to share the location information of MNs between MAs. In the literature, multicast or distributed hash table (DHT)-based sharing methods have been suggested; however they incur significant signaling overhead owing to unnecessary location information updates under frequent handovers. To reduce the signaling overhead, we propose a timer- based Bloom filter aggregation (TBFA) scheme for distributing the location information. In the TBFA scheme, the location information of MNs is maintained by Bloom filters at each MA. Also, since the propagation of the whole Bloom filter for every MN movement leads to high signaling overhead, each MA only propagates changed indexes in the Bloom filter when a pre- IEEE 2015
  • 169.
    defined timer expires.To verify the performance of the TBFA scheme, we develop analytical models on the signaling overhead and the latency and devise an algorithm to select an appropriate timer value. Extensive simulation results are given to show the accuracy of analytical models and effectiveness of the TBFA scheme over the existing DMM scheme.