Report_Summer

Summer Research Internship 2014
Privacy Preserving and Integrity Protecting Data Aggregation
in Wireless Sensor Networks
Rutvij Shah (ID-201101042)
Guide: Prof. Manik Lal Das
Abstract:
Wireless sensor networks (WSNs) are typically comprised of a large number of
sensors being randomly deployed for detecting and monitoring tasks. The data
aggregation is the most important task in wireless sensor Networks (WSNs), by
avoiding the redundant data transmitting to base station. The deployment of
wireless communicating sensor nodes in the hostile or unattended environment
causes attack more easily and the resource limited characteristics make the
conventional security algorithms infeasible. So these WSN’s always deal with
the problem of privacy preservation and integrity protection in Data
Aggregation.
Privacy preservation of data is that the node should not reveal the raw data to
any other node , the aggregator in the network or the nodes outside the network.
Maintaining the privacy of data makes sure that data is accessed by its intended
users, thus ensuring the privacy and protection of personal data.
Integrity protection of data guarantees that while aggregation of data in any
Sensor Network the data is not being tempered that means the data being
transmitted and received is similar. Data might be changed by a hacker or it
may be corrupted accidentally. Importance of data is only significant when data
is reliable or else it is of no use. So, Validation and verification are used to
ensure the integrity of data.
Study: We studied and improved the following algorithm in aspects of its
integrity protection of data.
Consider a Wireless Sensor Network in which there are number of sources that
collect and produce data. In these sources there are two or more parties which
contains private data and that should be aggregated to third party without
revealing the content of data. The data collected by these sources are private
and they do not want to reveal this data therefore the data is to be aggregated by
an aggregator, (which may be the third party). This data should be hide from
aggregator because the sources do not trust these aggregators and the data

needs to be secured and privacy protected.
In this scheme, privacy is preserved through randomization process. The
security part is being done with random key pre-distribution method.
So, The proposed scheme has two parts:

1. Secure key management.
2. Privacy preservation.
Consider the following network scenario :
Data Data Data
Source 1 Source 2 Source 3
When the service provider or the server sends the query to these sources the
sources collects the data and send back this data to the server.These sources
never wants their data to be revealed so they never sends the data in raw form
but to send this data they use some perturbation technique on the raw data. So
that the server can’t identify the data and just aggregate the data received from
these n sources.
Secure Key Distribution and Key establishment Phase :
K number of keys stored in every source node.
K-k keys ->shared with the server/aggregator for source to aggregator secure
communication
k number of keys-> source to source communication.
The secure key distribution method has two parts :
1. Aggregator to source key exchange
2. Source to source key exchange
Aggregator
Source 1
1
Source 2 Source 3

Aggregator to source key establishment:
Each source node has K-k number of keys shared with the server. But there is a
problem because all the source nodes possess the same keys , it is totally
unsecure when a source node communicates with the server node with the
shared key. Any malicious source node can know the communication between
server and source nodes and can launch attack very easily.
Solution:
To avoid this, in the pre-distribution phase, the permutation of source-
aggregator key bank is done and these keys are reordered for each source-
aggregator pair. This ordering for each node is stored in the server. Now, the
source node communicates with the server through one of its shared keys. For
communication and data aggregation the source node first generates a random
number between 1 and K-k (Rn) and send this random number to the server in
plain text form. The server understands that the source node will encrypt the
next message by the Rnth number key of the key bank. Every time the source
node likes to communicate with the server and send the data it does the same
steps.
How to communicate this random number to the server?
This random number (Rn) is sent to the server in plain text form. And if any
malicious node send the different random number this will not harm because
of the randomization of the key bank order. The Rnth number key is different
in different source nodes. The mapping is stored in the server offline in pre-
distribution phase. This is the key establishment phase.
Source to Source key establishment :

Assumption is that the source to aggregator/server key is securely established.
So, source to source key establishment is done through server only not directly
between the sources.
Another Problem :
In this scenario k keys are same for every node therefore it is easy for any
source node to get the data of other source nodes , i.e. source 3 can decrypt
what source node 1 and source node 2 are communicating.
The solution to avoid this situation is also same as the solution that we
discussed in source to aggregator issue. Source node 1 and source node 2
separately permute the key bank order of the k number of keys dedicated for
source-source communication. After that, they pass the permute function to
each node through the server using their pair-wise key with the server. After
successful delivery of permute functions, one of the source nodes (source node
1, for example) sends another random number between 1 and k to the other
source node (source node 2), which indicates the particular key of the
permuted key bank. This pair-wise key between source nodes will be used for
the subsequent communication until the data aggregation is complete. For next
round of data aggregation process, same key establishment procedure will be
followed.
Privacy Preservation: In this Wireless Sensor Network, there are n numbers
of source nodes. Each source i owns a value xi which is we can say a raw data ,
which is not being shared to other parties or nodes. Suppose that the sum is in
the range [0, K]. Our aim is to find out the sum X which is the sum of the raw
data xi, where i=1,2, … , n and the individual data and sum should be private
and secured between the nodes as well as to the server.
The server will start the process. The server randomly chooses one of the
source nodes and signals it to initiate the process. The source node first chosen
by the server is denoted by s1. This node possesses its private data x1 and it
generates one random number r1 between the range [0, K], which is denoted as
r1. It then computes R1.
R1 = (r1 + x1)modK
After computing R1, the source node s1 performs neighbourhood discovery to
find out the other source nodes it is connected to. s1 passes this information to
the server.

Server keeps the knowledge of the nodes already participated. If the source
nodes connected to s1 is not already participated, the server randomly chooses
one of those non-participated source nodes and sends that message to s1. Let
this next source node be s2. Now, accordingly s1 passes R1 to s2.
Now s2 computes the following : R2 = (R1+x2)modK.
The source node follows the same procedure as s1 and sends R2 to s3.
This way sn is reached, which computes Rn.
Rn = (Rn-1 + xn-1)modK.
The server comes to know that all the nodes have been participated, it asks
the last node to send RN to it. Server now tells the first source node s1 to
compute the summation as X = (Rn-r1)modK = sum of all xi .which is the
summation of individual node’s data and then server will send this data
ahead or keep it for itself.
.
Security Analysis:
The assumption is that the data which is being shared by nodes is correct. If
there is no collusion, the node which has only shared its data and know its raw
data can only calculate the sum of all other nodes, i.e (x-xi)modn. However, if
two or more source nodes collude, they can disclose more information. For
example, if the two neighbours of node i (that is, parties i − 1 and i + 1) collude,
they can learn xi = (Ri − Ri-1)mod n.
To avoid this we ask server or aggregator to choose the next node and
server does that by choosing from the eligible neighbours.
Another Interesting Case:
There is also a possibility of colluding through bypassing the server. In that
case, the source node sends the computed Ri to its colluding node then the
scheme needs to be slightly modified. In that scenario, source to source
communication needs to be strictly via server. But in that case the
communication overhead will increase.
R1=r1+x1
R2=R1+x2 Rn=Rn-1+xn-1

A Vulnerability: Server Trust Issues
Server may attempt to get the knowledge of source’s private data.
This can be done in the following way:
1) s1 passes the information of its neighbour to the server for forwarding R1
to other sources. There is a possibility that the server declares that the
neighbours of node 1 has already participated and therefore no nodes are
left and then the server ask the node s1 to send its data in Computation of
SUM and the data that s1 sends is the data that s1 contains. In this way
the server can identify the private data of each and every node by
subsequently choosing them as both initiator and the terminator node. . In
order to avoid that, each time the initiator source node asks the server
node to send the computation SUM value , it compares it with its private
value, if both happen to be same; initiator source node sends the message
to the server that “operation cannot be performed”.
How to preserve integrity along with privacy in this protocol?
We have analysed and tried to implement two techniques:
1) Integrity protection through perturbation technique.
2) Integrity protection through shuffling and aggregation tree.
1) Integrity protection through perturbation technique:
First the server or the base station sends the query to all the source nodes.
After receiving a query from the BS, each sensor node customizes its data
into a complex number by combining sensitive data with a private real
number and adjoins an imaginary number to it. It uses an additive property
of complex numbers to check integrity in data aggregation and achieve
privacy from other trusted participating node as well as from adversaries. In
the two parts of a complex number, the real part is used for privacy
preservation and imaginary part is used for integrity checking. Every node
share two keys, one key is shared with master device and other is shared
with those sensor nodes lying on the aggregation tree. Thisprotocol requires
a considerable memory at each node.
Each node encrypts and sends the customized data to its parent node using
the shared key between them.
R1 = {[(r1+x1) + ci]}modK

This data is sent to other nodes as we have discussed in privacy preservation
part. The customized data is aggregated (i.e. sum) by using additive
properties of complex number and send to the sink after the encryption.
After the entire process of privacy preservation, the server/aggregator has
the aggregated value with itself. Now, in order to get the actual data and to
check the integrity, first separate the real part and imaginary part of the sum.
When the base station get the sum of whole data it separates the real part by
subtracting the real seeds data from it and also subtract the imaginary part.
Then the sink will compare the sum of imaginary part that it received and
also the sum of individual data’s imaginary part. If both do not have much
difference that means the data is not being tempered and the privacy is
preserved.

Problem with this scheme: participation of partial nodes
If only some of the nodes participate in data aggregation, then the above
scheme for checking data integrity fails because base station would not
know which nodes are participating or not participating and thus instead of
subtracting only the imaginary values of nodes which are participating, it
subtracts the entire sum of imaginary values of all nodes.
Solution: We devised a solution to this problem of partial participation
of nodes by introducing a bit vector as follows:
0 0 0 0 0
In this sensor network the length of this bit vector is same as the number
nodes a single path can consist. Initially, all the bits would be set to zero and
this zero corresponds to the absence of a node in process of data aggregation.
As this vector is being passed through nodes of a specific path in the
network, the nodes which are participating will flip their corresponding bit
from 0 to 1 indicating its presence. Here, assumption is that each node
knows its bit position. Thus, finally, the base station would receive a bit
vector which would indicate that which nodes have participated and thus it
can subtract the corresponding imaginary values. As s sensor network can
consist 32 nodes in a single path, the bit vector’s length is maximum 32 bits
therefore the communication cost do not create a problem.
Security Problem: There is one major problem with this solution. It is that
any malicious node can flip any node’s bit and thus convey incorrect
information to the base station. This problem can be solved by using the
following scheme:
Each node will update its bit and encrypt its position vector with its own
key For example, the position vector for first node in path of three nodes is
0 0 1
The node sends the updated bit vector and its own position vector in
encrypted form to the next node in the path. The next node will update its
own bit and encrypt its position vector with its own key and XOR this with
the

Encrypted position vector received from previous node. This pattern would be
repeated till the entire bit vector and the XOR of the encrypted position vector
of all nodes along the path reaches the base station. Finally, the base station
looks up to the received bit vector and computes the XOR of the nodes which
are present as it knows the key of each node and then compares it with
received XOR value. If this values match, then it is assumed that data is not
tempered with and data is accepted else it is rejected.
Another interesting study: Creation of Energy Efficient tree in
Wireless Sensor Networks
Most of the Wireless Sensor Networks uses the spanning trees to efficiently
aggregate the data. When a data is sensed by sensor nodes, relevant data
must be forwarded to sink. The sink is the root of spanning tree and all the
other nodes that sense event, construct the tree. Each intermediate node
aggregates this data with data sent by its child and then transmits this
aggregated data to its parent. This procedure continues until data arrives to
the sink.
Parameters related to protocol:
1) To compute the aggregation tree, in the intermediate nodes the
consumed energy for sending data from the leaves to the sink
must be considered.
2) The tree’s delay which is equal to the tree’s depth should be
considered .
3) Scheduling mechanism and queuing delay can be considered as
B.S
1
2
3
{001} c1= {001}k1
{011}, c2={010}k2, c=c1 XOR c2
{111},c3={100}k3, c = c1 XOR c2 XOR c3
Compute c1, c2, c3 by seeing bit
vector and compute c by XORing
and compare.

aggregation tree evaluation parameters.
4) To decrease number of failed nodes and to increase the network
lifetime, both remaining energy and distance parameters are
considered.
 The first parameter that should be considered is the remaining
energy in each node.
 The distance between the nodes is considered as the second
parameter i.e. each node selects a node with most energy within its
neighbours as parent. If there are some neighbours with equal
energy, a neighbour with least distance will be selected. By using
this strategy, a node with low remaining energy can be alive more.
This increases the lifetime of the network and supports better
coverage.
 To provide fairness in energy consumption, in addition to the
residual energy and distance, third parameter which is the
maximum number of children permitted is also considered. In the
proposed algorithm, each node could have a predetermined
maximum number of children.

To avoid high power consumption because as the transmission power is
proportional to the distance, the proposed algorithm uses the average path’s
energy as a new parameter. This parameter is calculated as the sum of residual
energy of each node among the path divided by the path length. A node with
highest energy is chosen as a parent node in any path.
If residual energy = 5J
distance between nodes = 2m
parameter =5/2 = 2.5 J/m
Thus in this way, path is chosen and tree is formed.

For example:
The remaining energy of nodes 1, 2, 3, 4, 5, 6, 7 and 8 are equal to 10J, 2J, 8J,
3J, 6J, 8J, 7J and 9J, respectively. Suppose that node 8 wants to select its parent
.Node 5 which has more average path’s energy is selected as the parent of
node 8.
After performing the same procedure for each node, the spanning tree created as
a result looks like:

This concludes our research and development during summer internship.
Thank You.
References:
1) Privacy Preserving Data Aggregation in Wireless Sensor Networks
Arijit Ukil
Innovation Labs, Tata Consultancy Services, Kolkata, India
IEEE ICWCMC 2010, Valencia, Spain
2)
Integrity Protecting and Privacy Preserving Data Aggregation Protocols
in Wireless Sensor Networks: A Survey
Joyce Jose
Post Graduate Scholar,
Dept. Information Technology, Karunya University, Coimbatore, India
joycejose1990@gmail.com
M. Princy
Lecturer,
princym@karunya.edu
Josna Jose
Post Graduate Scholar,
josnajose1990@gmail.com
3) Energy Efficient Spanning Tree for Data Aggregation
in Wieless Sensor Networks
Zahra Eskandari
Department of Computer
Engineering, Ferdowsi
University of Mashhad
e-mail: za_es73@stu-mail.um.ac.ir
Mohammad Hossien Yaghmaee
Department of Computer Engineering,
Ferdowsi University of Mashhad
Lane Department of Computer Science
and Electrical Engineering, West Virginia
University, Morgantown, WV 26506
e-mail: hyaghmae@ferdowsi.um.ac.ir

AmirHossien Mohajerzadeh
Department of Computer
Engineering, Ferdowsi
University of Mashhad
e-mail: am_mo49@stu-mail.um.ac.ir
4) AN ENERGY-AWARE SPANNING TREE AlGORITHM FOR
DATA AGGREGATION IN WIRELESS SENSOR NETWORKS
Marc Lee and Vincent W.S. Wong
Department of Electrical and Computer Engineering
The University of British Columbia,Vancouver, BC,
Canada e-mail: {wnmlee, vincentw}@ece.ubc.ca

Report_Summer

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Similar to Report_Summer

Similar to Report_Summer (20)

Report_Summer