When a sensor network is first activated, various tasks must be performed to establish the
necessary infrastructure that will allow useful collaborative work to be performed. In particular,
each node must discover which other nodes it can directly communicate with, and its radio
power must be set appropriately to ensure adequate connectivity. Nodes near one another may
wish to organize themselves into clusters, so that sensing redundancy can be avoided and
scarce resources, such as radio frequency, may be reused across nonoverlapping clusters.
A wireless sensor network may consist of a large number of sensor nodes, and each node is
equipped with sensors, microprocessors, memory, wireless transceiver, and battery. Once
deployed, the sensor nodes form a network through short-range wireless communication. They
collect environmental surveillance data and send them back to the data processing center, which
is also called the sink node or base station. In many applications, wireless sensor networks are
used to monitor some measures of interest, such as temperature, light intensity, air pressure,
The wireless sensors are mostly deployed in remote and hazardous locations, where manual
monitoring is very difficult or almost impossible. Due to the low cost of wireless sensors, these
can be deployed in large numbers. Apart from sensing, sensor nodes are equipped with data
processing and communication capabilities. The sensing circuitry measures the parameters of
interest (temperature, pressure, etc.) within its sensing range and transforms them into electrical
signals. These electrical signals are processed and with the help of onboard radio they are
transmitted to the remotely located sink node. Due to deployment of wireless sensors in
unattended harsh environment, it is not possible to charge or replace their batteries. Therefore,
energy efficient operation of wireless sensors to prolong the lifetime of overall wireless sensor
Network is of utmost importance. Most of the energy consumption in wireless sensor node is
attributed to transmitting/receiving, processing, and forwarding the data to neighboring nodes.
The dense deployment and unattended nature of WSNs make it quite difficult to recharge node
batteries. Therefore, energy efficiency is a major design goal in these networks.
Grouping sensor nodes into clusters has been widely used to achieve this objective. Clustering is
especially important for sensor network applications where a large number of ad-hoc sensors are
deployed for sensing purposes. If each and every sensor start to communicate and engage in data
transmission in the network, a great network congestion and data collisions will be experienced.
This will result to drain limited energy from the network. Node clustering will address these
issues. Scalability of the network of those WSNs are useful to meet load balancing and efficient
resource utilization constraints.
In cluster networks, sensors are partitioned into smaller clusters and cluster head (CH) for each
cluster is elected. Sensor nodes in each cluster transmit their data to the respective CH and CH
aggregates data and forward them to a central base station. Clustering through creating a
hierarchical WSN facilitates efficient utilization of limited energy of sensor nodes and hence
extends network lifetime. Although sensor nodes in clusters transmit messages over a short
distance (within clusters), more energy is drained from CHs due to message transmission over
long distances (CHs to the base Station) compared to other sensor nodes in the cluster. Periodic
re-election of CHs within clusters based on their residual energy is a possible solution to balance
the power consumption of each cluster. In addition, clustering increases the efficiency of data
transmission by reducing number of sensors attempting to transmit data in the WSN, aggregating
data at CHs via intra-cluster communication and reducing total data packet loses.
General Sensor Network Architecture
The following figure shows the general sensor network architecture.
Fig. 1. General Sensor Network Architecture
Sensor Node: A sensor node is the core component of a WSN. Sensor nodes can take on
multiple roles in a network, such as simple sensing; data storage; routing; and data processing.
• Clusters: Clusters are the organizational unit for WSNs. The dense nature of these networks
require the need for them to be broken down into clusters to simplify tasks such a
• Clusterheads: Clusterheads are the organization leader of a cluster. They often are required to
organize activities in the cluster. These tasks include but are not limited to data-aggregation and
organizating the communication schedule of a cluster.
• Base Station: The base station is at the upper level of the hierarchical WSN. It provides the
communication link between the sensor network and the end-user.
• End User: The data in a sensor network can be used for a wide-range of applications.
Therefore, a particular application may make use of the network data over the internet, using a
PDA, or even a desktop computer. In a queried sensor network (where the required data is
gathered from a query sent through the network). This query is generated by the end user.
The clustering phenomenon , plays an important role in not just organization of the network, but
can dramatically affect network performance.
Pros and Cons of Clustering
The pros of Clustering are that it enables bandwidth reuse thus can improve the system capacity.
Due to the fact that within a cluster, all the normal nodes send their data to the CHs so energy
saving is achieved by absence of flooding, multiple routes, or routing loops. Due to the fact that
clustering enables efficient resource allocation and thus help in better designing of power control
and other advantage is due to the fact that any changes of nodes behavior within a cluster affect
only that cluster but not the entire network, which will therefore be robust to these changes.
There are also several cons of existing clustering schemes in WSNs like in the selection of the
cluster heads, some algorithm selects cluster heads only according to the ID number or residual
energy of the sensor nodes. Since all the data in sensor network are sent to the base station, the
traffic near the base station is higher. The sensor nodes in these areas will therefore run out
energy earlier. The base station will then be isolated and as a result, the residual energy stored in
the other sensor nodes will be wasted. Another disadvantage is the energy is wasted by flooding
in route discovery and duplicated transmission of data by multiple routes from the source to the
Design challenges in clustering algoriths
Wireless Sensor Networks present vast challenges in terms of implementation. Design goals
targeted in traditional networking provide little more than a basis for the design in wireless
sensor networks .Decomposition of a WSN into smaller clusters is considered to be a convenient
and an efficient approach to prolong network lifetime through efficient energy utilization of
WSN. Some important design considerations in designing clustering algorithms are discussed
• Limited Energy: Wireless sensor nodes have limited energy storage and once they are
deployed, it is not practical to recharge or replace their batteries. With the capability of reducing
the amount of data transmission, the clustering algorithms are more energy efficient compared to
the direct routing algorithms. This can be achieved by balancing the energy consumption in
sensor nodes by optimizing the cluster formation, periodically reelecting CHs based on their
residual energy, and efficient intra-cluster and inter-cluster communication. But clustering
algorithms should prevent high energy cluster reconstruction process.
• Network Lifetime: The energy limitation on nodes results in a limited network lifetime for
nodes in a network. Clustering helps to prolong the network lifetime of WSNs through reducing
in the number of nodes contending for channel access, data aggregation at CHs via intra-cluster
communication and direct or multi-hop communication by CHs with a base station. Proper
design should focus on increasing network lifetime.
• Limited Abilities: The small physical size and small amount of stored energy in a sensor node
limits many of the abilities of nodes in terms of processing, memory, storage,and
• Application Dependency: When designing protocols for WSNs, application robustness must be
focused on, as protocols should be able to adapt to a variety of application requirements.
Changes in the deployment environment can also be observed due to variations in the
• Secure Communication: The ability of a WSN to provide secure communication is ever more
important when considering these networks for military applications. The self-organization of a
network has a huge dependence on the application it is required for. An establishment of secure
and energy efficient intra-cluster and inter-cluster communication is one of the important
challenges in designing clustering algorithms since these tiny nodes are deployed unattended in
• Cluster formation and CH selection: Cluster formation and CHs selection are two of the
important operations in clustering algorithms. Energy wastage in sensors in WSN due to direct
transmission between sensors and a base station can be avoided by clustering the WSN.
Clustering further enhances scalability of WSN in real world applications. Selecting optimum
cluster size, election and reelection of CHs, and cluster maintenance are the main issues to be
addressed in designing of clustering algorithms. The selection criteria to isolate clusters and to
choose the CHs should maximize energy utilization, as well as function for a variety of
• Synchronization: Slotted transmission schemes such as TDMA allow nodes to regularly
schedule sleep intervals to minimize energy used. Such schemes require synchronization
mechanisms to setup and maintain the transmission schedule and the effectiveness of this
mechanism must be considered.
• Data Aggregation: Data aggregation allows the differentiation between sensed data and useful
data. In a densely populated network there are often multiple nodes sensing similar information.
In network processing this process makes energy optimization possible and now it is
fundamental in many sensor network schemes, as the power required for processing tasks is
substantially less than communication tasks. As such, the amount of data transferred in-network
should be minimized.
• Repair Mechanisms: Due to the nature of Wireless Sensor Networks, they are often prone to
node mobility, node death and interference. All of these situations can result in ink failure. When
looking at clustering schemes, it is important to look at the mechanisms in place for link
recovery and reliable data communication.
• Quality of Service (QoS): Existing clustering algorithms for WSN mainly focus on providing
energy efficient network utilization but pay less attention to QoS support in WSN. From an
overall network standpoint, we can look at QoS requirements in WSNs. Many of these
requirements are application dependant such as acceptable delay and packet loss tolerance. For
example in applications such as habitat monitoring, there is no bound on acceptable delay,
however in military tracking, even a small delay is unacceptable. QoS metrics must be taken into
account in the design process.
Classification of clustering algoriths
Clustering in WSNs involves grouping nodes into clusters and electing a CH such that:
• The members of a cluster can communicate with their CH directly.
• A CH can forward the aggregated data to the central base station through other CHs.
Thus, the collection of CHs in the network forms a connected dominating set. Research on
clustering in WSNs has focused on developing centralized and distributed algorithms to compute
connected dominating sets. Distributed approaches are more practical for large-scale deployment
scenarios. Since obtaining an optimal dominating set is an NP-complete problem, the proposed
algorithms are heuristic in nature.
We classify the clustering techniques based on two criteria:
• The parameter(s) used for electing CHs
• The execution nature of a clustering algorithm (probabilistic or iterative)
In self-organization schemes, CHs are nodes that consume more energy than cluster members
when they involve in aggregating, processing and routing data. Clustering algorithms are
heuristic in nature and NP hard. Distributed clustering algorithms are more feasible compared to
centralized clustering algorithms since central control of large number of sensor nodes are not
practical. Only distributed clustering algorithms therefore are considered in this analysis.
Existing clustering algorithms can be categorized into four groups vertically depending on
cluster formation criteria and parameters used for CH election. Some algorithms make decisions
on cluster formation and CHs selection based on pre-collected network information or heuristics
with some specific assumptions on some desirable properties. Node identifiers, node weights
based on the significance of the sensor node, number of neighboring nodes, probabilities
assigned to nodes and residual energy of nodes are common parameters in selecting CHs. If
overheads of collecting prior information about the network are significant or heuristics and
assumptions are not much realistic, energy efficiency and higher network lifetime may not be
achieved. Based on the cluster formation methodology and CH selection criteria, clustering
algorithms are classified into:
1. Identity-based clustering algorithms,
2. Neighborhood information based clustering algorithms,
3. Probabilistic clustering algorithms and
4. Biologically inspired clustering algorithms
We will now provide an overview of the clustering algorithms that are most commonly
considered when investing the self-organization of WSNs.
1. Identity-based clustering algorithms: Uniformly assigned unique identifiers are the
key parameter for selecting CHs in Identity-based clustering algorithms . For a
sensor node to be the CH, it should have the highest identity among all one-hop sensor
nodes  or the lowest identity among all nodes that are neither a cluster head nor are
within one hop of already selected CH . These algorithms may not favor the energy
limited sensor networks since they drain the power of some nodes in the network.
Generally these algorithms are coming under static clustering algorithms and do not
change the CHs once selected. However, energy efficiency is not a primary objective of
most of Identity-based clustering algorithms. A load balancing heuristic can be added to
these algorithms and hence, longer, low variance CH durations can be achieved.The
following are Identity-based clustering algorithms:
Linked Cluster Algorithm (LCA)]: In LCA, each node is assigned a unique ID
number and has two ways of becoming a CH. The first way is if a node has the highest ID
number in the set including all neighbor nodes and the node itself. The second way,
assuming none of its neighbors are CHs, then it becomes a CH. LCA2 , an extension
of LCA, was proposed to eliminate the election of an unnecessary number of CHs. Here
the concept of a node being covered and noncovered is introduced. A node is covered if
one of its neighbors is a CH. CHs are elected starting with the node having the lowest ID
among non-covered neighbors. LCA2 generates smaller number of clusters compared to
LCA and LCA2 have very limited scope as clustering algorithms for WSN since they did
not address the issue of limited energy of WSN. Both algorithms form 1-hop clusters
requiring clock synchronization with time complexity of O(n). Load balancing of
LCA/LCA2 focuses only on intra-cluster communication and is not favorable for real
2. Neighborhood information based clustering algorithms:
In neighborhood information based clustering algorithms; sensors should have information about
their neighbors and should be able to decide on number of neighbors within a pre-specified
transmission range (cluster range). Based on connectivity-based heuristics considering number of
neighbors, some algorithms elect sensors with maximum number of 1-hop neighbors as the CHs .
Some other algorithms under this category use a combination of metrics in addition to node
degree such as: transmission power; mobility; and the remaining energy of the nodes .
Depending on specific application, any or all of these parameters will be utilized for CH
selection. The power consumption at CHs can be reduced by using load balancing heuristic in
these algorithms. This may further build a larger number of clusters within the network creating
congestion in data routing to a base station. Re-clustering or CH reelection is not considered in
these algorithms and mostly they are static clustering algorithms. The following are
neighborhood information based clustering algorithms:
Highest-Connectivity Cluster Algorithm : In this scheme each node broadcasts the
number of neighbors it has to the surrounding nodes. Here the connectivity of a node is
considered. The node with the highest connectivity (highest degree) is elected CH, but in
the case of a tie, the node with the lowest ID prevails. Node which has already selected a
CH withdraws its intention to be a CH. The connectivity– based heuristic used in this
scheme elects the sensor with maximum number of 1-hop neighbors as the CH. The
creation of one-hop cluster and clock synchronization requirement limit the practical
usage of the algorithm.
Max-Min D-Cluster Algorithm: This is a distributed algorithm for CH election, where
no node is more than d hops away from the CH where d (>1) is a value selected for the
heuristic in the algorithm. CHs are selected based on their node ID. Therefore a change in
the network topology will not have much influence on the node clusters and CHs.
The time complexity of generating d-hop clusters is O(d) and this algorithm does not require
clock synchronization as LCA/LCA2 or highest connectivity algorithm. This algorithm further
provides a better load balancing compared to LCA/LCA2 algorithm and highest connectivity
Weighted Clustering Algorithm (WCA) : This algorithm is a non-periodic procedure
for CH election. It is invoked on demand every time a reconfiguration of the network’s
topology is unavoidable. A new election is invoked every time a sensor loses the
connection with any CH, thus saving power. WCA is based on a combination of metrics
that take into account several system parameters such as: the ideal node degree;
transmission power; mobility; and the remaining energy of the nodes. Depending on the
specific application, any or all of these parameters can be used as a metric to elect CHs.
The election procedure is based upon a global parameter that is called combined weight.
Linked Cluster Algorithm (LCA)]: One of the first clustering algorithms developed
and was meant for wired sensors, but has since been implemented for Wireless Sensor
Networks. In LCA, each node is assigned a unique ID number and has two ways of
becoming a CH. The first way is if a node has the highest ID number in the set including
all neighbor nodes and the node itself. The second way, assuming none of its neighbors
are CHs, then it becomes a CH. LCA2 , an extension of LCA, was proposed to eliminate
the election of an unnecessary number of CHs. LCA2 generates smaller number of
clusters compared to LCA. LCA and LCA2 have very limited scope as clustering
algorithms for WSN since they did not address the issue of limited energy of WSN. Both
algorithms form 1-hop clusters requiring clock synchronization with time complexity of
O(n). Load balancing of LCA/LCA2 focuses only on intra-cluster communication and is
not favorable for real applications. This algorithm attempted to provide better load
balancing through reduced number of sensors in a cluster but the requirement of clock
synchronization limits its applications.
Grid-clustering ROUting Protocol (GROUP): In this algorithm one of the sinks
(called the primary sink), dynamically and randomly builds the cluster grid, where CHs
are arranged in a grid-like manner. Forwarding of data queries from the sink to source
node is propagated from the Grid Seed (GS) to its CHs, and so on. In the case of a
location unaware data query the query is passed from the central most sink in the network
to its nearest CH. That CH will then broadcast the message to neighboring CHs. If the
data is location aware, then the requests are sent down the chain of CHs towards the
specified region using unicast packets. For both data queries, data is transmitted upstream
through the chain of CHs established during cluster formation. Energy conservation is
achieved due to the lower transmission distance for upstream data.
PEGASIS: It offers promising improvements with relation to network lifetime, however
reliability may not be as promising. In PEGASIS, each node communicates with its
nearest neighbor. This implementation may be more susceptible to failure due to gaps in
Clustering Algorithm via Waiting Timer - CWAT: A decentralized algorithm for
organizing clusters has been proposed for homogeneous sensors with the same
transmission range. The performance of CWAT was evaluated using simplified
simulations. It is observed that the generalization of the proposed algorithm is needed to
see its performance with respect to load balancing, CH reelection and energy usage
across the network.
3 .Probabilistic clustering algorithms:
In Probabilistic clustering algorithm, a prior probability assigned to each sensor node is used to
determine CHs. The probabilities assigned to individual node in the cluster facilitate individual
node to decide on their election as a CH in the cluster while considering few other primary
parameters. In addition to the probability assigned to each node, residual energy at nodes or node
degree is taken as the primary parameter to elect CH. Clustering algorithms in this category
shows faster convergence in addition to energy efficient network utilization, efficient load
balancing and low message overheads. The following are Probabilistic clustering algorithms:
Low-Energy Adaptive Clustering Hierarchy (LEACH) : This algorithm was one of
the first major improvements on conventional clustering. It provides a balancing of
energy usage by random rotation of CHs. The algorithm is also organized in such a
manner that data fusion can be used to reduce the amount of data transmission. The
decision of whether a node elevates itself to CH is made dynamically at each interval, to
minimize overhead in CH establishment. This decision is a function of the percentage of
optimal CHs in a network (determined a priori on application), in combination with how
often and the last time a given node was the CH. This scheduling scheme allows for
energy minimization as nodes can turn off their radio during all but their scheduled timeslot.
Therefore LEACH provides an uniform load balancing in one-hop sensor networks. Localized
coordination scheme used in LEACH provides better scalability for cluster formation and better
load balancing enhances the network lifetime.
Two-Level LEACH (TL-LEACH) : This algorithm is an extension to LEACH
and utilizes two levels of CHs (primary and secondary). The primary CH in each cluster
communicates with the secondaries and the corresponding secondaries communicate with
the nodes in their subcluster. Data-fusion can also be performed as in LEACH. In
addition, communication with a cluster is still scheduled using TDMA time-slots. The
organization of a round will consist of first selecting the primary and secondary CHs
using the same mechanism as LEACH, with a priori probability of being elevated to a
primary CH less than that of a secondary node. The two-level structure of TL-LEACH
reduces the amount of nodes that need to transmit to the base station, effectively reducing
the total energy usage.
Energy Efficient Clustering Scheme (EECS): EECS is similar to LEACH with some
enhancement in cluster formation and cluster head selection process. According to
residual energy of sensor nodes, a constant number of CHs are elected using localized
competition process without iteration. In EECS, clusters are formed by dynamic sizing of
clusters based on cluster distance from the base station. The result is an algorithm that
addresses the problem that clusters at a greater distance from the base station require
more energy for transmission than those that are closer. This provides much lower
message overheads and uniform distribution of CHs compared to LEACH.
Hybrid Energy Efficient Distributed Clustering (HEED) : HEED is a multi-hop
clustering algorithm for Wireless Sensor Networks. CHs are chosen based on two
important parameters: residual energy and intra-cluster communication cost. Residual
energy of each node is used to probabilistically choose the initial set of CHs, as
commonly done in other clustering schemes. In HEED, Intra-Cluster Communication
Cost reflects the node degree or node’s proximity to the neighbor and is used by the
nodes in deciding to join the cluster. Low cluster power levels promote an increase in
spatial reuse while high cluster power levels are required for inter-cluster communication
as they span two or more cluster areas. HEED provides a uniform CH distribution across
the network and better load balancing. However, knowledge of the entire network is
needed to determine intra-cluster communication cost and configuration of those
parameters might be difficult in the practical world.
Time Controlled Clustering Algorithm -TCCA : Similar to LEACH, the operation of
TCCA is divided into rounds enabling better load distribution among sensor nodes. Each
round consists of a cluster setup phase targeting at cluster formation and CH selection,
and a steady state phase focusing cyclic collection, aggregation and transfer of data at CH
to a base station. Node’s residual energy and a desired CH probability are considered in
the eligibility criteria for CH selection. Once the CH is selected it advertises its selection
as the CH to the neighboring nodes by sending an advertisement message (ADV) which
includes its node id, initial Time-To-Live (TTL), its residual energy and a timestamp.
TTL selected based on residual energy of the node is used to limit the size of the cluster
to be formed.
4. Biologically inspired clustering algorithms:
Recently proposed biologically inspired clustering algorithms utilize swarm intelligence
techniques which model the collective behavior of social insects like ants. These algorithms are
not yet matured and improvements are to be sought. In these clustering algorithms, colonial
closure model which has been derived based on ant colonies are used. Biologically inspired
clustering algorithms show that they can dynamically control the CH selection while achieving
uniform distribution of CHs and optimal number of clusters.Following is biologically inspired
ANTCLUST based clustering : In this, Swarm Intelligence based clustering algorithm
has been proposed. Swarm intelligence is a technique used to model the collective
behavior of social insects like ants and shows the properties of robustness, distributed
problem solving capabilities, de-centralized performance. ANTCLUST is a model of an
ant colonial closure to solve clustering problems. In the ANTCLUST-based clustering
method, sensor nodes with more residual energy become cluster heads independently.
Then, randomly chosen nodes meet with each other and clusters are created, merged, and
discarded through local meetings. Each sensor node with less residual energy chooses a
cluster based on the residual energy of the cluster head, its distance to the cluster head,
and an estimation of the cluster size. Eventually energy efficient clusters are formed that
result in an extension of the lifetime of a sensor network.
Wireless sensor networks (WSNs) have attracted significant attention over the past few years. A
growing list of civil and military applications can employ WSNs for increased effectiveness;
especially in hostile and remote areas. Examples include disaster management, border protection,
combat field surveillance. In these applications a large number of sensors are expected, requiring
careful architecture and management of the network.
Grouping nodes into clusters has been the most popular approach for support scalability in
WSNs.Clustering is an important technique that
▫ Prolongs network lifetime
▫ Reduces channel contention
▫ Reduces collisions
Significant attention has been paid to clustering algorithms.
 Ossama Younis, Marwan Krunz, and Srinivas Ramasubramanian, “Node Clustering in
Wireless Sensor Networks: Recent Developments and Deployment Challenges”, IEEE
 P. Kumarawadu*, D. J. Dechene+, M. Luccini+, and A. Sauer, “Algorithms for
Node Clustering in Wireless Sensor Networks: A Survey”, IEEE,2008
 D.J. Baker and A. Epheremides, “The Architectural Organization of a Mobile Radio
Network via a Distributed Algorithm,” IEEE Transactions on Communications, 1981,
M. Chatterjee, S. K. Das and D. Turgut, “WCA: A Weighted Clustering
Algorithm for Mobile Ad Hoc Networks,” Clustering Computing, 2002, vol. 5,
 A. Amis, R. Prakash, T. Vuong and D. Huynh, “Max-Min D-Cluster Formation in
Wireless Ad Hoc Networks,” IEEE INFOCOM, March 2000.
 M. Ye, C. Li, G. Chen and J. Wu, “An Energy Efficient Clustering Scheme
in Wireless Sensor Networks,” Ad Hoc & Sensor Wireless Networks,2006,
 Mao YE, Chengfa LI, Guihai CHEN and Jie WU, “An Energy Efficient
Clustering Scheme in Wireless Sensor Networks”, Ad Hoc & Sensor
Wireless Networks, 2005, Vol. 3, pp. 99-119