INDIAN INSTITUTE OF TECHNOLOGY ROORKEE
Authored by
Ishitva Minocha
Reliance Jio Infocomm Ltd
Navi Mumbai
`
Training and Seminar (CSN-499)
Wi-Fi based Geolocation Engine – Call Clustering Algorithm
Under the guidance of
Mr. Nikola Sucevic
2
Introduction
• As the technology advancements stride across the globe, there
seems to be a fair need of improvement in precision levels of
Geolocation of technology users.
• GPS as well as Google maps geolocation APIs based services for
android applications is presently used for tracking location, in most
of the devices.
• Considering their incapability to track the location of the devices
within the infrastructures, the project tries to proposes a more
reliable method.
3
Motivation behind the project
GPS based location services and its drawbacks
• The Global Positioning System (GPS) originally developed by the
US government for military navigation, is a network of about 30
satellites orbiting the Earth at an altitude of 20,000 km.
• Wherever you are on the planet, at least four GPS satellites are
‘visible’ at any time.
• But these radio signals stop communication once it starts facing
obstacles. which means within the buildings we cannot rely on GPS
for locating the device. In this case it provides the last tracked
location of the device.
4
Work Plan to meet the objective
• The proposed method tries to overcome this drawback by shifting the basic
principle from GPS to “Network Availability” which has an edge over former, as its
signals propagate through infrastructures.
• Routers (Access Points) are the basis of the proposed algorithm Principles.
Fig: Overview of the Work-Flow
5
• Requirement of data for analysis and development is met by an android
application which solely probes and logs the available APs and write the
log file to device, running the application.
Features of developed Application
• UNIX Time (POSIX time)
• Logging Date
• Time Stamp
• Latitude of the logging device
• Longitude of the logging device
• BSSID (Basic Service Set Identifier), which is Unique
for any network providing device.
• RSSI (Received Signal Strength Indicator) at a
particular lat-long from all scanned Aps.
• SSID (Signal Strength Indicator) or the Identity name
of the AP.
Development of Logging Application and its use
6
The Log file generated from logging Application
7
Work Description
Some Methodologies (Trials)
From the data collection, the basic idea was to use the parameters such that they
could be used in the known classification methods for the cluster formation, such as
K-Means, K- Nearest Neighbor, Hierarchical Clustering and others.
• Using Latitude-Longitude location and corresponding RSSI values.
• Histogram Formation from Mapped BSSIDs
• Using K-Means for clustering known distance parameters.
We considered this for further processing even if there were randomness errors
in position, because it presented clusters on the basis of intra-cluster closeness and
inter-cluster separation, which is hardly affected by small errors.
8
Final Model and Algorithm Development
The Final Principle is based on developing a distance function which can be used in
one of the clusterization methods.
• The idea is to develop three codes based on their precision levels, named
as A.Code (Area Code), V.Code (Visibility Code) And H.Code (Power code),
such that each individual group or cluster of BSSIDs would have a unique code
of tags.
 A.Code is find by using K-means and lat-long pairs as distance function, as
described in the last trial step. So all those possessing same A.code are at least
close to a particular lat-long pair.
 V.Code that is classification on the basis of visibility, which means that those clusters
of BSSIDs and their time stamps which possess same visibility code got scanned
in small interval of time or are highly visible in two or more time stamps so are
closer to each other.
 Following V.Code, the algorithm seeks to cluster the already found clusters on the
basis of RSSI values, known as H.Code of algorithm.
9
Steps to find V.Code ( Clustering on visibility basis)
• Segregating the BSSIDs at each time stamp
10
• Finding an affinity matrix based on a visibility closeness formula:
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝐶𝑜𝑚𝑚𝑜𝑛 𝑢𝑛𝑖𝑞𝑢𝑒 𝐵𝑆𝑆𝐼𝐷𝑠 𝑏𝑒𝑡𝑤𝑒𝑒𝑛 𝑎 𝑝𝑎𝑖𝑟 𝑜𝑓 𝑝𝑜𝑖𝑛𝑡𝑠
𝑇𝑜𝑡𝑎𝑙 𝑠𝑐𝑎𝑛𝑛𝑒𝑑 𝐵𝑆𝑆𝐼𝐷𝑠 𝑎𝑡 𝑒𝑎𝑐ℎ 𝑝𝑜𝑖𝑛𝑡
• After the affinity matrix so formed, there comes a distance function which now can be used
in one of the clustering methods.
The best function for the defined data is Hierarchical Clustering which used the
matrix to find Euclidean distance and functions on Average value method to cluster the
timestamps.
Phase one completion with result as Affinity matrix for V.Code (sample)
11
Cluster Analysis with the help of Dendrogram
From the cluster analysis or Dendrogram visualization, an optimal number of
clusters can be concluded either as per required clusters or on the basis of optimal
value function
12
Horizontal basic clustering using V.Code Vertical Clustering or 3-D clustering upto V.code
13
 For H.Code formulation, considering one cluster at a time, in which for each pair of
timestamps having some number of BSSIDs, a formula function has been developed.
• The threshold value is given by-
𝑆𝑢𝑚 𝑜𝑓 𝑎𝑙𝑙 𝑑𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑐𝑒𝑠
𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑐𝑜𝑚𝑚𝑜𝑛 𝐼𝐷𝑠
• Thus, we get number of IDs within defined range, which is used in following steps of
algorithm
 For affinity matrix formation on the basis of RSSI values, the function is defined as
follows-
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝐼𝐷𝑠 𝑖𝑛 𝑟𝑎𝑛𝑔𝑒
𝑇𝑜𝑡𝑎𝑙 𝑐𝑜𝑚𝑚𝑜𝑛 𝐵𝑆𝑆𝐼𝐷
 From each pair’s value, a single affinity matrix forms which implies that there will
be as many as K matrices, where K is the number of clusters, either given by user
or found optimally.
Further Analysis for H.Code
( Clustering on the basis of signal Strength)
14
A particular cluster, which is further clustered on the basis of H.Code
15
Clusters within the V.Code clusters imples validation of H.Code as shown above
16
The over-all Algorithm work-flow
17
Conclusion
• The developed algorithm, therefore propose a more reliable method, which has an
edge over existing geolocation methods and gives a new hope of exploring indoor
location services simply based on Access Point availability.
• Although the proposed algorithm proves out to be the best for indoor and 3-
Dimensional geo location for now, but there are certain factors which are
responsible for its accuracy, especially in noisy environments, which would
require special attention as an outlier.
18
Thanks for
Attending

seminar report

  • 1.
    INDIAN INSTITUTE OFTECHNOLOGY ROORKEE Authored by Ishitva Minocha Reliance Jio Infocomm Ltd Navi Mumbai ` Training and Seminar (CSN-499) Wi-Fi based Geolocation Engine – Call Clustering Algorithm Under the guidance of Mr. Nikola Sucevic
  • 2.
    2 Introduction • As thetechnology advancements stride across the globe, there seems to be a fair need of improvement in precision levels of Geolocation of technology users. • GPS as well as Google maps geolocation APIs based services for android applications is presently used for tracking location, in most of the devices. • Considering their incapability to track the location of the devices within the infrastructures, the project tries to proposes a more reliable method.
  • 3.
    3 Motivation behind theproject GPS based location services and its drawbacks • The Global Positioning System (GPS) originally developed by the US government for military navigation, is a network of about 30 satellites orbiting the Earth at an altitude of 20,000 km. • Wherever you are on the planet, at least four GPS satellites are ‘visible’ at any time. • But these radio signals stop communication once it starts facing obstacles. which means within the buildings we cannot rely on GPS for locating the device. In this case it provides the last tracked location of the device.
  • 4.
    4 Work Plan tomeet the objective • The proposed method tries to overcome this drawback by shifting the basic principle from GPS to “Network Availability” which has an edge over former, as its signals propagate through infrastructures. • Routers (Access Points) are the basis of the proposed algorithm Principles. Fig: Overview of the Work-Flow
  • 5.
    5 • Requirement ofdata for analysis and development is met by an android application which solely probes and logs the available APs and write the log file to device, running the application. Features of developed Application • UNIX Time (POSIX time) • Logging Date • Time Stamp • Latitude of the logging device • Longitude of the logging device • BSSID (Basic Service Set Identifier), which is Unique for any network providing device. • RSSI (Received Signal Strength Indicator) at a particular lat-long from all scanned Aps. • SSID (Signal Strength Indicator) or the Identity name of the AP. Development of Logging Application and its use
  • 6.
    6 The Log filegenerated from logging Application
  • 7.
    7 Work Description Some Methodologies(Trials) From the data collection, the basic idea was to use the parameters such that they could be used in the known classification methods for the cluster formation, such as K-Means, K- Nearest Neighbor, Hierarchical Clustering and others. • Using Latitude-Longitude location and corresponding RSSI values. • Histogram Formation from Mapped BSSIDs • Using K-Means for clustering known distance parameters. We considered this for further processing even if there were randomness errors in position, because it presented clusters on the basis of intra-cluster closeness and inter-cluster separation, which is hardly affected by small errors.
  • 8.
    8 Final Model andAlgorithm Development The Final Principle is based on developing a distance function which can be used in one of the clusterization methods. • The idea is to develop three codes based on their precision levels, named as A.Code (Area Code), V.Code (Visibility Code) And H.Code (Power code), such that each individual group or cluster of BSSIDs would have a unique code of tags.  A.Code is find by using K-means and lat-long pairs as distance function, as described in the last trial step. So all those possessing same A.code are at least close to a particular lat-long pair.  V.Code that is classification on the basis of visibility, which means that those clusters of BSSIDs and their time stamps which possess same visibility code got scanned in small interval of time or are highly visible in two or more time stamps so are closer to each other.  Following V.Code, the algorithm seeks to cluster the already found clusters on the basis of RSSI values, known as H.Code of algorithm.
  • 9.
    9 Steps to findV.Code ( Clustering on visibility basis) • Segregating the BSSIDs at each time stamp
  • 10.
    10 • Finding anaffinity matrix based on a visibility closeness formula: 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝐶𝑜𝑚𝑚𝑜𝑛 𝑢𝑛𝑖𝑞𝑢𝑒 𝐵𝑆𝑆𝐼𝐷𝑠 𝑏𝑒𝑡𝑤𝑒𝑒𝑛 𝑎 𝑝𝑎𝑖𝑟 𝑜𝑓 𝑝𝑜𝑖𝑛𝑡𝑠 𝑇𝑜𝑡𝑎𝑙 𝑠𝑐𝑎𝑛𝑛𝑒𝑑 𝐵𝑆𝑆𝐼𝐷𝑠 𝑎𝑡 𝑒𝑎𝑐ℎ 𝑝𝑜𝑖𝑛𝑡 • After the affinity matrix so formed, there comes a distance function which now can be used in one of the clustering methods. The best function for the defined data is Hierarchical Clustering which used the matrix to find Euclidean distance and functions on Average value method to cluster the timestamps. Phase one completion with result as Affinity matrix for V.Code (sample)
  • 11.
    11 Cluster Analysis withthe help of Dendrogram From the cluster analysis or Dendrogram visualization, an optimal number of clusters can be concluded either as per required clusters or on the basis of optimal value function
  • 12.
    12 Horizontal basic clusteringusing V.Code Vertical Clustering or 3-D clustering upto V.code
  • 13.
    13  For H.Codeformulation, considering one cluster at a time, in which for each pair of timestamps having some number of BSSIDs, a formula function has been developed. • The threshold value is given by- 𝑆𝑢𝑚 𝑜𝑓 𝑎𝑙𝑙 𝑑𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑐𝑒𝑠 𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑐𝑜𝑚𝑚𝑜𝑛 𝐼𝐷𝑠 • Thus, we get number of IDs within defined range, which is used in following steps of algorithm  For affinity matrix formation on the basis of RSSI values, the function is defined as follows- 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝐼𝐷𝑠 𝑖𝑛 𝑟𝑎𝑛𝑔𝑒 𝑇𝑜𝑡𝑎𝑙 𝑐𝑜𝑚𝑚𝑜𝑛 𝐵𝑆𝑆𝐼𝐷  From each pair’s value, a single affinity matrix forms which implies that there will be as many as K matrices, where K is the number of clusters, either given by user or found optimally. Further Analysis for H.Code ( Clustering on the basis of signal Strength)
  • 14.
    14 A particular cluster,which is further clustered on the basis of H.Code
  • 15.
    15 Clusters within theV.Code clusters imples validation of H.Code as shown above
  • 16.
  • 17.
    17 Conclusion • The developedalgorithm, therefore propose a more reliable method, which has an edge over existing geolocation methods and gives a new hope of exploring indoor location services simply based on Access Point availability. • Although the proposed algorithm proves out to be the best for indoor and 3- Dimensional geo location for now, but there are certain factors which are responsible for its accuracy, especially in noisy environments, which would require special attention as an outlier.
  • 18.