On the Impact of Data Collection on the Quality of Signal Strength in Wi-Fi Indoor Localization

On the Impact of Data Collection on the Quality of Signal Strength Signatures in
Wi-Fi Indoor Localization
John Nicholson and Vladimir Kulyukin
Computer Science Assistive Technology Laboratory
Department of Computer Science
Utah State University
Logan, UT 83422-4205
ABSTRACT

Wi-Fi signals can be used to localize navigators at topological landmarks in indoor and
outdoor environments. A major issue with Wi-Fi topological localization is calibration.
This paper describes the impact of data collection on the quality of signal strength
signatures.

KEYWORDS

Visual impairment; blindness; assisted navigation; indoor localization; Wi-Fi; 802.11.

BACKGROUND

Using Wi-Fi 802.11 signals for localization is growing in popularity [1,2,3] due to their
wide deployment and affordability. Some projects, such as PlaceLab [1], are using Wi-Fi
to replace or supplement GPS in outdoor environments. The objective is to bring the
calibration time to a minimum by pairing single GPS readings to available Wi-Fi signal
strengths at known locations. However, the reported localization accuracy fluctuates
between 13 and 40 meters, which may be too inaccurate for the indoor wayfinding needs
of the visually impaired.

Another method for Wi-Fi indoor localization is to create a topological map of the
environment and develop signal signatures of selected landmarks through data collection
and pre-processing. The quality of signal signatures is critical, because Wi-Fi signals vary
over time and are susceptible to interference that comes from other wireless devices in
the same frequency range, solid objects, human bodies, and multi-path issues [2].

HYPOTHESIS

It is hypothesized by the investigators that the quality of signal signatures is dependent on
the time of day when data collection occurs.

METHOD

--------------------------------
Insert Figures 1, and 2 here
--------------------------------
Data collection was done with a wearable multi-sensor wayfinding test bed called the
Wayfinder (see Figure 1). The device uses a wireless card that allows the signal strength

to be collected from five wireless access routers placed at different locations in the USU
Computer Science Department (see Figure 2). Data were collected two ways: statically
and dynamically.
---------------------------
Insert Figure 3 here
---------------------------
Static data were collected over a period of a month and a half. Data collection for
locations 1 through 5 was completed first and then the remaining locations' data were
collected. Data were collected once per day for each location in a group (1-5 and 6-12).
Locations were collected one immediately after another. Locations had at least two
collection positions. A collection position (see Figure 3) is where the data collector stood
while gathering signal strength information. As in other systems [2, 3], the user’s
orientation is taken into consideration, so at each collection position, data were collected
with the collector facing each direction of the hall for 2 minutes at a time. For example,
if a collection position was in a hall which ran north/south, then data were collected for 2
minutes facing north and then 2 minutes facing south. Directionality was taken into
consideration because of the effect of the human body on the signal strength. Data
collection was performed on 10 different days so 20 minutes of data were collected at
each collection position for each direction of the position's hall, in other words a total of
40 minutes data for a collection position.

Dynamic data were collected by walking a series of four routes around the CS
Department. Each route was walked 15 times in both directions for a total of eight
routes. Figure 2 shows the path for one of the routes. In order to record when the user
was at the locations, masking tape was placed on the floor 0.5 meters before and after
each collection position. During a walk, the user pressed a key on the system to record
whenever they passed over a piece of tape. All walks for all routes were completed on
the same day in one data collection session. Note that although the static data was
collected over multiple days, all dynamic data was collected on a single day.

Naive Bayes and C4.5 were used to pre-process the collected data. Thus, signal
signatures were numerical classes created by these two algorithms. To measure the
impact of data collection on the quality of signal signatures, three types of validation
were attempted at run time: 1) static on static, 2) static on dynamic, and 3) dynamic on
dynamic. For the static on static validation, the signatures were created from each day of
static data and validated with the other static datasets. For the static on dynamic
validation, the signatures were created from each day of static data and validated with the
dynamic data. Finally, for the dynamic on dynamic test, the signatures were created from
the dynamic data for each route and were validated with the dynamic data from all the
routes. If the training dataset and the validation datasets were the same, e.g. the same day
of static data, then the dataset was split so that 66% of the data were used for training and
33% of the data for validation. Otherwise, 100% percent of the training dataset and 100%
of the validation dataset were used.

RESULTS

---------------------------
Insert Tables 1, 2, 3, and 4 here
---------------------------
The static-on-static and the dynamic-on-dynamic tests had the same training and
validation dataset. In these cases, both classifiers, Bayes and C4.5, tend to give accurate
results: 94% accuracy or higher. However, this performance does not carry over to
training and validation on different data sets. In other words, when signal signatures are
created from a data set collected on a day different from the day of the validation data set,
localization accuracy varies. Although the dynamic data were only collected on one day,
it is reasonable to conjecture that a walk taken over the same route on a different day will
display the same problems as static data on different days, but this still needs to be
verified. The quality of signal signatures appears to be dependent on the time of day
when data collection occurs. It remains to be investigated how the quality of signal
signatures depends on the amount of collected data.

REFERENCES

1. Cheng, Y., Chawathe Y., LaMarca A., Krumm J. (2005). Accuracy characterization
for metropolitan-scale Wi-Fi localization. Proceedings of the 3rd international conference
on Mobile systems, applications, and services. Seattle, Washington.
2. Ladd, A., Bekris, K., Rudys A., Wallach, D., and Kavrakia, L. (2004). On the
Feasibility of Using Wireless Ethernet for Indoor Localization. Transactions on Robotics
and Automation, 20, No. 3.
3. Seshadri, V., Zaruba G. V., and Huber M. (2005). A Bayesian Sampling Approach to
In-door Localization of Wireless Devices Using Received Signal Strength Indication.
Third IEEE International Conference on Pervasive Computing and Communications,
2005. Kauai Island, Hawaii.

ACKNOWLEDGMENTS

The study was funded by two Community University Research Initiative (CURI) grants
from the State of Utah (2003-04 and 2004-05) and NSF Grant IIS-0346880. The authors
would like to thank Mr. Sachin Pavithran, a visually impaired training and development
specialist at the USU Center for Persons with Disabilities, for his feedback on the
localization experiments.

Author Contact Information:

Vladimir Kulyukin, Ph.D., Assistive Technology Laboratory, Department of Computer
Science, Utah State University, 4205 Old Main Hill, Logan, UT 84322-4205, Office
Phone (435) 797-8163. EMAIL: vladimir.kulyukin@usu.edu.

---------------------------
Figure 1: Wayfinder System
---------------------------

Alternative Text Description for Figure 1.
Figure 1 shows a photograph of the current Wayfinder system prototype . The system is
mounted to a vest and does not require the user to carry anything with their hands. It has
a GPS unit on one shoulder and a compass on the other shoulder. The computation unit
sits in front on the user's chest. A numeric keypad sits in front and allows the user to
respond to system prompts. The system uses speech trhough an attached headphone.

---------------------------
Figure 2: Map of the USU Computer Science Department with an example route shown.
Black circles represent access points. Circled numbers represent locations. The route
covers locations 1 to 5.
---------------------------

Figure 2 shows a map of the Utah State University Computer Science Department. It
shows the locations of five access points in the department. It also shows the locations
which are used for localization purposes. The twelve locations are the intersections of
halls were a person can turn. The figure also shows an example route which take a path
from location 1 to location 5.

------------------------------------------------
Figure 3: Collection positions at a corner location. Black dots represent collections
positions. Collection positions were 1.5 meters from the actual location. Narrow halls

had one collection position, wide halls had two.
------------------------------------------------

Figure 3 shows an example of collection positions for a corner where a narrow hall and a
wide hall intersect. There is one collection position in the middle of the narrow hall 1.5
meters from the corner. There are two collection positions in the wide hall which divide
the width of the hall into thirds. They are also 1.5 meters from the corner.

---------------------------
Table 1: Static on Static results when the training dataset and the validating datasets are
on the same day. The columns for Bayes and C4.5 are the percent of validation samples
which were correctly classified by that algorithm.
---------------------------
Dataset Day Bayes C4.5
2005-01-03 0.982892 0.999602
2005-01-04 0.979402 0.999830
2005-01-05 0.970853 0.999678
2005-01-11 0.988162 0.999830
2005-01-12 0.968064 0.999016
2005-01-13 0.992520 0.999792
2005-01-20 0.954853 0.998335
2005-01-26 0.943970 0.998845
2005-02-01 0.974570 0.999659
2005-02-02 0.957489 0.997861

---------------------------
Table 2: Static on Static results when the training dataset and the validating datasets
are on different days. The data from the training day was used classify all the other days.
The average columns are the average of the results for each validation day. The max
columns are the highest percent correct achieved for a day. The min columns are the
lowest achieved.
---------------------------
Training Bayes Bayes Max Bayes Min C4.5 C4.5 Max C4.5 Min
Day Average Average
2005-01-03 0.963299 0.985881 0.938110 0.947295 0.985117 0.900284
2005-01-04 0.936516 0.979617 0.873530 0.928698 0.979009 0.856595
2005-01-05 0.946472 0.986248 0.871559 0.920312 0.957741 0.869446
2005-01-11 0.949799 0.978475 0.930835 0.952160 0.976834 0.925226
2005-01-12 0.940417 0.986158 0.854863 0.929767 0.991057 0.859771
2005-01-13 0.937436 0.973018 0.856570 0.920374 0.972020 0.845039
2005-01-20 0.912735 0.978265 0.827666 0.887077 0.956317 0.753148
2005-01-26 0.936857 0.979829 0.843126 0.943860 0.975973 0.915822
2005-02-01 0.935468 0.989429 0.865968 0.931121 0.976527 0.867669
2005-02-02 0.934006 0.989783 0.846740 0.930395 0.981845 0.836240

---------------------------
Table 3: Static on Dynamic results. The Route column is the route used for validation.
The route "both" means data from both routes 1 and 5 were used. The columns for Bayes
and C4.5 are the percent of validation samples which were correctly classified by that

algorithm.
---------------------------
Training Day Route Bayes C4.5
2005-01-03 1 0.916771 0.872834
2005-01-03 5 0.889491 0.867207
2005-01-03 both 0.903284 0.870052
2005-01-04 1 0.891945 0.885039
2005-01-04 5 0.860265 0.866719
2005-01-04 both 0.876283 0.875982
2005-01-05 1 0.885456 0.789962
2005-01-05 5 0.894362 0.878775
2005-01-05 both 0.889859 0.833870
2005-01-11 1 0.840686 0.896172
2005-01-11 5 0.868668 0.865136
2005-01-11 both 0.854520 0.880828
2005-01-12 1 0.888492 0.879324
2005-01-12 5 0.876279 0.860753
2005-01-12 both 0.882454 0.870142
2005-01-13 1 0.804310 0.785378
2005-01-13 5 0.904165 0.880723
2005-01-13 both 0.853677 0.832516
2005-01-20 1 0.790439 0.865750
2005-01-20 5 0.874452 0.856612
2005-01-20 both 0.831974 0.861232
2005-01-26 1 0.879086 0.908912
2005-01-26 5 0.890283 0.877679
2005-01-26 both 0.884621 0.893471
2005-02-01 1 0.799131 0.879145
2005-02-01 5 0.908427 0.879688
2005-02-01 both 0.853165 0.879414
2005-02-02 1 0.826874 0.891647
2005-02-02 5 0.891805 0.892840
2005-02-02 both 0.858975 0.892237

---------------------------
Table 4: Dynamic on Dynamic results. The route "both" means data from both routes 1
and 5 were used. The columns for Bayes and C4.5 are the percent of validation samples
which were correctly classified by that algorithm.
---------------------------
Training Route Validation Route Bayes C4.5
1 1 0.944843 0.996148
1 5 0.603385 0.713833
1 both 0.776406 0.856777
5 1 0.554921 0.642496
5 5 0.982274 0.999463
5 both 0.767105 0.818549
both 1 0.882062 0.991129
both 5 0.899476 0.991963
both both 0.890315 0.991856

On the Impact of Data Collection on the Quality of Signal Strength in Wi-Fi Indoor Localization

More Related Content

What's hot

Viewers also liked

Similar to On the Impact of Data Collection on the Quality of Signal Strength in Wi-Fi Indoor Localization

More from Vladimir Kulyukin

Recently uploaded

On the Impact of Data Collection on the Quality of Signal Strength in Wi-Fi Indoor Localization