Expanding typologies of tourists spatio-temporal activities using the sequence alignment method
1. ENTER 2016 Research Track Slide Number 1
Expanding Typologies of Tourists’
Spatio-temporal Activities
Using the Sequence Alignment Method
Junya Kawase and Fumiko Ito
Department of Urban System Science
Tokyo Metropolitan University, Japan
j.kawase0922@gmail.com
2. ENTER 2016 Research Track Slide Number 2
Introduction
• Activities have TWO consecutive aspects
– Space
– Time
• GPS devices are useful for activity surveys
– When and where the subjects have passed
– Where the subjects have been staying
3. ENTER 2016 Research Track Slide Number 3
Introduction
• We often make groups by using subjects’
attribute data:
– Age
– Gender
– Accompanying person
• Is above-mentioned grouping always
appropriate method ?
4. ENTER 2016 Research Track Slide Number 4
Introduction
• We have to divide the subjects
according to their REAL activities
• What is important INDEX of tourists
activities?
– Combination of sites tourists visited
– Order of visits to the sites
– Time they spend in each sites
5. ENTER 2016 Research Track Slide Number 5
Challenge
• A quantitative method is required
in order to clarify and compare
characteristics of tourists activities.
6. ENTER 2016 Research Track Slide Number 6
Sequence Alignment Method
• SAM is:
– the basic tool of bioinformatics
– a method of comparing sequences of
characters and measuring similarity and
difference of them
7. ENTER 2016 Research Track Slide Number 7
Levenshtein Distance
A> AA B B C C C
A> AA B B B C C
A> AA B C C C
B
– To count is the minimal number of
edit operation required to change
one sequence of characters into the other
C 1 edit
operation
1 edit
operation
8. ENTER 2016 Research Track Slide Number 8
Making Sequences of Characters
Study Area
>AAABBCCCDDEE
>AAAACCCDDDE
>AAAAAAADDDEE
A
BC
D
E
A
BC
D
E
A
A
A
B
B
E
CCCD
D
E
Analysis
by Using SAM
9. ENTER 2016 Research Track Slide Number 9
Previous studies
• Wilson (1998):
illustrated SAM use in the analysis of daily
activity patterns derived from time-use
diaries.
10. ENTER 2016 Research Track Slide Number 10
Previous studies
• Shoval and Issacson (2007):
conducted GPS-tracking activity analysis of
tourists visiting the Old City of Akko(Israel)
and obtained a taxonomic guide tree from
which they derived clusters of typical
patters by applying SAM.
• Shoval et al.(2015) conducted typologies of
tourists visiting Hong Kong.
11. ENTER 2016 Research Track Slide Number 11
Preliminary Analysis
• Techniques of SAM for typologies of spatio-
temporal activities are not confirmed
sufficiently
• As a first step,
To follow previous studies
12. ENTER 2016 Research Track Slide Number 12
Study Area
• Ueno Zoo (Tokyo) has:
– 14.2 ha site area
– 2 entrance gates
– 3 exit gates
– East Garden
– West Garden
– bridge
– small monorail line
– footfall of 3.6 million / year
East Garden
West Garden
13. ENTER 2016 Research Track Slide Number 13
Data Collection by GPS Loggers
• To distribute
GPS loggers
at the Main Gate
• To collect loggers
at the 3 exit gates
• We obtained
113 valid sets in the day
Main Gate
14. ENTER 2016 Research Track Slide Number 14
Zoning Zoo Site Eh
Eg
Ei
Eb
Ef
Ee
EaEc
Ed
El
Ej
Ek
Em
Es
En
Ba
Wh
Wa
Ma
Wb
Wc
Wj
Ws
We
Wi
Wv
Wk
Wd
Wf
Wg
• To divide the zoo
site into 30 zones
• Each of the zones
is assigned
a code with two alphabets
15. ENTER 2016 Research Track Slide Number 15
• To convert
the subjects’ locations
into codes once every minute
Eh
Eg
Ei
Eb
Ef
Ee
EaEc
Ed
El
Ej
Ek
Em
Es
En
Ba
Wh
Wa
Ma
Wb
Wc
Wj
Ws
We
Wi
Wv
Wk
Wd
Wf
Wg
Zoning the Zoo Site
16. ENTER 2016 Research Track Slide Number 16
Application
• ClustalTXY (Wilson, 2008)
18. ENTER 2016 Research Track Slide Number 18
Limitation of Preliminary Analysis
Group (i): East Garden
→ West Garden
(46 subjects)
Group (ii): East Garden
→ West Garden
→ East Garden (54 subjects)
Group(iii): other route
(13 subjects)
20. ENTER 2016 Research Track Slide Number 20
Main Analysis
• To conduct typologies by SAM for:
– Group (i)
– Group (ii)
• Group (iii) excluded from this analysis
22. ENTER 2016 Research Track Slide Number 22
Time-Space Path Map
• Time-space path maps represent subjects’
movement by lines that increase in height
by one meter for every elapsed minutes.
23. ENTER 2016 Research Track Slide Number 23
Kernel Density Map
• Kernel density estimation maps represent
the hot spots
from their GPS logs.
24. ENTER 2016 Research Track Slide Number 24
Group (i), Cluster 1 to 4
• Cluster 1,2,3 & 4 have same characteristics
– Cluster 1 is the most typical type
– Cluster 2,3 & 4 are the derivatives of 1
A Typical Route of Cluster 1
25. ENTER 2016 Research Track Slide Number 25
Group (i), Cluster 6
• Cluster 6 stayed
for tens of minutes
at Shinobazu Pond
Terrace in West garden
26. ENTER 2016 Research Track Slide Number 26
For Group (i), Cluster 7
• Cluster 7 went around on the north side of
East Garden also but their direction is
opposite to other clusters.
A Typical Route of Cluster 7
27. ENTER 2016 Research Track Slide Number 27
Main Analysis
6300038Adr14
6300149Adr32
6300058Ddp12
6300030Bfp12
6300046Adp12
6300145Adr12
6300029Ear34
6300075Edq34
6300028Edp34
6300070Adr32
6300087Ecn12
6300150Ecn32
6300052Dco32
6300068Der32
6300049Ecr32
6300100Ecq14
6300105Cco12
6300092Daq12
6300071Ccn32
6300114Ddn12
6300139Dgo12
6300108Den12
6300006Dan12
6300026Deo12
6300142Dfn32
6300113Agn14
6300039Cco34
6300101Cco32
6300053Cgo12
6300065Cco12
6300062Cco12
6300081Agn32
6300051Fgn14
6300098Agn12
6300133Fgn12
6300136Agp12
6300140Egr12
6300025Efq14
6300033Baq32
6300078Bfq12
6300084Efq12
6300141Egq12
6300090Dap32
6300096Dep12
6300048Daq14
6300016Eaq12
6300005Daq32
6300042Edr32
6300066Edq12
6300031Baq34
6300116Ecq14
6300041Ear32
6300007Egn34
6300083Abp32
1212 1111 88991010
For Group (ii)
28. ENTER 2016 Research Track Slide Number 28
Group (ii), Cluster 8 to 10
• Cluster 8,9 and 10 have:
– same typical routes as Cluster 1,2,3 & 4
– differences in terms of routes in West Garden
Time-space path map of Cluster 1Time-space path map of Cluster 8
29. ENTER 2016 Research Track Slide Number 29
Group (ii), Cluster 11 & 12
• Cluster 11 stayed
for tens of minutes
at East Garden Cafeteria
• Cluster 12
contains many routes
30. ENTER 2016 Research Track Slide Number 30
Summarization of Results
• Preliminary analysis are:
– overall pictures of visitors
– lacking some spatio-temporal aspects
• Main analysis represent:
– some typical patterns in their activities
– the detailed characteristics of the patterns
31. ENTER 2016 Research Track Slide Number 31
Conclusions
• The bias of sampling tracking data of
tourists have very important influence for
results of typologies by using SAM
• Samples affect each other as noise
32. ENTER 2016 Research Track Slide Number 32
Future Works
• A quantitative method is required in order
to evaluate clusters of results by using SAM
– To adopt validity measures
– It is useful to check the bias of sampling and
grouping
• Relationship between
clusters and attribute data of subjects
Editor's Notes
In this presentation, I will talk about “HOW TO GET TYPICAL PATTERNS of tourists‘ Spatio-temporal activities“.
Such a typical pattern is called TYPOLOGY.
We are using a method that was based on the Sequence Alignment Method, which is called SAM
I will introduce some previous studies using SAM and our new attempts.
First, I will explain that “Why we need to understand about Spatio-temporal Activities of tourists?“
Activities of people have two consecutive aspects about SPACE and TIME.
Today, we have a variety of GPS devices.
GPS devices are very useful for activity surveys, because we can measure accurate, continuous, worldwide, and three-dimensional position of human subject.
So, we can illustrate where the subjects have passed, and we can tell where the subjects have stayed by using GPS logs.
When conducting such an activity survey, we often make groups by using their attribute data, Age, Gender, Accompanying person and other kind of attribute data.
However, is such a grouping always appropriate method?
The answer would be NO.
Sometimes, we have to divide the subjects according to their real activities.
However, we’ll be faced with a problem.
What is important index of their real activities for grouping?
Combination of sites tourists visited? Order of visits to the sites? Time they spend in each sites?
We don’t have this answer yet.
Because, tourists are very arbitrary in choosing the sites they visit.
So, a quantitative method is required in order to clarify and compare characteristics of tourists’ activities
To solve this difficulty, some researchers have applied the sequence alignment method to the typologies of tourists’ activities.
The sequence alignment method is the basic tool of bioinformatics.
In other words, SAM is a method of comparing sequences of characters and measuring similarity and difference of them.
Sequence similarity and difference are measured using the concept of Levenshtein distance.
Levenshtein distance is defined by the minimal number of edit operation required to change one sequence of characters into the other.
For analyzing Spatio-temporal activities, first of all, we have to convert tourists’ activities into sequences of characters.
Let me explain a procedure of converting.
First, we have to divide a study area into some polygons, and assign characters to each polygons.
Next, for example, if we get a tourist’ trajectory by GPS data like this, we can convert this trajectory into sequence of characters like this.
Some trajectories of the subjects are converted into sequences.
These sequences are analyzed by using SAM, we can get a tree like cluster analysis.
Let me introduce some previous studies.
We showed you some previous studies.
However, techniques of SAM for typologies of spatio-temporal activities are not confirmed sufficiently.
Some aspects still limit the potential of SAM for analysis of spatio-temporal activities.
Thus, we have to find out some issues of SMA for typologies of spatio-temporal activities.
So, as a first step, we decided to follow previous studies and conducted tourists activity surveys by using GPS and typologies by using SAM as preliminary analysis.
We selected the Ueno Zoo for our study because it is well suited for the application of SAM to GPS tracking data of tourists’ activity surveys.
There are some advantages in conducting a GPS tracking survey in a Zoo.
Most exhibitions are located outside.
The number of its entrance and exit gates are limited.
Zoo visitor’s activities are limited in time and space because zoo area is limited.
Ueno Zoo is one of the most popular tourist facilities in Japan.
Ueno Zoo has a 14.2 ha site area, two entrance gates and three exit gates.
The site is partitioned clearly into two gardens; East Garden and West Garden.
A bridge and a small monorail line connect the two gardens.
Ueno Zoo has a footfall of 3.6 million or more visitors per year.
In the holidays, tens of thousands visitors crowd into the zoo site.
In spite of Managers of Ueno Zoo have difficulty about visitor‘s crowding, they didn’t have methodology for clarifying visitors’ activities.
They just counted visitors at entrance gate.
No information about exit gates visitors used and their routes.
So, we conducted the survey to obtain GPS data of visitors.
At the Main Gate, we distributed small GPS loggers to visitors who agreed to participate in our survey.
We set each GPS logger to record the location ONCE IN EVERY SECOND.
When the visitors left the zoo, we collected the loggers from them, and asked them to answer questionnaires to obtain their attribute data.
We obtained 113valid sets of GPS logs and visitors’ attribute data in a day.
We divide the zoo site into thirty zones based on their spatial connections and functions.
Each of the zones was assigned a code with two alphabets.
Using the obtained GPS data, we made sequences of codes.
We converted the subjects’ locations into codes representing zone once every minute.
Actually, we have tested other time resolutions. for example: once every thirty seconds, once every 3 minutes, once every 5 minutes.
As result, LOW-time resolution could not represent the subjects’ trajectories. It looks like teleportation.
HI-time resolution got very complex trees that could not be interpreted as clusters.
Alignment software package ClustalTXY was applied.
As a preliminary analysis, we conducted typologies of all subjects.
As a result, we obtained the tree and assigned nomenclature to the clusters like this.
We found that the clusters have characteristics by two points:
・whether stayed for tens of minutes somewhere or not
・which zone they stayed for tens of minutes
From this result, we could understand some characteristics of visitors’ activities as overall picture.
This is a beneficial outcome in terms of that we are able to find the their rough tendency.
However, we found a incomplete aspect of this typologies.
Every cluster contained many type of routes.
We tried to visualization of their typical routes, but we couldn't interpret their patterns.
As I mentioned earlier, the site of Ueno Zoo is partitioned clearly into two gardens.
So, we divided the subjects into 3 groups based on their broad routes.
Group (i) moved from East Garden to West Garden.
Group (ii) got back to East Garden from West Garden.
Group (iii) took other routes.
Their spatio-temporal activities are clearly different from each other.
However, the subjects of each groups are interspersed into the tree.
It is the most likely due to the influence of long halt time in one zone.
Halting for tens of minutes in one zone is represented by continuous same codes.
Such a continuous same code may cause unfairly high prioritization in calculating mismatch cost.
Some previous studies pointed out this problem.
There is no consensus method or standard calibration procedure for the setting of sequence alignment parameters.
Then, as a main analysis, we conducted typologies by SAM for earlier mentioned groups, but Group(iii) was excluded from this analysis.
As a result of this simple idea, we observed some typical routes in their spatio-temporal activities, and the detailed characteristics of the routes.
First, we obtained this tree as the result of analysis for Group (i) and assigned nomenclature to the clusters like this.
Group (i) has seven clusters as Cluster 1 to 7.
We tried to interpret the details of typical routes of each cluster by using time-space path maps and Kernel density estimation maps.
Time-space path maps represent subjects’ movement by lines that increase in height by one meter for every elapsed minutes.
Kernel density estimation maps represent the hot spots from their GPS logs.
Red places are high density.
This is typical routes of group (i).
Cluster 1,2,3 and 4 have same characteristics of spatio-temporal activities.
Cluster 1 is the most typical type of these 4 clusters.
The subjects of Cluster 1 went around all over the East Garden, like this.
They viewed the almost all exhibition in East Garden probably.
Cluster 2,3,and 4 are considered as the derivatives of Cluster 1.
The subjects of Cluster 6 stayed for tens of minutes at Shinobazu Pond Terrace in West Garden.
The subjects of Cluster 7 went around on the north side of East Garden also but their direction is opposite to other clusters.
The characteristics of Cluster 7 is beneficial data for management of zoo.
Because some aisles in zoo are not wide, their behavior will probably affect the crowding.
Next, we obtained this tree as the result of analysis for Group (ii) and assigned nomenclature to the clusters like this.
The subjects of Group (ii) got back to East Garden from West Garden.
Group (ii) has five clusters as Cluster 8 to 12.
The subjects of Cluster 8,9 and 10 have same typical routes as Cluster 1 to 4 and have differences in terms of routes in West Garden.
The subjects of Cluster 11 stayed for tens of minutes at East Garden Cafeteria or near the free rest area.
In contrast, the subjects of Group (i) tend to not stop at East Garden Cafeteria.
Cluster 12 contains many routes
Let me summarize the result of our analysis.
As preliminary analysis, we conducted tourists’ activity survey by using GPS and typologies of their spatio-temporal activities by using SAM to follow previous studies.
These results are useful as overall pictures of visitors, but they are lacking some spatio-temporal aspects.
Therefore, we divided the subjects into 3 groups based on their broad routes and conducted typologies for 2 groups.
As a result, we observed some typical patterns in their activities and the detailed characteristics of the patterns.
Now, this is our conclusion.
What we can say from our study is that the bias of sampling tracking data of tourists have very important influence for results of typologies by using SAM.
Probably, Samples affect each other as noise
However, we mitigated this problem with a simple work.
We have to generalize over this results.
As future works, first, We consider a
More precisely, we have to adopt validity measures in order to evaluate clusters.
Second, we are concerned with a relationship Clusters and attribute data of subjects
In addition, we have to conduct this analysis with larger number of subjects.
Thank you for your attention.