REVEALING SPATIAL AND TEMPORAL PATTERNS FROM FLICKR
A CASE STUDY WITH TOURISTS IN AMSTERDAM
TOURISM IN AMSTERDAM
RAPID GROWTH
Source: Nicky Otten (Flickr)
MORE AND MORE CONCERNS ABOUT TOURISM
A SELECTION OF RECENT NEWS ARTICLES
They are puking and peeing on the Zeedijk
NOS, December 5 2014
Is Amsterdam becoming a second Venice?
De Morgen, March 27 2015
The center of Amsterdam should not become too popular
Volkskrant, October 25 2014
Amsterdam taken over by tourists
RTL, April 3 2015
Amsterdam will welcome twice as many tourists in 2030
Het Parool, December 9 2014
INITIAL RESEARCH TOPIC
WAGENINGEN UNIVERSITY AND AMS
Explore the possibilities to use (geo)tweets for detecting
spatial and temporal patterns of tourists in Amsterdam
But why Twitter? How about Flickr?
Twitter Flickr
Number of users + + + / -
Amount of data + + +
Connection of data to real location + / - + +
Use by tourists + / - + +
Interval between subsequent posts + / - + +
RESEARCH PROJECT
The objective of this exploratory research project is to develop,
implement and test methods that reveal spatial and temporal patterns
of tourists from a large dataset of geotagged Flickr photos
OBJECTIVE
RESEARCH QUESTIONS
RQ-01: What methods are available to detect spatial and temporal 	
	 	 patterns from geosocial data?
RQ-02: What methods need to be implemented to identify 	 	 	
	 	 temporal distributions, spatial clusters and popular routes of 	
	 	 tourists from the metadata of Flickr photos?
RQ-03: How well do the identified temporal distributions, spatial 	 	
	 	 clusters and popular routes resemble the spatial and temporal
	 	 behaviour of tourists?
FLICKR DATA COLLECTION
FLICKR DATA COLLECTION
OVERVIEW OF STEPS & TECHNIQUES
Flickr Database
(API)
Request
Local database
(PostgreSQL)
Java application
XML-file
Metadata
Restriction: 1 request per second
FLICKR DATA COLLECTION
STEP 1: HARVESTING PHOTO ID’S WITHIN BOUNDING BOXES (1550)
Search parameters:
• Xmin, Xmax, Ymin, Ymax
• Min date: January 1, 2005
• Max date: December 31, 2014
Search result:
• Photo ID
• User ID
• Photo title
FLICKR DATA COLLECTION
STEP 2: REQUESTING ADDITIONAL METADATA
Search parameters:
• Photo ID
Search result:
• Latitude, longitude
• Date and time
• User name
• User home location
• Tags
• Photo URL
• Location accuracy
2.849.261 photos
+/- 5 weeks of harvesting
FLICKR DATA COLLECTION
STEP 2: REQUESTING ADDITIONAL METADATA
Search parameters:
• Photo ID
484.346 photos
Search result:
• Latitude, longitude
• Date and time
• User name
• User home location
• Tags
• Photo URL
• Location accuracy
FLICKR DATA EXPLORATION
PHOTOS IN QGIS
FLICKR DATA EXPLORATION
SELECTION OF PHOTOS IN GOOGLE EARTH
TOURIST CLASSIFICATION
BASED ON USER’S HOME LOCATION
TOURIST CLASSIFICATION
1. Classification of user location by SQL
UPDATE users
SET countryname = 'Japan', istourist = 'True', classification = 'SQL'
WHERE geoname = '' AND userid IN
(SELECT userid FROM users WHERE (userlocation ~* 'y(japan|nippon|日本)y'))
(8628 users - 54%)
SQL AND ONLINE GEOCODING
Geonames API
(External database)
PostgreSQL
(Local database)
Java Application
2. Classification of user location by online geocoding
Tokyo Tokyo
Japan Japan
(450 users - 3%)
User location = Tokyo Tokyo = Japan
NUMBER OF UNIQUE USERS
0
1.750
3.500
5.250
7.000
6.914
6.257
2.821
17,6% 39,1% 43,2%
Locals Tourists Unclassified
TOURIST CLASSIFICATION
Overall accuracy = 99%
NUMBER OF UNIQUE PHOTOS
0
40.000
80.000
120.000
160.000
132.213
107.016
154.599
39,3% 27,2% 33,6%
Local Photos Tourist Photos Unclassified Photos
TOURIST CLASSIFICATION
Overall accuracy = 99%
CLASSIFICATION RESULTS AMSTERDAM
RELATIVE AMOUNT OF TOURISTS PER NATIONALITY (2013)
United States
United Kingdom
Germany
Italy
Spain
France
0% 5% 10% 15% 20%
Flickr nationalities 2013
CBS hotel nationalities 2013
TEMPORAL DISTRIBUTIONS
DIFFERENT GRANULARITIES
TEMPORAL DISTRIBUTIONS
RELATIVE NUMBER OF TOURISTS AND PHOTOS PER HOUR (2005-2014)
0%
2%
4%
6%
8%
10%
1:00
2:00
3:00
4:00
5:00
6:00
7:00
8:00
9:00
10:00
11:00
12:00
13:00
14:00
15:00
16:00
17:00
18:00
19:00
20:00
21:00
22:00
23:00
0:00
Tourists
Tourist photos
Many daytime
photos
TEMPORAL DISTRIBUTIONS
RELATIVE NUMBER OF TOURISTS AND LOCALS PER HOUR (2005-2014)
0%
2%
4%
6%
8%
10%
1:00
2:00
3:00
4:00
5:00
6:00
7:00
8:00
9:00
10:00
11:00
12:00
13:00
14:00
15:00
16:00
17:00
18:00
19:00
20:00
21:00
22:00
23:00
0:00
Tourists
Locals
Maximums shifted
Relatively more
tourists photos
in the night
More local
photos in
the evening
Exact match
2 hours off
TIMESTAMP VALIDATION
TIME DIFFERENCE BETWEEN PHOTO TIMESTAMP AND REAL TIME
TIMESTAMP VALIDATION
TIME DIFFERENCE BETWEEN PHOTO TIMESTAMP AND REAL TIME
Selecting
• all photos tagged with ‘clock’
• all photos near Central Station
!
1032 photos of locals
1134 photos of tourists
Result
• 70 suitable photos of tourists
• 50 suitable photos of locals
0%
20%
40%
60%
80%
-10:00:00
-9:00:00
-8:00:00
-7:00:00
-6:00:00
-5:00:00
-4:00:00
-3:00:00
-2:00:00
-1:00:00
0:00:00
1:00:00
2:00:00
3:00:00
4:00:00
5:00:00
6:00:00
7:00:00
8:00:00
9:00:00
10:00:00
Locals
Tourists
TIMESTAMP VALIDATION
TIME DIFFERENCE BETWEEN PHOTO TIMESTAMP AND REAL TIME
PHOTOGRAPHERS PER DAY OF THE WEEK (2005-2014)
0%
5%
10%
15%
20% Monday
Tuesday
Wednesday
Thursday
Friday
Saturday
Sunday
Tourists
Locals
TEMPORAL DISTRIBUTIONS
PHOTOGRAPHERS PER MONTH (2005-2014)
0%
2%
4%
6%
8%
10%
12%
January
February
March
April
May
June
July
August
September
October
November
December
Tourists
Locals
TEMPORAL DISTRIBUTIONS
TOURISTS AND FOREIGN HOTEL GUESTS PER MONTH (2012+2013)
0%
2%
4%
6%
8%
10%
12%
January
February
March
April
May
June
July
August
September
October
November
December
Tourists (Flickr 2012 + 2013)
Hotel guests (CBS 2012 + 2013)
TEMPORAL DISTRIBUTIONS
0
40
80
120
160
200
1 365
Locals
Tourists
PHOTOGRAPHERS PER DAY OF THE YEAR (2005-2014)
Queens-day
TEMPORAL DISTRIBUTIONS
SPATIAL DISTRIBUTION
GRID-BASED CLUSTERING
SPATIAL DISTRIBUTION
GRID-BASED CLUSTERING
1 1
1 1 1 1
1
1 1
2
111
2 31
1
1 1 1
112
EXPLORING THE DATA
TOURIST COUNT PER HEXAGON IN GOOGLE EARTH
SPATIAL DISTRIBUTION
DENSITY-BASED CLUSTERING
SPATIAL DISTRIBUTION
DENSITY-BASED CLUSTERING
DBSCAN: Density-Based Spatial Clustering for Applications with Noise
• Detects clusters with different shapes and sizes
• Not sensitive to noise very suitable for geosocial data
!
• Eps: radius search area
• MinPts: minimum number of points in neighborhood
Eps
Noise
MinPts=4
SPATIAL DISTRIBUTION
DENSITY-BASED CLUSTERING
SPATIAL DISTRIBUTION
DENSITY-BASED CLUSTERING
SPATIAL DISTRIBUTION
DENSITY-BASED CLUSTERING
SPATIAL DISTRIBUTION
DENSITY-BASED CLUSTERING
TOURISTIC ROUTES
ONE DAY IN THE LIFE OF A TOURIST
TOURISTIC ROUTES
LINEAR TRAJECTORIES OF MANY TOURISTS
TOURISTIC ROUTES
LINEAR TRAJECTORIES BETWEEN CLUSTERS
TOURISTIC ROUTES
TOURISTIC ROUTES
RELATING TRAJECTORIES TO STREET NETWORK USING ROUTING ALGORITHM
As the crow flies Trajectory over network
STEP 1: CREATE A SIMPLIFIED PEDESTRIAN NETWORK
TOURISTIC ROUTES
Original Aggregate road links Densify road links
TOURISTS TAKE THE MOST POPULAR ROUTES
TOURISTIC ROUTES
STEP 2: REDUCE TRAVEL COST PER ROAD SEGMENT BASED ON PHOTO DENSITY
TOURISTIC ROUTES
2,6
1,9
1,4
4,2
3,1
1,8
6,9
6,2
4,1
7,3
9,3
9,6
1. Create pairs of time-ordered photo locations per user
Point A Point B
Point B Point C
… …
!
2. Calculate distance, time interval and speed per photo pair
3. Select all photo pairs within thresholds:
• Distance > 50 m and < 750 m
• Time interval > 0 sec and < 600 sec
• Speed > 1 km/h and < 5 km/h
4. Calculate closest network node for start and end of every pair
TOURISTIC ROUTES
STEP 3: CREATE PHOTO PAIRS FOR ROUTING
TOURISTIC ROUTES
STEP 4: CALCULATE ROUTES AND AGGREGATE INTO ROUTE DENSITY MAP
1. Calculate route for 6,477 photo pairs with pgRouting
2. Aggregate and count overlaying route segments
3. Visualize touristic route densities
TOURISTIC CLUSTERS AND ROUTES
VALIDATION OF RESULTS
Solution: 	 Expert judgement by a questionnaire
Participants: 8 tourism experts from different departments of the
	 	 	 municipality of Amsterdam
Problem: 	 No comparable quantitative data available
TOURISTIC ROUTES
VALIDATION OF RESULTS BY 8 TOURISM EXPERTS
Match: 75% Match: 38% Match: 75%
Match: 100% Match: 100% Match: 63%
Match: 100% Match: 67% Match: 67%
Match: 100% Match: 100% Match: 100%
WITH HIGH CONFIDENCE (5/5)3
VALIDATION OF RESULTS
TOURISTIC CLUSTERS AND ROUTES
Expert # Profession
Validity 

results [1-5]
Usefulness 

results [1-5]
1 Policy Advisor Traffic & Public Space 4 5
2 Data Analyst, Information en Statistics 4 4
3 Senior Advisor Traffic Management 4 4
4 Researcher, Information en Statistics 3 4
5 Senior Advisor Traffic Research 5 4
6 Urban Planner 5 5
7 Urban Planner 4 5
8 Urban Designer 4 5
4.1 4.5
How well do the study outcomes resemble the real world?
Are the study outcomes useful for you or for your organization?
*
**
* **
SUGGESTIONS FOR FUTURE WORK
AND POTENTIAL THESIS TOPICS
• Calibrate thresholds with quantitative data
• Extensive validation of results in cooperation with tourism experts
• Cooperate with municipality to define objectives, some suggestions:
Additional data sources: Instagram, Twitter, Sina Weibo
Divide spatial distributions in different temporal intervals
Compare spatial distribution of locals and tourists
Divide the spatial distributions in different nationalities
Use the presented patterns as input for an agent-based model
Discover typical tourism problems with other geosocial data types
THANK YOU FOR YOUR ATTENTION!
ANY QUESTIONS OR REMARKS?

Revealing spatial and temporal patterns from Flickr photography: a case study with tourists in Amsterdam

  • 1.
    REVEALING SPATIAL ANDTEMPORAL PATTERNS FROM FLICKR A CASE STUDY WITH TOURISTS IN AMSTERDAM
  • 2.
    TOURISM IN AMSTERDAM RAPIDGROWTH Source: Nicky Otten (Flickr)
  • 3.
    MORE AND MORECONCERNS ABOUT TOURISM A SELECTION OF RECENT NEWS ARTICLES They are puking and peeing on the Zeedijk NOS, December 5 2014 Is Amsterdam becoming a second Venice? De Morgen, March 27 2015 The center of Amsterdam should not become too popular Volkskrant, October 25 2014 Amsterdam taken over by tourists RTL, April 3 2015 Amsterdam will welcome twice as many tourists in 2030 Het Parool, December 9 2014
  • 4.
    INITIAL RESEARCH TOPIC WAGENINGENUNIVERSITY AND AMS Explore the possibilities to use (geo)tweets for detecting spatial and temporal patterns of tourists in Amsterdam But why Twitter? How about Flickr? Twitter Flickr Number of users + + + / - Amount of data + + + Connection of data to real location + / - + + Use by tourists + / - + + Interval between subsequent posts + / - + +
  • 5.
    RESEARCH PROJECT The objectiveof this exploratory research project is to develop, implement and test methods that reveal spatial and temporal patterns of tourists from a large dataset of geotagged Flickr photos OBJECTIVE RESEARCH QUESTIONS RQ-01: What methods are available to detect spatial and temporal patterns from geosocial data? RQ-02: What methods need to be implemented to identify temporal distributions, spatial clusters and popular routes of tourists from the metadata of Flickr photos? RQ-03: How well do the identified temporal distributions, spatial clusters and popular routes resemble the spatial and temporal behaviour of tourists?
  • 6.
  • 7.
    FLICKR DATA COLLECTION OVERVIEWOF STEPS & TECHNIQUES Flickr Database (API) Request Local database (PostgreSQL) Java application XML-file Metadata Restriction: 1 request per second
  • 8.
    FLICKR DATA COLLECTION STEP1: HARVESTING PHOTO ID’S WITHIN BOUNDING BOXES (1550) Search parameters: • Xmin, Xmax, Ymin, Ymax • Min date: January 1, 2005 • Max date: December 31, 2014 Search result: • Photo ID • User ID • Photo title
  • 9.
    FLICKR DATA COLLECTION STEP2: REQUESTING ADDITIONAL METADATA Search parameters: • Photo ID Search result: • Latitude, longitude • Date and time • User name • User home location • Tags • Photo URL • Location accuracy 2.849.261 photos +/- 5 weeks of harvesting
  • 10.
    FLICKR DATA COLLECTION STEP2: REQUESTING ADDITIONAL METADATA Search parameters: • Photo ID 484.346 photos Search result: • Latitude, longitude • Date and time • User name • User home location • Tags • Photo URL • Location accuracy
  • 11.
  • 12.
    FLICKR DATA EXPLORATION SELECTIONOF PHOTOS IN GOOGLE EARTH
  • 13.
    TOURIST CLASSIFICATION BASED ONUSER’S HOME LOCATION
  • 14.
    TOURIST CLASSIFICATION 1. Classificationof user location by SQL UPDATE users SET countryname = 'Japan', istourist = 'True', classification = 'SQL' WHERE geoname = '' AND userid IN (SELECT userid FROM users WHERE (userlocation ~* 'y(japan|nippon|日本)y')) (8628 users - 54%) SQL AND ONLINE GEOCODING Geonames API (External database) PostgreSQL (Local database) Java Application 2. Classification of user location by online geocoding Tokyo Tokyo Japan Japan (450 users - 3%) User location = Tokyo Tokyo = Japan
  • 15.
    NUMBER OF UNIQUEUSERS 0 1.750 3.500 5.250 7.000 6.914 6.257 2.821 17,6% 39,1% 43,2% Locals Tourists Unclassified TOURIST CLASSIFICATION Overall accuracy = 99%
  • 16.
    NUMBER OF UNIQUEPHOTOS 0 40.000 80.000 120.000 160.000 132.213 107.016 154.599 39,3% 27,2% 33,6% Local Photos Tourist Photos Unclassified Photos TOURIST CLASSIFICATION Overall accuracy = 99%
  • 17.
    CLASSIFICATION RESULTS AMSTERDAM RELATIVEAMOUNT OF TOURISTS PER NATIONALITY (2013) United States United Kingdom Germany Italy Spain France 0% 5% 10% 15% 20% Flickr nationalities 2013 CBS hotel nationalities 2013
  • 18.
  • 19.
    TEMPORAL DISTRIBUTIONS RELATIVE NUMBEROF TOURISTS AND PHOTOS PER HOUR (2005-2014) 0% 2% 4% 6% 8% 10% 1:00 2:00 3:00 4:00 5:00 6:00 7:00 8:00 9:00 10:00 11:00 12:00 13:00 14:00 15:00 16:00 17:00 18:00 19:00 20:00 21:00 22:00 23:00 0:00 Tourists Tourist photos Many daytime photos
  • 20.
    TEMPORAL DISTRIBUTIONS RELATIVE NUMBEROF TOURISTS AND LOCALS PER HOUR (2005-2014) 0% 2% 4% 6% 8% 10% 1:00 2:00 3:00 4:00 5:00 6:00 7:00 8:00 9:00 10:00 11:00 12:00 13:00 14:00 15:00 16:00 17:00 18:00 19:00 20:00 21:00 22:00 23:00 0:00 Tourists Locals Maximums shifted Relatively more tourists photos in the night More local photos in the evening
  • 21.
    Exact match 2 hoursoff TIMESTAMP VALIDATION TIME DIFFERENCE BETWEEN PHOTO TIMESTAMP AND REAL TIME
  • 22.
    TIMESTAMP VALIDATION TIME DIFFERENCEBETWEEN PHOTO TIMESTAMP AND REAL TIME Selecting • all photos tagged with ‘clock’ • all photos near Central Station ! 1032 photos of locals 1134 photos of tourists Result • 70 suitable photos of tourists • 50 suitable photos of locals
  • 23.
  • 24.
    PHOTOGRAPHERS PER DAYOF THE WEEK (2005-2014) 0% 5% 10% 15% 20% Monday Tuesday Wednesday Thursday Friday Saturday Sunday Tourists Locals TEMPORAL DISTRIBUTIONS
  • 25.
    PHOTOGRAPHERS PER MONTH(2005-2014) 0% 2% 4% 6% 8% 10% 12% January February March April May June July August September October November December Tourists Locals TEMPORAL DISTRIBUTIONS
  • 26.
    TOURISTS AND FOREIGNHOTEL GUESTS PER MONTH (2012+2013) 0% 2% 4% 6% 8% 10% 12% January February March April May June July August September October November December Tourists (Flickr 2012 + 2013) Hotel guests (CBS 2012 + 2013) TEMPORAL DISTRIBUTIONS
  • 27.
    0 40 80 120 160 200 1 365 Locals Tourists PHOTOGRAPHERS PERDAY OF THE YEAR (2005-2014) Queens-day TEMPORAL DISTRIBUTIONS
  • 28.
  • 29.
    SPATIAL DISTRIBUTION GRID-BASED CLUSTERING 11 1 1 1 1 1 1 1 2 111 2 31 1 1 1 1 112
  • 30.
    EXPLORING THE DATA TOURISTCOUNT PER HEXAGON IN GOOGLE EARTH
  • 31.
  • 32.
    SPATIAL DISTRIBUTION DENSITY-BASED CLUSTERING DBSCAN:Density-Based Spatial Clustering for Applications with Noise • Detects clusters with different shapes and sizes • Not sensitive to noise very suitable for geosocial data ! • Eps: radius search area • MinPts: minimum number of points in neighborhood Eps Noise MinPts=4
  • 33.
  • 34.
  • 35.
  • 36.
  • 37.
  • 38.
    ONE DAY INTHE LIFE OF A TOURIST TOURISTIC ROUTES
  • 40.
    LINEAR TRAJECTORIES OFMANY TOURISTS TOURISTIC ROUTES
  • 41.
    LINEAR TRAJECTORIES BETWEENCLUSTERS TOURISTIC ROUTES
  • 42.
    TOURISTIC ROUTES RELATING TRAJECTORIESTO STREET NETWORK USING ROUTING ALGORITHM As the crow flies Trajectory over network
  • 43.
    STEP 1: CREATEA SIMPLIFIED PEDESTRIAN NETWORK TOURISTIC ROUTES Original Aggregate road links Densify road links
  • 44.
    TOURISTS TAKE THEMOST POPULAR ROUTES TOURISTIC ROUTES
  • 45.
    STEP 2: REDUCETRAVEL COST PER ROAD SEGMENT BASED ON PHOTO DENSITY TOURISTIC ROUTES 2,6 1,9 1,4 4,2 3,1 1,8 6,9 6,2 4,1 7,3 9,3 9,6
  • 46.
    1. Create pairsof time-ordered photo locations per user Point A Point B Point B Point C … … ! 2. Calculate distance, time interval and speed per photo pair 3. Select all photo pairs within thresholds: • Distance > 50 m and < 750 m • Time interval > 0 sec and < 600 sec • Speed > 1 km/h and < 5 km/h 4. Calculate closest network node for start and end of every pair TOURISTIC ROUTES STEP 3: CREATE PHOTO PAIRS FOR ROUTING
  • 47.
    TOURISTIC ROUTES STEP 4:CALCULATE ROUTES AND AGGREGATE INTO ROUTE DENSITY MAP 1. Calculate route for 6,477 photo pairs with pgRouting 2. Aggregate and count overlaying route segments 3. Visualize touristic route densities
  • 48.
    TOURISTIC CLUSTERS ANDROUTES VALIDATION OF RESULTS Solution: Expert judgement by a questionnaire Participants: 8 tourism experts from different departments of the municipality of Amsterdam Problem: No comparable quantitative data available
  • 49.
    TOURISTIC ROUTES VALIDATION OFRESULTS BY 8 TOURISM EXPERTS Match: 75% Match: 38% Match: 75% Match: 100% Match: 100% Match: 63% Match: 100% Match: 67% Match: 67% Match: 100% Match: 100% Match: 100% WITH HIGH CONFIDENCE (5/5)3
  • 50.
    VALIDATION OF RESULTS TOURISTICCLUSTERS AND ROUTES Expert # Profession Validity results [1-5] Usefulness results [1-5] 1 Policy Advisor Traffic & Public Space 4 5 2 Data Analyst, Information en Statistics 4 4 3 Senior Advisor Traffic Management 4 4 4 Researcher, Information en Statistics 3 4 5 Senior Advisor Traffic Research 5 4 6 Urban Planner 5 5 7 Urban Planner 4 5 8 Urban Designer 4 5 4.1 4.5 How well do the study outcomes resemble the real world? Are the study outcomes useful for you or for your organization? * ** * **
  • 51.
    SUGGESTIONS FOR FUTUREWORK AND POTENTIAL THESIS TOPICS • Calibrate thresholds with quantitative data • Extensive validation of results in cooperation with tourism experts • Cooperate with municipality to define objectives, some suggestions: Additional data sources: Instagram, Twitter, Sina Weibo Divide spatial distributions in different temporal intervals Compare spatial distribution of locals and tourists Divide the spatial distributions in different nationalities Use the presented patterns as input for an agent-based model Discover typical tourism problems with other geosocial data types
  • 52.
    THANK YOU FORYOUR ATTENTION! ANY QUESTIONS OR REMARKS?