2. Introduction: Places and geosocial media
Methods: Discovering relationships between
semantic similarity and geographic location
Results: London case study
Outlook: Current and future work
29.09.2015F.O.Ostermann - ISSDQ 2015 2
EXTRACTING AND COMPARING PLACES USING
GEOSOCIAL MEDIA
LONDON CASE STUDY
3. 29.09.2015F.O.Ostermann - ISSDQ 2015 3
RESEARCH OBJECTIVES
INTRODUCTION
Contribute to our understanding of the semantics of places
by combining
methods from diverse disciplines
and
datasets from several sources,
in particular geosocial media.
4. 29.09.2015F.O.Ostermann - ISSDQ 2015 4
WHY GEOSOCIAL MEDIA? WHY SEMANTICS OF PLACES?
INTRODUCTION
• rich and multi-faceted view on the perception and
semantics of geographic places
• improve interoperability between existing geospatial
datasets and geographic information retrieval for
future streams of geographic information
5. 29.09.2015F.O.Ostermann - ISSDQ 2015 5
GEOSOCIAL MEDIA
INTRODUCTION
Geography
Explicit Implicit
Participation
Explicit
Volunteered Geographic
Information (VGI)
Open Street Map
Volunteered Geographic Content
(VGC)
Wikipedia articles on non-geographic
topics containing place names,
Foursquare
Implicit
Contributed / Ambient
Geographic Information (CGI/AGI)
Public Tweets referring to the
properties of an identifiable place.
User-Generated Geographic Content
(UGGC)
Public Flickr images containing a place
name or being georeferenced
Adopted from [1]
6. 29.09.2015F.O.Ostermann - ISSDQ 2015 6
CONCEPTUALIZING PLACES
INTRODUCTION
Agnew (1987):
• Specific location (where?)
• Locale (properties)
• Sense of place (attachment)
Winter and Freksa (2012):
• Place through contrast
• Spatio-temporal location
• Semantics
7. 29.09.2015F.O.Ostermann - ISSDQ 2015 7
DISCOVERING PLACES IN/FROM GEOSOCIAL MEDIA
INTRODUCTION
• Theory-guided research and local case study:
• How to people see and understand the places they frequent?
• What is different across media sources?
• More than one (volunteered) data source
• Identification of places and their semantics
• Comparison of places between data sources
• Comparison of places with geographic features and authoritative
data sources
8. 29.09.2015F.O.Ostermann - ISSDQ 2015 8
CASE STUDY RESEARCH QUESTIONS
INTRODUCTION
• Can we identify distinct places in one data source
(Flickr) based on purely spatial clustering from another
data source (Twitter)?
• How does semantic similarity between places vary
over space and scale? How does Tobler’s first law of
geography hold with regards to scale and space?
• Study Area: Greater London
9. Introduction: Places and geosocial media
Methods: Discovering relationships between
semantic similarity and geographic location
Results: London case study
Outlook: Current and future work
29.09.2015F.O.Ostermann - ISSDQ 2015 9
EXTRACTING AND COMPARING PLACES USING
GEOSOCIAL MEDIA
LONDON CASE STUDY
10. 29.09.2015F.O.Ostermann - ISSDQ 2015 10
OVERALL WORKFLOW
METHODS
Mine Twitter
for potential
places
Mine Flickr
for matching
images
Build term
vectors for
image
clusters
Calculate
cosine
similarities
Analyse
correlation
and spatial
variations
11. 29.09.2015F.O.Ostermann - ISSDQ 2015 11
GEOSOCIAL DATA SET #1: TWITTER
METHODS
Twitter: abundant but noisy and not very rich in content
What: All geo-referenced Tweets
Where: Greater London Area
When: Nov 5, 2012 - October 3, 2013 (334 days)
Who: No tourists, please. Only users with Tweets spanning 30+ day
15,246,565 Tweets from 40,246 users.
13. 29.09.2015F.O.Ostermann - ISSDQ 2015 13
GEOSOCIAL DATA SET
METHODS
• Resulted in 55,000 potential places
• Too high for an exploratory analysis
• Fillter out clusters with < 100
distinct users
• 3501 clusters for further analysis
14. 29.09.2015F.O.Ostermann - ISSDQ 2015 14
FROM TWEETS TO PLACES
METHODS
Flickr: less abundant, but more stable, focused on geo, and richer
What: All geo-referenced Flickr images
Where: Greater London Area bounding box
When: until November 2014
More than five million images.
15. 29.09.2015F.O.Ostermann - ISSDQ 2015 15
MEASURING SEMANTIC SIMILARITY
METHODS
Cosine similarity
• Cosine of the angle between two vectors
• If vectors have same orientation, angle is 0°, cosine similarity is 1
• Orthogonal vectors have cosine similarity of 0
• Serves as an approximation of semantic similarity between places
• Has been used successfully in Geographic Information Retrieval
Hypothesis
Based on Tobler’s first law of geography, we expect a negative
correlation between distance and cosine similarity
16. 29.09.2015F.O.Ostermann - ISSDQ 2015 16
MEASURING SENSE OF PLACE
METHODS
1. Buffer Tweet clusters
2. Point-in-polygon intersection Flickr images and Tweet clusters using
PostGIS (inner join with multiple entries).
3. Build term vectors for remaining Flickr images through lexical
matching, using set of activities, elements, or qualities (Purves et al.
2011) in titles, descriptions, tags
4. Aggregate to Twitter clusters
5. Normalize to binary (present, not present)
17. Introduction: Places and geosocial media
Methods: Discovering relationships between
semantic similarity and geographic location
Results: London case study
Outlook: Current and future work
29.09.2015F.O.Ostermann - ISSDQ 2015 17
EXTRACTING AND COMPARING PLACES USING
GEOSOCIAL MEDIA
LONDON CASE STUDY
18. 29.09.2015F.O.Ostermann - ISSDQ 2015 18
SEMANTIC SIMILARITY BETWEEN NEIGHBORS
RESULTS
High similarity suggests they
might not be distinct places in
the sense of Freksa and
Winter (2012)
Exploratory analysis
Cosine similarity for combined term vectors with nearest neighbor
(asymmetrical)
20. 29.09.2015F.O.Ostermann - ISSDQ 2015 20
SIMILARITY AND DISTANCE NEAREST NEIGHBORS
METHODS
Normality not given (Shapiro-Wilk)
Non-parametric Kendall’s Tau correlation tests
• Weak to moderate negative correlation
• Kendall's rank correlation tau z = -25.8158, p-value < 0.000
sample estimates tau -0.2921797).
• Negative correlation consistent with Tobler’s first law of
geography,
• Shows that near things are in general more related than distant
things.
21. 29.09.2015F.O.Ostermann - ISSDQ 2015 21
CORRELATION OVER DISTANCE FOR ALL PAIRS
RESULTS
• Calculated cosine similarity
for all cluster pairs
• Calculated Kendall’s Tau
for all clusters
• Heterogeneous spatial
variation (non-stationarity)
26. 29.09.2015F.O.Ostermann - ISSDQ 2015 26
OUTLIERS OR PLACES?
RESULTS
• Correlation between distance and cosine similarity is stronger in the
city centre
• Shorter distances to all others, and correlation breaks down at some
distance
• Downtown London has higher average similarity scores than
periphery
• Shortest distance band shows clearly clusters of high average
similarity scores, suggesting areas that are internally more
semantically similar than others
27. 29.09.2015F.O.Ostermann - ISSDQ 2015 27
BACK TO SQUARE 1
RESULTS
• We can identify several coarse places when comparing the average
cosine similarity for low distance bands
• Negative correlation between distance and cosine similarity is
strongest for smaller distances, and flattens out over longer
distances. Consistent with Li et al. (2014), showing that Tobler’s first
law of geography is only consistently true within a specific distance
• Results support our assumption that distinct locales are discoverable
through geographic semantics in user-generated geographic content.
28. Introduction: Places and geosocial media
Methods: Discovering relationships between
semantic similarity and geographic location
Results: London case study
Outlook: Current and future work
29.09.2015F.O.Ostermann - ISSDQ 2015 28
EXTRACTING AND COMPARING PLACES USING
GEOSOCIAL MEDIA
LONDON CASE STUDY
29. 29.09.2015F.O.Ostermann - ISSDQ 2015 29
FUTURE WORK
OUTLOOK
• Measure correlation between similarity independently in the 3
dimensions of activities, elements and qualities.
• Measure the impact of the temporal dimension by investigating time
slices of Twitter and Flickr data.
• Merge neighbouring places with cosine similarity greater than some
given threshold value in an iterative clustering process
• Ground the resulting places through POIs from OSM and in-depth
qualitative analysis
30. 23.06.2015F.O.Ostermann - ifgi GI-Forum 30
HYBRID GEO-INFORMATION PROCESSING
OUTLOOK
Time-consuming and resource-intensive
• Manual annotation and ground truthing
• Parameterization of spatio-temporal clustering
Other challenges:
• Dependency on data quality
• Overfitting
• Diversity of contexts and tasks
• Near real-time
Crowdsourced Supervision
32. 29.09.2015F.O.Ostermann - ISSDQ 2015 32
EXTRACTING AND COMPARING PLACES USING
GEOSOCIAL MEDIA
Thank you!
f.o.ostermann@utwente.nl
@f_ostermann
nl.linkedin.com/in/foost
[1] Craglia, M., Ostermann, F., & Spinsanti, L. (2012). Digital Earth from vision to
practice: making sense of citizen-generated content. International Journal of Digital
Earth, 5(5), 398–416.