4. +
Introduction
n Twitter has Large volume of timely data
n Social and business value.
n Users casually or implicitly reveal their
locations and short term visiting plans.
4
6. +
Problem Definition
n Given:
n a tweet t published by a user from a
predefined geographical region
n a set of temporal awareness label
n c ∈ { past, present, future, non-temporal }
n Goal:
n extract all POI and temporal awareness label
pairs from t.
n ( POI name, temporal awareness label )
6
13. +
Data analysis and observations
n Data set
n 4.33M tweets from 19,256 unique Singapore-
based users during June 2010
n 222,201 tweets mentions at least one
candidate POI name from by 13,758 unique
users.
13
14. +
Data analysis and observations
n Observation 1
Many users reveal their fine-grained locations
in their tweets.
n 71.4% of all users in the dataset
n 91.3% of the users who had published at
least 20 tweets
14
15. +
Data analysis and observations
n Observation 2
The candidate POI mentions are mostly very
short with one or two words. Many of the
mentions are partial location names.
n 46.7% of the candidate POI names are
unigrams
n 41.6%+ of the candidate POI names are
partial POI names
15
16. +
Data analysis and observations
n Observation 3
About half of the candidate POI mentions
indeed refer to locations and their associated
temporal awareness can be determined.
16
17. +
Data analysis and observations
n Observation 3
About half of the candidate POI mentions
indeed refer to locations and their associated
temporal awareness can be determined.
17
18. +
Data analysis and observations
n Observation 4
Among all POIs that were visited, or to be
visited, about 90% of the visits to these POIs
happen within a day.
18
20. +
Time-aware POI tagger
n Lexical features
n Basic: word itself, lowercase form, all-cap …
n Contextual: context window, preceding,
following
n Off to jp now ! Hope it DOESN’t rain
n Off to jp now ! Hope it DOESN’t rain
n Off to jp now ! Hope it DOESN’t rain
20
21. +
Time-aware POI tagger
n Grammatical Features
n Part-of-speech (POS)
n The time-trend score T(t) of the tweet
n The closest verb/time-trend word
n its time-trend score, distance, left/right-hand side
indicator
21
22. +
Time-aware POI tagger
n a dictionary D of time-trend words,
containing commonly used words with
manually assigned time-trend scores
n D = { “future”:1, “present”:0, “past”:-1, “is”:0,
“was”:-1 … }
n a tweet t: “Soccer fever at mac now!”
n Is w ∈ t matching an entry in D?
n Considering tags of Verbs.
22
24. +
Time-aware POI tagger
n Geographical Features
n Spatial randomness : POI name entorpy
n Location name confidence.
n Multiple candidate POI mention.
24
26. +
Time-aware POI tagger
n Geographical Features
n Spatial randomness. – entorpy/identifiability
n Location name confidence. – count
n Multiple candidate POI mention.
26
27. +
Time-aware POI tagger
n BILOU Schema Features
n Identify Beginning, Inside and Last word of a
multi-word POI name, and Unit-length POI
name, and the words Outside of any POI
names
27
31. +
Experiment: feature analysis
n Lex features are better for POI mention
disambiguation
n Gra features are better for resolving
temporal awareness
31
34. +
Conclusion
n PETAR facilitate the fine-grained location-
based services/marketing and
personalization.
n a POI inventory exploits the crowd wisdom
of Foursquare community to enable fine-
grained location extraction.
n a time-aware POI tagger conducts the
location extraction and temporal
awareness resolution in an effective and
efficient way.
34