SlideShare a Scribd company logo
1 of 60
Knowledge Enabled Location Prediction of Twitter
Users
Master’s Thesis
Revathy Krishnamurthy
Committee
Amit P. Sheth (Advisor)
Krishnaprasad Thirunarayan
Derek Doran
Collaborator
Pavan Kapanipathi
1
Background Knowledge can improve a
machine’s ability to interpret text
BUCKEYE STATE
2
BACKGROUND KNOWLEDGE
3
Geographic footprint of a Twitter user
4
News Recommender
Systems
Beavercreek preschool to open in 2015
By Sharon D. Boykin
A $5.1 million preschool in Beavercreek city
Schools district will help accommodate a
growing of student population and reduce
overcrowding, according to school officials.
Ohio’s health exchange to include
more competition
By Randy Tucker
It was just a year ago that the insurance industry
fretted over potential loses from the new
insurance market created by Affordable Care Act.
Recommended for you
WHY IS LOCATION IMPORTANT?
• Targeted advertising
• Opinion Analysis
• Disaster Response
• Location Based
Services
Other applications
5
Geo-tagged Tweets Profile Information
LOCATION PUBLISHED BY USER
6
Geo-tagged Tweets Profile Information
LOCATION PUBLISHED BY USER
• Less than 4% of tweets contain geo-spatial tags
• Location field in profile is either empty or contains
invalid information such as “Justin Bieber’s heart”
7
Friends
INFERRING LOCATION OF A TWITTER USER
Followees
8
Just drove around Golden Gate Park two times
trying to get in
Cleveland Browns confuse me. When I give up
on them, they actually show up to play.
Followers
Network based
Content based
Friends
NETWORK BASED APPROACHES
FollowersFollowees
Depends on the friends and
followers of a user whose
location is known
9
CONTENT BASED APPROACHES
Just drove around Golden Gate Park two times
trying to get in
Cleveland Browns confuse me. When I give up
on them, they actually show up to play.
• Supervised Approaches
• Probabilistic Models – (Cheng, Caverlee, and Lee, 2010)
• Cascading Topic Models – (Eisenstein, Connor, Smith, and Xing, 2010)
• Gaussian Mixture Model – (Chang, Lee, Eltaher, and Lee, 2012)
• Language Models – (Doran, Gokhale, and Dagnino, 2014)
• Ensemble of Statistical and Heuristic Classifiers – (Mahmud, Nichols,
and Drews, 2014)
10
Geographic location of a user
influences the contents of their
tweets
Content-based approach
APPROACHES TO LOCATE A TWITTER USER
Reference: Cheng, Caverlee, and Lee, 2010 11
Content-based approach
APPROACHES TO LOCATE A TWITTER USER
12
Reference: Cheng, Caverlee, and Lee, 2010
PROBLEM STATEMENT
13
Predict the location of a Twitter user based on their
tweets, by exploiting Wikipedia to create a location
specific knowledgebase
• Knowledge-enabled approach to predict the location of Twitter
users based on the contents of their tweets without using any
training dataset of geo-tagged tweets
• Creation of location specific knowledgebase extracted from
Wikipedia by introducing the concept of Local Entities
• Evaluation of the approach on a publicly available dataset with
55% accuracy and 429 miles of Average Error Distance
CONTRIBUTIONS
14
KNOWLEDGE-BASE ENABLED APPROACH
San Francisco:
Golden Gate Bridge,
San Francisco 49ers,
San Francisco Chronicle …
Entity Count
Golden Gate Bridge 4
San Francisco 49ers 2
San Francisco
Chronicle
1
Top-k predictions:
San Francisco
Oakland
Palo Alto
15
KNOWLEDGE BASE
GENERATOR
Internal Links
Extraction
LocalEntity-1
LocalEntity-2
---
LocalEntity-n
city-1 city-2 city-k
Weighted Local
Entities
Entity Recognition
and Scoring
Annotated
Tweets
USER PROFILE GENERATOR
LOCATION PREDICTION
Location Predictor
Ranked
cities for
user
KNOWLEDGE-BASE ENABLED APPROACH
16
SAN FRANCISCO NEW YORK CITY
HOUSTON
LOCAL ENTITIES
17
• Collaborative encyclopedia
• As of 2014, English Wikipedia has 4.6 million articles, 18 billion pages views
and 500 million unique visitors per month.
• Category Structure
• Used for document clustering, tweet classification, personalization
systems etc.
• At Kno.e.sis, used in applications such as
• Doozer (Thomas, Mehra, Brooks, and Sheth, 2008)
• BLOOMS (Jain, Hitzler, Sheth, Verma, and Yeh, 2010)
• Hierarchical Interest Graph (Kapanipathi, Jain, Venkataramani, and
Sheth, 2014)
• Link Structure
• Used for word sense disambiguation, semantic relatedness between
terms etc.
WIKIPEDIA
18
LINK STRUCTURE OF WIKIPEDIA
19
LINK STRUCTURE OF WIKIPEDIA
20
“In general, links should be created to relevant
connections to the subject of another article that will
help readers understand the article more fully. This
can include people, events, and topics that already
have an article or that clearly deserve one, so long
as the link is relevant to the article in question.”
Source: http://en.wikipedia.org/wiki/Help:Link#Wikilinks
LINK STRUCTURE OF WIKIPEDIA
21
• We consider the internal links of location pages as Local Entities of the
city
Local Entities of San Francisco
LOCAL ENTITIES
• While a city does not contain link to itself, we use the city as a local
entity
22
LOCAL ENTITIES
San Francisco, California – 717 local entities
Fairborn, Ohio – 110 local entities
23
ARE ALL ENTITIES EQUALLY LOCAL?
24
ARE ALL ENTITIES EQUALLY LOCAL?
25
San Francisco Chronicle
San Francisco ExaminerSF Weekly
MSNBC CNN BBC
Al Jazeera America
• Pointwise Mutual Information – standard measure of
association between two variables
• Assumption is that higher is the localness of an entity with
respect to the city, higher will be the statistical dependence
between them
• Computed as:
𝑃𝑀𝐼 𝑐, 𝑒 = 𝑙𝑜𝑔2
𝑃 𝑐,𝑒
𝑃 𝑐 .𝑃(𝑒)
Association-based Measure
LOCALNESS MEASURE OF ENTITIES
26
Graph-based Measure
LOCALNESS MEASURE OF ENTITIES
27
The Boston Red Sox, a founding member of the
American League of Major League Baseball in
1901..
Boston Red Sox
The Boston Red Sox are an American
professional baseball team based in
Boston, Massachusetts ...
They are members of American League (AL).
Boston
American League
LOCALNESS MEASURE OF ENTITIES
28
Directed Graph of Local Entities of Boston
• Betweenness Centrality (BC) – Measures the importance of a
node relative to the rest of the nodes in the graph
• A high BC score of a vertex in a graph indicates that it lies on
considerable fraction of shortest path connecting others
• Computed as:
𝐶 𝐵 𝑐, 𝑒 = 𝑒𝑖
≠𝑒≠𝑒𝑗
𝜎 𝑒𝑖𝑒𝑗
(𝑒)
𝜎 𝑒𝑖𝑒𝑗
Graph-based Measure
LOCALNESS MEASURE OF ENTITIES
29
LOCALNESS MEASURE OF ENTITIES
30
Directed Graph of Local Entities of Boston
Boston Red Sox: 0.004540
American League: 0.000046
Alcatraz Island
Treasure Island
Alameda Island
Financial District
Market Street
Fisherman’s Wharf
San Francisco 49ers
Cow Hollow
Silicon Valley
South Beach
….
Suspension Bridge
Hyde Street Pier
Irving Morrow
Angelo Rossi
Art Deco
Charles Alton Ellis
Bethlehem Steel
Half Way to Hell Club
International Orange
…
San Francisco Bay
Golden Gate
San Francisco Chronicle
U.S. Route 101
Marin County
Sausalito
Bay Area
…
Semantic Overlap Measure
LOCALNESS MEASURE OF ENTITIES
31
• Measures the relatedness between concepts with the intuition
that related concepts are connected to similar entities
• Jaccard Index: Overlap between two sets
𝑗𝑎𝑐𝑐𝑎𝑟𝑑 𝑐, 𝑒 =
|𝑂 𝑐 ∩𝑂 𝑒 |
|𝑂 𝑐 ∪𝑂 𝑒 |
Semantic Overlap Measure
LOCALNESS MEASURE OF ENTITIES
32
• Tversky Index: Asymmetric similarity measure between two
sets
𝑡𝑖 𝑐, 𝑒 =
|𝑂 𝑐 ∩𝑂 𝑒 |
𝑂 𝑐 ∩𝑂 𝑒 + α 𝑂 𝑐 −𝑂 𝑒 + β|𝑂 𝑒 −𝑂 𝑐 |
• We choose α = 0 and β = 1
• For every entity in the page of a local entity not found in the
page of the city, penalize the local entity
Semantic Overlap Measure
LOCALNESS MEASURE OF ENTITIES
33
KNOWLEDGE-BASE OF LOCAL ENTITIES
Local Entities of San Francisco (Localness measure: Tversky Index)
34
KNOWLEDGE BASE
GENERATOR
Internal Links
Extraction
LocalEntity-1
LocalEntity-2
---
LocalEntity-n
city-1 city-2 city-k
Weighted Local
Entities
Entity Recognition
and Scoring
Annotated
Tweets
USER PROFILE GENERATOR
LOCATION PREDICTION
Location Predictor
Ranked
cities for
user
KNOWLEDGE-BASE ENABLED APPROACH
35
Step 1: Entity Linking
Just drove around Golden Gate Park trying to get in.
CREATION OF USER PROFILE
We use Zemanta for Entity Linking
36
Step 1: Entity Linking
Just drove around Golden Gate Park trying to get in.
CREATION OF USER PROFILE
Entity Count
Golden Gate Bridge 4
San Francisco 49ers 2
San Francisco Chronicle 1
User Profile for user 𝑢 defined as:
𝑃 𝑢 = 𝑒, 𝑠 𝑒 ∈ 𝑊, 𝑠 ∈ 𝑅}
Step 2: Entity Scoring
We use Zemanta for Entity Linking
37
KNOWLEDGE BASE
GENERATOR
Internal Links
Extraction
LocalEntity-1
LocalEntity-2
---
LocalEntity-n
city-1 city-2 city-k
Weighted Local
Entities
Entity Recognition
and Scoring
Annotated
Tweets
USER PROFILE GENERATOR
LOCATION PREDICTION
Location Predictor
Ranked
cities for
user
KNOWLEDGE-BASE ENABLED APPROACH
38
LOCATION PREDICTION
• Compute an aggregate score for each city whose local entities are found
in a user’s tweets
𝑙𝑜𝑐𝑆𝑐𝑜𝑟𝑒 𝑐, 𝑢 =
𝑗=1
𝐼 𝑐𝑢
𝑙𝑜𝑐𝑙 𝑐, 𝑒𝑗 × 𝑠𝑒𝑗
where 𝐼 𝑐𝑢 are local entities of city 𝑐 found in tweets of
user 𝑢 , 𝑒𝑗 ∈ 𝐼𝑐𝑢 and 𝑙𝑜𝑐𝑙(𝑐, 𝑒𝑗) is the localness score of entity
𝑒𝑗 with respect to city 𝑐
• Rank 𝑙𝑜𝑐𝑆𝑐𝑜𝑟𝑒 𝑐, 𝑢 in descending order to predict the top-k locations
of a user
39
San Francisco International Airport (6),
San Francisco (4), Nob Hill (3), San
Francisco Museum of Modern Art (1),
Beach Blanket Babylon (2), San Francisco
Municipal Railway (4), Golden Gate Park
(1), San Francisco Bay Area (1), SF Weekly
(1), Fox Oakland Theatre (2), Berkley (1),
Green Day (1), Oakland (9), San Francisco
Bay Area (1), The White Stripes (1),
Detroit Metropolitan Wayne County
Airport (1), Detroit Historical Museum
(1), Detroit Red Wings (4), General
Motors (1), Palo Alto (6), SAP AG (8),
Facebook (3), PARC (company) (2), Dell
(1), Google (1), …
LOCATION PREDICTION
User Profile Knowledgebase
Nob Hill 0.48214
SF Weekly 0.1875
Golden Gate Park 0.16783
San Francisco International
Airport 0.06818
…
Fox Oakland Theatre 0.09375
SF Bay Area 0.12972
Green Day 0.02066
…
Detroit Historical
Museum 0.4838
General Motors 0.05538
Detroit Red Wings 0.0232
…
PARC (company) 0.03726
Google 0.04678
Facebook 0.05810
San Francisco
Oakland, CA
Detroit, MI
Palo Alto, CA
40
LOCATION PREDICTION
San Francisco International Airport (6), San
Francisco (4), Nob Hill (3), San Francisco
Museum of Modern Art (1), Beach Blanket
Babylon (2), San Francisco Municipal Railway
(4), Golden Gate Park (1), San Francisco Bay
Area (1), SF Weekly (1)
14.5531
Fox Oakland Theatre (2), Berkley (1), Green Day
(1), Oakland (9), San Francisco Bay Area (1)
10.7584
The White Stripes (1), Detroit Metropolitan
Wayne County Airport (1), Detroit Historical
Museum (1), Detroit Red Wings (4), General
Motors (1)
8.0600
Palo Alto (6), SAP AG (8), Facebook (3), PARC
(company) (2), Dell (1), Google (1)
6.9175
User Profile Knowledgebase Location
Prediction
Nob Hill 0.48214
SF Weekly 0.1875
Golden Gate Park 0.16783
San Francisco International
Airport 0.06818
…
Fox Oakland Theatre 0.09375
SF Bay Area 0.12972
Green Day 0.02066
…
Detroit Historical
Museum 0.4838
General Motors 0.05538
Detroit Red Wings 0.0232
…
PARC (company) 0.03726
Google 0.04678
Facebook 0.05810
San Francisco
Oakland, CA
Detroit, MI
Palo Alto, CA
41
• All cities of United States with population > 5000 as published in census
estimates of 2012
• 4,661 cities and 500714 local entities
Knowledge base
IMPLEMENTATION
Baseline
• Considers all local entities to be equally local to the city
• Location prediction based only on frequency of entities
42
• Published by Cheng, Caverlee, and Lee, 2010.
• Contains 5119 active users from continental United States with
approximately 1000 tweets per user.
• User’s location listed in the form of latitude and longitude.
Test Dataset
EVALUATION
43
• Error Distance
𝐸𝑟𝑟𝑜𝑟𝐷𝑖𝑠𝑡 𝑢 = 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒 𝑙𝑜𝑐𝑎𝑐𝑡 𝑢 , 𝑙𝑜𝑐𝑒𝑠𝑡 𝑢
Distance between actual location of the user and the estimated location
• Average Error Distance
𝐴𝐸𝐷 𝑈 = 𝑢∈𝑈 𝐸𝑟𝑟𝑜𝑟𝐷𝑖𝑠𝑡(𝑢)
|𝑈|
Average of error distance of all users in the test dataset
• Accuracy
𝐴𝐶𝐶 𝑈 =
|{𝑢|𝑢∈𝑈 ˄ 𝐸𝑟𝑟𝑜𝑟𝐷𝑖𝑠𝑡 𝑢 ≤100}|
|𝑈|
Percentage of users predicted within 100 miles of their actual location
Evaluation Metrics
EVALUATION
44
Location Prediction Results
EVALUATION
Localness
Measure
ACC (%) AED (in
Miles)
ACC@2 ACC@3 ACC@5
Baseline 25.21 632.56 38.01 42.78 47.95
PMI 38.48 599.40 49.85 56.06 64.15
BC 47.91 478.14 57.39 62.18 66.98
Jaccard Index 53.21 433.62 67.41 73.56 78.84
Tversky Index 54.48 429.00 68.72 74.68 79.99
45
EVALUATION
Localness
Measure
ACC (%) AED (in Miles) ACC@2 ACC@3 ACC@5
Baseline 25.21 632.56 38.01 42.78 47.95
PMI 38.48 599.40 49.85 56.06 64.15
BC 47.91 478.14 57.39 62.18 66.98
Jaccard Index 53.21 433.62 67.41 73.56 78.84
Tversky Index 54.48 429.00 68.72 74.68 79.99
• PMI is not normalized hence sensitive to the count of the occurrences of local
entities in the Wikipedia corpus
• E.g. PMI of local entities of Glenn Rock, New Jersey is higher than those of
San Francisco
46
EVALUATION
Localness
Measure
ACC (%) AED (in Miles) ACC@2 ACC@3 ACC@5
Baseline 25.21 632.56 38.01 42.78 47.95
PMI 38.48 599.40 49.85 56.06 64.15
BC 47.91 478.14 57.39 62.18 66.98
Jaccard Index 53.21 433.62 67.41 73.56 78.84
Tversky Index 54.48 429.00 68.72 74.68 79.99
• Does a good job of assigning low scores to common entities.
• E.g. community college, National Weather Service, start up company
etc.
• Fails for entities with some relevance to the city but no distinguishing factor
• E.g. IBM with respect to Endicott, New York
47
LOCALNESS MEASURE OF ENTITIES
48
EVALUATION
Localness
Measure
ACC (%) AED (in Miles) ACC@2 ACC@3 ACC@5
Baseline 25.21 632.56 38.01 42.78 47.95
PMI 38.48 599.40 49.85 56.06 64.15
BC 47.91 478.14 57.39 62.18 66.98
Jaccard
Index
53.21 433.62 67.41 73.56 78.84
Tversky Index 54.48 429.00 68.72 74.68 79.99
• Underperforms for local entities with fewer entities than the city
• E.g. Eureka Valley and California with respect to San Francisco.
49
EVALUATION
California
San Francisco
Eureka
Valley
50
0.03005
Overlap
Overlap
0.07092
EVALUATION
Localness
Measure
ACC (%) AED (in Miles) ACC@2 ACC@3 ACC@5
Baseline 25.21 632.56 38.01 42.78 47.95
PMI 38.48 599.40 49.85 56.06 64.15
BC 47.91 478.14 57.39 62.18 66.98
Jaccard Index 53.21 433.62 67.41 73.56 78.84
Tversky
Index
54.48 429.00 68.72 74.68 79.99
• Best performing localness measure
• Overcomes the disadvantage of Jaccard Index.
• For example: We are able to assign higher localness to Eureka Valley
(0.7096) than California (0.1270) with respect to San Francisco
51
Top-k Accuracy
EVALUATION
52
Top-k Average Error Distance
EVALUATION
53
Distribution of all
users in the dataset
Distribution of
accurately predicted
users
Distribution of users
54
Comparison with Existing Approaches
EVALUATION
Method ACC (%) AED (in miles)
Cheng, Caverlee, and Lee, 2010 51.00 535.56
Chang, Lee, Eltaher, and Lee, 2012 49.9 509.3
Wikipedia based Approach 54.48 429.00
55
Impact of Local Entities
EVALUATION
56
Top 100 Cities
EVALUATION
• 2172 users from the dataset are from the top-100 most
populated cities of United States
• 60% users predicted within 100 miles of their actual location
• 54% users predicted exactly at the city level
57
CONCLUSION
• Presented a crowd sourced knowledge based approach, that does not
require geo-tagged tweets as a training dataset, to predict the location
of a user
• Introduced the concept of Local Entities and preprocessed Wikipedia
Hyperlink Graph to extract local entities for each city
• Investigated relatedness measures to establish the degree of
association between a local entity and a city
• Evaluated the proposed approach against a benchmark dataset
published by Cheng et al. For 5119 users, we are able to predict the
location of 55% of users within 100 miles with an average error
distance of 429 miles
58
FUTURE WORK
• Compute the confidence score of the prediction based on top-k cities
and count of local entities in tweets
• Investigate other localness measures for score local entities
• Consider semantic types, categories of local entities and weight the
contribution based on types
• Explore other knowledge bases such as Wikitravel and GeoNames
59
ACKNOWLEDGEMENTS
THANK YOU!
Amit P. Sheth Krishnaprasad
Thirunarayan
Derek Doran
60

More Related Content

Viewers also liked

Stream Reasoning: mastering the velocity and variety dimensions of Big Data...
Stream Reasoning: mastering the velocity and variety dimensions of Big Data...Stream Reasoning: mastering the velocity and variety dimensions of Big Data...
Stream Reasoning: mastering the velocity and variety dimensions of Big Data...Artificial Intelligence Institute at UofSC
 
Spatial Semantics for Better Interoperability and Analysis: Challenges and Ex...
Spatial Semantics for Better Interoperability and Analysis: Challenges and Ex...Spatial Semantics for Better Interoperability and Analysis: Challenges and Ex...
Spatial Semantics for Better Interoperability and Analysis: Challenges and Ex...Artificial Intelligence Institute at UofSC
 
Evaluating a Potential Commercial Tool for Healthcare Application for People ...
Evaluating a Potential Commercial Tool for Healthcare Application for People ...Evaluating a Potential Commercial Tool for Healthcare Application for People ...
Evaluating a Potential Commercial Tool for Healthcare Application for People ...Artificial Intelligence Institute at UofSC
 
Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...
Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...
Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...Artificial Intelligence Institute at UofSC
 
An Up-to-date Knowledge Base and Focused Exploration System for Human Perform...
An Up-to-date Knowledge Base and Focused Exploration System for Human Perform...An Up-to-date Knowledge Base and Focused Exploration System for Human Perform...
An Up-to-date Knowledge Base and Focused Exploration System for Human Perform...Artificial Intelligence Institute at UofSC
 
Whom to Coordinate With and How in Online Social Communities during Crisis Re...
Whom to Coordinate With and How in Online Social Communities during Crisis Re...Whom to Coordinate With and How in Online Social Communities during Crisis Re...
Whom to Coordinate With and How in Online Social Communities during Crisis Re...Artificial Intelligence Institute at UofSC
 
Social and Physical Sensing Enabled Decision Support for Disaster Management ...
Social and Physical Sensing Enabled Decision Support for Disaster Management ...Social and Physical Sensing Enabled Decision Support for Disaster Management ...
Social and Physical Sensing Enabled Decision Support for Disaster Management ...Artificial Intelligence Institute at UofSC
 
Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Soci...
Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Soci...Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Soci...
Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Soci...Artificial Intelligence Institute at UofSC
 
Hemant Purohit PhD Defense: Mining Citizen Sensor Communities for Cooperation...
Hemant Purohit PhD Defense: Mining Citizen Sensor Communities for Cooperation...Hemant Purohit PhD Defense: Mining Citizen Sensor Communities for Cooperation...
Hemant Purohit PhD Defense: Mining Citizen Sensor Communities for Cooperation...Artificial Intelligence Institute at UofSC
 

Viewers also liked (15)

Analysis and Monetization of Social Data
Analysis and Monetization of Social DataAnalysis and Monetization of Social Data
Analysis and Monetization of Social Data
 
Stream Reasoning: mastering the velocity and variety dimensions of Big Data...
Stream Reasoning: mastering the velocity and variety dimensions of Big Data...Stream Reasoning: mastering the velocity and variety dimensions of Big Data...
Stream Reasoning: mastering the velocity and variety dimensions of Big Data...
 
Walk through Streaming Technologies: EPL
Walk through Streaming Technologies: EPLWalk through Streaming Technologies: EPL
Walk through Streaming Technologies: EPL
 
Semantics-enhanced Geoscience Interoperability, Analytics, and Applications
Semantics-enhanced Geoscience Interoperability, Analytics, and ApplicationsSemantics-enhanced Geoscience Interoperability, Analytics, and Applications
Semantics-enhanced Geoscience Interoperability, Analytics, and Applications
 
Spatial Semantics for Better Interoperability and Analysis: Challenges and Ex...
Spatial Semantics for Better Interoperability and Analysis: Challenges and Ex...Spatial Semantics for Better Interoperability and Analysis: Challenges and Ex...
Spatial Semantics for Better Interoperability and Analysis: Challenges and Ex...
 
Role of Big Data for Smart City Applications
Role of Big Data for Smart City ApplicationsRole of Big Data for Smart City Applications
Role of Big Data for Smart City Applications
 
Data dirtroad infocosm-1995
Data dirtroad infocosm-1995Data dirtroad infocosm-1995
Data dirtroad infocosm-1995
 
Evaluating a Potential Commercial Tool for Healthcare Application for People ...
Evaluating a Potential Commercial Tool for Healthcare Application for People ...Evaluating a Potential Commercial Tool for Healthcare Application for People ...
Evaluating a Potential Commercial Tool for Healthcare Application for People ...
 
Understanding User-Community Engagement by Multi-faceted Features: A Case ...
Understanding User-Community Engagement by Multi-faceted Features: A Case ...Understanding User-Community Engagement by Multi-faceted Features: A Case ...
Understanding User-Community Engagement by Multi-faceted Features: A Case ...
 
Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...
Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...
Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...
 
An Up-to-date Knowledge Base and Focused Exploration System for Human Perform...
An Up-to-date Knowledge Base and Focused Exploration System for Human Perform...An Up-to-date Knowledge Base and Focused Exploration System for Human Perform...
An Up-to-date Knowledge Base and Focused Exploration System for Human Perform...
 
Whom to Coordinate With and How in Online Social Communities during Crisis Re...
Whom to Coordinate With and How in Online Social Communities during Crisis Re...Whom to Coordinate With and How in Online Social Communities during Crisis Re...
Whom to Coordinate With and How in Online Social Communities during Crisis Re...
 
Social and Physical Sensing Enabled Decision Support for Disaster Management ...
Social and Physical Sensing Enabled Decision Support for Disaster Management ...Social and Physical Sensing Enabled Decision Support for Disaster Management ...
Social and Physical Sensing Enabled Decision Support for Disaster Management ...
 
Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Soci...
Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Soci...Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Soci...
Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Soci...
 
Hemant Purohit PhD Defense: Mining Citizen Sensor Communities for Cooperation...
Hemant Purohit PhD Defense: Mining Citizen Sensor Communities for Cooperation...Hemant Purohit PhD Defense: Mining Citizen Sensor Communities for Cooperation...
Hemant Purohit PhD Defense: Mining Citizen Sensor Communities for Cooperation...
 

Similar to Location prediction

Social Web 2014: Final Presentations (Part II)
Social Web 2014: Final Presentations (Part II)Social Web 2014: Final Presentations (Part II)
Social Web 2014: Final Presentations (Part II)Lora Aroyo
 
12.10.14 Slides, “The SHARE Notification Service”
12.10.14 Slides, “The SHARE Notification Service”12.10.14 Slides, “The SHARE Notification Service”
12.10.14 Slides, “The SHARE Notification Service”DuraSpace
 
Monitoring The Impact of Urban Form Changes on Health and Inequality: The INT...
Monitoring The Impact of Urban Form Changes on Health and Inequality: The INT...Monitoring The Impact of Urban Form Changes on Health and Inequality: The INT...
Monitoring The Impact of Urban Form Changes on Health and Inequality: The INT...INTERACT
 
Temporal models for mining, ranking and recommendation in the Web
Temporal models for mining, ranking and recommendation in the WebTemporal models for mining, ranking and recommendation in the Web
Temporal models for mining, ranking and recommendation in the WebTu Nguyen
 
TruSIS: Trust Accross Social Network
TruSIS: Trust Accross Social NetworkTruSIS: Trust Accross Social Network
TruSIS: Trust Accross Social NetworkLora Aroyo
 
Syracuse open data presentation
Syracuse open data presentationSyracuse open data presentation
Syracuse open data presentationSam Edelstein
 
Making More Sense Out of Social Data
Making More Sense Out of Social DataMaking More Sense Out of Social Data
Making More Sense Out of Social DataThe Open University
 
Final Poster for Engineering Showcase
Final Poster for Engineering ShowcaseFinal Poster for Engineering Showcase
Final Poster for Engineering ShowcaseTucker Truesdale
 
Fran Cavanagh -- Strategic Communications Insight; Network Analysis
Fran Cavanagh -- Strategic Communications Insight; Network AnalysisFran Cavanagh -- Strategic Communications Insight; Network Analysis
Fran Cavanagh -- Strategic Communications Insight; Network AnalysisFederal Communicators Network
 
Curating for Value in Different Data Stewardship Paradigms
Curating for Value in Different Data Stewardship ParadigmsCurating for Value in Different Data Stewardship Paradigms
Curating for Value in Different Data Stewardship ParadigmsMark Parsons
 
Jonathan Lenaghan, VP of Science and Technology, PlaceIQ at MLconf ATL 2016
Jonathan Lenaghan, VP of Science and Technology, PlaceIQ at MLconf ATL 2016Jonathan Lenaghan, VP of Science and Technology, PlaceIQ at MLconf ATL 2016
Jonathan Lenaghan, VP of Science and Technology, PlaceIQ at MLconf ATL 2016MLconf
 
Hashtag Conversations, Eventgraphs, and User Ego Neighborhoods: Extracting...
Hashtag Conversations,Eventgraphs, and User Ego Neighborhoods:  Extracting...Hashtag Conversations,Eventgraphs, and User Ego Neighborhoods:  Extracting...
Hashtag Conversations, Eventgraphs, and User Ego Neighborhoods: Extracting...learjk
 
Hashtag Conversations,Eventgraphs, and User Ego Neighborhoods: Extracting So...
Hashtag Conversations,Eventgraphs, and User Ego Neighborhoods:  Extracting So...Hashtag Conversations,Eventgraphs, and User Ego Neighborhoods:  Extracting So...
Hashtag Conversations,Eventgraphs, and User Ego Neighborhoods: Extracting So...Shalin Hai-Jew
 

Similar to Location prediction (20)

Knowledge Enabled Location Prediction of Twitter Users
Knowledge Enabled Location Prediction of Twitter UsersKnowledge Enabled Location Prediction of Twitter Users
Knowledge Enabled Location Prediction of Twitter Users
 
Citizen-centric Linked Data Services for Smarter Cities
Citizen-centric Linked Data Services for Smarter CitiesCitizen-centric Linked Data Services for Smarter Cities
Citizen-centric Linked Data Services for Smarter Cities
 
Social Web 2014: Final Presentations (Part II)
Social Web 2014: Final Presentations (Part II)Social Web 2014: Final Presentations (Part II)
Social Web 2014: Final Presentations (Part II)
 
Enabling Citizen-empowered Apps over Linked Data
Enabling Citizen-empowered Apps over Linked DataEnabling Citizen-empowered Apps over Linked Data
Enabling Citizen-empowered Apps over Linked Data
 
Structural Implications of Destination Value System Networks
Structural Implications of Destination Value System NetworksStructural Implications of Destination Value System Networks
Structural Implications of Destination Value System Networks
 
12.10.14 Slides, “The SHARE Notification Service”
12.10.14 Slides, “The SHARE Notification Service”12.10.14 Slides, “The SHARE Notification Service”
12.10.14 Slides, “The SHARE Notification Service”
 
CTDC DC Case Study
CTDC DC Case StudyCTDC DC Case Study
CTDC DC Case Study
 
Monitoring The Impact of Urban Form Changes on Health and Inequality: The INT...
Monitoring The Impact of Urban Form Changes on Health and Inequality: The INT...Monitoring The Impact of Urban Form Changes on Health and Inequality: The INT...
Monitoring The Impact of Urban Form Changes on Health and Inequality: The INT...
 
Temporal models for mining, ranking and recommendation in the Web
Temporal models for mining, ranking and recommendation in the WebTemporal models for mining, ranking and recommendation in the Web
Temporal models for mining, ranking and recommendation in the Web
 
TruSIS: Trust Accross Social Network
TruSIS: Trust Accross Social NetworkTruSIS: Trust Accross Social Network
TruSIS: Trust Accross Social Network
 
Syracuse open data presentation
Syracuse open data presentationSyracuse open data presentation
Syracuse open data presentation
 
Alamw15 VIVO
Alamw15 VIVOAlamw15 VIVO
Alamw15 VIVO
 
Making More Sense Out of Social Data
Making More Sense Out of Social DataMaking More Sense Out of Social Data
Making More Sense Out of Social Data
 
Yuntech present
Yuntech presentYuntech present
Yuntech present
 
Final Poster for Engineering Showcase
Final Poster for Engineering ShowcaseFinal Poster for Engineering Showcase
Final Poster for Engineering Showcase
 
Fran Cavanagh -- Strategic Communications Insight; Network Analysis
Fran Cavanagh -- Strategic Communications Insight; Network AnalysisFran Cavanagh -- Strategic Communications Insight; Network Analysis
Fran Cavanagh -- Strategic Communications Insight; Network Analysis
 
Curating for Value in Different Data Stewardship Paradigms
Curating for Value in Different Data Stewardship ParadigmsCurating for Value in Different Data Stewardship Paradigms
Curating for Value in Different Data Stewardship Paradigms
 
Jonathan Lenaghan, VP of Science and Technology, PlaceIQ at MLconf ATL 2016
Jonathan Lenaghan, VP of Science and Technology, PlaceIQ at MLconf ATL 2016Jonathan Lenaghan, VP of Science and Technology, PlaceIQ at MLconf ATL 2016
Jonathan Lenaghan, VP of Science and Technology, PlaceIQ at MLconf ATL 2016
 
Hashtag Conversations, Eventgraphs, and User Ego Neighborhoods: Extracting...
Hashtag Conversations,Eventgraphs, and User Ego Neighborhoods:  Extracting...Hashtag Conversations,Eventgraphs, and User Ego Neighborhoods:  Extracting...
Hashtag Conversations, Eventgraphs, and User Ego Neighborhoods: Extracting...
 
Hashtag Conversations,Eventgraphs, and User Ego Neighborhoods: Extracting So...
Hashtag Conversations,Eventgraphs, and User Ego Neighborhoods:  Extracting So...Hashtag Conversations,Eventgraphs, and User Ego Neighborhoods:  Extracting So...
Hashtag Conversations,Eventgraphs, and User Ego Neighborhoods: Extracting So...
 

Recently uploaded

Call^ Girls Delhi Independent girls Chanakyapuri 9711199012 Call Me
Call^ Girls Delhi Independent girls Chanakyapuri 9711199012 Call MeCall^ Girls Delhi Independent girls Chanakyapuri 9711199012 Call Me
Call^ Girls Delhi Independent girls Chanakyapuri 9711199012 Call MeMs Riya
 
Top Call Girls In Telibagh ( Lucknow ) 🔝 8923113531 🔝 Cash Payment
Top Call Girls In Telibagh ( Lucknow  ) 🔝 8923113531 🔝  Cash PaymentTop Call Girls In Telibagh ( Lucknow  ) 🔝 8923113531 🔝  Cash Payment
Top Call Girls In Telibagh ( Lucknow ) 🔝 8923113531 🔝 Cash Paymentanilsa9823
 
GREAT OPORTUNITY Russian Call Girls Kirti Nagar 9711199012 Independent Escort...
GREAT OPORTUNITY Russian Call Girls Kirti Nagar 9711199012 Independent Escort...GREAT OPORTUNITY Russian Call Girls Kirti Nagar 9711199012 Independent Escort...
GREAT OPORTUNITY Russian Call Girls Kirti Nagar 9711199012 Independent Escort...Mona Rathore
 
Night 7k Call Girls Pari Chowk Escorts Call Me: 8448380779
Night 7k Call Girls Pari Chowk Escorts Call Me: 8448380779Night 7k Call Girls Pari Chowk Escorts Call Me: 8448380779
Night 7k Call Girls Pari Chowk Escorts Call Me: 8448380779Delhi Call girls
 
Factors-on-Authenticity-and-Validity-of-Evidences-and-Information.pptx
Factors-on-Authenticity-and-Validity-of-Evidences-and-Information.pptxFactors-on-Authenticity-and-Validity-of-Evidences-and-Information.pptx
Factors-on-Authenticity-and-Validity-of-Evidences-and-Information.pptxvemusae
 
O9654467111 Call Girls In Dwarka Women Seeking Men
O9654467111 Call Girls In Dwarka Women Seeking MenO9654467111 Call Girls In Dwarka Women Seeking Men
O9654467111 Call Girls In Dwarka Women Seeking MenSapana Sha
 
Spotify AI DJ Deck - The Agency at University of Florida
Spotify AI DJ Deck - The Agency at University of FloridaSpotify AI DJ Deck - The Agency at University of Florida
Spotify AI DJ Deck - The Agency at University of Floridajorirz24
 
Top Call Girls In Charbagh ( Lucknow ) 🔝 8923113531 🔝 Cash Payment
Top Call Girls In Charbagh ( Lucknow  ) 🔝 8923113531 🔝  Cash PaymentTop Call Girls In Charbagh ( Lucknow  ) 🔝 8923113531 🔝  Cash Payment
Top Call Girls In Charbagh ( Lucknow ) 🔝 8923113531 🔝 Cash Paymentanilsa9823
 
DickinsonSlides teeeeeeeeeeessssssssssst.pptx
DickinsonSlides teeeeeeeeeeessssssssssst.pptxDickinsonSlides teeeeeeeeeeessssssssssst.pptx
DickinsonSlides teeeeeeeeeeessssssssssst.pptxednyonat
 
SELECTING A SOCIAL MEDIA MARKETING COMPANY
SELECTING A SOCIAL MEDIA MARKETING COMPANYSELECTING A SOCIAL MEDIA MARKETING COMPANY
SELECTING A SOCIAL MEDIA MARKETING COMPANYdizinfo
 
Call Girls In Patel Nagar Delhi 9654467111 Escorts Service
Call Girls In Patel Nagar Delhi 9654467111 Escorts ServiceCall Girls In Patel Nagar Delhi 9654467111 Escorts Service
Call Girls In Patel Nagar Delhi 9654467111 Escorts ServiceSapana Sha
 
Improve Your Brand in Waco with a Professional Social Media Marketing Company
Improve Your Brand in Waco with a Professional Social Media Marketing CompanyImprove Your Brand in Waco with a Professional Social Media Marketing Company
Improve Your Brand in Waco with a Professional Social Media Marketing CompanyWSI INTERNET PARTNER
 
Codes and Conventions of Artists' Websites
Codes and Conventions of Artists' WebsitesCodes and Conventions of Artists' Websites
Codes and Conventions of Artists' WebsitesLukeNash7
 
Website research Powerpoint for Bauer magazine
Website research Powerpoint for Bauer magazineWebsite research Powerpoint for Bauer magazine
Website research Powerpoint for Bauer magazinesamuelcoulson30
 
MODERN PODCASTING ,CREATING DREAMS TODAY.
MODERN PODCASTING ,CREATING DREAMS TODAY.MODERN PODCASTING ,CREATING DREAMS TODAY.
MODERN PODCASTING ,CREATING DREAMS TODAY.AFFFILIATE
 
Night 7k Call Girls Noida Sector 121 Call Me: 8448380779
Night 7k Call Girls Noida Sector 121 Call Me: 8448380779Night 7k Call Girls Noida Sector 121 Call Me: 8448380779
Night 7k Call Girls Noida Sector 121 Call Me: 8448380779Delhi Call girls
 
Social media marketing/Seo expert and digital marketing
Social media marketing/Seo expert and digital marketingSocial media marketing/Seo expert and digital marketing
Social media marketing/Seo expert and digital marketingSheikhSaifAli1
 
CALL ON ➥8923113531 🔝Call Girls Takrohi Lucknow best Female service 👖
CALL ON ➥8923113531 🔝Call Girls Takrohi Lucknow best Female service  👖CALL ON ➥8923113531 🔝Call Girls Takrohi Lucknow best Female service  👖
CALL ON ➥8923113531 🔝Call Girls Takrohi Lucknow best Female service 👖anilsa9823
 
Top Astrologer, Kala ilam specialist in USA and Bangali Amil baba in Saudi Ar...
Top Astrologer, Kala ilam specialist in USA and Bangali Amil baba in Saudi Ar...Top Astrologer, Kala ilam specialist in USA and Bangali Amil baba in Saudi Ar...
Top Astrologer, Kala ilam specialist in USA and Bangali Amil baba in Saudi Ar...baharayali
 
Interpreting the brief for the media IDY
Interpreting the brief for the media IDYInterpreting the brief for the media IDY
Interpreting the brief for the media IDYgalaxypingy
 

Recently uploaded (20)

Call^ Girls Delhi Independent girls Chanakyapuri 9711199012 Call Me
Call^ Girls Delhi Independent girls Chanakyapuri 9711199012 Call MeCall^ Girls Delhi Independent girls Chanakyapuri 9711199012 Call Me
Call^ Girls Delhi Independent girls Chanakyapuri 9711199012 Call Me
 
Top Call Girls In Telibagh ( Lucknow ) 🔝 8923113531 🔝 Cash Payment
Top Call Girls In Telibagh ( Lucknow  ) 🔝 8923113531 🔝  Cash PaymentTop Call Girls In Telibagh ( Lucknow  ) 🔝 8923113531 🔝  Cash Payment
Top Call Girls In Telibagh ( Lucknow ) 🔝 8923113531 🔝 Cash Payment
 
GREAT OPORTUNITY Russian Call Girls Kirti Nagar 9711199012 Independent Escort...
GREAT OPORTUNITY Russian Call Girls Kirti Nagar 9711199012 Independent Escort...GREAT OPORTUNITY Russian Call Girls Kirti Nagar 9711199012 Independent Escort...
GREAT OPORTUNITY Russian Call Girls Kirti Nagar 9711199012 Independent Escort...
 
Night 7k Call Girls Pari Chowk Escorts Call Me: 8448380779
Night 7k Call Girls Pari Chowk Escorts Call Me: 8448380779Night 7k Call Girls Pari Chowk Escorts Call Me: 8448380779
Night 7k Call Girls Pari Chowk Escorts Call Me: 8448380779
 
Factors-on-Authenticity-and-Validity-of-Evidences-and-Information.pptx
Factors-on-Authenticity-and-Validity-of-Evidences-and-Information.pptxFactors-on-Authenticity-and-Validity-of-Evidences-and-Information.pptx
Factors-on-Authenticity-and-Validity-of-Evidences-and-Information.pptx
 
O9654467111 Call Girls In Dwarka Women Seeking Men
O9654467111 Call Girls In Dwarka Women Seeking MenO9654467111 Call Girls In Dwarka Women Seeking Men
O9654467111 Call Girls In Dwarka Women Seeking Men
 
Spotify AI DJ Deck - The Agency at University of Florida
Spotify AI DJ Deck - The Agency at University of FloridaSpotify AI DJ Deck - The Agency at University of Florida
Spotify AI DJ Deck - The Agency at University of Florida
 
Top Call Girls In Charbagh ( Lucknow ) 🔝 8923113531 🔝 Cash Payment
Top Call Girls In Charbagh ( Lucknow  ) 🔝 8923113531 🔝  Cash PaymentTop Call Girls In Charbagh ( Lucknow  ) 🔝 8923113531 🔝  Cash Payment
Top Call Girls In Charbagh ( Lucknow ) 🔝 8923113531 🔝 Cash Payment
 
DickinsonSlides teeeeeeeeeeessssssssssst.pptx
DickinsonSlides teeeeeeeeeeessssssssssst.pptxDickinsonSlides teeeeeeeeeeessssssssssst.pptx
DickinsonSlides teeeeeeeeeeessssssssssst.pptx
 
SELECTING A SOCIAL MEDIA MARKETING COMPANY
SELECTING A SOCIAL MEDIA MARKETING COMPANYSELECTING A SOCIAL MEDIA MARKETING COMPANY
SELECTING A SOCIAL MEDIA MARKETING COMPANY
 
Call Girls In Patel Nagar Delhi 9654467111 Escorts Service
Call Girls In Patel Nagar Delhi 9654467111 Escorts ServiceCall Girls In Patel Nagar Delhi 9654467111 Escorts Service
Call Girls In Patel Nagar Delhi 9654467111 Escorts Service
 
Improve Your Brand in Waco with a Professional Social Media Marketing Company
Improve Your Brand in Waco with a Professional Social Media Marketing CompanyImprove Your Brand in Waco with a Professional Social Media Marketing Company
Improve Your Brand in Waco with a Professional Social Media Marketing Company
 
Codes and Conventions of Artists' Websites
Codes and Conventions of Artists' WebsitesCodes and Conventions of Artists' Websites
Codes and Conventions of Artists' Websites
 
Website research Powerpoint for Bauer magazine
Website research Powerpoint for Bauer magazineWebsite research Powerpoint for Bauer magazine
Website research Powerpoint for Bauer magazine
 
MODERN PODCASTING ,CREATING DREAMS TODAY.
MODERN PODCASTING ,CREATING DREAMS TODAY.MODERN PODCASTING ,CREATING DREAMS TODAY.
MODERN PODCASTING ,CREATING DREAMS TODAY.
 
Night 7k Call Girls Noida Sector 121 Call Me: 8448380779
Night 7k Call Girls Noida Sector 121 Call Me: 8448380779Night 7k Call Girls Noida Sector 121 Call Me: 8448380779
Night 7k Call Girls Noida Sector 121 Call Me: 8448380779
 
Social media marketing/Seo expert and digital marketing
Social media marketing/Seo expert and digital marketingSocial media marketing/Seo expert and digital marketing
Social media marketing/Seo expert and digital marketing
 
CALL ON ➥8923113531 🔝Call Girls Takrohi Lucknow best Female service 👖
CALL ON ➥8923113531 🔝Call Girls Takrohi Lucknow best Female service  👖CALL ON ➥8923113531 🔝Call Girls Takrohi Lucknow best Female service  👖
CALL ON ➥8923113531 🔝Call Girls Takrohi Lucknow best Female service 👖
 
Top Astrologer, Kala ilam specialist in USA and Bangali Amil baba in Saudi Ar...
Top Astrologer, Kala ilam specialist in USA and Bangali Amil baba in Saudi Ar...Top Astrologer, Kala ilam specialist in USA and Bangali Amil baba in Saudi Ar...
Top Astrologer, Kala ilam specialist in USA and Bangali Amil baba in Saudi Ar...
 
Interpreting the brief for the media IDY
Interpreting the brief for the media IDYInterpreting the brief for the media IDY
Interpreting the brief for the media IDY
 

Location prediction

  • 1. Knowledge Enabled Location Prediction of Twitter Users Master’s Thesis Revathy Krishnamurthy Committee Amit P. Sheth (Advisor) Krishnaprasad Thirunarayan Derek Doran Collaborator Pavan Kapanipathi 1
  • 2. Background Knowledge can improve a machine’s ability to interpret text BUCKEYE STATE 2
  • 4. Geographic footprint of a Twitter user 4
  • 5. News Recommender Systems Beavercreek preschool to open in 2015 By Sharon D. Boykin A $5.1 million preschool in Beavercreek city Schools district will help accommodate a growing of student population and reduce overcrowding, according to school officials. Ohio’s health exchange to include more competition By Randy Tucker It was just a year ago that the insurance industry fretted over potential loses from the new insurance market created by Affordable Care Act. Recommended for you WHY IS LOCATION IMPORTANT? • Targeted advertising • Opinion Analysis • Disaster Response • Location Based Services Other applications 5
  • 6. Geo-tagged Tweets Profile Information LOCATION PUBLISHED BY USER 6
  • 7. Geo-tagged Tweets Profile Information LOCATION PUBLISHED BY USER • Less than 4% of tweets contain geo-spatial tags • Location field in profile is either empty or contains invalid information such as “Justin Bieber’s heart” 7
  • 8. Friends INFERRING LOCATION OF A TWITTER USER Followees 8 Just drove around Golden Gate Park two times trying to get in Cleveland Browns confuse me. When I give up on them, they actually show up to play. Followers Network based Content based
  • 9. Friends NETWORK BASED APPROACHES FollowersFollowees Depends on the friends and followers of a user whose location is known 9
  • 10. CONTENT BASED APPROACHES Just drove around Golden Gate Park two times trying to get in Cleveland Browns confuse me. When I give up on them, they actually show up to play. • Supervised Approaches • Probabilistic Models – (Cheng, Caverlee, and Lee, 2010) • Cascading Topic Models – (Eisenstein, Connor, Smith, and Xing, 2010) • Gaussian Mixture Model – (Chang, Lee, Eltaher, and Lee, 2012) • Language Models – (Doran, Gokhale, and Dagnino, 2014) • Ensemble of Statistical and Heuristic Classifiers – (Mahmud, Nichols, and Drews, 2014) 10 Geographic location of a user influences the contents of their tweets
  • 11. Content-based approach APPROACHES TO LOCATE A TWITTER USER Reference: Cheng, Caverlee, and Lee, 2010 11
  • 12. Content-based approach APPROACHES TO LOCATE A TWITTER USER 12 Reference: Cheng, Caverlee, and Lee, 2010
  • 13. PROBLEM STATEMENT 13 Predict the location of a Twitter user based on their tweets, by exploiting Wikipedia to create a location specific knowledgebase
  • 14. • Knowledge-enabled approach to predict the location of Twitter users based on the contents of their tweets without using any training dataset of geo-tagged tweets • Creation of location specific knowledgebase extracted from Wikipedia by introducing the concept of Local Entities • Evaluation of the approach on a publicly available dataset with 55% accuracy and 429 miles of Average Error Distance CONTRIBUTIONS 14
  • 15. KNOWLEDGE-BASE ENABLED APPROACH San Francisco: Golden Gate Bridge, San Francisco 49ers, San Francisco Chronicle … Entity Count Golden Gate Bridge 4 San Francisco 49ers 2 San Francisco Chronicle 1 Top-k predictions: San Francisco Oakland Palo Alto 15
  • 16. KNOWLEDGE BASE GENERATOR Internal Links Extraction LocalEntity-1 LocalEntity-2 --- LocalEntity-n city-1 city-2 city-k Weighted Local Entities Entity Recognition and Scoring Annotated Tweets USER PROFILE GENERATOR LOCATION PREDICTION Location Predictor Ranked cities for user KNOWLEDGE-BASE ENABLED APPROACH 16
  • 17. SAN FRANCISCO NEW YORK CITY HOUSTON LOCAL ENTITIES 17
  • 18. • Collaborative encyclopedia • As of 2014, English Wikipedia has 4.6 million articles, 18 billion pages views and 500 million unique visitors per month. • Category Structure • Used for document clustering, tweet classification, personalization systems etc. • At Kno.e.sis, used in applications such as • Doozer (Thomas, Mehra, Brooks, and Sheth, 2008) • BLOOMS (Jain, Hitzler, Sheth, Verma, and Yeh, 2010) • Hierarchical Interest Graph (Kapanipathi, Jain, Venkataramani, and Sheth, 2014) • Link Structure • Used for word sense disambiguation, semantic relatedness between terms etc. WIKIPEDIA 18
  • 19. LINK STRUCTURE OF WIKIPEDIA 19
  • 20. LINK STRUCTURE OF WIKIPEDIA 20
  • 21. “In general, links should be created to relevant connections to the subject of another article that will help readers understand the article more fully. This can include people, events, and topics that already have an article or that clearly deserve one, so long as the link is relevant to the article in question.” Source: http://en.wikipedia.org/wiki/Help:Link#Wikilinks LINK STRUCTURE OF WIKIPEDIA 21
  • 22. • We consider the internal links of location pages as Local Entities of the city Local Entities of San Francisco LOCAL ENTITIES • While a city does not contain link to itself, we use the city as a local entity 22
  • 23. LOCAL ENTITIES San Francisco, California – 717 local entities Fairborn, Ohio – 110 local entities 23
  • 24. ARE ALL ENTITIES EQUALLY LOCAL? 24
  • 25. ARE ALL ENTITIES EQUALLY LOCAL? 25 San Francisco Chronicle San Francisco ExaminerSF Weekly MSNBC CNN BBC Al Jazeera America
  • 26. • Pointwise Mutual Information – standard measure of association between two variables • Assumption is that higher is the localness of an entity with respect to the city, higher will be the statistical dependence between them • Computed as: 𝑃𝑀𝐼 𝑐, 𝑒 = 𝑙𝑜𝑔2 𝑃 𝑐,𝑒 𝑃 𝑐 .𝑃(𝑒) Association-based Measure LOCALNESS MEASURE OF ENTITIES 26
  • 27. Graph-based Measure LOCALNESS MEASURE OF ENTITIES 27 The Boston Red Sox, a founding member of the American League of Major League Baseball in 1901.. Boston Red Sox The Boston Red Sox are an American professional baseball team based in Boston, Massachusetts ... They are members of American League (AL). Boston American League
  • 28. LOCALNESS MEASURE OF ENTITIES 28 Directed Graph of Local Entities of Boston
  • 29. • Betweenness Centrality (BC) – Measures the importance of a node relative to the rest of the nodes in the graph • A high BC score of a vertex in a graph indicates that it lies on considerable fraction of shortest path connecting others • Computed as: 𝐶 𝐵 𝑐, 𝑒 = 𝑒𝑖 ≠𝑒≠𝑒𝑗 𝜎 𝑒𝑖𝑒𝑗 (𝑒) 𝜎 𝑒𝑖𝑒𝑗 Graph-based Measure LOCALNESS MEASURE OF ENTITIES 29
  • 30. LOCALNESS MEASURE OF ENTITIES 30 Directed Graph of Local Entities of Boston Boston Red Sox: 0.004540 American League: 0.000046
  • 31. Alcatraz Island Treasure Island Alameda Island Financial District Market Street Fisherman’s Wharf San Francisco 49ers Cow Hollow Silicon Valley South Beach …. Suspension Bridge Hyde Street Pier Irving Morrow Angelo Rossi Art Deco Charles Alton Ellis Bethlehem Steel Half Way to Hell Club International Orange … San Francisco Bay Golden Gate San Francisco Chronicle U.S. Route 101 Marin County Sausalito Bay Area … Semantic Overlap Measure LOCALNESS MEASURE OF ENTITIES 31
  • 32. • Measures the relatedness between concepts with the intuition that related concepts are connected to similar entities • Jaccard Index: Overlap between two sets 𝑗𝑎𝑐𝑐𝑎𝑟𝑑 𝑐, 𝑒 = |𝑂 𝑐 ∩𝑂 𝑒 | |𝑂 𝑐 ∪𝑂 𝑒 | Semantic Overlap Measure LOCALNESS MEASURE OF ENTITIES 32
  • 33. • Tversky Index: Asymmetric similarity measure between two sets 𝑡𝑖 𝑐, 𝑒 = |𝑂 𝑐 ∩𝑂 𝑒 | 𝑂 𝑐 ∩𝑂 𝑒 + α 𝑂 𝑐 −𝑂 𝑒 + β|𝑂 𝑒 −𝑂 𝑐 | • We choose α = 0 and β = 1 • For every entity in the page of a local entity not found in the page of the city, penalize the local entity Semantic Overlap Measure LOCALNESS MEASURE OF ENTITIES 33
  • 34. KNOWLEDGE-BASE OF LOCAL ENTITIES Local Entities of San Francisco (Localness measure: Tversky Index) 34
  • 35. KNOWLEDGE BASE GENERATOR Internal Links Extraction LocalEntity-1 LocalEntity-2 --- LocalEntity-n city-1 city-2 city-k Weighted Local Entities Entity Recognition and Scoring Annotated Tweets USER PROFILE GENERATOR LOCATION PREDICTION Location Predictor Ranked cities for user KNOWLEDGE-BASE ENABLED APPROACH 35
  • 36. Step 1: Entity Linking Just drove around Golden Gate Park trying to get in. CREATION OF USER PROFILE We use Zemanta for Entity Linking 36
  • 37. Step 1: Entity Linking Just drove around Golden Gate Park trying to get in. CREATION OF USER PROFILE Entity Count Golden Gate Bridge 4 San Francisco 49ers 2 San Francisco Chronicle 1 User Profile for user 𝑢 defined as: 𝑃 𝑢 = 𝑒, 𝑠 𝑒 ∈ 𝑊, 𝑠 ∈ 𝑅} Step 2: Entity Scoring We use Zemanta for Entity Linking 37
  • 38. KNOWLEDGE BASE GENERATOR Internal Links Extraction LocalEntity-1 LocalEntity-2 --- LocalEntity-n city-1 city-2 city-k Weighted Local Entities Entity Recognition and Scoring Annotated Tweets USER PROFILE GENERATOR LOCATION PREDICTION Location Predictor Ranked cities for user KNOWLEDGE-BASE ENABLED APPROACH 38
  • 39. LOCATION PREDICTION • Compute an aggregate score for each city whose local entities are found in a user’s tweets 𝑙𝑜𝑐𝑆𝑐𝑜𝑟𝑒 𝑐, 𝑢 = 𝑗=1 𝐼 𝑐𝑢 𝑙𝑜𝑐𝑙 𝑐, 𝑒𝑗 × 𝑠𝑒𝑗 where 𝐼 𝑐𝑢 are local entities of city 𝑐 found in tweets of user 𝑢 , 𝑒𝑗 ∈ 𝐼𝑐𝑢 and 𝑙𝑜𝑐𝑙(𝑐, 𝑒𝑗) is the localness score of entity 𝑒𝑗 with respect to city 𝑐 • Rank 𝑙𝑜𝑐𝑆𝑐𝑜𝑟𝑒 𝑐, 𝑢 in descending order to predict the top-k locations of a user 39
  • 40. San Francisco International Airport (6), San Francisco (4), Nob Hill (3), San Francisco Museum of Modern Art (1), Beach Blanket Babylon (2), San Francisco Municipal Railway (4), Golden Gate Park (1), San Francisco Bay Area (1), SF Weekly (1), Fox Oakland Theatre (2), Berkley (1), Green Day (1), Oakland (9), San Francisco Bay Area (1), The White Stripes (1), Detroit Metropolitan Wayne County Airport (1), Detroit Historical Museum (1), Detroit Red Wings (4), General Motors (1), Palo Alto (6), SAP AG (8), Facebook (3), PARC (company) (2), Dell (1), Google (1), … LOCATION PREDICTION User Profile Knowledgebase Nob Hill 0.48214 SF Weekly 0.1875 Golden Gate Park 0.16783 San Francisco International Airport 0.06818 … Fox Oakland Theatre 0.09375 SF Bay Area 0.12972 Green Day 0.02066 … Detroit Historical Museum 0.4838 General Motors 0.05538 Detroit Red Wings 0.0232 … PARC (company) 0.03726 Google 0.04678 Facebook 0.05810 San Francisco Oakland, CA Detroit, MI Palo Alto, CA 40
  • 41. LOCATION PREDICTION San Francisco International Airport (6), San Francisco (4), Nob Hill (3), San Francisco Museum of Modern Art (1), Beach Blanket Babylon (2), San Francisco Municipal Railway (4), Golden Gate Park (1), San Francisco Bay Area (1), SF Weekly (1) 14.5531 Fox Oakland Theatre (2), Berkley (1), Green Day (1), Oakland (9), San Francisco Bay Area (1) 10.7584 The White Stripes (1), Detroit Metropolitan Wayne County Airport (1), Detroit Historical Museum (1), Detroit Red Wings (4), General Motors (1) 8.0600 Palo Alto (6), SAP AG (8), Facebook (3), PARC (company) (2), Dell (1), Google (1) 6.9175 User Profile Knowledgebase Location Prediction Nob Hill 0.48214 SF Weekly 0.1875 Golden Gate Park 0.16783 San Francisco International Airport 0.06818 … Fox Oakland Theatre 0.09375 SF Bay Area 0.12972 Green Day 0.02066 … Detroit Historical Museum 0.4838 General Motors 0.05538 Detroit Red Wings 0.0232 … PARC (company) 0.03726 Google 0.04678 Facebook 0.05810 San Francisco Oakland, CA Detroit, MI Palo Alto, CA 41
  • 42. • All cities of United States with population > 5000 as published in census estimates of 2012 • 4,661 cities and 500714 local entities Knowledge base IMPLEMENTATION Baseline • Considers all local entities to be equally local to the city • Location prediction based only on frequency of entities 42
  • 43. • Published by Cheng, Caverlee, and Lee, 2010. • Contains 5119 active users from continental United States with approximately 1000 tweets per user. • User’s location listed in the form of latitude and longitude. Test Dataset EVALUATION 43
  • 44. • Error Distance 𝐸𝑟𝑟𝑜𝑟𝐷𝑖𝑠𝑡 𝑢 = 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒 𝑙𝑜𝑐𝑎𝑐𝑡 𝑢 , 𝑙𝑜𝑐𝑒𝑠𝑡 𝑢 Distance between actual location of the user and the estimated location • Average Error Distance 𝐴𝐸𝐷 𝑈 = 𝑢∈𝑈 𝐸𝑟𝑟𝑜𝑟𝐷𝑖𝑠𝑡(𝑢) |𝑈| Average of error distance of all users in the test dataset • Accuracy 𝐴𝐶𝐶 𝑈 = |{𝑢|𝑢∈𝑈 ˄ 𝐸𝑟𝑟𝑜𝑟𝐷𝑖𝑠𝑡 𝑢 ≤100}| |𝑈| Percentage of users predicted within 100 miles of their actual location Evaluation Metrics EVALUATION 44
  • 45. Location Prediction Results EVALUATION Localness Measure ACC (%) AED (in Miles) ACC@2 ACC@3 ACC@5 Baseline 25.21 632.56 38.01 42.78 47.95 PMI 38.48 599.40 49.85 56.06 64.15 BC 47.91 478.14 57.39 62.18 66.98 Jaccard Index 53.21 433.62 67.41 73.56 78.84 Tversky Index 54.48 429.00 68.72 74.68 79.99 45
  • 46. EVALUATION Localness Measure ACC (%) AED (in Miles) ACC@2 ACC@3 ACC@5 Baseline 25.21 632.56 38.01 42.78 47.95 PMI 38.48 599.40 49.85 56.06 64.15 BC 47.91 478.14 57.39 62.18 66.98 Jaccard Index 53.21 433.62 67.41 73.56 78.84 Tversky Index 54.48 429.00 68.72 74.68 79.99 • PMI is not normalized hence sensitive to the count of the occurrences of local entities in the Wikipedia corpus • E.g. PMI of local entities of Glenn Rock, New Jersey is higher than those of San Francisco 46
  • 47. EVALUATION Localness Measure ACC (%) AED (in Miles) ACC@2 ACC@3 ACC@5 Baseline 25.21 632.56 38.01 42.78 47.95 PMI 38.48 599.40 49.85 56.06 64.15 BC 47.91 478.14 57.39 62.18 66.98 Jaccard Index 53.21 433.62 67.41 73.56 78.84 Tversky Index 54.48 429.00 68.72 74.68 79.99 • Does a good job of assigning low scores to common entities. • E.g. community college, National Weather Service, start up company etc. • Fails for entities with some relevance to the city but no distinguishing factor • E.g. IBM with respect to Endicott, New York 47
  • 48. LOCALNESS MEASURE OF ENTITIES 48
  • 49. EVALUATION Localness Measure ACC (%) AED (in Miles) ACC@2 ACC@3 ACC@5 Baseline 25.21 632.56 38.01 42.78 47.95 PMI 38.48 599.40 49.85 56.06 64.15 BC 47.91 478.14 57.39 62.18 66.98 Jaccard Index 53.21 433.62 67.41 73.56 78.84 Tversky Index 54.48 429.00 68.72 74.68 79.99 • Underperforms for local entities with fewer entities than the city • E.g. Eureka Valley and California with respect to San Francisco. 49
  • 51. EVALUATION Localness Measure ACC (%) AED (in Miles) ACC@2 ACC@3 ACC@5 Baseline 25.21 632.56 38.01 42.78 47.95 PMI 38.48 599.40 49.85 56.06 64.15 BC 47.91 478.14 57.39 62.18 66.98 Jaccard Index 53.21 433.62 67.41 73.56 78.84 Tversky Index 54.48 429.00 68.72 74.68 79.99 • Best performing localness measure • Overcomes the disadvantage of Jaccard Index. • For example: We are able to assign higher localness to Eureka Valley (0.7096) than California (0.1270) with respect to San Francisco 51
  • 53. Top-k Average Error Distance EVALUATION 53
  • 54. Distribution of all users in the dataset Distribution of accurately predicted users Distribution of users 54
  • 55. Comparison with Existing Approaches EVALUATION Method ACC (%) AED (in miles) Cheng, Caverlee, and Lee, 2010 51.00 535.56 Chang, Lee, Eltaher, and Lee, 2012 49.9 509.3 Wikipedia based Approach 54.48 429.00 55
  • 56. Impact of Local Entities EVALUATION 56
  • 57. Top 100 Cities EVALUATION • 2172 users from the dataset are from the top-100 most populated cities of United States • 60% users predicted within 100 miles of their actual location • 54% users predicted exactly at the city level 57
  • 58. CONCLUSION • Presented a crowd sourced knowledge based approach, that does not require geo-tagged tweets as a training dataset, to predict the location of a user • Introduced the concept of Local Entities and preprocessed Wikipedia Hyperlink Graph to extract local entities for each city • Investigated relatedness measures to establish the degree of association between a local entity and a city • Evaluated the proposed approach against a benchmark dataset published by Cheng et al. For 5119 users, we are able to predict the location of 55% of users within 100 miles with an average error distance of 429 miles 58
  • 59. FUTURE WORK • Compute the confidence score of the prediction based on top-k cities and count of local entities in tweets • Investigate other localness measures for score local entities • Consider semantic types, categories of local entities and weight the contribution based on types • Explore other knowledge bases such as Wikitravel and GeoNames 59
  • 60. ACKNOWLEDGEMENTS THANK YOU! Amit P. Sheth Krishnaprasad Thirunarayan Derek Doran 60

Editor's Notes

  1. “Buckeye State” – Nickname of Ohio
  2. Location provides “context”
  3. Users can publish geographic information through cellphone or profile
  4. Cheng et al. found 21% users contained location as granular as city, state in their profile There was a need to automatically infer the location of a user
  5. Network based approaches are based on the network of a Twitter user
  6. he general idea behind these approaches is to determine the probabilistic distribution of words across a region
  7. Cheng et al. proposed a probabilistic framework to model the spatial distribution of words. They proposed the concept of local words which are words that have a compact geographic scope.
  8. Weakness: Large training dataset required. Collection process of geo-tagged tweets is time intensive. Does not exploit underlying semantics in tweets
  9. My thesis addresses weaknesses of existing approaches by using a knowledge-based approach to extract location specific concepts from Wikipedia
  10. Brief description of the approach
  11. Brief description of the approach
  12. Local Entities: Entities that have a high relatedness to a city and can discriminate between geographic locations
  13. Our work is based on the link structure of Wikipedia
  14. Links only to topically relevant entities
  15. Previous research have concluded that location names provide important clues to the location of a user
  16. Number of local entities for each city varies.
  17. Association based measure: Compute relatedness based on their occurrences in a large corpus
  18. Construct a directed graph of local entities for each city
  19. Construct a graph of local entities for each city Number of Nodes: 474 Number of Edges: 5921
  20. Based on the idea that higher is the overlap between concepts found in the Wikipedia pages of a city and an entity higher is the degree of localness of the entity
  21. Local Entities of San Francisco represented in a tag cloud weighted based on Tversky Index
  22. The next module is the user profile generator. We create a semantic profile of each user consisting of Wikipedia entities found in their tweets
  23. Brief description of the approach
  24. For predicting the location of a user we compute an aggregate score for each city whose local entities are found in a user’s tweets.
  25. Cheng, Caverlee, and Lee, 2010
  26. Introduction to Twitter and location of a Twitter user