SlideShare a Scribd company logo
Twitter User Geolocation Using a Unified Text and
Network Prediction Model
Afshin Rahimi, Trevor Cohn and Timothy Baldwin
Department of Computing and Information Systems, The University of Melbourne
OVERVIEW
Task: Where does @ShvwnK live?
Input: user, concatenated tweet text, mention-list
Output: latitude/longitude
(known for training users, predicted for test users)
Datasets: 3 Twitter geolocation datasets (#users in parenthesis)
GeoText (9.5K), Twitter-US (450K) and Twitter-World (1.4M).
TEXT-BASED MODEL
Logistic regression with l1 regularisation
over k-d tree discretisation of latitude/longitude.
top features of NYC use of “upstate” in U.S.
NETWORK-BASED MODEL
Label propagation in a collapsed network:
• Build the graph using @-mentions.
• Use training nodes as seed (labelled samples).
• Infer the test labels by Modified Adsorption (Talukar
and Crammer, 2009).
argmin
ˆY
c( ˆY ) =
l
µ1
Match seed
(Yl − ˆYl)T
S(Yl − ˆYl) + µ2
ˆY T
l L ˆYl
Smooth labels
0.7 0.5
0.01
new label
estimate
FROM @-MENTION TO COLLAPSED NETWORK
@-mention Network Collapsed Network + Text Dongle Nodes
labelled
nodes
unlabelled
nodes
mentioned
nodes
text dongle nodes
celebrity
UNIFIED MODEL: NETWORK & TEXT
• For connected users, Network-based models are
more accurate.
• For disconnected users (about 20% of the nodes),
text-based models are more accurate.
• Solution: Utilise both text and network!
• For each test node, attach a text dongle node car-
rying text-based predictions.
• Add the text dongle nodes to seed nodes (like train-
ing nodes).
• Use Modified Adsorption to infer the labels.
“CELEBRITIES” DON’T GEOLOCATE
• “Celebrities” (highly mentioned users) are
connected from everywhere.
• They connect lots of people.
• Solution: Remove users with more than T mentions.
• Results in sparser graphs (tractable inference)
and more accurate geolocation.
TUNING T (TWITTER-US)
2 5 15 50 500 5k
Celebrity threshold T (# of mentions)
700
720
740
760
780
800
820
840
860
Meanerror(inkm)
Mean error
Graph size
105
106
107
108
109
Graphsize(#edges)
Decreasing T results in: sparser graph, lower mean error.
RESULTS
State of the art results over all three datasets!
GEOTEXT TwitterUS TwitterWorld
600
800
1000
1200
1400
1600
MeanError(km)
Network-based Model (This work)
Unified Model (This work)
Network-based: Rahimi et al. (NAACL2015)
Text-based: Rahimi et al. (NAACL2015)
Text-based: Wing and Baldrige (EMNLP2014)
Text-based: Cha et al. (ICWSM2015)
larger dataset
−−−−−−−−−→

More Related Content

What's hot

MS Thesis Presentation
MS Thesis PresentationMS Thesis Presentation
MS Thesis Presentation
Ali Raza
 
AGILE_FinalDay_RobinFrew
AGILE_FinalDay_RobinFrewAGILE_FinalDay_RobinFrew
AGILE_FinalDay_RobinFrew
Robin Frew
 
Content Based Image Retrieval
Content Based Image RetrievalContent Based Image Retrieval
Content Based Image Retrieval
Jane Dane
 
Xiaolin Wang - Managing and Integrating Geography Models in Distributed Envir...
Xiaolin Wang - Managing and Integrating Geography Models in Distributed Envir...Xiaolin Wang - Managing and Integrating Geography Models in Distributed Envir...
Xiaolin Wang - Managing and Integrating Geography Models in Distributed Envir...
grssieee
 
Poster Final
Poster FinalPoster Final
Poster Final
Gireeshma Reddy
 
Compression-based Graph Mining Exploiting Structure Primites
Compression-based Graph Mining Exploiting Structure PrimitesCompression-based Graph Mining Exploiting Structure Primites
Compression-based Graph Mining Exploiting Structure Primites
Werner Hoffmann
 
Daniel Lee STAN
Daniel Lee STANDaniel Lee STAN
Daniel Lee STAN
David LeBauer
 
Improved k-means
Improved k-meansImproved k-means
Improved k-means
Kasun Ranga Wijeweera
 
Provenance Analytics at AAAI Human Computation Conference 2013
Provenance Analytics at AAAI Human Computation Conference 2013Provenance Analytics at AAAI Human Computation Conference 2013
Provenance Analytics at AAAI Human Computation Conference 2013
T Dong Huynh
 
Path loss models comparation in radio mobile communications
Path loss models comparation in radio mobile communicationsPath loss models comparation in radio mobile communications
Path loss models comparation in radio mobile communications
Nguyen Minh Thu
 
MATLAB IMPLEMENTATION OF SELF-ORGANIZING MAPS FOR CLUSTERING OF REMOTE SENSIN...
MATLAB IMPLEMENTATION OF SELF-ORGANIZING MAPS FOR CLUSTERING OF REMOTE SENSIN...MATLAB IMPLEMENTATION OF SELF-ORGANIZING MAPS FOR CLUSTERING OF REMOTE SENSIN...
MATLAB IMPLEMENTATION OF SELF-ORGANIZING MAPS FOR CLUSTERING OF REMOTE SENSIN...
Daksh Raj Chopra
 
lab report 4
lab report 4lab report 4
lab report 4
Selase Kwami
 

What's hot (12)

MS Thesis Presentation
MS Thesis PresentationMS Thesis Presentation
MS Thesis Presentation
 
AGILE_FinalDay_RobinFrew
AGILE_FinalDay_RobinFrewAGILE_FinalDay_RobinFrew
AGILE_FinalDay_RobinFrew
 
Content Based Image Retrieval
Content Based Image RetrievalContent Based Image Retrieval
Content Based Image Retrieval
 
Xiaolin Wang - Managing and Integrating Geography Models in Distributed Envir...
Xiaolin Wang - Managing and Integrating Geography Models in Distributed Envir...Xiaolin Wang - Managing and Integrating Geography Models in Distributed Envir...
Xiaolin Wang - Managing and Integrating Geography Models in Distributed Envir...
 
Poster Final
Poster FinalPoster Final
Poster Final
 
Compression-based Graph Mining Exploiting Structure Primites
Compression-based Graph Mining Exploiting Structure PrimitesCompression-based Graph Mining Exploiting Structure Primites
Compression-based Graph Mining Exploiting Structure Primites
 
Daniel Lee STAN
Daniel Lee STANDaniel Lee STAN
Daniel Lee STAN
 
Improved k-means
Improved k-meansImproved k-means
Improved k-means
 
Provenance Analytics at AAAI Human Computation Conference 2013
Provenance Analytics at AAAI Human Computation Conference 2013Provenance Analytics at AAAI Human Computation Conference 2013
Provenance Analytics at AAAI Human Computation Conference 2013
 
Path loss models comparation in radio mobile communications
Path loss models comparation in radio mobile communicationsPath loss models comparation in radio mobile communications
Path loss models comparation in radio mobile communications
 
MATLAB IMPLEMENTATION OF SELF-ORGANIZING MAPS FOR CLUSTERING OF REMOTE SENSIN...
MATLAB IMPLEMENTATION OF SELF-ORGANIZING MAPS FOR CLUSTERING OF REMOTE SENSIN...MATLAB IMPLEMENTATION OF SELF-ORGANIZING MAPS FOR CLUSTERING OF REMOTE SENSIN...
MATLAB IMPLEMENTATION OF SELF-ORGANIZING MAPS FOR CLUSTERING OF REMOTE SENSIN...
 
lab report 4
lab report 4lab report 4
lab report 4
 

Similar to ACL2015 Poster: Twitter User Geolocation Using a Unified Text and Network Prediction Model

Geotagging Social Media Content with a Refined Language Modelling Approach
Geotagging Social Media Content with a Refined Language Modelling ApproachGeotagging Social Media Content with a Refined Language Modelling Approach
Geotagging Social Media Content with a Refined Language Modelling Approach
REVEAL - Social Media Verification
 
Geotagging Social Media Content with a Refined Language Modelling Approach
Geotagging Social Media Content with a Refined Language Modelling ApproachGeotagging Social Media Content with a Refined Language Modelling Approach
Geotagging Social Media Content with a Refined Language Modelling Approach
Symeon Papadopoulos
 
SERENE 2014 School: Incremental Model Queries over the Cloud
SERENE 2014 School: Incremental Model Queries over the CloudSERENE 2014 School: Incremental Model Queries over the Cloud
SERENE 2014 School: Incremental Model Queries over the Cloud
SERENEWorkshop
 
SERENE 2014 School: Daniel varro serene2014_school
SERENE 2014 School: Daniel varro serene2014_schoolSERENE 2014 School: Daniel varro serene2014_school
SERENE 2014 School: Daniel varro serene2014_school
Henry Muccini
 
Exploiting Text and Network Context for Geolocation of Social Media Users
Exploiting Text and Network Context for Geolocation of Social Media UsersExploiting Text and Network Context for Geolocation of Social Media Users
Exploiting Text and Network Context for Geolocation of Social Media Users
Afshin Rahimi
 
Geolocation twitter-text-network
Geolocation twitter-text-networkGeolocation twitter-text-network
Geolocation twitter-text-network
afshinrahimi1983
 
F14 lec12graphs
F14 lec12graphsF14 lec12graphs
F14 lec12graphs
ankush karwa
 
IncQuery-D: Distributed Incremental Model Queries over the Cloud: Engineerin...
IncQuery-D: Distributed Incremental Model Queries over the Cloud: Engineerin...IncQuery-D: Distributed Incremental Model Queries over the Cloud: Engineerin...
IncQuery-D: Distributed Incremental Model Queries over the Cloud: Engineerin...
Daniel Varro
 
AINL 2016: Alekseev, Nikolenko
AINL 2016: Alekseev, NikolenkoAINL 2016: Alekseev, Nikolenko
AINL 2016: Alekseev, Nikolenko
Lidia Pivovarova
 
Spatial decision support and analytics on a campus scale: bringing GIS, CAD, ...
Spatial decision support and analytics on a campus scale: bringing GIS, CAD, ...Spatial decision support and analytics on a campus scale: bringing GIS, CAD, ...
Spatial decision support and analytics on a campus scale: bringing GIS, CAD, ...
Safe Software
 
Matsunaga crowdsourcing IEEE e-science 2014
Matsunaga crowdsourcing IEEE e-science 2014Matsunaga crowdsourcing IEEE e-science 2014
Matsunaga crowdsourcing IEEE e-science 2014
Andrea Matsunaga
 
WebFML: Synthesizing Feature Models Everywhere (@ SPLC 2014)
WebFML: Synthesizing Feature Models Everywhere (@ SPLC 2014)WebFML: Synthesizing Feature Models Everywhere (@ SPLC 2014)
WebFML: Synthesizing Feature Models Everywhere (@ SPLC 2014)
University of Rennes, INSA Rennes, Inria/IRISA, CNRS
 
Samtec whitepaper
Samtec whitepaperSamtec whitepaper
Samtec whitepaper
john_111
 
【論文紹介】Relay: A New IR for Machine Learning Frameworks
【論文紹介】Relay: A New IR for Machine Learning Frameworks【論文紹介】Relay: A New IR for Machine Learning Frameworks
【論文紹介】Relay: A New IR for Machine Learning Frameworks
Takeo Imai
 
Presentation_BigData_NenaMarin
Presentation_BigData_NenaMarinPresentation_BigData_NenaMarin
Presentation_BigData_NenaMarin
n5712036
 
A technical paper presentation on Evaluation of Deep Learning techniques in S...
A technical paper presentation on Evaluation of Deep Learning techniques in S...A technical paper presentation on Evaluation of Deep Learning techniques in S...
A technical paper presentation on Evaluation of Deep Learning techniques in S...
VarshaR19
 
CLIM Program: Remote Sensing Workshop, An Introduction to Systems and Softwar...
CLIM Program: Remote Sensing Workshop, An Introduction to Systems and Softwar...CLIM Program: Remote Sensing Workshop, An Introduction to Systems and Softwar...
CLIM Program: Remote Sensing Workshop, An Introduction to Systems and Softwar...
The Statistical and Applied Mathematical Sciences Institute
 
(Slides) A Method for Distributed Computaion of Semi-Optimal Multicast Tree i...
(Slides) A Method for Distributed Computaion of Semi-Optimal Multicast Tree i...(Slides) A Method for Distributed Computaion of Semi-Optimal Multicast Tree i...
(Slides) A Method for Distributed Computaion of Semi-Optimal Multicast Tree i...
Naoki Shibata
 
RAMSES: Robust Analytic Models for Science at Extreme Scales
RAMSES: Robust Analytic Models for Science at Extreme ScalesRAMSES: Robust Analytic Models for Science at Extreme Scales
RAMSES: Robust Analytic Models for Science at Extreme Scales
Ian Foster
 
Taming Big Data!
Taming Big Data!Taming Big Data!
Taming Big Data!
Ian Foster
 

Similar to ACL2015 Poster: Twitter User Geolocation Using a Unified Text and Network Prediction Model (20)

Geotagging Social Media Content with a Refined Language Modelling Approach
Geotagging Social Media Content with a Refined Language Modelling ApproachGeotagging Social Media Content with a Refined Language Modelling Approach
Geotagging Social Media Content with a Refined Language Modelling Approach
 
Geotagging Social Media Content with a Refined Language Modelling Approach
Geotagging Social Media Content with a Refined Language Modelling ApproachGeotagging Social Media Content with a Refined Language Modelling Approach
Geotagging Social Media Content with a Refined Language Modelling Approach
 
SERENE 2014 School: Incremental Model Queries over the Cloud
SERENE 2014 School: Incremental Model Queries over the CloudSERENE 2014 School: Incremental Model Queries over the Cloud
SERENE 2014 School: Incremental Model Queries over the Cloud
 
SERENE 2014 School: Daniel varro serene2014_school
SERENE 2014 School: Daniel varro serene2014_schoolSERENE 2014 School: Daniel varro serene2014_school
SERENE 2014 School: Daniel varro serene2014_school
 
Exploiting Text and Network Context for Geolocation of Social Media Users
Exploiting Text and Network Context for Geolocation of Social Media UsersExploiting Text and Network Context for Geolocation of Social Media Users
Exploiting Text and Network Context for Geolocation of Social Media Users
 
Geolocation twitter-text-network
Geolocation twitter-text-networkGeolocation twitter-text-network
Geolocation twitter-text-network
 
F14 lec12graphs
F14 lec12graphsF14 lec12graphs
F14 lec12graphs
 
IncQuery-D: Distributed Incremental Model Queries over the Cloud: Engineerin...
IncQuery-D: Distributed Incremental Model Queries over the Cloud: Engineerin...IncQuery-D: Distributed Incremental Model Queries over the Cloud: Engineerin...
IncQuery-D: Distributed Incremental Model Queries over the Cloud: Engineerin...
 
AINL 2016: Alekseev, Nikolenko
AINL 2016: Alekseev, NikolenkoAINL 2016: Alekseev, Nikolenko
AINL 2016: Alekseev, Nikolenko
 
Spatial decision support and analytics on a campus scale: bringing GIS, CAD, ...
Spatial decision support and analytics on a campus scale: bringing GIS, CAD, ...Spatial decision support and analytics on a campus scale: bringing GIS, CAD, ...
Spatial decision support and analytics on a campus scale: bringing GIS, CAD, ...
 
Matsunaga crowdsourcing IEEE e-science 2014
Matsunaga crowdsourcing IEEE e-science 2014Matsunaga crowdsourcing IEEE e-science 2014
Matsunaga crowdsourcing IEEE e-science 2014
 
WebFML: Synthesizing Feature Models Everywhere (@ SPLC 2014)
WebFML: Synthesizing Feature Models Everywhere (@ SPLC 2014)WebFML: Synthesizing Feature Models Everywhere (@ SPLC 2014)
WebFML: Synthesizing Feature Models Everywhere (@ SPLC 2014)
 
Samtec whitepaper
Samtec whitepaperSamtec whitepaper
Samtec whitepaper
 
【論文紹介】Relay: A New IR for Machine Learning Frameworks
【論文紹介】Relay: A New IR for Machine Learning Frameworks【論文紹介】Relay: A New IR for Machine Learning Frameworks
【論文紹介】Relay: A New IR for Machine Learning Frameworks
 
Presentation_BigData_NenaMarin
Presentation_BigData_NenaMarinPresentation_BigData_NenaMarin
Presentation_BigData_NenaMarin
 
A technical paper presentation on Evaluation of Deep Learning techniques in S...
A technical paper presentation on Evaluation of Deep Learning techniques in S...A technical paper presentation on Evaluation of Deep Learning techniques in S...
A technical paper presentation on Evaluation of Deep Learning techniques in S...
 
CLIM Program: Remote Sensing Workshop, An Introduction to Systems and Softwar...
CLIM Program: Remote Sensing Workshop, An Introduction to Systems and Softwar...CLIM Program: Remote Sensing Workshop, An Introduction to Systems and Softwar...
CLIM Program: Remote Sensing Workshop, An Introduction to Systems and Softwar...
 
(Slides) A Method for Distributed Computaion of Semi-Optimal Multicast Tree i...
(Slides) A Method for Distributed Computaion of Semi-Optimal Multicast Tree i...(Slides) A Method for Distributed Computaion of Semi-Optimal Multicast Tree i...
(Slides) A Method for Distributed Computaion of Semi-Optimal Multicast Tree i...
 
RAMSES: Robust Analytic Models for Science at Extreme Scales
RAMSES: Robust Analytic Models for Science at Extreme ScalesRAMSES: Robust Analytic Models for Science at Extreme Scales
RAMSES: Robust Analytic Models for Science at Extreme Scales
 
Taming Big Data!
Taming Big Data!Taming Big Data!
Taming Big Data!
 

Recently uploaded

06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
nyfuhyz
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
Timothy Spann
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
nuttdpt
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
jitskeb
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Kiwi Creative
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
Social Samosa
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
roli9797
 
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdfUdemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Fernanda Palhano
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
Sm321
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
Social Samosa
 
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
74nqk8xf
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
v7oacc3l
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
manishkhaire30
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
AlessioFois2
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
AndrzejJarynowski
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
Timothy Spann
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
Bill641377
 

Recently uploaded (20)

06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
 
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdfUdemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
 
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
 

ACL2015 Poster: Twitter User Geolocation Using a Unified Text and Network Prediction Model

  • 1. Twitter User Geolocation Using a Unified Text and Network Prediction Model Afshin Rahimi, Trevor Cohn and Timothy Baldwin Department of Computing and Information Systems, The University of Melbourne OVERVIEW Task: Where does @ShvwnK live? Input: user, concatenated tweet text, mention-list Output: latitude/longitude (known for training users, predicted for test users) Datasets: 3 Twitter geolocation datasets (#users in parenthesis) GeoText (9.5K), Twitter-US (450K) and Twitter-World (1.4M). TEXT-BASED MODEL Logistic regression with l1 regularisation over k-d tree discretisation of latitude/longitude. top features of NYC use of “upstate” in U.S. NETWORK-BASED MODEL Label propagation in a collapsed network: • Build the graph using @-mentions. • Use training nodes as seed (labelled samples). • Infer the test labels by Modified Adsorption (Talukar and Crammer, 2009). argmin ˆY c( ˆY ) = l µ1 Match seed (Yl − ˆYl)T S(Yl − ˆYl) + µ2 ˆY T l L ˆYl Smooth labels 0.7 0.5 0.01 new label estimate FROM @-MENTION TO COLLAPSED NETWORK @-mention Network Collapsed Network + Text Dongle Nodes labelled nodes unlabelled nodes mentioned nodes text dongle nodes celebrity UNIFIED MODEL: NETWORK & TEXT • For connected users, Network-based models are more accurate. • For disconnected users (about 20% of the nodes), text-based models are more accurate. • Solution: Utilise both text and network! • For each test node, attach a text dongle node car- rying text-based predictions. • Add the text dongle nodes to seed nodes (like train- ing nodes). • Use Modified Adsorption to infer the labels. “CELEBRITIES” DON’T GEOLOCATE • “Celebrities” (highly mentioned users) are connected from everywhere. • They connect lots of people. • Solution: Remove users with more than T mentions. • Results in sparser graphs (tractable inference) and more accurate geolocation. TUNING T (TWITTER-US) 2 5 15 50 500 5k Celebrity threshold T (# of mentions) 700 720 740 760 780 800 820 840 860 Meanerror(inkm) Mean error Graph size 105 106 107 108 109 Graphsize(#edges) Decreasing T results in: sparser graph, lower mean error. RESULTS State of the art results over all three datasets! GEOTEXT TwitterUS TwitterWorld 600 800 1000 1200 1400 1600 MeanError(km) Network-based Model (This work) Unified Model (This work) Network-based: Rahimi et al. (NAACL2015) Text-based: Rahimi et al. (NAACL2015) Text-based: Wing and Baldrige (EMNLP2014) Text-based: Cha et al. (ICWSM2015) larger dataset −−−−−−−−−→