ACL2015 Poster: Twitter User Geolocation Using a Unified Text and Network Prediction Model

•

0 likes•388 views

Afshin Rahimi

Twitter User Geolocation Using a Unified Text and Network Prediction Model

Data & Analytics

Twitter User Geolocation Using a Uniﬁed Text and
Network Prediction Model
Afshin Rahimi, Trevor Cohn and Timothy Baldwin
Department of Computing and Information Systems, The University of Melbourne
OVERVIEW
Task: Where does @ShvwnK live?
Input: user, concatenated tweet text, mention-list
Output: latitude/longitude
(known for training users, predicted for test users)
Datasets: 3 Twitter geolocation datasets (#users in parenthesis)
GeoText (9.5K), Twitter-US (450K) and Twitter-World (1.4M).
TEXT-BASED MODEL
Logistic regression with l1 regularisation
over k-d tree discretisation of latitude/longitude.
top features of NYC use of “upstate” in U.S.
NETWORK-BASED MODEL
Label propagation in a collapsed network:
• Build the graph using @-mentions.
• Use training nodes as seed (labelled samples).
• Infer the test labels by Modiﬁed Adsorption (Talukar
and Crammer, 2009).
argmin
ˆY
c( ˆY ) =
l
µ1
Match seed
(Yl − ˆYl)T
S(Yl − ˆYl) + µ2
ˆY T
l L ˆYl
Smooth labels
0.7 0.5
0.01
new label
estimate
FROM @-MENTION TO COLLAPSED NETWORK
@-mention Network Collapsed Network + Text Dongle Nodes
labelled
nodes
unlabelled
nodes
mentioned
nodes
text dongle nodes
celebrity
UNIFIED MODEL: NETWORK & TEXT
• For connected users, Network-based models are
more accurate.
• For disconnected users (about 20% of the nodes),
text-based models are more accurate.
• Solution: Utilise both text and network!
• For each test node, attach a text dongle node car-
rying text-based predictions.
• Add the text dongle nodes to seed nodes (like train-
ing nodes).
• Use Modiﬁed Adsorption to infer the labels.
“CELEBRITIES” DON’T GEOLOCATE
• “Celebrities” (highly mentioned users) are
connected from everywhere.
• They connect lots of people.
• Solution: Remove users with more than T mentions.
• Results in sparser graphs (tractable inference)
and more accurate geolocation.
TUNING T (TWITTER-US)
2 5 15 50 500 5k
Celebrity threshold T (# of mentions)
700
720
740
760
780
800
820
840
860
Meanerror(inkm)
Mean error
Graph size
105
106
107
108
109
Graphsize(#edges)
Decreasing T results in: sparser graph, lower mean error.
RESULTS
State of the art results over all three datasets!
GEOTEXT TwitterUS TwitterWorld
600
800
1000
1200
1400
1600
MeanError(km)
Network-based Model (This work)
Unified Model (This work)
Network-based: Rahimi et al. (NAACL2015)
Text-based: Rahimi et al. (NAACL2015)
Text-based: Wing and Baldrige (EMNLP2014)
Text-based: Cha et al. (ICWSM2015)
larger dataset
−−−−−−−−−→

What's hot

MS Thesis PresentationAli Raza

AGILE_FinalDay_RobinFrewRobin Frew

Content Based Image RetrievalJane Dane

Xiaolin Wang - Managing and Integrating Geography Models in Distributed Envir...grssieee

Poster FinalGireeshma Reddy

Compression-based Graph Mining Exploiting Structure PrimitesWerner Hoffmann

Daniel Lee STANDavid LeBauer

Improved k-meansKasun Ranga Wijeweera

Provenance Analytics at AAAI Human Computation Conference 2013T Dong Huynh

Path loss models comparation in radio mobile communicationsNguyen Minh Thu

MATLAB IMPLEMENTATION OF SELF-ORGANIZING MAPS FOR CLUSTERING OF REMOTE SENSIN...Daksh Raj Chopra

lab report 4Selase Kwami

What's hot (12)

MS Thesis Presentation

AGILE_FinalDay_RobinFrew

Content Based Image Retrieval

Xiaolin Wang - Managing and Integrating Geography Models in Distributed Envir...

Poster Final

Compression-based Graph Mining Exploiting Structure Primites

Daniel Lee STAN

Improved k-means

Provenance Analytics at AAAI Human Computation Conference 2013

Path loss models comparation in radio mobile communications

MATLAB IMPLEMENTATION OF SELF-ORGANIZING MAPS FOR CLUSTERING OF REMOTE SENSIN...

lab report 4

Similar to ACL2015 Poster: Twitter User Geolocation Using a Unified Text and Network Prediction Model

Geotagging Social Media Content with a Refined Language Modelling ApproachSymeon Papadopoulos

Geotagging Social Media Content with a Refined Language Modelling ApproachREVEAL - Social Media Verification

SERENE 2014 School: Daniel varro serene2014_schoolHenry Muccini

SERENE 2014 School: Incremental Model Queries over the CloudSERENEWorkshop

Exploiting Text and Network Context for Geolocation of Social Media UsersAfshin Rahimi

Geolocation twitter-text-networkafshinrahimi1983

F14 lec12graphsankush karwa

IncQuery-D: Distributed Incremental Model Queries over the Cloud: Engineerin...Daniel Varro

AINL 2016: Alekseev, NikolenkoLidia Pivovarova

Spatial decision support and analytics on a campus scale: bringing GIS, CAD, ...Safe Software

Matsunaga crowdsourcing IEEE e-science 2014Andrea Matsunaga

WebFML: Synthesizing Feature Models Everywhere (@ SPLC 2014)University of Rennes, INSA Rennes, Inria/IRISA, CNRS

Samtec whitepaperjohn_111

【論文紹介】Relay: A New IR for Machine Learning FrameworksTakeo Imai

Presentation_BigData_NenaMarinn5712036

A technical paper presentation on Evaluation of Deep Learning techniques in S...VarshaR19

CLIM Program: Remote Sensing Workshop, An Introduction to Systems and Softwar...The Statistical and Applied Mathematical Sciences Institute

(Slides) A Method for Distributed Computaion of Semi-Optimal Multicast Tree i...Naoki Shibata

RAMSES: Robust Analytic Models for Science at Extreme ScalesIan Foster

Taming Big Data!Ian Foster

Recently uploaded

科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss

DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett

办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss

Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534

꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083

Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...ThinkInnovation

04242024_CCC TUG_Joins and Relationshipsccctableauusergroup

Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna

办理学位证纽约大学毕业证(NYU毕业证书）原版一比一fhwihughh

20240419 - Measurecamp Amsterdam - SAM.pdfHuman37

专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss

VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor

vip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Bookmanojkuma9823

VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor

Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408

9654467111 Call Girls In Munirka Hotel And Home ServiceSapana Sha

ASML's Taxonomy Adventure by Daniel Cantervoginip

INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman

Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAbdelrhman abooda

毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss

Recently uploaded (20)

科罗拉多大学波尔得分校毕业证学位证成绩单-可办理

DBA Basics: Getting Started with Performance Tuning.pdf

办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一

Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...

꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call

Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...

04242024_CCC TUG_Joins and Relationships

Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...

办理学位证纽约大学毕业证(NYU毕业证书）原版一比一

20240419 - Measurecamp Amsterdam - SAM.pdf

专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改

VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...

vip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Book

VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130

Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps

9654467111 Call Girls In Munirka Hotel And Home Service

ASML's Taxonomy Adventure by Daniel Canter

INTERNSHIP ON PURBASHA COMPOSITE TEX LTD

Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx

毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree

ACL2015 Poster: Twitter User Geolocation Using a Unified Text and Network Prediction Model

1. Twitter User Geolocation Using a Unified Text and Network Prediction Model Afshin Rahimi, Trevor Cohn and Timothy Baldwin Department of Computing and Information Systems, The University of Melbourne OVERVIEW Task: Where does @ShvwnK live? Input: user, concatenated tweet text, mention-list Output: latitude/longitude (known for training users, predicted for test users) Datasets: 3 Twitter geolocation datasets (#users in parenthesis) GeoText (9.5K), Twitter-US (450K) and Twitter-World (1.4M). TEXT-BASED MODEL Logistic regression with l1 regularisation over k-d tree discretisation of latitude/longitude. top features of NYC use of “upstate” in U.S. NETWORK-BASED MODEL Label propagation in a collapsed network: • Build the graph using @-mentions. • Use training nodes as seed (labelled samples). • Infer the test labels by Modified Adsorption (Talukar and Crammer, 2009). argmin ˆY c( ˆY ) = l µ1 Match seed (Yl − ˆYl)T S(Yl − ˆYl) + µ2 ˆY T l L ˆYl Smooth labels 0.7 0.5 0.01 new label estimate FROM @-MENTION TO COLLAPSED NETWORK @-mention Network Collapsed Network + Text Dongle Nodes labelled nodes unlabelled nodes mentioned nodes text dongle nodes celebrity UNIFIED MODEL: NETWORK & TEXT • For connected users, Network-based models are more accurate. • For disconnected users (about 20% of the nodes), text-based models are more accurate. • Solution: Utilise both text and network! • For each test node, attach a text dongle node car- rying text-based predictions. • Add the text dongle nodes to seed nodes (like training nodes). • Use Modified Adsorption to infer the labels. “CELEBRITIES” DON’T GEOLOCATE • “Celebrities” (highly mentioned users) are connected from everywhere. • They connect lots of people. • Solution: Remove users with more than T mentions. • Results in sparser graphs (tractable inference) and more accurate geolocation. TUNING T (TWITTER-US) 2 5 15 50 500 5k Celebrity threshold T (# of mentions) 700 720 740 760 780 800 820 840 860 Meanerror(inkm) Mean error Graph size 105 106 107 108 109 Graphsize(#edges) Decreasing T results in: sparser graph, lower mean error. RESULTS State of the art results over all three datasets! GEOTEXT TwitterUS TwitterWorld 600 800 1000 1200 1400 1600 MeanError(km) Network-based Model (This work) Unified Model (This work) Network-based: Rahimi et al. (NAACL2015) Text-based: Rahimi et al. (NAACL2015) Text-based: Wing and Baldrige (EMNLP2014) Text-based: Cha et al. (ICWSM2015) larger dataset −−−−−−−−−→

ACL2015 Poster: Twitter User Geolocation Using a Unified Text and Network Prediction Model

Recommended

Recommended

More Related Content

What's hot

What's hot (12)

Similar to ACL2015 Poster: Twitter User Geolocation Using a Unified Text and Network Prediction Model

Similar to ACL2015 Poster: Twitter User Geolocation Using a Unified Text and Network Prediction Model (20)

Recently uploaded

Recently uploaded (20)

ACL2015 Poster: Twitter User Geolocation Using a Unified Text and Network Prediction Model