Online 
User 
Loca.on 
Inference 
Exploi'ng 
Spa'otemporal 
Correla'ons 
in 
Social 
Streams 
Yuto 
Yamaguchi†, 
Toshiyuki 
Amagasa†, 
Hiroyuki 
Kitagawa†, 
and 
Yohei 
Ikawa‡ 
† 
University 
of 
Tsukuba 
‡ 
IBM 
Research 
-­‐ 
Tokyo 
14/11/05 
CIKM 
2014 
-­‐ 
Yuto 
Yamaguchi 
1
Tweets 
that 
help 
us 
14/11/05 
CIKM 
2014 
-­‐ 
Yuto 
Yamaguchi 
2 
Shaked 
!!! 
Thunder 
We 
can 
infer 
your 
home 
loca'on 
immediately
Social 
and 
Loca'on 
14/11/05 
CIKM 
2014 
-­‐ 
Yuto 
Yamaguchi 
3 
• Lots 
of 
social 
media 
users 
• Frequent 
updates 
• User 
home 
loca'ons
Loca'on-­‐based 
Applica'ons 
14/11/05 
CIKM 
2014 
-­‐ 
Yuto 
Yamaguchi 
4 
• Event 
Detec'on 
• Loca'on-­‐based 
Marke'ng 
• Epidemics 
Analysis
Lack 
of 
home 
loca'ons 
14/11/05 
CIKM 
2014 
-­‐ 
Yuto 
Yamaguchi 
5 
Most 
users 
do 
not 
disclose 
their 
home 
loca'ons 
• 74% 
of 
TwiXer 
users 
[Cheng+, 
10] 
• 94% 
of 
Facebooks 
users 
[Backstrom+, 
10]
Our 
Objec've 
14/11/05 
CIKM 
2014 
-­‐ 
Yuto 
Yamaguchi 
6 
To 
infer 
home 
loca'ons 
of 
social 
media 
users
Focus 
& 
Contribu'ons 
14/11/05 
CIKM 
2014 
-­‐ 
Yuto 
Yamaguchi 
7 
〜〜〜〜〜〜〜〜 
〜〜〜〜〜〜〜〜 
Time 
〜〜〜〜〜〜〜〜 
Our 
Focus 
Social 
contents 
are 
not 
sta-c, 
but 
like 
a 
stream 
Our 
Contribu.ons 
1. Online 
& 
Incremental 
Inference 
2. Exploi'ng 
Spa'otemporal 
features
Contribu'on 
1 
ONLINE 
& 
INCREMENTAL 
INFERENCE 
14/11/05 
CIKM 
2014 
-­‐ 
Yuto 
Yamaguchi 
8
Exis'ng 
methods: 
Batch 
inference 
Batch 
Input Inference 
Results 
14/11/05 
CIKM 
2014 
-­‐ 
Yuto 
Yamaguchi 
9 
Exis'ng 
Methods 
Perform 
batch 
inference 
just 
once 
acer 
“enough 
data” 
is 
stored 
è 
Can’t 
update 
the 
results 
L 
è 
What 
is 
“enough”? 
L 
è 
When 
will 
it 
be 
enough? 
L
Our 
method: 
Online 
& 
incremental 
inference 
method 
Social 
Stream Inference 
Results 
14/11/05 
CIKM 
2014 
-­‐ 
Yuto 
Yamaguchi 
10 
Online 
& 
incremental 
method 
Perform 
loca'on 
inference 
every 
'me 
new 
post 
arrives 
è 
Can 
keep 
the 
results 
up 
to 
date 
J
Contribu'on 
2 
EXPLOITING 
SPATIOTEMPORAL 
FEATURES 
14/11/05 
CIKM 
2014 
-­‐ 
Yuto 
Yamaguchi 
11
Local 
words 
14/11/05 
CIKM 
2014 
-­‐ 
Yuto 
Yamaguchi 
12 
地震だ! 
Steelers! 
Home 
loca.on 
known 
Local 
words: 
strongly 
correlated 
to 
a 
specific 
loca.on 
Steelers! 
Home 
loca.on 
unknown 
Infer 
PiXsburgh?
Exis'ng 
methods: 
Only 
sta'c 
features 
14/11/05 
CIKM 
2014 
-­‐ 
Yuto 
Yamaguchi 
13 
地震だ! Thunder! 
Home 
loca'on 
known 
Thunder! 
Home 
loca'on 
unknown 
“Thunderbolt” 
is 
not 
a 
local 
word 
sta'cally 
Thunder! 
Thunder! 
Thunder! 
Home 
loca'on 
known 
Home 
loca'on 
known 
Home 
loca'on 
known 
è 
Can’t 
u.lize 
this 
word 
L 
Can’t 
infer
Our 
method: 
Spa'otemporal 
correla'on 
14/11/05 
CIKM 
2014 
-­‐ 
Yuto 
Yamaguchi 
14 
地震だ! Thunder! 
Home 
loca'on 
known 
Thunder! 
Home 
loca'on 
unknown 
“Thunderbolt” 
can 
be 
a 
local 
word 
temporally 
è 
Our 
method 
can 
u.lize 
this 
word 
J 
In 
a 
specific 
.me 
period 
Can 
infer
OLIM: 
Online 
Loca'on 
Inference 
Method 
PROPOSED 
METHOD 
14/11/05 
CIKM 
2014 
-­‐ 
Yuto 
Yamaguchi 
15
The 
Algorithm 
14/11/05 
CIKM 
2014 
-­‐ 
Yuto 
Yamaguchi 
16 
1. divideMap() 
2. calcPopula'onDistribu'on() 
3. for 
post 
p 
from 
SocialStream 
4. 
user 
u 
<-­‐ 
getUser(p) 
5. 
if 
u 
is 
loca'on-­‐known 
6. 
updateLocalWords(p) 
7. 
else 
8. 
updateUserLoca'on(u,p) 
. 
Preprocessing 
Main 
Slide 
17 
Slide 
20 
Slide 
24
divideMap 
14/11/05 
CIKM 
2014 
-­‐ 
Yuto 
Yamaguchi 
Each 
region 
is 
treated 
as 
a 
categorical 
loca'on 
17 
Quadtree 
decomposi.on 
L = l1, l2,…, lK { } 
Loca.on 
inference 
is 
reduced 
to 
a 
classifica.on 
problem
Popula'on 
distribu'on 
14/11/05 
CIKM 
2014 
-­‐ 
Yuto 
Yamaguchi 
18 
… 
l1 l2 l3 l4 l5 lK 
What 
frac.on 
of 
loca.on-­‐known 
users 
live 
in 
each 
loca.on 
Used 
for 
local 
words 
extrac.on 
Categorical 
distribu'on
The 
Algorithm 
14/11/05 
CIKM 
2014 
-­‐ 
Yuto 
Yamaguchi 
19 
1. divideMap() 
2. calcPopula'onDistribu'on() 
3. for 
post 
p 
from 
SocialStream 
4. 
user 
u 
<-­‐ 
getUser(p) 
5. 
if 
u 
is 
loca'on-­‐known 
6. 
updateLocalWords(p) 
7. 
else 
8. 
updateUserLoca'on(u,p) 
. 
Preprocessing 
Main 
Slide 
17 
Slide 
20 
Slide 
24
updateLocalWords: 
Sliding 
window 
and 
word 
distribu'on 
14/11/05 
CIKM 
2014 
-­‐ 
Yuto 
Yamaguchi 
20 
Sliding 
window 
with 
length 
N 
e.g.) 
N 
= 
5 
Word 
distribu'on 
… 
… 
l1 l2 l3 l4 l5 lK 
Where 
the 
word 
posted 
from?
updateLocalWords: 
Local 
Word 
Intui'on 
14/11/05 
CIKM 
2014 
-­‐ 
Yuto 
Yamaguchi 
21 
… 
… 
… 
l1 l2 l3 l4 l5 lK 
… 
… 
l1 l2 l3 l4 l5 lK 
Popula'on 
distribu'on 
Word 
distribu'on 
Word 
distribu'on 
KL 
Divergence 
small 
Local 
word 
… 
l1 l2 l3 l4 l5 lK 
Detail
updateLocalWords: 
Online 
upda'ng 
Window 
length 
N 
is 
14/11/05 
CIKM 
2014 
-­‐ 
Yuto 
Yamaguchi 
22 
Detail 
fixed 
We 
can 
update 
KL 
in 
O(1) 
every 
.me 
new 
post 
arrives 
J
The 
Algorithm 
14/11/05 
CIKM 
2014 
-­‐ 
Yuto 
Yamaguchi 
23 
1. divideMap() 
2. calcPopula'onDistribu'on() 
3. for 
post 
p 
from 
SocialStream 
4. 
user 
u 
<-­‐ 
getUser(p) 
5. 
if 
u 
is 
loca'on-­‐known 
6. 
updateLocalWords(p) 
7. 
else 
8. 
updateUserLoca'on(u,p) 
. 
Preprocessing 
Main 
Slide 
17 
Slide 
20 
Slide 
24
updateUserLoca'on: 
user 
distribu'on 
14/11/05 
CIKM 
2014 
-­‐ 
Yuto 
Yamaguchi 
24 
… 
l1 l2 l3 l4 l5 lK 
Denotes 
how 
likely 
this 
user 
lives 
in 
each 
loca'on 
u 
User 
distribu'on 
of 
u
updateUserLoca'on: 
update 
Word 
distribu'on 
of 
w 
… 
w 
Detail 
14/11/05 
CIKM 
2014 
-­‐ 
Yuto 
Yamaguchi 
posterior 
25 
… 
prior 
l1 l2 l3 l4 l5 lK 
… 
update 
l1 l2 l3 l4 l5 lK 
If 
user 
u 
posts 
local 
word 
w: 
… 
l1 l2 l3 l4 l5 lK 
Dirichlet-­‐Mul.nomial 
Compound 
for 
Bayesian 
updates
Accuracy 
& 
Costs 
EXPERIMENTS 
14/11/05 
CIKM 
2014 
-­‐ 
Yuto 
Yamaguchi 
26
Data 
from 
TwiXer 
• Data 
size 
– 200K 
loca'on-­‐known 
users 
in 
Japan 
• Geocode 
loca'on 
profiles 
into 
coordinates 
– 200 
tweets 
for 
each 
user 
(40M 
in 
total) 
– 34M 
follow 
edges 
(for 
exis'ng 
methods) 
• 90% 
for 
training; 
5% 
for 
valida'on; 
5% 
for 
test 
14/11/05 
CIKM 
2014 
-­‐ 
Yuto 
Yamaguchi 
27
Inference 
accuracy 
14/11/05 
CIKM 
2014 
-­‐ 
Yuto 
Yamaguchi 
Exis.ng 
methods 
28
Cost 
per 
update 
Feed 
40M 
tweets 
in 
the 
dataset 
chronologically 
14/11/05 
CIKM 
2014 
-­‐ 
Yuto 
Yamaguchi 
29 
Be]er 
Variants 
of 
ours 
Exis.ng 
methods
Conclusion 
• Proposed 
loca'on 
inference 
method 
– online 
& 
incremental 
inference 
• Constant 
'me 
complexity 
– exploi'ng 
spa'otemporal 
correla'on 
• BeXer 
accuracy 
14/11/05 
CIKM 
2014 
-­‐ 
Yuto 
Yamaguchi 
30

Online User Location Inference Exploiting Spatiotemporal Correlations in Social Streams

  • 1.
    Online User Loca.on Inference Exploi'ng Spa'otemporal Correla'ons in Social Streams Yuto Yamaguchi†, Toshiyuki Amagasa†, Hiroyuki Kitagawa†, and Yohei Ikawa‡ † University of Tsukuba ‡ IBM Research -­‐ Tokyo 14/11/05 CIKM 2014 -­‐ Yuto Yamaguchi 1
  • 2.
    Tweets that help us 14/11/05 CIKM 2014 -­‐ Yuto Yamaguchi 2 Shaked !!! Thunder We can infer your home loca'on immediately
  • 3.
    Social and Loca'on 14/11/05 CIKM 2014 -­‐ Yuto Yamaguchi 3 • Lots of social media users • Frequent updates • User home loca'ons
  • 4.
    Loca'on-­‐based Applica'ons 14/11/05 CIKM 2014 -­‐ Yuto Yamaguchi 4 • Event Detec'on • Loca'on-­‐based Marke'ng • Epidemics Analysis
  • 5.
    Lack of home loca'ons 14/11/05 CIKM 2014 -­‐ Yuto Yamaguchi 5 Most users do not disclose their home loca'ons • 74% of TwiXer users [Cheng+, 10] • 94% of Facebooks users [Backstrom+, 10]
  • 6.
    Our Objec've 14/11/05 CIKM 2014 -­‐ Yuto Yamaguchi 6 To infer home loca'ons of social media users
  • 7.
    Focus & Contribu'ons 14/11/05 CIKM 2014 -­‐ Yuto Yamaguchi 7 〜〜〜〜〜〜〜〜 〜〜〜〜〜〜〜〜 Time 〜〜〜〜〜〜〜〜 Our Focus Social contents are not sta-c, but like a stream Our Contribu.ons 1. Online & Incremental Inference 2. Exploi'ng Spa'otemporal features
  • 8.
    Contribu'on 1 ONLINE & INCREMENTAL INFERENCE 14/11/05 CIKM 2014 -­‐ Yuto Yamaguchi 8
  • 9.
    Exis'ng methods: Batch inference Batch Input Inference Results 14/11/05 CIKM 2014 -­‐ Yuto Yamaguchi 9 Exis'ng Methods Perform batch inference just once acer “enough data” is stored è Can’t update the results L è What is “enough”? L è When will it be enough? L
  • 10.
    Our method: Online & incremental inference method Social Stream Inference Results 14/11/05 CIKM 2014 -­‐ Yuto Yamaguchi 10 Online & incremental method Perform loca'on inference every 'me new post arrives è Can keep the results up to date J
  • 11.
    Contribu'on 2 EXPLOITING SPATIOTEMPORAL FEATURES 14/11/05 CIKM 2014 -­‐ Yuto Yamaguchi 11
  • 12.
    Local words 14/11/05 CIKM 2014 -­‐ Yuto Yamaguchi 12 地震だ! Steelers! Home loca.on known Local words: strongly correlated to a specific loca.on Steelers! Home loca.on unknown Infer PiXsburgh?
  • 13.
    Exis'ng methods: Only sta'c features 14/11/05 CIKM 2014 -­‐ Yuto Yamaguchi 13 地震だ! Thunder! Home loca'on known Thunder! Home loca'on unknown “Thunderbolt” is not a local word sta'cally Thunder! Thunder! Thunder! Home loca'on known Home loca'on known Home loca'on known è Can’t u.lize this word L Can’t infer
  • 14.
    Our method: Spa'otemporal correla'on 14/11/05 CIKM 2014 -­‐ Yuto Yamaguchi 14 地震だ! Thunder! Home loca'on known Thunder! Home loca'on unknown “Thunderbolt” can be a local word temporally è Our method can u.lize this word J In a specific .me period Can infer
  • 15.
    OLIM: Online Loca'on Inference Method PROPOSED METHOD 14/11/05 CIKM 2014 -­‐ Yuto Yamaguchi 15
  • 16.
    The Algorithm 14/11/05 CIKM 2014 -­‐ Yuto Yamaguchi 16 1. divideMap() 2. calcPopula'onDistribu'on() 3. for post p from SocialStream 4. user u <-­‐ getUser(p) 5. if u is loca'on-­‐known 6. updateLocalWords(p) 7. else 8. updateUserLoca'on(u,p) . Preprocessing Main Slide 17 Slide 20 Slide 24
  • 17.
    divideMap 14/11/05 CIKM 2014 -­‐ Yuto Yamaguchi Each region is treated as a categorical loca'on 17 Quadtree decomposi.on L = l1, l2,…, lK { } Loca.on inference is reduced to a classifica.on problem
  • 18.
    Popula'on distribu'on 14/11/05 CIKM 2014 -­‐ Yuto Yamaguchi 18 … l1 l2 l3 l4 l5 lK What frac.on of loca.on-­‐known users live in each loca.on Used for local words extrac.on Categorical distribu'on
  • 19.
    The Algorithm 14/11/05 CIKM 2014 -­‐ Yuto Yamaguchi 19 1. divideMap() 2. calcPopula'onDistribu'on() 3. for post p from SocialStream 4. user u <-­‐ getUser(p) 5. if u is loca'on-­‐known 6. updateLocalWords(p) 7. else 8. updateUserLoca'on(u,p) . Preprocessing Main Slide 17 Slide 20 Slide 24
  • 20.
    updateLocalWords: Sliding window and word distribu'on 14/11/05 CIKM 2014 -­‐ Yuto Yamaguchi 20 Sliding window with length N e.g.) N = 5 Word distribu'on … … l1 l2 l3 l4 l5 lK Where the word posted from?
  • 21.
    updateLocalWords: Local Word Intui'on 14/11/05 CIKM 2014 -­‐ Yuto Yamaguchi 21 … … … l1 l2 l3 l4 l5 lK … … l1 l2 l3 l4 l5 lK Popula'on distribu'on Word distribu'on Word distribu'on KL Divergence small Local word … l1 l2 l3 l4 l5 lK Detail
  • 22.
    updateLocalWords: Online upda'ng Window length N is 14/11/05 CIKM 2014 -­‐ Yuto Yamaguchi 22 Detail fixed We can update KL in O(1) every .me new post arrives J
  • 23.
    The Algorithm 14/11/05 CIKM 2014 -­‐ Yuto Yamaguchi 23 1. divideMap() 2. calcPopula'onDistribu'on() 3. for post p from SocialStream 4. user u <-­‐ getUser(p) 5. if u is loca'on-­‐known 6. updateLocalWords(p) 7. else 8. updateUserLoca'on(u,p) . Preprocessing Main Slide 17 Slide 20 Slide 24
  • 24.
    updateUserLoca'on: user distribu'on 14/11/05 CIKM 2014 -­‐ Yuto Yamaguchi 24 … l1 l2 l3 l4 l5 lK Denotes how likely this user lives in each loca'on u User distribu'on of u
  • 25.
    updateUserLoca'on: update Word distribu'on of w … w Detail 14/11/05 CIKM 2014 -­‐ Yuto Yamaguchi posterior 25 … prior l1 l2 l3 l4 l5 lK … update l1 l2 l3 l4 l5 lK If user u posts local word w: … l1 l2 l3 l4 l5 lK Dirichlet-­‐Mul.nomial Compound for Bayesian updates
  • 26.
    Accuracy & Costs EXPERIMENTS 14/11/05 CIKM 2014 -­‐ Yuto Yamaguchi 26
  • 27.
    Data from TwiXer • Data size – 200K loca'on-­‐known users in Japan • Geocode loca'on profiles into coordinates – 200 tweets for each user (40M in total) – 34M follow edges (for exis'ng methods) • 90% for training; 5% for valida'on; 5% for test 14/11/05 CIKM 2014 -­‐ Yuto Yamaguchi 27
  • 28.
    Inference accuracy 14/11/05 CIKM 2014 -­‐ Yuto Yamaguchi Exis.ng methods 28
  • 29.
    Cost per update Feed 40M tweets in the dataset chronologically 14/11/05 CIKM 2014 -­‐ Yuto Yamaguchi 29 Be]er Variants of ours Exis.ng methods
  • 30.
    Conclusion • Proposed loca'on inference method – online & incremental inference • Constant 'me complexity – exploi'ng spa'otemporal correla'on • BeXer accuracy 14/11/05 CIKM 2014 -­‐ Yuto Yamaguchi 30