Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Online User Location Inference Exploiting Spatiotemporal Correlations in Social Streams

2,245 views

Published on

Talk at CIKM2014

Published in: Technology
  • Be the first to comment

Online User Location Inference Exploiting Spatiotemporal Correlations in Social Streams

  1. 1. Online User Loca.on Inference Exploi'ng Spa'otemporal Correla'ons in Social Streams Yuto Yamaguchi†, Toshiyuki Amagasa†, Hiroyuki Kitagawa†, and Yohei Ikawa‡ † University of Tsukuba ‡ IBM Research -­‐ Tokyo 14/11/05 CIKM 2014 -­‐ Yuto Yamaguchi 1
  2. 2. Tweets that help us 14/11/05 CIKM 2014 -­‐ Yuto Yamaguchi 2 Shaked !!! Thunder We can infer your home loca'on immediately
  3. 3. Social and Loca'on 14/11/05 CIKM 2014 -­‐ Yuto Yamaguchi 3 • Lots of social media users • Frequent updates • User home loca'ons
  4. 4. Loca'on-­‐based Applica'ons 14/11/05 CIKM 2014 -­‐ Yuto Yamaguchi 4 • Event Detec'on • Loca'on-­‐based Marke'ng • Epidemics Analysis
  5. 5. Lack of home loca'ons 14/11/05 CIKM 2014 -­‐ Yuto Yamaguchi 5 Most users do not disclose their home loca'ons • 74% of TwiXer users [Cheng+, 10] • 94% of Facebooks users [Backstrom+, 10]
  6. 6. Our Objec've 14/11/05 CIKM 2014 -­‐ Yuto Yamaguchi 6 To infer home loca'ons of social media users
  7. 7. Focus & Contribu'ons 14/11/05 CIKM 2014 -­‐ Yuto Yamaguchi 7 〜〜〜〜〜〜〜〜 〜〜〜〜〜〜〜〜 Time 〜〜〜〜〜〜〜〜 Our Focus Social contents are not sta-c, but like a stream Our Contribu.ons 1. Online & Incremental Inference 2. Exploi'ng Spa'otemporal features
  8. 8. Contribu'on 1 ONLINE & INCREMENTAL INFERENCE 14/11/05 CIKM 2014 -­‐ Yuto Yamaguchi 8
  9. 9. Exis'ng methods: Batch inference Batch Input Inference Results 14/11/05 CIKM 2014 -­‐ Yuto Yamaguchi 9 Exis'ng Methods Perform batch inference just once acer “enough data” is stored è Can’t update the results L è What is “enough”? L è When will it be enough? L
  10. 10. Our method: Online & incremental inference method Social Stream Inference Results 14/11/05 CIKM 2014 -­‐ Yuto Yamaguchi 10 Online & incremental method Perform loca'on inference every 'me new post arrives è Can keep the results up to date J
  11. 11. Contribu'on 2 EXPLOITING SPATIOTEMPORAL FEATURES 14/11/05 CIKM 2014 -­‐ Yuto Yamaguchi 11
  12. 12. Local words 14/11/05 CIKM 2014 -­‐ Yuto Yamaguchi 12 地震だ! Steelers! Home loca.on known Local words: strongly correlated to a specific loca.on Steelers! Home loca.on unknown Infer PiXsburgh?
  13. 13. Exis'ng methods: Only sta'c features 14/11/05 CIKM 2014 -­‐ Yuto Yamaguchi 13 地震だ! Thunder! Home loca'on known Thunder! Home loca'on unknown “Thunderbolt” is not a local word sta'cally Thunder! Thunder! Thunder! Home loca'on known Home loca'on known Home loca'on known è Can’t u.lize this word L Can’t infer
  14. 14. Our method: Spa'otemporal correla'on 14/11/05 CIKM 2014 -­‐ Yuto Yamaguchi 14 地震だ! Thunder! Home loca'on known Thunder! Home loca'on unknown “Thunderbolt” can be a local word temporally è Our method can u.lize this word J In a specific .me period Can infer
  15. 15. OLIM: Online Loca'on Inference Method PROPOSED METHOD 14/11/05 CIKM 2014 -­‐ Yuto Yamaguchi 15
  16. 16. The Algorithm 14/11/05 CIKM 2014 -­‐ Yuto Yamaguchi 16 1. divideMap() 2. calcPopula'onDistribu'on() 3. for post p from SocialStream 4. user u <-­‐ getUser(p) 5. if u is loca'on-­‐known 6. updateLocalWords(p) 7. else 8. updateUserLoca'on(u,p) . Preprocessing Main Slide 17 Slide 20 Slide 24
  17. 17. divideMap 14/11/05 CIKM 2014 -­‐ Yuto Yamaguchi Each region is treated as a categorical loca'on 17 Quadtree decomposi.on L = l1, l2,…, lK { } Loca.on inference is reduced to a classifica.on problem
  18. 18. Popula'on distribu'on 14/11/05 CIKM 2014 -­‐ Yuto Yamaguchi 18 … l1 l2 l3 l4 l5 lK What frac.on of loca.on-­‐known users live in each loca.on Used for local words extrac.on Categorical distribu'on
  19. 19. The Algorithm 14/11/05 CIKM 2014 -­‐ Yuto Yamaguchi 19 1. divideMap() 2. calcPopula'onDistribu'on() 3. for post p from SocialStream 4. user u <-­‐ getUser(p) 5. if u is loca'on-­‐known 6. updateLocalWords(p) 7. else 8. updateUserLoca'on(u,p) . Preprocessing Main Slide 17 Slide 20 Slide 24
  20. 20. updateLocalWords: Sliding window and word distribu'on 14/11/05 CIKM 2014 -­‐ Yuto Yamaguchi 20 Sliding window with length N e.g.) N = 5 Word distribu'on … … l1 l2 l3 l4 l5 lK Where the word posted from?
  21. 21. updateLocalWords: Local Word Intui'on 14/11/05 CIKM 2014 -­‐ Yuto Yamaguchi 21 … … … l1 l2 l3 l4 l5 lK … … l1 l2 l3 l4 l5 lK Popula'on distribu'on Word distribu'on Word distribu'on KL Divergence small Local word … l1 l2 l3 l4 l5 lK Detail
  22. 22. updateLocalWords: Online upda'ng Window length N is 14/11/05 CIKM 2014 -­‐ Yuto Yamaguchi 22 Detail fixed We can update KL in O(1) every .me new post arrives J
  23. 23. The Algorithm 14/11/05 CIKM 2014 -­‐ Yuto Yamaguchi 23 1. divideMap() 2. calcPopula'onDistribu'on() 3. for post p from SocialStream 4. user u <-­‐ getUser(p) 5. if u is loca'on-­‐known 6. updateLocalWords(p) 7. else 8. updateUserLoca'on(u,p) . Preprocessing Main Slide 17 Slide 20 Slide 24
  24. 24. updateUserLoca'on: user distribu'on 14/11/05 CIKM 2014 -­‐ Yuto Yamaguchi 24 … l1 l2 l3 l4 l5 lK Denotes how likely this user lives in each loca'on u User distribu'on of u
  25. 25. updateUserLoca'on: update Word distribu'on of w … w Detail 14/11/05 CIKM 2014 -­‐ Yuto Yamaguchi posterior 25 … prior l1 l2 l3 l4 l5 lK … update l1 l2 l3 l4 l5 lK If user u posts local word w: … l1 l2 l3 l4 l5 lK Dirichlet-­‐Mul.nomial Compound for Bayesian updates
  26. 26. Accuracy & Costs EXPERIMENTS 14/11/05 CIKM 2014 -­‐ Yuto Yamaguchi 26
  27. 27. Data from TwiXer • Data size – 200K loca'on-­‐known users in Japan • Geocode loca'on profiles into coordinates – 200 tweets for each user (40M in total) – 34M follow edges (for exis'ng methods) • 90% for training; 5% for valida'on; 5% for test 14/11/05 CIKM 2014 -­‐ Yuto Yamaguchi 27
  28. 28. Inference accuracy 14/11/05 CIKM 2014 -­‐ Yuto Yamaguchi Exis.ng methods 28
  29. 29. Cost per update Feed 40M tweets in the dataset chronologically 14/11/05 CIKM 2014 -­‐ Yuto Yamaguchi 29 Be]er Variants of ours Exis.ng methods
  30. 30. Conclusion • Proposed loca'on inference method – online & incremental inference • Constant 'me complexity – exploi'ng spa'otemporal correla'on • BeXer accuracy 14/11/05 CIKM 2014 -­‐ Yuto Yamaguchi 30

×