Detecting Non-Gaussian Geographical Topics in Tagged Photo Collections

331 views
244 views

Published on

Nowadays, large collections of photos are tagged with GPS coordinates. The modelling of such large geo-tagged corpora
is an important problem in data mining and information re-
trieval, and involves the use of geographical information to
detect topics with a spatial component. In this paper, we
propose a novel geographical topic model which captures
dependencies between geographical regions to support the
detection of topics with complex, non-Gaussian distributed
spatial structures. The model is based on a multi-Dirichlet
process (MDP), a novel generalisation of the hierarchical
Dirichlet process extended to support multiple base distributions. Our method thus is called the MDP-based geographical topic model (MGTM). We show how to use a MDP
to dynamically smooth topic distributions between groups
of spatially adjacent documents. In systematic quantitative
and qualitative evaluations using independent datasets from
prior related work, we show that such a model can exploit
the adjacency of regions and leads to a significant improvement in the quality of topics compared to the state of the
art in geographical topic modelling.

Published in: Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
331
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
2
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Detecting Non-Gaussian Geographical Topics in Tagged Photo Collections

  1. 1. Institute for Web Science & Technologies University of Koblenz ▪ Landau, Germany Detecting Non-Gaussian Geographical Topics in Tagged Photo Collections Christoph Carl Kling, Jérôme Kunegis, Sergej Sizov, Steffen Staab
  2. 2. Detecting Non-Gaussian Geographical Topics 2Christoph Carl Kling Outline 1) Motivation 2) Existing approaches 3) Our approach 4) Evaluation
  3. 3. Detecting Non-Gaussian Geographical Topics 3Christoph Carl Kling Motivation
  4. 4. Detecting Non-Gaussian Geographical Topics 4Christoph Carl Kling Topics in topic modelling: Latent variables that explain the co-occurrence of words in documents.
  5. 5. Detecting Non-Gaussian Geographical Topics 5Christoph Carl Kling Topics in topic modelling: Latent variables that explain the co-occurrence of words in documents. Geographical topics: Latent variables that explain the co-occurrence of words both in documents and in the geographical space.
  6. 6. Detecting Non-Gaussian Geographical Topics 6Christoph Carl Kling fish, rice seafood, fish seafood, shrimp lobster, wine seafood, fish, salmon fish, salmon, wine seafood, shrimp lobster, seafood, shrimp coffee coffee, wine coffee wine wine pizza, wine pizza, wine pasta, wine pasta, shrimp lobster, shrimp seafood, shrimp Tagged photographies with geo-coordinates
  7. 7. Detecting Non-Gaussian Geographical Topics 7Christoph Carl Kling fish, rice seafood, fish seafood, shrimp lobster, wine seafood, fish, salmon fish, salmon, wine seafood, shrimp lobster, seafood, shrimp coffee coffee, wine coffee italian, wine wine pizza, wine italian, pizza, wine pasta, wine pasta, shrimp seafood fish lobster shrimp crab wine salmon wine pizza coffee italian pasta seafood, shrimp lobster, shrimp
  8. 8. Detecting Non-Gaussian Geographical Topics 8Christoph Carl Kling Existing Approaches
  9. 9. Detecting Non-Gaussian Geographical Topics 9Christoph Carl Kling fish, rice lobster, shrimp seafood, fish seafood, shrimp lobster, wine seafood, fish, salmon seafood, shrimp fish, salmon, wine seafood, shrimp lobster, seafood, shrimp coffee coffee, wine coffee italian, wine wine pizza, wine italian, pizza, wine pasta, wine pasta, shrimp shrimp fish rice seafood lobster wine pizza coffee italian pasta fish seafood salmon shrimp wine seafood shrimp lobster lobster seafood fish salmon wineGeoFolk, S. Sizov 2010
  10. 10. Detecting Non-Gaussian Geographical Topics 10Christoph Carl Kling fish, rice lobster, shrimp seafood, fish seafood, shrimp lobster, wine seafood, fish, salmon seafood, shrimp fish, salmon, wine seafood, shrimp lobster, seafood, shrimp coffee coffee, wine coffee italian, wine wine pizza, wine italian, pizza, wine pasta, wine pasta, shrimp seafood fish lobster shrimp crab wine salmon wine pizza coffee italian pasta LGTA, Z. Yin et al., 2011
  11. 11. Detecting Non-Gaussian Geographical Topics 11Christoph Carl Kling fish, rice lobster, shrimp seafood, fish seafood, shrimp lobster, wine seafood, fish, salmon seafood, shrimp fish, salmon, wine seafood, shrimp lobster, seafood, shrimp coffee coffee, wine coffee italian, wine wine pizza, wine italian, pizza, wine pasta, wine pasta, shrimp seafood fish lobster shrimp crab wine salmon wine pizza coffee italian pasta A. Ahmed, L. Hong and A. Smola, 2013
  12. 12. Detecting Non-Gaussian Geographical Topics 12Christoph Carl Kling Our Approach
  13. 13. Detecting Non-Gaussian Geographical Topics 13Christoph Carl Kling Cultural areas, country borders, geographical features and other geographical observations exhibit complex spatial distributions wikipedia.org
  14. 14. Detecting Non-Gaussian Geographical Topics 17Christoph Carl Kling fish, rice lobster, shrimp seafood, fish seafood, shrimp lobster, wine seafood, fish, salmon seafood, shrimp fish, salmon, wine seafood, shrimp lobster, seafood, shrimp coffee coffee, wine coffee italian, wine wine pizza, wine italian, pizza, wine pasta, wine pasta, shrimp Clustering: E.g. mixture of Gaussian/Fisher distributions
  15. 15. Detecting Non-Gaussian Geographical Topics 18Christoph Carl Kling fish, rice lobster, shrimp seafood, fish seafood, shrimp lobster, wine seafood, fish, salmon seafood, shrimp fish, salmon, wine seafood, shrimp lobster, seafood, shrimp coffee coffee, wine coffee italian, wine wine pizza, wine italian, pizza, wine pasta, wine pasta, shrimp seafood fish lobster shrimp crab wine salmon wine pizza coffee italian pasta
  16. 16. Detecting Non-Gaussian Geographical Topics 19Christoph Carl Kling
  17. 17. Detecting Non-Gaussian Geographical Topics 20Christoph Carl Kling Adjacency: Delaunay triangulation K-NN …
  18. 18. Detecting Non-Gaussian Geographical Topics 21Christoph Carl Kling fish, rice lobster, shrimp seafood, fish seafood, shrimp lobster, wine seafood, fish, salmon seafood, shrimp fish, salmon, wine seafood, shrimp lobster, seafood, shrimp coffee coffee, wine coffee italian, wine wine pizza, wine italian, pizza, wine pasta, wine pasta, shrimp seafood fish lobster shrimp crab wine salmon wine pizza coffee italian pasta
  19. 19. Detecting Non-Gaussian Geographical Topics 22Christoph Carl Kling Cluster adjacency Dependencies of document- specific topic distributions Exchange of topic information between clusters
  20. 20. Detecting Non-Gaussian Geographical Topics 23Christoph Carl Kling Exchange of topic information between clusters
  21. 21. Detecting Non-Gaussian Geographical Topics 24Christoph Carl Kling Exchange of topic information between clusters
  22. 22. Detecting Non-Gaussian Geographical Topics 25Christoph Carl Kling Exchange of topic information between clusters
  23. 23. Detecting Non-Gaussian Geographical Topics 26Christoph Carl Kling Exchange of topic information between clusters
  24. 24. Detecting Non-Gaussian Geographical Topics 27Christoph Carl Kling γ M N L H G G α0 G Al j 0 θjn w η s d l δl L: #regions M: #documents in cluster N: #words in document G :⁰ Global topic distribution G : Cluster-topic distribution G : Document-topic distribution s d MGTM
  25. 25. Detecting Non-Gaussian Geographical Topics 28Christoph Carl Kling Evaluation
  26. 26. Detecting Non-Gaussian Geographical Topics 29Christoph Carl Kling Datasets Activities: 1.931 photos Landscape: 5.791 photos Manhattan: 28.922 photos Car: 34.707 photos Food: 151.747 photos LGTA, Z. Yin et al., 2011
  27. 27. Detecting Non-Gaussian Geographical Topics 30Christoph Carl Kling Compared models: - LGTA: Model with regions - Basic model: 3-level Hierarchical Dirichlet Process - MGTM: Basic model plus dynamically smoothed adjacent regions
  28. 28. Detecting Non-Gaussian Geographical Topics 31Christoph Carl Kling manhattan (100 regions) landscape (200 regions) activities (300 regions) car (500 regions) food (1000 regions) Word Perplexity
  29. 29. Detecting Non-Gaussian Geographical Topics 32Christoph Carl Kling User Study Food dataset (1000 regions) 31 participants Task: intrusion detection Measure: precision 4 topics avg / median 6 topics avg / median 8 topics avg / median LGTA 0.67 / 0.64 0.57 / 0.57 0.60 / 0.58 Basic model 0.45 / 0.57 0.63 / 0.61 0.64 / 0.58 MGTM 0.79 / 0.80 0.82 / 0.81 0.78 / 0.75
  30. 30. Detecting Non-Gaussian Geographical Topics 33Christoph Carl Kling west.uni-koblenz.de Research → systems → MGTM west.uni-koblenz.de liveandgov.eu
  31. 31. Detecting Non-Gaussian Geographical Topics 34Christoph Carl Kling Thank you! Questions? Contact: c@c-kling.de
  32. 32. Detecting Non-Gaussian Geographical Topics 35Christoph Carl Kling Summary • Geographical topics often exhibit a complex spatial distribution • The detection of such complex topics can be supported • The dynamic smoothing of adjacent regions leads to an evolutionary creation and spread of topics during inference
  33. 33. Detecting Non-Gaussian Geographical Topics 36Christoph Carl Kling ReferencesReferences Hierarchical Dirichlet processes by: Y. W. Teh, M. I. Jordan, M. J. Beal, and D. M. Blei In: Journal of the American Statistical Association, Vol. 101 (2006) , p. 1566-1581. GeoFolk: latent spatial semantics in web 2.0 social media. by: Sergej Sizov In: WSDM ACM (2010) , p. 281-290. Geographical topic discovery and comparison. by: Zhijun Yin, Liangliang Cao, Jiawei Han, Chengxiang Zhai, and Thomas S. Huang In: WWW ACM (2011) , p. 247-256. A Nonparametric Bayesian Model of Multi-Level Category Learning. by: Kevin Robert Canini, and Thomas L. Griffiths In: AAAI AAAI Press (2011) .

×