Spatial LDA

Spatial Latent Dirichlet
Allocation

Spatial Latent Dirichlet
Allocation
Authors : - Xiaogang Wang and Eric Crimson
Review By: George Mathew(george2)

Applications
• Text Mining
• Identifying similar chapters in a book

Applications
• Text Mining
• Computer Vision
• Face Recognition

Applications
• Text Mining
• Computer Vision
• Colocation Mining
• Identifying forest ﬁres

Applications
• Text Mining
• Computer Vision
• Colocation Mining
• Identifying forest ﬁres
• Music search
• Identifying genre of music based on segment of the song

LDA-Overview
• A generative probabilistic model

LDA-Overview
• Represented as words, documents, corpus and labels

LDA-Overview
• words - primary unit of discrete data

LDA-Overview
• document - sequence of words

LDA-Overview
• corpus - collection of all documents

LDA-Overview
• corpus - collection of all documents
• label(Output) - class of the document

Wait a minute … So how are we going to
perform computer vision applications
using words and documents

• So here words would represent visual words which
could consist of

could consist of
• image patches

could consist of
• image patches
• spatial and temporal interest points

could consist of
• image patches
• moving pixels etc

could consist of
• image patches
• moving pixels etc
• The paper takes an example of image classiﬁcation
in computer vision.

Data Preprocessing
• Image is convolved against a series of ﬁlters, 3
Gaussians, 4 Laplacians of Gaussians and 4 ﬁrst
order derivatives of Gaussians

Data Preprocessing
• A grid is used to divide the image into local
patches and the patch is sampled densely for a
particular local descriptor.

Data Preprocessing
• A grid is used to divide the image into local
patches and the patch is sampled densely for a
particular local descriptor.
• The local descriptors of each patch in the entire
image set is clustered using k-means and stored in
an auxiliary data structure(lets call it a “Workbook”).

Clustering using LDA
• Framework:

• Framework:
• M documents(images)

• Framework:
• Each document j has Nj words

• Framework:
• wji is the observed value of word i in document j

• Framework:
• All words will be clustered into k topics

• Framework:
• Each topic k is modeled as a multinomial distribution over the
WorkBook

• Framework:
WorkBook
• 𝛼 and β are Dirichlet prior hyperparameters.

• Framework:
WorkBook
• 𝛼 and β are Dirichlet prior hyperparameters.
• ɸk, ∏j and zji are hidden variables used.

Clustering using LDA(cntd)
• Generative algorithm:
• For a topic k a multinomial parameter ɸk
is sampled from the Dirichlet prior ɸk
~
Dir(β)
• For a document j, a multinomial parameter ∏j
over K topics is sampled from
Dirichlet prior ∏j
~ Dir(𝛼)
• For a word i in document j, a topic label zji
is sampled from the discrete
distribution zji
~ Discrete(∏j
)
• The value wji
of word i in document j is sampled for the discrete distribution of topic
zji
, wji
~ Discrete(ɸzji
)
• zji
is sampled through Gibbs sampling procedure as follows: 
 
 
• n
(k)
-ji,w
represents number of words in the corpus with value w assigned to topic k
excluding word i in document j
• n
(j)
-ji,k
represents number of words in the document j assigned to topic k excluding
word i in document j

What’s the issue with LDA?
• Spatial and Temporal components of the visual
words are not considered. So co-occurence
information is not utilized.

What’s the issue with LDA?
• Spatial and Temporal components of the visual
words are not considered. So co-occurence
information is not utilized.
• Consider the scenario where there is a series of
animals with grass as the background. Since we
assume an image to be a document and since the
animal is only a small part of the image, it would
most likely be classiﬁed as grass.

How can we resolve it?
• Use a grid layout on each image and each region in the grid could
be considered a document.

• But how would you handle overlap of a patch between two
regions?

regions?
• We could use overlapping regions as a document.

regions?
• But since each overlapping document could contain a patch
how would you decide which of the documents it should belong
to?

regions?
• But since each overlapping document could contain a patch
how would you decide which of the documents it should belong
to?
• So we could replace each document(region) as a point and if a
patch is closer to a particular point, we could assign it to that
document.

• Framework:
• Besides the parameters used in LDA spatial information is also captured
• A hidden variable di indicates the document which word i is assigned to.
• Additionally for each document g
d
j, x
d
j, y
d
j, represents the index, x
coordinate and y coordinate of the document respectively.
• Additionally for each image gi, xi, yi, represents the index, x coordinate
and y coordinate of the image respectively.
• Generative Algorithm:
• For a topic k a multinomial parameter ɸk is sampled from the Dirichlet
prior ɸk ~ Dir(β)
• For a document j, a multinomial parameter ∏j over K topics is sampled
from Dirichlet prior ∏j ~ Dir(𝛼)
• For a word i, a random variable di, is sampled from prior of p(di|η),
indicating document for word i.
Clustering using Spatial LDA

Clustering using Spatial
LDA(contd)
• Generative Algorithm:
• Image index and location of word is chosen from distribution p(ci|c
d
di,𝝈). A gaussian kernel is chosen 
 
• For a word j in document di, a topic label zi is sampled from the discrete distribution zji ~ Discrete(∏di)
• The value wi of word i is sampled for the discrete distribution of topic zi, wi ~ Discrete(ɸzi)
• zji is sampled through Gibbs sampling procedure as follows: 
 
 
• n
(k)
-i,w represents number of words in the corpus with value w assigned to topic k excluding word i and  
 
n
(j)
-i,k represents number of words in the document j assigned to topic k excluding word i
• The conditional distribution of di is represented as follows:

Results
cows cars faces bicycles
LDA(D) 0.376 0.555 0.717 0.556
SLDA(D) 0.566 0.684 0.697 0.566
LDA(FA) 0.558 0.396 0.586 0.529
SLDA(FA) 0.033 0.244 0.371 0.422

What the paper missed
• Comparisons with other standard clustering methods could have been
mentioned to highlight the efﬁciency of the algorithm.

• For the given experimental data an intuition on the selection of input
parameters 𝛼,β and η could have been provided.

• In case of moving images, the temporal aspect of the images are ignored.
In future, this could be considered as a parameter and the algorithm could
be updated.

be updated.
• Few Advancements were made in the paper:

be updated.
• Few Advancements were made in the paper:
• James Philbin, Josef Sivic and Andrew Zisserman. Geometric Latent
Dirichlet Allocation on a Matching Graph for Large-scale Image
Datasets, International Journal of Computer Vision, Volume 95, Number
2, page 138--153, nov 2011

Libraries for LDA
• R - “lda” - http://cran.r-project.org/web/packages/
lda/lda.pdf

Libraries for LDA
lda/lda.pdf
• Python - lda v1.0.2 - https://pypi.python.org/pypi/
lda

Libraries for LDA
lda/lda.pdf
• Python - lda v1.0.2 - https://pypi.python.org/pypi/
lda
• Java - GibbsLDA - http://gibbslda.sourceforge.net/

References
• Wang, Xiaogang and Eric Grimson. Spatial Latent
Dirichlet Allocation. Advances in Neural Information
Processing Systems 20 (NIPS 2007)
• D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent
dirichlet allocation. Journal of Machine Learning
Research, 3:993–1022, 2003.
• Diane J. Hu. Latent Dirichlet Allocation for Text,
Images, and Music.

Spatial LDA

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Spatial LDA

Similar to Spatial LDA (20)

Recently uploaded

Recently uploaded (20)

Spatial LDA