6. Applications
• Text Mining
• Identifying similar chapters in a book
• Computer Vision
• Face Recognition
• Colocation Mining
• Identifying forest fires
7. Applications
• Text Mining
• Identifying similar chapters in a book
• Computer Vision
• Face Recognition
• Colocation Mining
• Identifying forest fires
• Music search
• Identifying genre of music based on segment of the song
11. LDA-Overview
• A generative probabilistic model
• Represented as words, documents, corpus and labels
• words - primary unit of discrete data
12. LDA-Overview
• A generative probabilistic model
• Represented as words, documents, corpus and labels
• words - primary unit of discrete data
• document - sequence of words
13. LDA-Overview
• A generative probabilistic model
• Represented as words, documents, corpus and labels
• words - primary unit of discrete data
• document - sequence of words
• corpus - collection of all documents
14. LDA-Overview
• A generative probabilistic model
• Represented as words, documents, corpus and labels
• words - primary unit of discrete data
• document - sequence of words
• corpus - collection of all documents
• label(Output) - class of the document
15. Wait a minute … So how are we going to
perform computer vision applications
using words and documents
16. Wait a minute … So how are we going to
perform computer vision applications
using words and documents
• So here words would represent visual words which
could consist of
17. Wait a minute … So how are we going to
perform computer vision applications
using words and documents
• So here words would represent visual words which
could consist of
• image patches
18. Wait a minute … So how are we going to
perform computer vision applications
using words and documents
• So here words would represent visual words which
could consist of
• image patches
• spatial and temporal interest points
19. Wait a minute … So how are we going to
perform computer vision applications
using words and documents
• So here words would represent visual words which
could consist of
• image patches
• spatial and temporal interest points
• moving pixels etc
20. Wait a minute … So how are we going to
perform computer vision applications
using words and documents
• So here words would represent visual words which
could consist of
• image patches
• spatial and temporal interest points
• moving pixels etc
• The paper takes an example of image classification
in computer vision.
22. Data Preprocessing
• Image is convolved against a series of filters, 3
Gaussians, 4 Laplacians of Gaussians and 4 first
order derivatives of Gaussians
23. Data Preprocessing
• Image is convolved against a series of filters, 3
Gaussians, 4 Laplacians of Gaussians and 4 first
order derivatives of Gaussians
• A grid is used to divide the image into local
patches and the patch is sampled densely for a
particular local descriptor.
24. Data Preprocessing
• Image is convolved against a series of filters, 3
Gaussians, 4 Laplacians of Gaussians and 4 first
order derivatives of Gaussians
• A grid is used to divide the image into local
patches and the patch is sampled densely for a
particular local descriptor.
• The local descriptors of each patch in the entire
image set is clustered using k-means and stored in
an auxiliary data structure(lets call it a “Workbook”).
29. Clustering using LDA
• Framework:
• M documents(images)
• Each document j has Nj words
• wji is the observed value of word i in document j
30. Clustering using LDA
• Framework:
• M documents(images)
• Each document j has Nj words
• wji is the observed value of word i in document j
• All words will be clustered into k topics
31. Clustering using LDA
• Framework:
• M documents(images)
• Each document j has Nj words
• wji is the observed value of word i in document j
• All words will be clustered into k topics
• Each topic k is modeled as a multinomial distribution over the
WorkBook
32. Clustering using LDA
• Framework:
• M documents(images)
• Each document j has Nj words
• wji is the observed value of word i in document j
• All words will be clustered into k topics
• Each topic k is modeled as a multinomial distribution over the
WorkBook
• 𝛼 and β are Dirichlet prior hyperparameters.
33. Clustering using LDA
• Framework:
• M documents(images)
• Each document j has Nj words
• wji is the observed value of word i in document j
• All words will be clustered into k topics
• Each topic k is modeled as a multinomial distribution over the
WorkBook
• 𝛼 and β are Dirichlet prior hyperparameters.
• ɸk, ∏j and zji are hidden variables used.
34. Clustering using LDA(cntd)
• Generative algorithm:
• For a topic k a multinomial parameter ɸk
is sampled from the Dirichlet prior ɸk
~
Dir(β)
• For a document j, a multinomial parameter ∏j
over K topics is sampled from
Dirichlet prior ∏j
~ Dir(𝛼)
• For a word i in document j, a topic label zji
is sampled from the discrete
distribution zji
~ Discrete(∏j
)
• The value wji
of word i in document j is sampled for the discrete distribution of topic
zji
, wji
~ Discrete(ɸzji
)
• zji
is sampled through Gibbs sampling procedure as follows:
• n
(k)
-ji,w
represents number of words in the corpus with value w assigned to topic k
excluding word i in document j
• n
(j)
-ji,k
represents number of words in the document j assigned to topic k excluding
word i in document j
36. What’s the issue with LDA?
• Spatial and Temporal components of the visual
words are not considered. So co-occurence
information is not utilized.
37. What’s the issue with LDA?
• Spatial and Temporal components of the visual
words are not considered. So co-occurence
information is not utilized.
• Consider the scenario where there is a series of
animals with grass as the background. Since we
assume an image to be a document and since the
animal is only a small part of the image, it would
most likely be classified as grass.
39. How can we resolve it?
• Use a grid layout on each image and each region in the grid could
be considered a document.
40. How can we resolve it?
• Use a grid layout on each image and each region in the grid could
be considered a document.
• But how would you handle overlap of a patch between two
regions?
41. How can we resolve it?
• Use a grid layout on each image and each region in the grid could
be considered a document.
• But how would you handle overlap of a patch between two
regions?
• We could use overlapping regions as a document.
42. How can we resolve it?
• Use a grid layout on each image and each region in the grid could
be considered a document.
• But how would you handle overlap of a patch between two
regions?
• We could use overlapping regions as a document.
• But since each overlapping document could contain a patch
how would you decide which of the documents it should belong
to?
43. How can we resolve it?
• Use a grid layout on each image and each region in the grid could
be considered a document.
• But how would you handle overlap of a patch between two
regions?
• We could use overlapping regions as a document.
• But since each overlapping document could contain a patch
how would you decide which of the documents it should belong
to?
• So we could replace each document(region) as a point and if a
patch is closer to a particular point, we could assign it to that
document.
44. • Framework:
• Besides the parameters used in LDA spatial information is also captured
• A hidden variable di indicates the document which word i is assigned to.
• Additionally for each document g
d
j, x
d
j, y
d
j, represents the index, x
coordinate and y coordinate of the document respectively.
• Additionally for each image gi, xi, yi, represents the index, x coordinate
and y coordinate of the image respectively.
• Generative Algorithm:
• For a topic k a multinomial parameter ɸk is sampled from the Dirichlet
prior ɸk ~ Dir(β)
• For a document j, a multinomial parameter ∏j over K topics is sampled
from Dirichlet prior ∏j ~ Dir(𝛼)
• For a word i, a random variable di, is sampled from prior of p(di|η),
indicating document for word i.
Clustering using Spatial LDA
45. Clustering using Spatial
LDA(contd)
• Generative Algorithm:
• Image index and location of word is chosen from distribution p(ci|c
d
di,𝝈). A gaussian kernel is chosen
• For a word j in document di, a topic label zi is sampled from the discrete distribution zji ~ Discrete(∏di)
• The value wi of word i is sampled for the discrete distribution of topic zi, wi ~ Discrete(ɸzi)
• zji is sampled through Gibbs sampling procedure as follows:
• n
(k)
-i,w represents number of words in the corpus with value w assigned to topic k excluding word i and
n
(j)
-i,k represents number of words in the document j assigned to topic k excluding word i
• The conditional distribution of di is represented as follows:
48. What the paper missed
• Comparisons with other standard clustering methods could have been
mentioned to highlight the efficiency of the algorithm.
49. What the paper missed
• Comparisons with other standard clustering methods could have been
mentioned to highlight the efficiency of the algorithm.
• For the given experimental data an intuition on the selection of input
parameters 𝛼,β and η could have been provided.
50. What the paper missed
• Comparisons with other standard clustering methods could have been
mentioned to highlight the efficiency of the algorithm.
• For the given experimental data an intuition on the selection of input
parameters 𝛼,β and η could have been provided.
• In case of moving images, the temporal aspect of the images are ignored.
In future, this could be considered as a parameter and the algorithm could
be updated.
51. What the paper missed
• Comparisons with other standard clustering methods could have been
mentioned to highlight the efficiency of the algorithm.
• For the given experimental data an intuition on the selection of input
parameters 𝛼,β and η could have been provided.
• In case of moving images, the temporal aspect of the images are ignored.
In future, this could be considered as a parameter and the algorithm could
be updated.
• Few Advancements were made in the paper:
52. What the paper missed
• Comparisons with other standard clustering methods could have been
mentioned to highlight the efficiency of the algorithm.
• For the given experimental data an intuition on the selection of input
parameters 𝛼,β and η could have been provided.
• In case of moving images, the temporal aspect of the images are ignored.
In future, this could be considered as a parameter and the algorithm could
be updated.
• Few Advancements were made in the paper:
• James Philbin, Josef Sivic and Andrew Zisserman. Geometric Latent
Dirichlet Allocation on a Matching Graph for Large-scale Image
Datasets, International Journal of Computer Vision, Volume 95, Number
2, page 138--153, nov 2011
58. References
• Wang, Xiaogang and Eric Grimson. Spatial Latent
Dirichlet Allocation. Advances in Neural Information
Processing Systems 20 (NIPS 2007)
• D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent
dirichlet allocation. Journal of Machine Learning
Research, 3:993–1022, 2003.
• Diane J. Hu. Latent Dirichlet Allocation for Text,
Images, and Music.