20 cv mil_models_for_words

551 views

Published on

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
551
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
7
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

20 cv mil_models_for_words

  1. 1. Computer vision: models, learning and inference Chapter 20 Models for visual words Please send errata to s.prince@cs.ucl.ac.uk
  2. 2. Visual words• Most models treat data as continuous• Likelihood based on normal distribution• Visual words = discrete representation of image• Likelihood based on categorical distribution• Useful for difficult tasks such as scene recognition and object recognition Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 2
  3. 3. Motivation: scene recognition Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 3
  4. 4. Structure• Computing visual words• Bag of words model• Latent Dirichlet allocation• Single author-topic model• Constellation model• Scene model• Applications Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 4
  5. 5. Computing dictionary of visual words1. For every one of the I training images, select a set of Ji spatial locations. • Interest points • Regular grid2. Compute a descriptor at each spatial location in each image3. Cluster all of these descriptor vectors into K groups using a method such as the K-Means algorithm4. The means of the K clusters are used as the K prototype vectors in the dictionary. Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 5
  6. 6. Encoding images as visual words1. Select a set of J spatial locations in the image using the same method as for the dictionary2. Compute the descriptor at each of the J spatial locations.3. Compare each descriptor to the set of K prototype descriptors in the dictionary4. Assign a discrete index to this location that corresponds to the index of the closest word in the dictionary.End result: Discrete feature index x and y position Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 6
  7. 7. Structure• Computing visual words• Bag of words model• Latent Dirichlet allocation• Single author-topic model• Constellation model• Scene model• Applications Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 7
  8. 8. Bag of words modelKey idea:• Abandon all spatial information• Just represent image by relative frequency (histogram) of words from dictionary where Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 8
  9. 9. Bag of wordsComputer vision: models, learning and inference. ©2011 Simon J.D. Prince 9
  10. 10. StructureLearning (MAP solution):Inference: Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 10
  11. 11. Bag of words for object recognition Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 11
  12. 12. Problems with bag of words Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 12
  13. 13. Structure• Computing visual words• Bag of words model• Latent Dirichlet allocation• Single author-topic model• Constellation model• Scene model• Applications Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 13
  14. 14. Latent Dirichlet allocation• Describes relative frequency of visual words in a single image (no world term)• Words not generated independently (connected by hidden variable)• Analogy to text documents – Each image contains mixture of several topics (parts) – Each topic induces a distribution over words Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 14
  15. 15. Latent Dirichlet allocation Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 15
  16. 16. Latent Dirichlet allocationGenerative equationsMarginal distribution over featuresConjugate priors over parameters Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 16
  17. 17. Latent Dirichlet allocation Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 17
  18. 18. Learning LDA model• Part labels p hidden variables• If we knew them then it would be easy to estimate the parameters• How about EM algorithm? Unfortunately, parts within in image not independent Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 18
  19. 19. Latent Dirichlet allocation Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 19
  20. 20. LearningStrategy:1. Write an expression for posterior distribution over part labels2. Draw samples from posterior using MCMC3. Use samples to estimate parameters Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 20
  21. 21. 1. Posterior over part labels Denominator intractableCan compute two terms in numerator in closed form Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 21
  22. 22. 2. Draw samples from posteriorGibbs’ sampling: fix all part labels except one and samplefrom conditional distributionThis can be computed in closed form Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 22
  23. 23. 3. Use samples to estimate parametersSamples substitute in for real part labels in updateequations Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 23
  24. 24. Structure• Computing visual words• Bag of words model• Latent Dirichlet allocation• Single author-topic model• Constellation model• Scene model• Applications Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 24
  25. 25. Single author topic model Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 25
  26. 26. Single author-topic model Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 26
  27. 27. Learning1. Posterior over part labelsLikelihood same as before, prior becomes Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 27
  28. 28. Learning2. Draw samples from posterior3. Use samples to estimate parameters Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 28
  29. 29. InferenceLikelihood that words in this image are due tocategory nCompute posterior over categories Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 29
  30. 30. Structure• Computing visual words• Bag of words model• Latent Dirichlet allocation• Single author-topic model• Constellation model• Scene model• Applications Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 30
  31. 31. Problems with bag of words Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 31
  32. 32. Constellation modelComputer vision: models, learning and inference. ©2011 Simon J.D. Prince 32
  33. 33. Constellation modelComputer vision: models, learning and inference. ©2011 Simon J.D. Prince 33
  34. 34. Learning1. Posterior over part labelsPrior same as before, likelihood becomes Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 34
  35. 35. Learning2. Draw samples from posterior3. Use samples to estimate parameters Part and word probabilities as before Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 35
  36. 36. InferenceLikelihood that words in this image are due tocategory nCompute posterior over categories Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 36
  37. 37. LearningComputer vision: models, learning and inference. ©2011 Simon J.D. Prince 37
  38. 38. Structure• Computing visual words• Bag of words model• Latent Dirichlet allocation• Single author-topic model• Constellation model• Scene model• Applications Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 38
  39. 39. Problems with bag of words Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 39
  40. 40. Scene modelComputer vision: models, learning and inference. ©2011 Simon J.D. Prince 40
  41. 41. Scene modelComputer vision: models, learning and inference. ©2011 Simon J.D. Prince 41
  42. 42. Structure• Computing visual words• Bag of words model• Latent Dirichlet allocation• Single author-topic model• Constellation model• Scene model• Applications Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 42
  43. 43. Video GoogleComputer vision: models, learning and inference. ©2011 Simon J.D. Prince 43
  44. 44. Action recognitionSpatio-temporal bag of words model 91.8% classification Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 44
  45. 45. Action recognitionComputer vision: models, learning and inference. ©2011 Simon J.D. Prince 45

×