• Like
20 cv mil_models_for_words
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

20 cv mil_models_for_words

  • 287 views
Published

 

Published in Technology , Business
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
287
On SlideShare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
4
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Computer vision: models, learning and inference Chapter 20 Models for visual words Please send errata to s.prince@cs.ucl.ac.uk
  • 2. Visual words• Most models treat data as continuous• Likelihood based on normal distribution• Visual words = discrete representation of image• Likelihood based on categorical distribution• Useful for difficult tasks such as scene recognition and object recognition Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 2
  • 3. Motivation: scene recognition Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 3
  • 4. Structure• Computing visual words• Bag of words model• Latent Dirichlet allocation• Single author-topic model• Constellation model• Scene model• Applications Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 4
  • 5. Computing dictionary of visual words1. For every one of the I training images, select a set of Ji spatial locations. • Interest points • Regular grid2. Compute a descriptor at each spatial location in each image3. Cluster all of these descriptor vectors into K groups using a method such as the K-Means algorithm4. The means of the K clusters are used as the K prototype vectors in the dictionary. Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 5
  • 6. Encoding images as visual words1. Select a set of J spatial locations in the image using the same method as for the dictionary2. Compute the descriptor at each of the J spatial locations.3. Compare each descriptor to the set of K prototype descriptors in the dictionary4. Assign a discrete index to this location that corresponds to the index of the closest word in the dictionary.End result: Discrete feature index x and y position Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 6
  • 7. Structure• Computing visual words• Bag of words model• Latent Dirichlet allocation• Single author-topic model• Constellation model• Scene model• Applications Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 7
  • 8. Bag of words modelKey idea:• Abandon all spatial information• Just represent image by relative frequency (histogram) of words from dictionary where Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 8
  • 9. Bag of wordsComputer vision: models, learning and inference. ©2011 Simon J.D. Prince 9
  • 10. StructureLearning (MAP solution):Inference: Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 10
  • 11. Bag of words for object recognition Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 11
  • 12. Problems with bag of words Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 12
  • 13. Structure• Computing visual words• Bag of words model• Latent Dirichlet allocation• Single author-topic model• Constellation model• Scene model• Applications Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 13
  • 14. Latent Dirichlet allocation• Describes relative frequency of visual words in a single image (no world term)• Words not generated independently (connected by hidden variable)• Analogy to text documents – Each image contains mixture of several topics (parts) – Each topic induces a distribution over words Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 14
  • 15. Latent Dirichlet allocation Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 15
  • 16. Latent Dirichlet allocationGenerative equationsMarginal distribution over featuresConjugate priors over parameters Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 16
  • 17. Latent Dirichlet allocation Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 17
  • 18. Learning LDA model• Part labels p hidden variables• If we knew them then it would be easy to estimate the parameters• How about EM algorithm? Unfortunately, parts within in image not independent Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 18
  • 19. Latent Dirichlet allocation Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 19
  • 20. LearningStrategy:1. Write an expression for posterior distribution over part labels2. Draw samples from posterior using MCMC3. Use samples to estimate parameters Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 20
  • 21. 1. Posterior over part labels Denominator intractableCan compute two terms in numerator in closed form Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 21
  • 22. 2. Draw samples from posteriorGibbs’ sampling: fix all part labels except one and samplefrom conditional distributionThis can be computed in closed form Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 22
  • 23. 3. Use samples to estimate parametersSamples substitute in for real part labels in updateequations Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 23
  • 24. Structure• Computing visual words• Bag of words model• Latent Dirichlet allocation• Single author-topic model• Constellation model• Scene model• Applications Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 24
  • 25. Single author topic model Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 25
  • 26. Single author-topic model Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 26
  • 27. Learning1. Posterior over part labelsLikelihood same as before, prior becomes Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 27
  • 28. Learning2. Draw samples from posterior3. Use samples to estimate parameters Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 28
  • 29. InferenceLikelihood that words in this image are due tocategory nCompute posterior over categories Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 29
  • 30. Structure• Computing visual words• Bag of words model• Latent Dirichlet allocation• Single author-topic model• Constellation model• Scene model• Applications Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 30
  • 31. Problems with bag of words Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 31
  • 32. Constellation modelComputer vision: models, learning and inference. ©2011 Simon J.D. Prince 32
  • 33. Constellation modelComputer vision: models, learning and inference. ©2011 Simon J.D. Prince 33
  • 34. Learning1. Posterior over part labelsPrior same as before, likelihood becomes Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 34
  • 35. Learning2. Draw samples from posterior3. Use samples to estimate parameters Part and word probabilities as before Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 35
  • 36. InferenceLikelihood that words in this image are due tocategory nCompute posterior over categories Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 36
  • 37. LearningComputer vision: models, learning and inference. ©2011 Simon J.D. Prince 37
  • 38. Structure• Computing visual words• Bag of words model• Latent Dirichlet allocation• Single author-topic model• Constellation model• Scene model• Applications Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 38
  • 39. Problems with bag of words Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 39
  • 40. Scene modelComputer vision: models, learning and inference. ©2011 Simon J.D. Prince 40
  • 41. Scene modelComputer vision: models, learning and inference. ©2011 Simon J.D. Prince 41
  • 42. Structure• Computing visual words• Bag of words model• Latent Dirichlet allocation• Single author-topic model• Constellation model• Scene model• Applications Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 42
  • 43. Video GoogleComputer vision: models, learning and inference. ©2011 Simon J.D. Prince 43
  • 44. Action recognitionSpatio-temporal bag of words model 91.8% classification Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 44
  • 45. Action recognitionComputer vision: models, learning and inference. ©2011 Simon J.D. Prince 45