Lecture 02 internet video search

  • 336 views
Uploaded on

 

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
336
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
3
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Before the breakColor, texture, time, spatial structure, Gauss does it all.…. but not invariance, which is badly needed.
  • 2. Before the breakMost basic systems:System 1 Swain Ballard match colorsSystem 2 Blob world match texture blobsWe need more invariance
  • 3. 4. Descriptors
  • 4. Patch descriptorsFor 4x4 patches, find local gradient directions over t.Count the directions per patch, 128D SIFT histogram.Lowe IJCV 2004
  • 5. Affine patch descriptor Compute the prominent direction. Start with central Gaussian distributed weights in W. Compute 2nd order moments matrix Mk over all directions. Adapt weights to elliptic shape.  ∑ wk ( x, y ) f x f x ∑ w ( x, y ) f fx  Mk =  k y  ∑ wk ( x, y ) f x f y  ∑ w ( x, y ) f k y fy   Iterate until there is no longer change. Wk +1 = M k Wk
  • 6. Color Patch Descriptors Invariance properties per descriptor Light Light Light intensity Light color intensity intensity change and Light color change and change shift shift change shift SIFT + + + - - OpponentSIFT + + + - - C-SIFT + - - - - RGB-SIFT + + + + + van de Sande PAMI 2010
  • 7. Results on PASCAL VOC 2007
  • 8. Results per object category OpponentSIFT (L2 norm) MAP Two channel I+C (L2 norm) bottlepottedplant cow dogdiningtable sheep bird sofa tvmonitor cat chair bicycle motorbike bus boat train car horse aeroplane person 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Average Precision
  • 9. Corner selectorThe change energy at x over a small vector u: u   fx fx fy fx  E xy (u , v) ≈ [u v] M  , M =   v  fx fy fy fy  Since M is symmetric, we have direction of the λ 0 −1  1 fastest change M =R  R  0 λ2 For a corner both should be large. (λmax)-1/2= det M − k ( trace M ) 2 R (λmin)-1/2det M = λ1λ2 = I x2 I y − ( I x I y ) 2 2traceM = λ1 + λ2 = I x + I y 2 2
  • 10. Directionality of gradients
  • 11. Harris’ stability
  • 12. Blob detector2D Laplacian: L σ 2 ( Gxx ( x, y, σ ) + G yy ( x, y, σ ) ) =DoG: = G ( x, y, kσ ) − G ( x, y, σ ) DoGThe Laplacian has a single max at the size of the blob,Multiplyby σ2
  • 13. Laplace blob detector
  • 14. Laplace blob detector
  • 15. Laplace blob detector
  • 16. DoG detection + SIFT descriptionJepson 2005
  • 17. System 3: patch detection System 3 is an app: Stitching http://www.cloudburstresearch.com/
  • 18. 4. ConclusionPatch descriptors bring local orderless information.Best combined with color invariance for illumination.Scene-pose-illumination invariance brings meaning. Lee Comm. ACM 2011
  • 19. 5 Words & Similarity
  • 20. Before words1000 patches, 128 features1,000,000 images ~ 11,5 days / 100 Gbyte
  • 21. Capture the pattern in patchMeasure the pattern in a patch with abundant features.More is better. Different is better. Normalized is better.
  • 22. Sample many patchesSample the patches in the image.Dense 256 K words, salient 1 K words. Salience is good.Dense is better. Combined even better. Salient is memoryefficient. Dense is compute efficient.
  • 23. Sample many imagesSample the images in the world: the learning set.Learn all relevant distinctions. Learn all irrelevant variationsnot covered in the invariance of features.
  • 24. Form a dictionary of wordsForm regions in feature space.Size 4,000 (general) to 400,000 (buildings). Random forest isgood and fast, 4 runs 10 deep is OK.
  • 25. Count words per imageRetain the word boundaries.Fill the histogram of words per training image.
  • 26. Map histogram in similarity spaceIn 4096 D word count space, 1 point is 1 image.Hard assignment: one patch one word.
  • 27. Learn histogram similarityLearn the histogram distinction between the image histogramsThe histogram is 𝑉 𝑑 = 𝑡1 , 𝑡2 , … , 𝑡 𝑖 , … , 𝑡 𝑛 𝑇 , where 𝑡 𝑖 is the totalsorted per class of images in the learning set.of occurrences of the visual word i.query and image: 𝑆 𝑞 = 𝑉𝑞 ∩ 𝑉𝑗The number of words in common is the intersection between
  • 28. Classify unknown imageRetain the word count discrimination + support vectorsGo from patch to patch > words > counts > discriminate
  • 29. http://www.robots.ox.ac.uk/~vgg/research/oxbuildings/index.htmlSystem 4: Oxford building search
  • 30. Note 1: Soft assignment is better Soft assignment: assign to multiple clusters, weighted by distance to center. Pooled single sigma for all codebook elements. van Gemert, PAMI 2010
  • 31. Notes 2: SVM similarity is betterSVM can reconstruct a complex geometry at the boundaryincluding disjoint subspaces. The distance metric in the kernelis important.
  • 32. Vapnik, 1995Note 2: nonlinear SVMs How to transform the data such that the samples from the two classes are separable by a linear function (preferably with margin). Or, equivalently, define a kernel that does this for you straight away.
  • 33. Zhang, IJCV ‘07Note 2: χ² - kernelsBecause χ² is meant to discriminate histograms!
  • 34. Note 2: … or multiple kernelsLet multiple kernel learning determine the weight of all features Descriptors Norm = L2 # Norm ∈ L # SIFT 0.4902 1 0.5169 4 OpponentSIFT (baseline) 0.4975 1 0.5203 4 SIFT and OpponentSIFT 0.5187 2 0.5357 8 One channel from C 0.5351 49 0.5405 196 Two channel: I and one from C 0.5463 49 0.5507 196
  • 35. Note 3: Speed For the Intersection Kernel hi is piecewise linear, and quite smooth, blue plot. We can approximate with fewer uniformly spaced segments, red plot. Saves factor 75 time!Maji CVPR 2008
  • 36. Note 4: What is in a word? This is how a word looks like Gavves 2011 Chum ICCV 2007 Turcot ICCV 2009
  • 37. Note 4: Where are the synonyms?But not all views of the same detailare close! Gavves 2011
  • 38. Note 4: Forming selective dictionaryBuild vocabulary by selecting theminimal set by maximizing the crossentropy: 99% vocabulary reduction 6% improved recognitionNeeds 100 words per concept.Gavves 2011 CVPR
  • 39. Note 4Selectivedictionary bycross entropy.Examples.
  • 40. Note 5: Deconstruct wordsFisher vectors capture the internal structure of words.Train a Gaussian Mixture Model, where each codebookelement has its own sigma – one per dimension. Storedifferences in all descriptor dimensions. The feature vector is#codewords x #descriptors. Perronnin ECCV 2010
  • 41. System 5: MediaMill search engine http://www.mediamill.nl
  • 42. 5. ConclusionWords are the essential step forward.More is better. Better but costly.Smooth assignment works better than hard.At the cost of less orthogonal methods.Approximate algorithms are sufficient, mostly.