0
Upcoming SlideShare
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Standard text messaging rates apply

# Lecture 02 internet video search

394

Published on

0 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

• Be the first to like this

Views
Total Views
394
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
6
0
Likes
0
Embeds 0
No embeds

No notes for slide

### Transcript

• 1. Before the breakColor, texture, time, spatial structure, Gauss does it all.…. but not invariance, which is badly needed.
• 2. Before the breakMost basic systems:System 1 Swain Ballard match colorsSystem 2 Blob world match texture blobsWe need more invariance
• 3. 4. Descriptors
• 4. Patch descriptorsFor 4x4 patches, find local gradient directions over t.Count the directions per patch, 128D SIFT histogram.Lowe IJCV 2004
• 5. Affine patch descriptor Compute the prominent direction. Start with central Gaussian distributed weights in W. Compute 2nd order moments matrix Mk over all directions. Adapt weights to elliptic shape.  ∑ wk ( x, y ) f x f x ∑ w ( x, y ) f fx  Mk =  k y  ∑ wk ( x, y ) f x f y  ∑ w ( x, y ) f k y fy   Iterate until there is no longer change. Wk +1 = M k Wk
• 6. Color Patch Descriptors Invariance properties per descriptor Light Light Light intensity Light color intensity intensity change and Light color change and change shift shift change shift SIFT + + + - - OpponentSIFT + + + - - C-SIFT + - - - - RGB-SIFT + + + + + van de Sande PAMI 2010
• 7. Results on PASCAL VOC 2007
• 8. Results per object category OpponentSIFT (L2 norm) MAP Two channel I+C (L2 norm) bottlepottedplant cow dogdiningtable sheep bird sofa tvmonitor cat chair bicycle motorbike bus boat train car horse aeroplane person 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Average Precision
• 9. Corner selectorThe change energy at x over a small vector u: u   fx fx fy fx  E xy (u , v) ≈ [u v] M  , M =   v  fx fy fy fy  Since M is symmetric, we have direction of the λ 0 −1  1 fastest change M =R  R  0 λ2 For a corner both should be large. (λmax)-1/2= det M − k ( trace M ) 2 R (λmin)-1/2det M = λ1λ2 = I x2 I y − ( I x I y ) 2 2traceM = λ1 + λ2 = I x + I y 2 2
• 11. Harris’ stability
• 12. Blob detector2D Laplacian: L σ 2 ( Gxx ( x, y, σ ) + G yy ( x, y, σ ) ) =DoG: = G ( x, y, kσ ) − G ( x, y, σ ) DoGThe Laplacian has a single max at the size of the blob,Multiplyby σ2
• 13. Laplace blob detector
• 14. Laplace blob detector
• 15. Laplace blob detector
• 16. DoG detection + SIFT descriptionJepson 2005
• 17. System 3: patch detection System 3 is an app: Stitching http://www.cloudburstresearch.com/
• 18. 4. ConclusionPatch descriptors bring local orderless information.Best combined with color invariance for illumination.Scene-pose-illumination invariance brings meaning. Lee Comm. ACM 2011
• 19. 5 Words & Similarity
• 20. Before words1000 patches, 128 features1,000,000 images ~ 11,5 days / 100 Gbyte
• 21. Capture the pattern in patchMeasure the pattern in a patch with abundant features.More is better. Different is better. Normalized is better.
• 22. Sample many patchesSample the patches in the image.Dense 256 K words, salient 1 K words. Salience is good.Dense is better. Combined even better. Salient is memoryefficient. Dense is compute efficient.
• 23. Sample many imagesSample the images in the world: the learning set.Learn all relevant distinctions. Learn all irrelevant variationsnot covered in the invariance of features.
• 24. Form a dictionary of wordsForm regions in feature space.Size 4,000 (general) to 400,000 (buildings). Random forest isgood and fast, 4 runs 10 deep is OK.
• 25. Count words per imageRetain the word boundaries.Fill the histogram of words per training image.
• 26. Map histogram in similarity spaceIn 4096 D word count space, 1 point is 1 image.Hard assignment: one patch one word.
• 27. Learn histogram similarityLearn the histogram distinction between the image histogramsThe histogram is 𝑉 𝑑 = 𝑡1 , 𝑡2 , … , 𝑡 𝑖 , … , 𝑡 𝑛 𝑇 , where 𝑡 𝑖 is the totalsorted per class of images in the learning set.of occurrences of the visual word i.query and image: 𝑆 𝑞 = 𝑉𝑞 ∩ 𝑉𝑗The number of words in common is the intersection between
• 28. Classify unknown imageRetain the word count discrimination + support vectorsGo from patch to patch > words > counts > discriminate
• 29. http://www.robots.ox.ac.uk/~vgg/research/oxbuildings/index.htmlSystem 4: Oxford building search
• 30. Note 1: Soft assignment is better Soft assignment: assign to multiple clusters, weighted by distance to center. Pooled single sigma for all codebook elements. van Gemert, PAMI 2010
• 31. Notes 2: SVM similarity is betterSVM can reconstruct a complex geometry at the boundaryincluding disjoint subspaces. The distance metric in the kernelis important.
• 32. Vapnik, 1995Note 2: nonlinear SVMs How to transform the data such that the samples from the two classes are separable by a linear function (preferably with margin). Or, equivalently, define a kernel that does this for you straight away.
• 33. Zhang, IJCV ‘07Note 2: χ² - kernelsBecause χ² is meant to discriminate histograms!
• 34. Note 2: … or multiple kernelsLet multiple kernel learning determine the weight of all features Descriptors Norm = L2 # Norm ∈ L # SIFT 0.4902 1 0.5169 4 OpponentSIFT (baseline) 0.4975 1 0.5203 4 SIFT and OpponentSIFT 0.5187 2 0.5357 8 One channel from C 0.5351 49 0.5405 196 Two channel: I and one from C 0.5463 49 0.5507 196
• 35. Note 3: Speed For the Intersection Kernel hi is piecewise linear, and quite smooth, blue plot. We can approximate with fewer uniformly spaced segments, red plot. Saves factor 75 time!Maji CVPR 2008
• 36. Note 4: What is in a word? This is how a word looks like Gavves 2011 Chum ICCV 2007 Turcot ICCV 2009
• 37. Note 4: Where are the synonyms?But not all views of the same detailare close! Gavves 2011
• 38. Note 4: Forming selective dictionaryBuild vocabulary by selecting theminimal set by maximizing the crossentropy: 99% vocabulary reduction 6% improved recognitionNeeds 100 words per concept.Gavves 2011 CVPR
• 39. Note 4Selectivedictionary bycross entropy.Examples.
• 40. Note 5: Deconstruct wordsFisher vectors capture the internal structure of words.Train a Gaussian Mixture Model, where each codebookelement has its own sigma – one per dimension. Storedifferences in all descriptor dimensions. The feature vector is#codewords x #descriptors. Perronnin ECCV 2010
• 41. System 5: MediaMill search engine http://www.mediamill.nl
• 42. 5. ConclusionWords are the essential step forward.More is better. Better but costly.Smooth assignment works better than hard.At the cost of less orthogonal methods.Approximate algorithms are sufficient, mostly.