Internet Video SearchArnold W.M. Smeulders & Cees Snoek             CWI & UvA
Overview Image and Video SearchLecture 1   visual search, the problem            color-spatial-textural-temporal features ...
1 Visual search, the problem
A brief history of televisionFrom broadcasting to narrowcasting       ~1955       ~1985             ~2005…to thin casting ...
Any other purpose than tv? Surveillance   to alert events Forensics      to find evidence / to protect misuse Social media...
How big? The answer from the web  The web is video
How big? The answer from                           …as of May 2011
How big? Answer from the archiveYearly influx              Next 6 years   15.000 hours of video     137.200 hours of video...
Crowd-given searchWhat others say is in the video.We focus on what digital content says is in the video.
Problem 1: The variation  So many images of one thing:   illumination                                 background          ...
Problem 2: What defines things?     1101011011011     1101011011011     0110110110011                                11010...
Problem 3: The many thingsThis is the model gap
Problem 4: The story of a videoThis is the narrative gap
Problem 5: No shared intuition                                Query-by-keyword     Find shots of people       Query-by-con...
System 1: histogram matchingHistogram as a summary of color characteristics.This image cannot currently be displayed.     ...
1 ConclusionAs content grows, many applications of image search.Deep cognitive and computer science problems.With simple m...
2 Features
Source . reflectionLight source            e(λ )Object                  ρ (λ )Result                      e( λ ) ρ (λ )
(R,G,B)                                                 ∫ e ( λ ) ρ ( λ ) f R ( λ ) dλ R λ                         ...
(r, g, b) in (R,G,B)          R                      r   R+G+ B        G     g =           b   R + G + ...
The sensation of spectraHue:             dominant wavelength            λ(EH)Saturation:      purity of the colour        ...
The sensation of spectra: opponentHuman perception combines (R,G,B) response   of the eye in opponent colors              ...
Color Gaussian space               E   0.06       0.63     0.27  R                                              ...
Color Gaussian space               (R,G,B)-pdf   (E0,Eλ,Eλλ)-pdf
Matter body reflectance in (R,G,B)
Taxonomy of diff-image structure  T-junction                               Junction  Highlight                            ...
Gabor textureThe 2D Gabor function is:                           x2 + y2                 1       −  h ( x, y ) =       e  ...
Gabor texture            K-means cluster   K-means cluster              of RGB          Gabor opponent                    ...
Gabor GIST descriptor    Calculate Gabor responses locally    Create histograms as before    Distinguishes things like nat...
Receptive field in f(x,t)Gaussian equivalent over x and t:zero order   first order t                                    Bu...
Gaussians measure differentials                             Taylor expansion at xFor discretely sampled signal use the Gau...
Receptive fields: overview             All observables up to first order color,             second order spatial scales, e...
System 2: Blobworld, textured worldGroup blobs based on color and Tamura textureUser specifies query blob and featuresSyst...
2 ConclusionPowerful features capture uniqueness.A large set is needed for open-ended search.The Gauss family is the prefe...
3 Measures and invariances
The need for invarianceThere are a million appearances to one objectThe same part of the same shoe does not have the samea...
Invariance: definitionA feature g is invariant under condition (transform)caused by accidental conditions at the time of r...
Quiz: scale invariant detection What properties are invariant to observation scale?
Color invariance                                                   C = mb (n , s ) ∫ e(λ )cb (λ ) f C (λ )d λ + ms (n...
Matter body reflectance in E
C is viewpoint invariant                          R                                 Gc1 ( R, G, B) = arctan           c2 (...
Hue is viewpoint invariant                 3 𝐺−𝐵               𝑅−𝐺 + 𝑅−𝐵  H = arctan               , H is a scalar
Differential invariants C’, W’, M’C’ is for matte objects and uneven white light:                                     Eλλ ...
Retained discrimination          shadows shading highlights ill. intensity ill. color E            -       -       -      ...
3 ConclusionKnow your variances and invariants.Good invariant features make algorithms simple.
Upcoming SlideShare
Loading in …5
×

Lecture 01 internet video search

748 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
748
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Lecture 01 internet video search

  1. 1. Internet Video SearchArnold W.M. Smeulders & Cees Snoek CWI & UvA
  2. 2. Overview Image and Video SearchLecture 1 visual search, the problem color-spatial-textural-temporal features measures and invariancesLecture 2 descriptors words and similarity where and whatLecture 3 data and metadata performance speed
  3. 3. 1 Visual search, the problem
  4. 4. A brief history of televisionFrom broadcasting to narrowcasting ~1955 ~1985 ~2005…to thin casting 2008 2010
  5. 5. Any other purpose than tv? Surveillance to alert events Forensics to find evidence / to protect misuse Social media to sort responses Safety to prevent terrorism Agriculture to sort fruit News to reuse archived footage Business to have efficient access eBusiness to mine consumer data Science to understand visual cognition Family “I have it somewhere on this disk”
  6. 6. How big? The answer from the web The web is video
  7. 7. How big? The answer from …as of May 2011
  8. 8. How big? Answer from the archiveYearly influx Next 6 years 15.000 hours of video 137.200 hours of video 1 Pbyte per year 22.510 hours of film 2.900.000 photo’s
  9. 9. Crowd-given searchWhat others say is in the video.We focus on what digital content says is in the video.
  10. 10. Problem 1: The variation So many images of one thing: illumination background occlusion viewpoint, … This is the sensory gap.
  11. 11. Problem 2: What defines things? 1101011011011 1101011011011 0110110110011 1101011011011 0110110110011 0101101111100 0110110110011 0101101111100 1101011011111 0101101111100 1101011011111 1101011011111 Tree Suit Basketball 1101011011011 Machine 1101011011011 0110110110011 0110110110011 0101101111100 1101011011011 0101101111100 1101011011111 0110110110011 1101011011111 0101101111100 US flag Building 1101011011111 1101011011011 Table 0110110110011 0101101111100 1101011011011 1101011011111 0110110110011 0101101111100 Aircraft Multimedia Archives 1101011011111 1101011011011 Fire 1101011011011 0110110110011 1101011011011 0110110110011 0110110110011Language 0101101111100 0101101111100 1101011011111 0101101111100 1101011011111 1101011011111 Dog Tennis Mountain
  12. 12. Problem 3: The many thingsThis is the model gap
  13. 13. Problem 4: The story of a videoThis is the narrative gap
  14. 14. Problem 5: No shared intuition Query-by-keyword Find shots of people Query-by-concept shaking hands Query-by-examples Query What sources PredictionThis is the query-context gap
  15. 15. System 1: histogram matchingHistogram as a summary of color characteristics.This image cannot currently be displayed. Swain and Ballard, IJCV 1991
  16. 16. 1 ConclusionAs content grows, many applications of image search.Deep cognitive and computer science problems.With simple means one gets visually simple results.
  17. 17. 2 Features
  18. 18. Source . reflectionLight source e(λ )Object ρ (λ )Result e( λ ) ρ (λ )
  19. 19. (R,G,B)    ∫ e ( λ ) ρ ( λ ) f R ( λ ) dλ R λ     G  =  ∫ e ( λ ) ρ ( λ ) f G ( λ ) dλ B λ     ∫ e(λ ) ρ (λ ) f B (λ )dλ  λ 
  20. 20. (r, g, b) in (R,G,B)  R    r   R+G+ B   G g =  b   R + G + B    B     R+G+ B Independent of shadow!
  21. 21. The sensation of spectraHue: dominant wavelength λ(EH)Saturation: purity of the colour (EH - EW)/EHIntensity: brightness of the colour EW EH E W “white” “green”
  22. 22. The sensation of spectra: opponentHuman perception combines (R,G,B) response of the eye in opponent colors   R+G + B  Luminance    λ  1  BlueYellow  =  ( R − G )  2 λ PuperGreen      1 (2 B − R − G )  λ   4 Maximizes perceived contrast!
  23. 23. Color Gaussian space  E   0.06 0.63 0.27  R        Eλ  =  0.30 0.04 − 0.35  G   E   0.34 − 0.60 0.17  B   λλ   Maximizes information content! Geusebroek PAMI 2002
  24. 24. Color Gaussian space (R,G,B)-pdf (E0,Eλ,Eλλ)-pdf
  25. 25. Matter body reflectance in (R,G,B)
  26. 26. Taxonomy of diff-image structure T-junction Junction Highlight Corner These junctions later bring recognition
  27. 27. Gabor textureThe 2D Gabor function is: x2 + y2 1 − h ( x, y ) = e 2δ 2 e 2πj ( ux + vy ) 2πσ 2Tuning parameters: u, v, σManjunath and Ma on Gabor for texture in Fourier-space
  28. 28. Gabor texture K-means cluster K-means cluster of RGB Gabor opponent Hoang ECCV 2002
  29. 29. Gabor GIST descriptor Calculate Gabor responses locally Create histograms as before Distinguishes things like naturalness, openness, roughness, expansion, and ruggednessSlide credit: James Hays and Alexei Efros Olivia IJCV 2001
  30. 30. Receptive field in f(x,t)Gaussian equivalent over x and t:zero order first order t Burghouts TIP 2006
  31. 31. Gaussians measure differentials Taylor expansion at xFor discretely sampled signal use the GaussiansThe preferred brand of filters: separable by dimension rotation symmetric no new maxima fast implementations.
  32. 32. Receptive fields: overview All observables up to first order color, second order spatial scales, eight frequency bands & first order in t.
  33. 33. System 2: Blobworld, textured worldGroup blobs based on color and Tamura textureUser specifies query blob and featuresSystem returns images with similar regions Carson PAMI 2002
  34. 34. 2 ConclusionPowerful features capture uniqueness.A large set is needed for open-ended search.The Gauss family is the preferred brand of filters.Fast recursive implementation:Geusebroek, Van de Weijer & Smeulders 2002
  35. 35. 3 Measures and invariances
  36. 36. The need for invarianceThere are a million appearances to one objectThe same part of the same shoe does not have the sameappearance in the image. This is the sensory gap.Remove unwanted variance as early as you can.
  37. 37. Invariance: definitionA feature g is invariant under condition (transform)caused by accidental conditions at the time of recording,iff g observed on equal objects and is constant:
  38. 38. Quiz: scale invariant detection What properties are invariant to observation scale?
  39. 39. Color invariance      C = mb (n , s ) ∫ e(λ )cb (λ ) f C (λ )d λ + ms (n , s , v ) ∫ e(λ )cs (λ ) f C (λ ) d λ λ λcb (λ ) surface albedo scene & viewpoint invariante(λ ) illumination scene dependent n object surface normal object shape variants illumination direction scene dependentv viewer’s direction viewpoint variantf C (λ ) sensor sensitivity scene dependent
  40. 40. Matter body reflectance in E
  41. 41. C is viewpoint invariant R Gc1 ( R, G, B) = arctan c2 ( R, G, B ) = arctan c3 ( R, G, B max{G, B} max{R, B} E space C space Gevers TIP 2000
  42. 42. Hue is viewpoint invariant 3 𝐺−𝐵 𝑅−𝐺 + 𝑅−𝐵 H = arctan , H is a scalar
  43. 43. Differential invariants C’, W’, M’C’ is for matte objects and uneven white light: Eλλ E Cλ = λ Cλλ = E E Eλ x E − Eλ E x Cλx = E2W’ is for matte planar objects and even white light: Ex Eλ x Wx = Wλx = E EM’ is for matte objects and monochromatic light: Eλ x E − Eλ E x N λx = E2 Geusebroek PAMI 2002
  44. 44. Retained discrimination shadows shading highlights ill. intensity ill. color E - - - - - H + + + + - W & W’ - + - + - C & C’ + + - + - M & M’ + + - + + L + + + + - E 990 H 315Retained from 1000 colors σ = 3: W’ 995 C’ 850 M’ 900Geusebroek PAMI 2003
  45. 45. 3 ConclusionKnow your variances and invariants.Good invariant features make algorithms simple.

×