SlideShare a Scribd company logo
1 of 42
Download to read offline
Before the break

Color, texture, time, spatial structure, Gauss does it all.




…. but not invariance, which is badly needed.
Before the break

Most basic systems:

System 1 Swain Ballard match colors

System 2 Blob world match texture blobs




We need more invariance
4. Descriptors
Patch descriptors

For 4x4 patches, find local gradient directions over t.
Count the directions per patch, 128D SIFT histogram.




Lowe IJCV 2004
Affine patch descriptor
 Compute the prominent direction.
 Start with central Gaussian
        distributed weights in W.
 Compute 2nd order moments matrix
      Mk over all directions.
 Adapt weights to elliptic shape.
           ∑ wk ( x, y ) f x f x   ∑ w ( x, y ) f     fx 
     Mk = 
                                         k           y
                                                          
          ∑ wk ( x, y ) f x f y
                                   ∑ w ( x, y ) f
                                         k           y fy 
                                                          
 Iterate until there is no longer change.
     Wk +1 = M k Wk
Color Patch Descriptors




                       Invariance properties per descriptor
                       Light       Light      Light intensity                 Light color
                     intensity   intensity     change and       Light color   change and
                      change       shift           shift          change         shift
      SIFT              +           +               +                -             -
      OpponentSIFT      +           +               +                -             -
      C-SIFT            +            -               -               -             -
      RGB-SIFT          +           +               +               +             +

                                             van de Sande PAMI 2010
Results on PASCAL VOC 2007
Results per object category
                                                                 OpponentSIFT (L2 norm)
        MAP
                                                                 Two channel I+C (L2 norm)
      bottle
pottedplant
        cow
        dog
diningtable
      sheep
        bird
        sofa
 tvmonitor
         cat
       chair
     bicycle
 motorbike
         bus
       boat
       train
         car
      horse
 aeroplane
     person
               0.0   0.1   0.2   0.3     0.4       0.5     0.6   0.7       0.8         0.9
                                       Average Precision
Corner selector

The change energy at x over a small vector u:
                        u       fx fx fy fx 
 E xy (u , v) ≈ [u v] M  , M = 
                         v      fx fy fy fy 
                                               
Since M is symmetric, we have                      direction of the
             λ 0
         −1  1
                                                   fastest change
   M =R           R
             0 λ2 
For a corner both should be large.
                                              (λmax)-1/2
= det M − k ( trace M )
                                          2
 R                                                             (λmin)-1/2
det M = λ1λ2 = I x2 I y − ( I x I y ) 2
                      2


traceM = λ1 + λ2 = I x + I y
                     2     2
Directionality of gradients
Harris’ stability
Blob detector

2D Laplacian: L σ 2 ( Gxx ( x, y, σ ) + G yy ( x, y, σ ) )
   =
DoG:        = G ( x, y, kσ ) − G ( x, y, σ )
            DoG
The Laplacian has a single max at the size of the blob,




Multiply
by σ2
Laplace blob detector
Laplace blob detector
Laplace blob detector
DoG detection + SIFT description




Jepson 2005
System 3: patch detection

  System 3 is an app: Stitching




                       http://www.cloudburstresearch.com/
4. Conclusion

Patch descriptors bring local orderless information.

Best combined with color invariance for illumination.

Scene-pose-illumination invariance brings meaning.




                                              Lee Comm. ACM 2011
5 Words & Similarity
Before words

1000 patches, 128 features



1,000,000 images ~ 11,5 days / 100 Gbyte
Capture the pattern in patch

Measure the pattern in a patch with abundant features.
More is better. Different is better. Normalized is better.
Sample many patches
Sample the patches in the image.
Dense 256 K words, salient 1 K words. Salience is good.
Dense is better. Combined even better. Salient is memory
efficient. Dense is compute efficient.
Sample many images
Sample the images in the world: the learning set.
Learn all relevant distinctions. Learn all irrelevant variations
not covered in the invariance of features.
Form a dictionary of words

Form regions in feature space.
Size 4,000 (general) to 400,000 (buildings). Random forest is
good and fast, 4 runs 10 deep is OK.
Count words per image

Retain the word boundaries.
Fill the histogram of words per training image.
Map histogram in similarity space

In 4096 D word count space, 1 point is 1 image.
Hard assignment: one patch one word.
Learn histogram similarity

Learn the histogram distinction between the image histograms


The histogram is 𝑉 𝑑 = 𝑡1 , 𝑡2 , … , 𝑡 𝑖 , … , 𝑡 𝑛 𝑇 , where 𝑡 𝑖 is the total
sorted per class of images in the learning set.


of occurrences of the visual word i.


query and image: 𝑆 𝑞 = 𝑉𝑞 ∩ 𝑉𝑗
The number of words in common is the intersection between
Classify unknown image

Retain the word count discrimination + support vectors
Go from patch to patch > words > counts > discriminate
http://www.robots.ox.ac.uk/~vgg/research/oxbuildings/index.html



System 4: Oxford building search
Note 1: Soft assignment is better

 Soft assignment: assign to multiple clusters, weighted by
 distance to center. Pooled single sigma for all codebook
 elements.




                                     van Gemert, PAMI 2010
Notes 2: SVM similarity is better

SVM can reconstruct a complex geometry at the boundary
including disjoint subspaces. The distance metric in the kernel
is important.
Vapnik, 1995


Note 2: nonlinear SVMs
 How to transform the data such that the samples from
 the two classes are separable by a linear function
 (preferably with margin). Or, equivalently, define a
 kernel that does this for you straight away.
Zhang, IJCV ‘07


Note 2: χ² - kernels




Because χ² is meant to discriminate histograms!
Note 2: … or multiple kernels

Let multiple kernel learning determine the weight of all features


     Descriptors                     Norm = L2   #    Norm ∈ L   #
     SIFT                            0.4902      1    0.5169     4
     OpponentSIFT (baseline)         0.4975      1    0.5203     4
     SIFT and OpponentSIFT           0.5187      2    0.5357     8
     One channel from C              0.5351      49   0.5405     196
     Two channel: I and one from C   0.5463      49   0.5507     196
Note 3: Speed




                 For the Intersection Kernel hi
                 is piecewise linear, and quite
                 smooth, blue plot. We can
                        approximate with fewer
                        uniformly spaced
                        segments, red plot.
                        Saves factor 75 time!
Maji CVPR 2008
Note 4: What is in a word?




                 This is how a word looks like


                 Gavves 2011 Chum ICCV 2007
                 Turcot ICCV 2009
Note 4: Where are the synonyms?




But not all views of the same detail
are close!               Gavves 2011
Note 4: Forming selective dictionary

Build vocabulary by selecting the
minimal set by maximizing the cross
entropy:
      99% vocabulary reduction
      6% improved recognition
Needs 100 words per concept.



Gavves 2011 CVPR
Note 4

Selective
dictionary by
cross entropy.


Examples.
Note 5: Deconstruct words

Fisher vectors capture the internal structure of words.
Train a Gaussian Mixture Model, where each codebook
element has its own sigma – one per dimension. Store
differences in all descriptor dimensions. The feature vector is
#codewords x #descriptors.




                                          Perronnin ECCV 2010
System 5: MediaMill search engine




                    http://www.mediamill.nl
5. Conclusion

Words are the essential step forward.
More is better. Better but costly.
Smooth assignment works better than hard.
At the cost of less orthogonal methods.
Approximate algorithms are sufficient, mostly.

More Related Content

What's hot

CS 354 Understanding Color
CS 354 Understanding ColorCS 354 Understanding Color
CS 354 Understanding ColorMark Kilgard
 
Ultrasound Modular Architecture
Ultrasound Modular ArchitectureUltrasound Modular Architecture
Ultrasound Modular ArchitectureJose Miguel Moreno
 
Fourier transform convergence
Fourier transform convergenceFourier transform convergence
Fourier transform convergenceJahidul Islam
 
Quantum Probabilities and Quantum-inspired Information Retrieval
Quantum Probabilities and Quantum-inspired Information RetrievalQuantum Probabilities and Quantum-inspired Information Retrieval
Quantum Probabilities and Quantum-inspired Information RetrievalIngo Frommholz
 
Dft and its applications
Dft and its applicationsDft and its applications
Dft and its applicationsAgam Goel
 
Image trnsformations
Image trnsformationsImage trnsformations
Image trnsformationsJohn Williams
 
Practical Spherical Harmonics Based PRT Methods
Practical Spherical Harmonics Based PRT MethodsPractical Spherical Harmonics Based PRT Methods
Practical Spherical Harmonics Based PRT MethodsNaughty Dog
 
ICML2013読み会 Large-Scale Learning with Less RAM via Randomization
ICML2013読み会 Large-Scale Learning with Less RAM via RandomizationICML2013読み会 Large-Scale Learning with Less RAM via Randomization
ICML2013読み会 Large-Scale Learning with Less RAM via RandomizationHidekazu Oiwa
 
ARCHITECTURAL CONDITIONING FOR DISENTANGLEMENT OF OBJECT IDENTITY AND POSTURE...
ARCHITECTURAL CONDITIONING FOR DISENTANGLEMENT OF OBJECT IDENTITY AND POSTURE...ARCHITECTURAL CONDITIONING FOR DISENTANGLEMENT OF OBJECT IDENTITY AND POSTURE...
ARCHITECTURAL CONDITIONING FOR DISENTANGLEMENT OF OBJECT IDENTITY AND POSTURE...홍배 김
 
Nyquist criterion for zero ISI
Nyquist criterion for zero ISINyquist criterion for zero ISI
Nyquist criterion for zero ISIGunasekara Reddy
 
SEGMENTATION OF POLARIMETRIC SAR DATA WITH A MULTI-TEXTURE PRODUCT MODEL
SEGMENTATION OF POLARIMETRIC SAR DATA WITH A MULTI-TEXTURE PRODUCT MODELSEGMENTATION OF POLARIMETRIC SAR DATA WITH A MULTI-TEXTURE PRODUCT MODEL
SEGMENTATION OF POLARIMETRIC SAR DATA WITH A MULTI-TEXTURE PRODUCT MODELgrssieee
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
Machine learning session 9
Machine learning session 9Machine learning session 9
Machine learning session 9NirsandhG
 
ENBIS 2018 presentation on Deep k-Means
ENBIS 2018 presentation on Deep k-MeansENBIS 2018 presentation on Deep k-Means
ENBIS 2018 presentation on Deep k-Meanstthonet
 
Dsp U Lec08 Fir Filter Design
Dsp U   Lec08 Fir Filter DesignDsp U   Lec08 Fir Filter Design
Dsp U Lec08 Fir Filter Designtaha25
 
Brief intro : Invariance and Equivariance
Brief intro : Invariance and EquivarianceBrief intro : Invariance and Equivariance
Brief intro : Invariance and Equivariance홍배 김
 

What's hot (19)

CS 354 Understanding Color
CS 354 Understanding ColorCS 354 Understanding Color
CS 354 Understanding Color
 
Ultrasound Modular Architecture
Ultrasound Modular ArchitectureUltrasound Modular Architecture
Ultrasound Modular Architecture
 
Fourier transform convergence
Fourier transform convergenceFourier transform convergence
Fourier transform convergence
 
Quantum Probabilities and Quantum-inspired Information Retrieval
Quantum Probabilities and Quantum-inspired Information RetrievalQuantum Probabilities and Quantum-inspired Information Retrieval
Quantum Probabilities and Quantum-inspired Information Retrieval
 
Dft and its applications
Dft and its applicationsDft and its applications
Dft and its applications
 
Jokyokai2
Jokyokai2Jokyokai2
Jokyokai2
 
Hm2513521357
Hm2513521357Hm2513521357
Hm2513521357
 
Image trnsformations
Image trnsformationsImage trnsformations
Image trnsformations
 
Practical Spherical Harmonics Based PRT Methods
Practical Spherical Harmonics Based PRT MethodsPractical Spherical Harmonics Based PRT Methods
Practical Spherical Harmonics Based PRT Methods
 
ICML2013読み会 Large-Scale Learning with Less RAM via Randomization
ICML2013読み会 Large-Scale Learning with Less RAM via RandomizationICML2013読み会 Large-Scale Learning with Less RAM via Randomization
ICML2013読み会 Large-Scale Learning with Less RAM via Randomization
 
ARCHITECTURAL CONDITIONING FOR DISENTANGLEMENT OF OBJECT IDENTITY AND POSTURE...
ARCHITECTURAL CONDITIONING FOR DISENTANGLEMENT OF OBJECT IDENTITY AND POSTURE...ARCHITECTURAL CONDITIONING FOR DISENTANGLEMENT OF OBJECT IDENTITY AND POSTURE...
ARCHITECTURAL CONDITIONING FOR DISENTANGLEMENT OF OBJECT IDENTITY AND POSTURE...
 
Nyquist criterion for zero ISI
Nyquist criterion for zero ISINyquist criterion for zero ISI
Nyquist criterion for zero ISI
 
SEGMENTATION OF POLARIMETRIC SAR DATA WITH A MULTI-TEXTURE PRODUCT MODEL
SEGMENTATION OF POLARIMETRIC SAR DATA WITH A MULTI-TEXTURE PRODUCT MODELSEGMENTATION OF POLARIMETRIC SAR DATA WITH A MULTI-TEXTURE PRODUCT MODEL
SEGMENTATION OF POLARIMETRIC SAR DATA WITH A MULTI-TEXTURE PRODUCT MODEL
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)
 
Machine learning session 9
Machine learning session 9Machine learning session 9
Machine learning session 9
 
Lb2519271931
Lb2519271931Lb2519271931
Lb2519271931
 
ENBIS 2018 presentation on Deep k-Means
ENBIS 2018 presentation on Deep k-MeansENBIS 2018 presentation on Deep k-Means
ENBIS 2018 presentation on Deep k-Means
 
Dsp U Lec08 Fir Filter Design
Dsp U   Lec08 Fir Filter DesignDsp U   Lec08 Fir Filter Design
Dsp U Lec08 Fir Filter Design
 
Brief intro : Invariance and Equivariance
Brief intro : Invariance and EquivarianceBrief intro : Invariance and Equivariance
Brief intro : Invariance and Equivariance
 

Viewers also liked

Skiena algorithm 2007 lecture07 heapsort priority queues
Skiena algorithm 2007 lecture07 heapsort priority queuesSkiena algorithm 2007 lecture07 heapsort priority queues
Skiena algorithm 2007 lecture07 heapsort priority queueszukun
 
Lecture 1 graphical models
Lecture 1  graphical modelsLecture 1  graphical models
Lecture 1 graphical modelsDuy Tung Pham
 
Advance algorithm hashing lec II
Advance algorithm hashing lec IIAdvance algorithm hashing lec II
Advance algorithm hashing lec IISajid Marwat
 
Skiena algorithm 2007 lecture10 graph data strctures
Skiena algorithm 2007 lecture10 graph data strcturesSkiena algorithm 2007 lecture10 graph data strctures
Skiena algorithm 2007 lecture10 graph data strctureszukun
 

Viewers also liked (6)

Designs
DesignsDesigns
Designs
 
Skiena algorithm 2007 lecture07 heapsort priority queues
Skiena algorithm 2007 lecture07 heapsort priority queuesSkiena algorithm 2007 lecture07 heapsort priority queues
Skiena algorithm 2007 lecture07 heapsort priority queues
 
Lecture 1 graphical models
Lecture 1  graphical modelsLecture 1  graphical models
Lecture 1 graphical models
 
Advance algorithm hashing lec II
Advance algorithm hashing lec IIAdvance algorithm hashing lec II
Advance algorithm hashing lec II
 
Skiena algorithm 2007 lecture10 graph data strctures
Skiena algorithm 2007 lecture10 graph data strcturesSkiena algorithm 2007 lecture10 graph data strctures
Skiena algorithm 2007 lecture10 graph data strctures
 
02 newton-raphson
02 newton-raphson02 newton-raphson
02 newton-raphson
 

Similar to Lecture 02 internet video search

Practical spherical harmonics based PRT methods.ppsx
Practical spherical harmonics based PRT methods.ppsxPractical spherical harmonics based PRT methods.ppsx
Practical spherical harmonics based PRT methods.ppsxMannyK4
 
Topological Inference via Meshing
Topological Inference via MeshingTopological Inference via Meshing
Topological Inference via MeshingDon Sheehy
 
Basics of edge detection and forier transform
Basics of edge detection and forier transformBasics of edge detection and forier transform
Basics of edge detection and forier transformSimranjit Singh
 
Skiena algorithm 2007 lecture18 application of dynamic programming
Skiena algorithm 2007 lecture18 application of dynamic programmingSkiena algorithm 2007 lecture18 application of dynamic programming
Skiena algorithm 2007 lecture18 application of dynamic programmingzukun
 
Attention is all you need (UPC Reading Group 2018, by Santi Pascual)
Attention is all you need (UPC Reading Group 2018, by Santi Pascual)Attention is all you need (UPC Reading Group 2018, by Santi Pascual)
Attention is all you need (UPC Reading Group 2018, by Santi Pascual)Universitat Politècnica de Catalunya
 
Wavelet Based Image Compression Using FPGA
Wavelet Based Image Compression Using FPGAWavelet Based Image Compression Using FPGA
Wavelet Based Image Compression Using FPGADr. Mohieddin Moradi
 
Image formation
Image formationImage formation
Image formationpotaters
 
Distributed Parallel Process Particle Swarm Optimization on Fixed Charge Netw...
Distributed Parallel Process Particle Swarm Optimization on Fixed Charge Netw...Distributed Parallel Process Particle Swarm Optimization on Fixed Charge Netw...
Distributed Parallel Process Particle Swarm Optimization on Fixed Charge Netw...Corey Clark, Ph.D.
 
Chapter10 image segmentation
Chapter10 image segmentationChapter10 image segmentation
Chapter10 image segmentationasodariyabhavesh
 
Deep learning for molecules, introduction to chainer chemistry
Deep learning for molecules, introduction to chainer chemistryDeep learning for molecules, introduction to chainer chemistry
Deep learning for molecules, introduction to chainer chemistryKenta Oono
 
Biao Hou--SAR IMAGE DESPECKLING BASED ON IMPROVED DIRECTIONLET DOMAIN GAUSSIA...
Biao Hou--SAR IMAGE DESPECKLING BASED ON IMPROVED DIRECTIONLET DOMAIN GAUSSIA...Biao Hou--SAR IMAGE DESPECKLING BASED ON IMPROVED DIRECTIONLET DOMAIN GAUSSIA...
Biao Hou--SAR IMAGE DESPECKLING BASED ON IMPROVED DIRECTIONLET DOMAIN GAUSSIA...grssieee
 
Biao Hou--SAR IMAGE DESPECKLING BASED ON IMPROVED DIRECTIONLET DOMAIN GAUSSIA...
Biao Hou--SAR IMAGE DESPECKLING BASED ON IMPROVED DIRECTIONLET DOMAIN GAUSSIA...Biao Hou--SAR IMAGE DESPECKLING BASED ON IMPROVED DIRECTIONLET DOMAIN GAUSSIA...
Biao Hou--SAR IMAGE DESPECKLING BASED ON IMPROVED DIRECTIONLET DOMAIN GAUSSIA...grssieee
 
DETECTION OF MOVING OBJECT
DETECTION OF MOVING OBJECTDETECTION OF MOVING OBJECT
DETECTION OF MOVING OBJECTAM Publications
 
A Novel Methodology for Designing Linear Phase IIR Filters
A Novel Methodology for Designing Linear Phase IIR FiltersA Novel Methodology for Designing Linear Phase IIR Filters
A Novel Methodology for Designing Linear Phase IIR FiltersIDES Editor
 
論文紹介:Towards Robust Adaptive Object Detection Under Noisy Annotations
論文紹介:Towards Robust Adaptive Object Detection Under Noisy Annotations論文紹介:Towards Robust Adaptive Object Detection Under Noisy Annotations
論文紹介:Towards Robust Adaptive Object Detection Under Noisy AnnotationsToru Tamaki
 

Similar to Lecture 02 internet video search (20)

Practical spherical harmonics based PRT methods.ppsx
Practical spherical harmonics based PRT methods.ppsxPractical spherical harmonics based PRT methods.ppsx
Practical spherical harmonics based PRT methods.ppsx
 
Topological Inference via Meshing
Topological Inference via MeshingTopological Inference via Meshing
Topological Inference via Meshing
 
Basics of edge detection and forier transform
Basics of edge detection and forier transformBasics of edge detection and forier transform
Basics of edge detection and forier transform
 
Skiena algorithm 2007 lecture18 application of dynamic programming
Skiena algorithm 2007 lecture18 application of dynamic programmingSkiena algorithm 2007 lecture18 application of dynamic programming
Skiena algorithm 2007 lecture18 application of dynamic programming
 
CNN for modeling sentence
CNN for modeling sentenceCNN for modeling sentence
CNN for modeling sentence
 
Attention is all you need (UPC Reading Group 2018, by Santi Pascual)
Attention is all you need (UPC Reading Group 2018, by Santi Pascual)Attention is all you need (UPC Reading Group 2018, by Santi Pascual)
Attention is all you need (UPC Reading Group 2018, by Santi Pascual)
 
Image transforms
Image transformsImage transforms
Image transforms
 
Wavelet Based Image Compression Using FPGA
Wavelet Based Image Compression Using FPGAWavelet Based Image Compression Using FPGA
Wavelet Based Image Compression Using FPGA
 
Image formation
Image formationImage formation
Image formation
 
Distributed Parallel Process Particle Swarm Optimization on Fixed Charge Netw...
Distributed Parallel Process Particle Swarm Optimization on Fixed Charge Netw...Distributed Parallel Process Particle Swarm Optimization on Fixed Charge Netw...
Distributed Parallel Process Particle Swarm Optimization on Fixed Charge Netw...
 
Chapter10 image segmentation
Chapter10 image segmentationChapter10 image segmentation
Chapter10 image segmentation
 
Mgm
MgmMgm
Mgm
 
Deep learning for molecules, introduction to chainer chemistry
Deep learning for molecules, introduction to chainer chemistryDeep learning for molecules, introduction to chainer chemistry
Deep learning for molecules, introduction to chainer chemistry
 
www.ijerd.com
www.ijerd.comwww.ijerd.com
www.ijerd.com
 
Biao Hou--SAR IMAGE DESPECKLING BASED ON IMPROVED DIRECTIONLET DOMAIN GAUSSIA...
Biao Hou--SAR IMAGE DESPECKLING BASED ON IMPROVED DIRECTIONLET DOMAIN GAUSSIA...Biao Hou--SAR IMAGE DESPECKLING BASED ON IMPROVED DIRECTIONLET DOMAIN GAUSSIA...
Biao Hou--SAR IMAGE DESPECKLING BASED ON IMPROVED DIRECTIONLET DOMAIN GAUSSIA...
 
Biao Hou--SAR IMAGE DESPECKLING BASED ON IMPROVED DIRECTIONLET DOMAIN GAUSSIA...
Biao Hou--SAR IMAGE DESPECKLING BASED ON IMPROVED DIRECTIONLET DOMAIN GAUSSIA...Biao Hou--SAR IMAGE DESPECKLING BASED ON IMPROVED DIRECTIONLET DOMAIN GAUSSIA...
Biao Hou--SAR IMAGE DESPECKLING BASED ON IMPROVED DIRECTIONLET DOMAIN GAUSSIA...
 
DETECTION OF MOVING OBJECT
DETECTION OF MOVING OBJECTDETECTION OF MOVING OBJECT
DETECTION OF MOVING OBJECT
 
A Novel Methodology for Designing Linear Phase IIR Filters
A Novel Methodology for Designing Linear Phase IIR FiltersA Novel Methodology for Designing Linear Phase IIR Filters
A Novel Methodology for Designing Linear Phase IIR Filters
 
dalrymple_slides.ppt
dalrymple_slides.pptdalrymple_slides.ppt
dalrymple_slides.ppt
 
論文紹介:Towards Robust Adaptive Object Detection Under Noisy Annotations
論文紹介:Towards Robust Adaptive Object Detection Under Noisy Annotations論文紹介:Towards Robust Adaptive Object Detection Under Noisy Annotations
論文紹介:Towards Robust Adaptive Object Detection Under Noisy Annotations
 

More from zukun

My lyn tutorial 2009
My lyn tutorial 2009My lyn tutorial 2009
My lyn tutorial 2009zukun
 
ETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCVETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCVzukun
 
ETHZ CV2012: Information
ETHZ CV2012: InformationETHZ CV2012: Information
ETHZ CV2012: Informationzukun
 
Siwei lyu: natural image statistics
Siwei lyu: natural image statisticsSiwei lyu: natural image statistics
Siwei lyu: natural image statisticszukun
 
Lecture9 camera calibration
Lecture9 camera calibrationLecture9 camera calibration
Lecture9 camera calibrationzukun
 
Brunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer visionBrunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer visionzukun
 
Modern features-part-4-evaluation
Modern features-part-4-evaluationModern features-part-4-evaluation
Modern features-part-4-evaluationzukun
 
Modern features-part-3-software
Modern features-part-3-softwareModern features-part-3-software
Modern features-part-3-softwarezukun
 
Modern features-part-2-descriptors
Modern features-part-2-descriptorsModern features-part-2-descriptors
Modern features-part-2-descriptorszukun
 
Modern features-part-1-detectors
Modern features-part-1-detectorsModern features-part-1-detectors
Modern features-part-1-detectorszukun
 
Modern features-part-0-intro
Modern features-part-0-introModern features-part-0-intro
Modern features-part-0-introzukun
 
Lecture 01 internet video search
Lecture 01 internet video searchLecture 01 internet video search
Lecture 01 internet video searchzukun
 
Lecture 03 internet video search
Lecture 03 internet video searchLecture 03 internet video search
Lecture 03 internet video searchzukun
 
Icml2012 tutorial representation_learning
Icml2012 tutorial representation_learningIcml2012 tutorial representation_learning
Icml2012 tutorial representation_learningzukun
 
Advances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer visionAdvances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer visionzukun
 
Gephi tutorial: quick start
Gephi tutorial: quick startGephi tutorial: quick start
Gephi tutorial: quick startzukun
 
EM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysisEM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysiszukun
 
Object recognition with pictorial structures
Object recognition with pictorial structuresObject recognition with pictorial structures
Object recognition with pictorial structureszukun
 
Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities zukun
 
Icml2012 learning hierarchies of invariant features
Icml2012 learning hierarchies of invariant featuresIcml2012 learning hierarchies of invariant features
Icml2012 learning hierarchies of invariant featureszukun
 

More from zukun (20)

My lyn tutorial 2009
My lyn tutorial 2009My lyn tutorial 2009
My lyn tutorial 2009
 
ETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCVETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCV
 
ETHZ CV2012: Information
ETHZ CV2012: InformationETHZ CV2012: Information
ETHZ CV2012: Information
 
Siwei lyu: natural image statistics
Siwei lyu: natural image statisticsSiwei lyu: natural image statistics
Siwei lyu: natural image statistics
 
Lecture9 camera calibration
Lecture9 camera calibrationLecture9 camera calibration
Lecture9 camera calibration
 
Brunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer visionBrunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer vision
 
Modern features-part-4-evaluation
Modern features-part-4-evaluationModern features-part-4-evaluation
Modern features-part-4-evaluation
 
Modern features-part-3-software
Modern features-part-3-softwareModern features-part-3-software
Modern features-part-3-software
 
Modern features-part-2-descriptors
Modern features-part-2-descriptorsModern features-part-2-descriptors
Modern features-part-2-descriptors
 
Modern features-part-1-detectors
Modern features-part-1-detectorsModern features-part-1-detectors
Modern features-part-1-detectors
 
Modern features-part-0-intro
Modern features-part-0-introModern features-part-0-intro
Modern features-part-0-intro
 
Lecture 01 internet video search
Lecture 01 internet video searchLecture 01 internet video search
Lecture 01 internet video search
 
Lecture 03 internet video search
Lecture 03 internet video searchLecture 03 internet video search
Lecture 03 internet video search
 
Icml2012 tutorial representation_learning
Icml2012 tutorial representation_learningIcml2012 tutorial representation_learning
Icml2012 tutorial representation_learning
 
Advances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer visionAdvances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer vision
 
Gephi tutorial: quick start
Gephi tutorial: quick startGephi tutorial: quick start
Gephi tutorial: quick start
 
EM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysisEM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysis
 
Object recognition with pictorial structures
Object recognition with pictorial structuresObject recognition with pictorial structures
Object recognition with pictorial structures
 
Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities
 
Icml2012 learning hierarchies of invariant features
Icml2012 learning hierarchies of invariant featuresIcml2012 learning hierarchies of invariant features
Icml2012 learning hierarchies of invariant features
 

Lecture 02 internet video search

  • 1. Before the break Color, texture, time, spatial structure, Gauss does it all. …. but not invariance, which is badly needed.
  • 2. Before the break Most basic systems: System 1 Swain Ballard match colors System 2 Blob world match texture blobs We need more invariance
  • 4. Patch descriptors For 4x4 patches, find local gradient directions over t. Count the directions per patch, 128D SIFT histogram. Lowe IJCV 2004
  • 5. Affine patch descriptor Compute the prominent direction. Start with central Gaussian distributed weights in W. Compute 2nd order moments matrix Mk over all directions. Adapt weights to elliptic shape.  ∑ wk ( x, y ) f x f x ∑ w ( x, y ) f fx  Mk =  k y  ∑ wk ( x, y ) f x f y  ∑ w ( x, y ) f k y fy   Iterate until there is no longer change. Wk +1 = M k Wk
  • 6. Color Patch Descriptors Invariance properties per descriptor Light Light Light intensity Light color intensity intensity change and Light color change and change shift shift change shift SIFT + + + - - OpponentSIFT + + + - - C-SIFT + - - - - RGB-SIFT + + + + + van de Sande PAMI 2010
  • 7. Results on PASCAL VOC 2007
  • 8. Results per object category OpponentSIFT (L2 norm) MAP Two channel I+C (L2 norm) bottle pottedplant cow dog diningtable sheep bird sofa tvmonitor cat chair bicycle motorbike bus boat train car horse aeroplane person 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Average Precision
  • 9. Corner selector The change energy at x over a small vector u: u   fx fx fy fx  E xy (u , v) ≈ [u v] M  , M =   v  fx fy fy fy   Since M is symmetric, we have direction of the λ 0 −1  1 fastest change M =R  R  0 λ2  For a corner both should be large. (λmax)-1/2 = det M − k ( trace M ) 2 R (λmin)-1/2 det M = λ1λ2 = I x2 I y − ( I x I y ) 2 2 traceM = λ1 + λ2 = I x + I y 2 2
  • 12. Blob detector 2D Laplacian: L σ 2 ( Gxx ( x, y, σ ) + G yy ( x, y, σ ) ) = DoG: = G ( x, y, kσ ) − G ( x, y, σ ) DoG The Laplacian has a single max at the size of the blob, Multiply by σ2
  • 16. DoG detection + SIFT description Jepson 2005
  • 17. System 3: patch detection System 3 is an app: Stitching http://www.cloudburstresearch.com/
  • 18. 4. Conclusion Patch descriptors bring local orderless information. Best combined with color invariance for illumination. Scene-pose-illumination invariance brings meaning. Lee Comm. ACM 2011
  • 19. 5 Words & Similarity
  • 20. Before words 1000 patches, 128 features 1,000,000 images ~ 11,5 days / 100 Gbyte
  • 21. Capture the pattern in patch Measure the pattern in a patch with abundant features. More is better. Different is better. Normalized is better.
  • 22. Sample many patches Sample the patches in the image. Dense 256 K words, salient 1 K words. Salience is good. Dense is better. Combined even better. Salient is memory efficient. Dense is compute efficient.
  • 23. Sample many images Sample the images in the world: the learning set. Learn all relevant distinctions. Learn all irrelevant variations not covered in the invariance of features.
  • 24. Form a dictionary of words Form regions in feature space. Size 4,000 (general) to 400,000 (buildings). Random forest is good and fast, 4 runs 10 deep is OK.
  • 25. Count words per image Retain the word boundaries. Fill the histogram of words per training image.
  • 26. Map histogram in similarity space In 4096 D word count space, 1 point is 1 image. Hard assignment: one patch one word.
  • 27. Learn histogram similarity Learn the histogram distinction between the image histograms The histogram is 𝑉 𝑑 = 𝑡1 , 𝑡2 , … , 𝑡 𝑖 , … , 𝑡 𝑛 𝑇 , where 𝑡 𝑖 is the total sorted per class of images in the learning set. of occurrences of the visual word i. query and image: 𝑆 𝑞 = 𝑉𝑞 ∩ 𝑉𝑗 The number of words in common is the intersection between
  • 28. Classify unknown image Retain the word count discrimination + support vectors Go from patch to patch > words > counts > discriminate
  • 30. Note 1: Soft assignment is better Soft assignment: assign to multiple clusters, weighted by distance to center. Pooled single sigma for all codebook elements. van Gemert, PAMI 2010
  • 31. Notes 2: SVM similarity is better SVM can reconstruct a complex geometry at the boundary including disjoint subspaces. The distance metric in the kernel is important.
  • 32. Vapnik, 1995 Note 2: nonlinear SVMs How to transform the data such that the samples from the two classes are separable by a linear function (preferably with margin). Or, equivalently, define a kernel that does this for you straight away.
  • 33. Zhang, IJCV ‘07 Note 2: χ² - kernels Because χ² is meant to discriminate histograms!
  • 34. Note 2: … or multiple kernels Let multiple kernel learning determine the weight of all features Descriptors Norm = L2 # Norm ∈ L # SIFT 0.4902 1 0.5169 4 OpponentSIFT (baseline) 0.4975 1 0.5203 4 SIFT and OpponentSIFT 0.5187 2 0.5357 8 One channel from C 0.5351 49 0.5405 196 Two channel: I and one from C 0.5463 49 0.5507 196
  • 35. Note 3: Speed For the Intersection Kernel hi is piecewise linear, and quite smooth, blue plot. We can approximate with fewer uniformly spaced segments, red plot. Saves factor 75 time! Maji CVPR 2008
  • 36. Note 4: What is in a word? This is how a word looks like Gavves 2011 Chum ICCV 2007 Turcot ICCV 2009
  • 37. Note 4: Where are the synonyms? But not all views of the same detail are close! Gavves 2011
  • 38. Note 4: Forming selective dictionary Build vocabulary by selecting the minimal set by maximizing the cross entropy: 99% vocabulary reduction 6% improved recognition Needs 100 words per concept. Gavves 2011 CVPR
  • 40. Note 5: Deconstruct words Fisher vectors capture the internal structure of words. Train a Gaussian Mixture Model, where each codebook element has its own sigma – one per dimension. Store differences in all descriptor dimensions. The feature vector is #codewords x #descriptors. Perronnin ECCV 2010
  • 41. System 5: MediaMill search engine http://www.mediamill.nl
  • 42. 5. Conclusion Words are the essential step forward. More is better. Better but costly. Smooth assignment works better than hard. At the cost of less orthogonal methods. Approximate algorithms are sufficient, mostly.