ICVSS 2011 Steven Seitz Lorenzo Torresani Guillermo Sapiro Shmuel Peleg                    ICVSS 2011: Selected Presentati...
ICVSS 2011 Steven Seitz Lorenzo Torresani Guillermo Sapiro Shmuel Peleg    Outline    1 ICVSS 2011    2 A Trillion Photos ...
ICVSS 2011 Steven Seitz Lorenzo Torresani Guillermo Sapiro Shmuel Peleg    Outline    1 ICVSS 2011    2 A Trillion Photos ...
ICVSS 2011 Steven Seitz Lorenzo Torresani Guillermo Sapiro Shmuel Peleg    ICVSS 2011    International Computer Vision Sum...
ICVSS 2011 Steven Seitz Lorenzo Torresani Guillermo Sapiro Shmuel Peleg    ICVSS 2011    International Computer Vision Sum...
ICVSS 2011 Steven Seitz Lorenzo Torresani Guillermo Sapiro Shmuel Peleg    ICVSS 2011    International Computer Vision Sum...
ICVSS 2011 Steven Seitz Lorenzo Torresani Guillermo Sapiro Shmuel Peleg    Outline    1 ICVSS 2011    2 A Trillion Photos ...
A Trillion Photos         Steve Seitz  University of Washington           GoogleSicily Computer Vision Summer School      ...
Facebook   >3 billion uploaded each month    ~ trillion photos taken each year
What do you do with a trillion photos?         Digital Shoebox        (hard drives, iphoto, facebook...)
?
Comparing images    Detect features using SIFT [Lowe, IJCV 2004]
Comparing imagesExtraordinarily robust image matching  – Across viewpoint (~60 degree out-of-plane rotations)  – Varying i...
Edges
Scale Invariant Feature Transform                                    0               2π                                   ...
NASA Mars Rover images
NASA Mars Rover imageswith SIFT feature matches Figure by Noah Snavely
Coliseum                                       (outside)St. Peters (inside)                         Coliseum              ...
Structure from motion   Matched photos       3D structure
Structure from motionaka “bundle adjustment”     (texts: Zisserman; Faugeras)                            p4               ...
?
Reconstructing RomeIn a day...From ~1M imagesUsing ~1000 coresSameer Agarwal, Noah Snavely, Rick Szeliski, Steve Seitzhttp...
Rome 150K: Colosseum
Rome: St. Peters
Venice (250K images)
Venice: Canal
Dubrovnik
From Sparse to Dense      Sparse output from the SfM system
From Sparse to Dense   Furukawa, Curless, Seitz, Szeliski, CVPR 2010
Most of our photos don’t look like this
recognition + alignment
Your Life in 30 Seconds    path optimization
Picasa Integration• As “Face Movies” feature in v3.8 – Rahul Garg, Ira Kemelmacher
Conclusiontrillions of photos         +    computer vision breakthroughs      = new ways to see the world
ICVSS 2011 Steven Seitz Lorenzo Torresani Guillermo Sapiro Shmuel Peleg    Outline    1 ICVSS 2011    2 A Trillion Photos ...
Efficient Novel-ClassRecognition and Search    Lorenzo Torresani
Problem statement:           novel object-class search• Given:            image database              user-provided images...
Application: Web-powered visual search     in unlabeled personal photos                     Goal: Find “soccer camp”      ...
Application: product search•   Search of aesthetic products
RBM predictedpredicted labels (47%)                                      RBM labels (47%)               Relation to other ...
Relation to other tasks                             novel class                               search     image retrieval  ...
Technical requirements of          novel class-search• The object classifier must be learned on the fly from  few examples• ...
State-of-the-art in               object classificationWinning recipe: many features + non-linear classifiers(e.g. [Gehler a...
Model evaluation on Caltech256                45                40                     gist                35   phog      ...
Model evaluation on Caltech256                45                40   gist                     phog                35      ...
Model evaluation on Caltech256                                                                     5)#6+"#$%&()*$+,       ...
Multiple kernel combinersClassification output is obtained by combining many features vianon-linear kernels:               ...
m=1 s. For a kernel function k between         a SVM.he short-hand notation                                            Tra...
LP-β: a two-stage approach to MKL ! [Gehler and Nowozin, 2009]• Classification output of traditional MKL:                  ...
LP-β for novel-class search?The LP-β classifier:                     F                                    N                ...
Classemes: a compact descriptor for   efficient recognition [Torresani et al., 2010]                      !Key-idea: repres...
How this works...                                   Efficient Object Category Recognition Using Classemes                   ...
bject Classes by Between-Class Attribute Transfer      Hannes Nickisch       Stefan Harmeling                             ...
Method overview1. Classeme learning                                     φ”body of water” (x) →                            ...
Classeme learning:         choosing the basis classes•   Classeme labels desiderata:     -   must be visual concepts     -...
Classeme learning:      gathering the training data•   We downloaded the top 150 images returned by    Bing Images for eac...
Classeme learning:          training the classifiers• Each classeme classifier is an LP-β kernel combiner [Gehler and Nowozi...
A dimensionality reduction          view of classemes           GIST                                            ...
Experiment 1: multiclass         recognition on Caltech256               60                                               ...
Computational cost                               comparison                            Training time                      ...
Accuracy vs. compactness                                      4                                     10                    ...
Experiment 2:                         object class retrieval                                  Efficient Object Category Reco...
Analogies with text retrieval• Classeme representation of an image:                        presence/absence of visual attr...
Related work•       Prior work (e.g., [Sivic  Zisserman, 2003; Nister  Stewenius, 2006;        Philbin et al., 2007]) has ...
Data structures for                            efficient retrieval            Incidence matrix:                           I...
Efficient retrieval via            inverted index                                    Inverted index:                       ...
Efficient retrieval via         inverted index                          Inverted index:                       w: [1.5 -2   ...
Efficient retrieval via         inverted index                          Inverted index:                       w: [1.5 -2   ...
Efficient retrieval via         inverted index                          Inverted index:                       w: [1.5 -2   ...
Efficient retrieval via         inverted index                          Inverted index:                       w: [1.5 -2   ...
Efficient retrieval via         inverted index                          Inverted index:                       w: [1.5 -2   ...
Efficient retrieval via             inverted index                                    Inverted index:                      ...
Improve efficiency via              sparse weight vectorsKey-idea: force w to contain as many zeros as possible            ...
Improve efficiency via              sparse weight vectorsKey-idea: force w to contain as many zeros as possible            ...
Performance evaluation on                            ImageNet (10M images)                     35                         ...
Top-k ranking• Do we need to rank the entire database?  - users only care about the top-ranked images• Key idea:  - for ea...
Top-k pruning                                             ! [Rastegari et al., 2011]w: [   3 -2   0 -6      0    3 -2     ...
Top-k pruning                                             ! [Rastegari et al., 2011]w: [   3 -2   0 -6      0    3 -2     ...
Top-k pruning                                                ! [Rastegari et al., 2011]w: [     3 -2    0 -6      0    3 -...
Top-k pruning                                                ! [Rastegari et al., 2011]w: [     3 -2    0 -6      0    3 -...
Top-k pruning                                                ! [Rastegari et al., 2011]w: [     3 -2    0 -6      0    3 -...
Top-k pruning                                                ! [Rastegari et al., 2011]w: [   3 -2      0 -6      0    3 -...
Distribution of weights and                                                               pruning rateCCV CV              ...
Performance evaluation on            35                           ImageNet (10M images)               30                  ...
Alternative search strategy:        approximate ranking•   Key-idea: approximate the score function with a measure that ca...
Product quantization                    !        Product quantization for nearest neighbor search                         ...
obhgien tseraen rof noitazitnauq tcudorP                   :srotcevbus m otni tilps rotceV                                ...
obhgien tseraen rof noitazitnauq tcudorP                   :srotcevbus m otni tilps rotceV                                ...
obhgien tseraen rof noitazitnauq tcudorP                   :srotcevbus m otni tilps rotceV                                ...
obhgien tseraen rof noitazitnauq tcudorP                    :srotcevbus m otni tilps rotceV                               ...
obhgien tseraen rof noitazitnauq tcudorP                    :srotcevbus m otni tilps rotceV                               ...
xedni noitazitnauq tib-46                                                     stib 8                                      ...
Choice of parameters                                                                                             ! [Rasteg...
Performance evaluation                                      on 150K imagesICCV#1745                                    ICC...
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
ICVSS2011 Selected Presentations
Upcoming SlideShare
Loading in...5
×

ICVSS2011 Selected Presentations

804

Published on

This is a presentation to share the experiences and selected presentation from International Computer Vision Summer School (ICVSS2011) attended by Angel Cruz and Andrea Rueda from Bioingenium Research Group of Universidad Nacional de Colombia.

Published in: Education, Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
804
On Slideshare
0
From Embeds
0
Number of Embeds
9
Actions
Shares
0
Downloads
6
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

ICVSS2011 Selected Presentations

  1. 1. ICVSS 2011 Steven Seitz Lorenzo Torresani Guillermo Sapiro Shmuel Peleg ICVSS 2011: Selected Presentations Angel Cruz and Andrea Rueda BioIngenium Research Group, Universidad Nacional de Colombia August 25, 2011 Angel Cruz and Andrea Rueda — ICVSS 2011: Selected Presentations
  2. 2. ICVSS 2011 Steven Seitz Lorenzo Torresani Guillermo Sapiro Shmuel Peleg Outline 1 ICVSS 2011 2 A Trillion Photos - Steven Seitz 3 Efficient Novel Class Recognition and Search - Lorenzo Torresani 4 The Life of Structured Learned Dictionaries - Guillermo Sapiro 5 Image Rearrangement & Video Synopsis - Shmuel Peleg Angel Cruz and Andrea Rueda — ICVSS 2011: Selected Presentations
  3. 3. ICVSS 2011 Steven Seitz Lorenzo Torresani Guillermo Sapiro Shmuel Peleg Outline 1 ICVSS 2011 2 A Trillion Photos - Steven Seitz 3 Efficient Novel Class Recognition and Search - Lorenzo Torresani 4 The Life of Structured Learned Dictionaries - Guillermo Sapiro 5 Image Rearrangement & Video Synopsis - Shmuel Peleg Angel Cruz and Andrea Rueda — ICVSS 2011: Selected Presentations
  4. 4. ICVSS 2011 Steven Seitz Lorenzo Torresani Guillermo Sapiro Shmuel Peleg ICVSS 2011 International Computer Vision Summer School 15 speakers, from USA, France, UK, Italy, Prague and Israel Angel Cruz and Andrea Rueda — ICVSS 2011: Selected Presentations
  5. 5. ICVSS 2011 Steven Seitz Lorenzo Torresani Guillermo Sapiro Shmuel Peleg ICVSS 2011 International Computer Vision Summer School Angel Cruz and Andrea Rueda — ICVSS 2011: Selected Presentations
  6. 6. ICVSS 2011 Steven Seitz Lorenzo Torresani Guillermo Sapiro Shmuel Peleg ICVSS 2011 International Computer Vision Summer School Angel Cruz and Andrea Rueda — ICVSS 2011: Selected Presentations
  7. 7. ICVSS 2011 Steven Seitz Lorenzo Torresani Guillermo Sapiro Shmuel Peleg Outline 1 ICVSS 2011 2 A Trillion Photos - Steven Seitz 3 Efficient Novel Class Recognition and Search - Lorenzo Torresani 4 The Life of Structured Learned Dictionaries - Guillermo Sapiro 5 Image Rearrangement & Video Synopsis - Shmuel Peleg Angel Cruz and Andrea Rueda — ICVSS 2011: Selected Presentations
  8. 8. A Trillion Photos Steve Seitz University of Washington GoogleSicily Computer Vision Summer School July 11, 2011
  9. 9. Facebook >3 billion uploaded each month ~ trillion photos taken each year
  10. 10. What do you do with a trillion photos? Digital Shoebox (hard drives, iphoto, facebook...)
  11. 11. ?
  12. 12. Comparing images Detect features using SIFT [Lowe, IJCV 2004]
  13. 13. Comparing imagesExtraordinarily robust image matching – Across viewpoint (~60 degree out-of-plane rotations) – Varying illumination – Real-time implementations
  14. 14. Edges
  15. 15. Scale Invariant Feature Transform 0 2π angle histogram Adapted from slide by David Lowe
  16. 16. NASA Mars Rover images
  17. 17. NASA Mars Rover imageswith SIFT feature matches Figure by Noah Snavely
  18. 18. Coliseum (outside)St. Peters (inside) Coliseum St. Peters (outside) (inside) Il VittorianoTrevi Fountain Forum
  19. 19. Structure from motion Matched photos 3D structure
  20. 20. Structure from motionaka “bundle adjustment” (texts: Zisserman; Faugeras) p4 p1 p3 minimize p2 f (R, T, P) p5 p7 p6 Camera 1 Camera 3 R1,t1 Camera 2 R3,t3 R2,t2
  21. 21. ?
  22. 22. Reconstructing RomeIn a day...From ~1M imagesUsing ~1000 coresSameer Agarwal, Noah Snavely, Rick Szeliski, Steve Seitzhttp://grail.cs.washington.edu/rome
  23. 23. Rome 150K: Colosseum
  24. 24. Rome: St. Peters
  25. 25. Venice (250K images)
  26. 26. Venice: Canal
  27. 27. Dubrovnik
  28. 28. From Sparse to Dense Sparse output from the SfM system
  29. 29. From Sparse to Dense Furukawa, Curless, Seitz, Szeliski, CVPR 2010
  30. 30. Most of our photos don’t look like this
  31. 31. recognition + alignment
  32. 32. Your Life in 30 Seconds path optimization
  33. 33. Picasa Integration• As “Face Movies” feature in v3.8 – Rahul Garg, Ira Kemelmacher
  34. 34. Conclusiontrillions of photos + computer vision breakthroughs = new ways to see the world
  35. 35. ICVSS 2011 Steven Seitz Lorenzo Torresani Guillermo Sapiro Shmuel Peleg Outline 1 ICVSS 2011 2 A Trillion Photos - Steven Seitz 3 Efficient Novel Class Recognition and Search - Lorenzo Torresani 4 The Life of Structured Learned Dictionaries - Guillermo Sapiro 5 Image Rearrangement & Video Synopsis - Shmuel Peleg Angel Cruz and Andrea Rueda — ICVSS 2011: Selected Presentations
  36. 36. Efficient Novel-ClassRecognition and Search Lorenzo Torresani
  37. 37. Problem statement: novel object-class search• Given: image database user-provided images (e.g., 1 million photos) of an object class +• Want: database • no text/tags available images • query images mayof this class represent a novel class
  38. 38. Application: Web-powered visual search in unlabeled personal photos Goal: Find “soccer camp” pictures on my computer1 1 Search the Web for images of “soccer camp” 2 Find images of this visual class on my computer 2
  39. 39. Application: product search• Search of aesthetic products
  40. 40. RBM predictedpredicted labels (47%) RBM labels (47%) Relation to other tasks sky sky building building tree bed tree bed car car novel class road road Input search Ground truth neighbors image image Input Ground truth neighbors 32−RBM 32−RBM 16384-gist 1 query retrieved image retrieval object categorizationshowingitperce Figure 6. 6. Curves showing per Figure Curves query images that make it int query images that make into ofof the query for 1400 image the query for a a 1400 imag to 5% of the database size. upup to 5% of the database sizanalogies: RBM predictedpredicted labels (56%) RBM labels (56%) crucial for scalable retrieval th crucial for scalable retrieval- large databases tree from [Nister and Stewenius, ’07] tree sky sky database make it it to the very database make to the very to is is feasible only for a tiny f feasible only for a tiny fra- efficient indexing database grows large. Hence, w database grows large. Hence, building building the curves meet the y-axis. T the curves meet the y-axis.- compact representation (a) car car given in in Table 1 for larger n given Table 1 for a a larger sidewalk sidewalkcrosswalkcrosswalk conclusions can bebe drawn from conclusions can drawn from road road improves retrieval performance improves retrieval performandifferences: from neighbors et al., ’07] performance than vocabularies.1 performance than 2 -norm. En L L2 -norm. Input image imageGround truth [Philbinneighbors 32−RBM 32−RBM vocabularies. O Input least for smaller 16384-gist- simple notions of visual Ground truth least for smaller gives much better performance th gives much better performance (b) relevancy is is setting T. setting T. (e.g., near-duplicate, same object instance, settings used by [17]. settings used by [17]. The performance with vav The performance with same spatial layout) (c) RBM predictedpredicted labels (63%) [Torralba et al., ’08] RBM labels (63%) from on the full 6376 image databa on the full 6376 image data the scores decrease with inc the scores decrease with in ceiling ceiling are more images toto confus are more images confuse Figure Thewall retrieval performance is is evaluated using a large wall performance evaluated using a large Figure 5. 5. The retrieval ofof the vocabulary tree is sh the vocabulary tree is show ground truth database (6376 images) with groups ofof four images ground truth database (6376 images) with groups four images door door defining the vocabulary tree defining the vocabulary tre poster poster
  41. 41. Relation to other tasks novel class search image retrieval object classificationanalogies: analogies:- large databases - recognition of object- efficient indexing classes from a few examples- compact representation differences:differences: - classes to recognize are- simple notions of visual defined a priori relevancy - training and recognition (e.g., near-duplicate, time is unimportant same object instance, - storage of features is not an same spatial layout) issue
  42. 42. Technical requirements of novel class-search• The object classifier must be learned on the fly from few examples• Recognition in the database must have low computational cost• Image descriptors must be compact to allow storage in memory
  43. 43. State-of-the-art in object classificationWinning recipe: many features + non-linear classifiers(e.g. [Gehler and Nowozin, CVPR’09]) non-linear !"#$% decision boundary !"#$%&#()* +&,-)&.&#(#/* ... 01#-2"#* &()*+),%% -.,()*+/% #"0$%
  44. 44. Model evaluation on Caltech256 45 40 gist 35 phog phog2pi 30 accuracy (%) ssim 25 bow5000 20 !"#$%&()*$+ 15 , 10 "#*"-"*.%+/$%0.&$1 5 0 0 5 10 15 20 25 30 number of training examples
  45. 45. Model evaluation on Caltech256 45 40 gist phog 35 phog2pi 30 ssim accuracy (%) bow5000 !"#$%&()*$+, 25 linear combination /$%0.&$2)(3"#%4)# 20 !"#$%&()*$+ 15 , 10 "#*"-"*.%+/$%0.&$1 5 0 0 5 10 15 20 25 30 number of training examples
  46. 46. Model evaluation on Caltech256 5)#6+"#$%&()*$+, 45 /$%0.&$2)(3"#%4)# 40 7%898%8:.+4;+$<$&#$+ gist !$%&#"#=> 35 phog ?@$A+$&B5)C)D"#EFGH phog2pi 30 accuracy (%) ssim 25 bow5000 !"#$%&()*$+, linear combination /$%0.&$2)(3"#%4)# 20 nonlinear combination !"#$%&()*$+ 15 , 10 "#*"-"*.%+/$%0.&$1 5 0 0 5 10 15 20 25 30 number of training examples
  47. 47. Multiple kernel combinersClassification output is obtained by combining many features vianon-linear kernels: F N h(x) = βf kf (x, xn )αn + b f =1 n=1 sum over features sum over training examples !#$% ... where ()*+),%% -.,()*+/% #0$%
  48. 48. m=1 s. For a kernel function k between a SVM.he short-hand notation Training Same as for averaging.= k(fm (x), fm (x )), Multiple con- 4. Methods: Multiple Kernel Learning kernel learning (MKL) nel km : X × X → R onlyespect to image feature fal., 2004; Sonnenburg etapproach toVarma and Ray, 2007] is to [Bach et m . If the Another al., 2006; perform kernel selection to a certain aspect, say, it only con- a kernel combination during the training phase of th gorithm. jointly optimizing over Learning a non-linear SVM by One prominent instance of this class is MKLon, then the kernel measures simi- F a linear combinatito this aspect. The subscript m ofnderstood as a linear combinationobjective ∗ (x, x ) k=(x, x ) =β over(x,fx ) x ) the par 1. indexing into the set of kernels k is to optimize jointly of kernels: ∗ F β k (x, km f and m m=1 f =1 2. the SVM parameters: α ∈ RN and b ∈ R of an SVM. tersnotational convenience, we will de- MKL was originally introduced in [1]. For efficiency e of the m’th feature for a given   F in order N obtain sparse, F to interpretable coefficients, Fraining samples xi , i = 1, 1 . . . , N min βf αT Kf α stricts βm ≥ 0 and ,imposes thefconstraintT α βm + C L yn b + β Kf (xn ) m=1 α,β,b 2 Since the scope of this paper is to access the applicab f =1 n=1 f =1 of MKL to feature combination rather than its optimiz ), km (x, x2 ), . . . , km (x, xN )]T . F part we opted to present the MKL formulations in a wa aining sample, i.e. x = xi , then = 1,lowing for easier 1, . . . , F subject to βf βf ≥ 0, f = comparison with the other methodsh column of the m’th kernel matrix.f =1 write its objective function as F ernel selection In this papert) = max(0, 1 − yt) 1 where L(y, we min βm αT Km αclassifiers that aim to combine sev- 2 m=1 Kf (x) = [kf (x, x1 ), kf (x, x2 ), . . . , kf (x, xN )]T α,β,be model. Since we associate image N Fctions, kernel combination/selection +C L(yi , b + βm Km (x)T α)
  49. 49. LP-β: a two-stage approach to MKL ! [Gehler and Nowozin, 2009]• Classification output of traditional MKL: F N hM KL (x) = βf kf (x, xn )αn + b f =1 n=1• Classification function of LP-β: F N h(x) = βf kf (x, xn )αf n + bf f =1 n=1 hf (x) Two-stage training procedure: 1. train each hf (x) independently → traditional SVM learning 2. optimize over β → a simple linear program
  50. 50. LP-β for novel-class search?The LP-β classifier: F N h(x) = βf kf (x, xn )αf n + bf f =1 n=1 sum over features sum over training examplesUnsuitable for our needs due to:• large storage requirements (typically over 20K bytes/image)• costly evaluation (requires query-time kernel distance computation for each test image)• costly training (1+ minute for O(10) training examples)
  51. 51. Classemes: a compact descriptor for efficient recognition [Torresani et al., 2010] !Key-idea: represent each image x in terms of its “closeness” to a set of basis classes (“classemes”) x Φ(x) = [φ1 (x), . . . , φC (x)]T F N φc (x) = hclassemec (x) = c βf kf (x, xc )αn + bc n c f =1 n=1 output of a pre-learned LP-β for the c-th basis class Φ(x1 ) ... Φ(xN )Query-time learning: training examples oftrain a linear classifier on Φ(x) novel  class  C F N g duck (Φ(x); wduck ) = Φ(x)T wduck = wc  duck c βf kf (x, xc )αn + bc  n c c=1 f =1 n=1 LP-β trained before the trained at query-time creation of the database
  52. 52. How this works... Efficient Object Category Recognition Using Classemes 777 • Accurate weighted classemes. Five classemes with the highest LP-β weightsTable 1. Highly semantic labels are not required...to •make semantic sense, but it should bejust used that detectors may createfor the retrieval experiment, for a selection of Caltech 256 categories. Somefor appear Classeme classifiers are emphasized as our goal is simply to specific patterns of texture, color, shape, etc.a useful feature vector, not to assign semantic labels. The somewhat peculiar classemelabels reflect the ontology used as a source of base categories.!#$%()*+$ ,-(./+$#-(.0$%/1121$%)#3)+4.$ !#$% ()*%+%*,-. -,.+(,/ -)##-%01# $2330/+(,/05%6$ 1)$1*+(#,/ 1)45+)3+6,%* 60$$* 6,#.0/7 %*,07!% 12##+$,#+!*4+/6$ 3072*+.,%* -,%%# 7*,80% 4,4+1)45 ,/0$,#7*-13$ 6,%*-*,3%+2*3,- -0+-,1# ,#,*$+-#)-. !0/42 *80/7+%*,5 6%*/+!$0(!*+*-/)3-4898$ -)/89+%!0/7 $0/4+,*, -4(#,5* *),%0/7+(,/ (*)/ %,.0/7+-,*+)3+ -)/%,0/*+(*2*+#./3**)#$ 1,77,7+()*%* -,/)(5+-#)2*+)(/ *)60/7+!## )$%!0/7 1,**0* Large-scale recognition benefits from a compact descriptor for each image,for example allowing databases to be stored in memory rather than on disk. The
  53. 53. bject Classes by Between-Class Attribute Transfer Hannes Nickisch Stefan Harmeling Related workor Biological Cybernetics, T¨ bingen, Germany u me.lastname}@tuebingen.mpg.de • otter when train- Attribute-based recognition: black: white: yes no brown: yesexamples of stripes: nohardly been water: yes [Lampert et al., CVPR’09] [Farhadi et al., CVPR’09] eats fish: yes rule ratherens of thou- polar bear black: no very few of white: yesd annotated brown: no stripes: no water: yes introducing eats fish: yesct detection zebraption of the black: yes description white: yes requires hand-specified attribute-class associations brown: no hape, colors. On the lefth properties stripes: water: yes noribute be hey can predic- eats fish: no to displayed. attribute classifiers must be trained witharethe cur- Figure 1. A description object categories: after learningthe transfer by high-level attributes allowsected based of knowledge between the visualed for a new cat- human-labeled examplesve across appearance of attributes from any classes with training examples,and to “engine”,can detect also object classes that do not have any training ike facil- we based on which attribute description a test image fits best. randomly selected positively pre new large- images, Figure 5: This figure showselection helps 30,000 an- tributes for 12 typical images from 12 categories in Yahoo set.nd “rein” that of well-labeled training imageslearnedtechniquesrson’s clas- lions and is likely out of classifiers are numerous on Pascal train set and tested on Yahoo se reach for years to come. Therefore, emantic at- one class outreducing the number of necessary training imagesattributes from the list of 64 attributes a for domly select 5 predicted have
  54. 54. Method overview1. Classeme learning φ”body of water” (x) → ... φ”walking” (x) → 2. Using the classemes for recognition and retrieval training examples of novel class C g duck (Φ(x)) = wc φc (x) duck c=1 Φ(x1 ) ... Φ(xN )
  55. 55. Classeme learning: choosing the basis classes• Classeme labels desiderata: - must be visual concepts - should span the entire space of visual classes• Our selection: concepts defined in the Large Scale Ontology for Multimedia [LSCOM] to be “useful, observable and feasible for automatic detection”. 2659 classeme labels, after manual elimination of plurals, near-duplicates, and inappropriate concepts
  56. 56. Classeme learning: gathering the training data• We downloaded the top 150 images returned by Bing Images for each classeme label• For each of the 2659 classemes, a one-versus-the-rest training set was formed to learn a binary classifier φ”walking” (x) yes no
  57. 57. Classeme learning: training the classifiers• Each classeme classifier is an LP-β kernel combiner [Gehler and Nowozin, 2009]: F N φ(x) = βf kf (x, xn )αf,n + bf f =1 n=1 linear combination of feature-specific SVMs• We use 13 kernels based on spatial pyramid histograms computed from the following features: - color GIST [Oliva and Torralba, 2001] - oriented gradients [Dalal and Triggs, 2009] - self-similarity descriptors [Schechtman and Irani, 2007] - SIFT [Lowe, 2004]
  58. 58. A dimensionality reduction   view of classemes     GIST            self-similarity  descriptor Φ  φ1 (x) ... x=      φ2659 (x)   oriented     gradients     • near state-of-the-art accuracy SIFT with linear classifiers • can be quantized down to • non-linear kernels are needed 200 bytes/image with almost for good classification no recognition loss • 23K bytes/image
  59. 59. Experiment 1: multiclass recognition on Caltech256 60 LP-β in [Gehler LPbeta Nowozin, 2009] LPbeta13 using 39 kernels 50 MKL Csvm LP-β with our x Cq1svm 40 Xsvm our approach: linear SVM withaccuracy (%) classemes Φ(x) 30 linear SVM with binarized classemes, 20 i.e. (Φ(x) 0) linear SVM with x 10 0 0 10 20 30 40 50 number of training examples
  60. 60. Computational cost comparison Training time Testing time 1500 40 23 hours 30time (minutes) 1000 time (ms) 20 500 9 minutes 10 0 0 LPbeta Csvm LPbeta Csvm
  61. 61. Accuracy vs. compactness 4 10 188 bytes/image compactness (images per MB) 3 10 2.5K bytes/image 2 10 LPbeta13 23K bytes/image 1 Csvm 10 Cq1svm nbnn [Boiman et al., 2008] 128K bytes/image emk [Bo and Sminchisescu, 2008] Xsvm 0 10 10 15 20 25 30 35 40 45 accuracy (%)Lines link performance at 15 and 30 training examples
  62. 62. Experiment 2: object class retrieval Efficient Object Category Recognition Using Classemes 787 30 Csvm Cq1Rocchio (β=1, γ=0) 25 Cq1Rocchio (β=0.75, γ=0.15)Precision @ 25 25 BowsvmPrecision (%) @ 20 BowRocchio (β=1, γ=0) BowRocchio (β=0.75, γ=0.15) 15 • Random performance is 0.4% 10 • training Csvm takes 0.6 sec with 5*256 training examples 5 0 0 10 20 30 40 50 Number of training images Fig. 4. Retrieval. Percentage of the top 25 in a 6400-document set which match the query class. Random performance is 0.4%.
  63. 63. Analogies with text retrieval• Classeme representation of an image: presence/absence of visual attributes• Bag-of-word representation of a text-document: presence/absence of words
  64. 64. Related work• Prior work (e.g., [Sivic Zisserman, 2003; Nister Stewenius, 2006; Philbin et al., 2007]) has exploited a similar analogy for object-instance retrieval by representing images as bag of visual words Detect interest patches Compute SIFT descriptors [Lowe, 2004] … … Quantize Represent image as a sparse descriptors histogram of visual words frequency ….. codewords • To extend this methodology to object-class retrieval we need: - to use a representation more suited to object class recognition (e.g. classemes as opposed to bag of visual words) - to train the ranking/retrieval function for every new query-class
  65. 65. Data structures for efficient retrieval Incidence matrix: Inverted index: features f0 f1 f2 f3 f4 f5 f6 f7 f0 f1 f2 f3 f4 f5 f6 f7 I0: 1 0 1 0 0 1 0 0 I1: 0 0 1 0 1 0 0 0 I0 I2 I0 I2 I1 I0 I4 I6documents I2: 1 1 0 1 0 0 0 0 I2 I7 I1 I3 I4 I6 I5 I9 I3: 1 0 1 1 0 0 0 0 I4: 1 0 0 0 1 0 1 0 I3 I8 I3 I9 I5 I8 I5: 0 0 0 0 1 0 1 0 I4 I7 I9 I6: 1 0 0 0 0 1 0 1 I6 I9 I7: 0 1 0 0 1 0 0 0 I8 I8: 1 1 0 0 0 1 0 0 I9: 0 0 0 1 1 1 0 1 • enables efficient calculation of w Φ, as: T ∀Φ • very compact: only one bit per feature entry wi Φi i s.t. Φi =0
  66. 66. Efficient retrieval via inverted index Inverted index: w: [1.5 -2 0 -5 0 3 -2 0 ] f0 f1 f2 f3 f4 f5 f6 f7 I0 I2 I0 I2 I1 I0 I4 I6 I2 I7 I1 I3 I4 I6 I5 I9 I3 I8 I3 I9 I5 I8 I4 I7 I9 I6 I9 I8Goal:compute score w T Φ, for all binary vectors Φ in the database ∀Φ
  67. 67. Efficient retrieval via inverted index Inverted index: w: [1.5 -2 0 -5 0 3 -2 0 ] f0 f1 f2 f3 f4 f5 f6 f7 I0 I2 I0 I2 I1 I0 I4 I6 I2 I7 I1 I3 I4 I6 I5 I9 I3 I8 I3 I9 I5 I8 I4 I7 I9 I6 I9 I8Scoring: I0 I1 I2 I3 I4 I5 I6 I7 I8 I9
  68. 68. Efficient retrieval via inverted index Inverted index: w: [1.5 -2 0 -5 0 3 -2 0 ] f0 f1 f2 f3 f4 f5 f6 f7 I0 I2 I0 I2 I1 I0 I4 I6 I2 I7 I1 I3 I4 I6 I5 I9 I3 I8 I3 I9 I5 I8 I4 I7 I9 I6 I9 I8Scoring: I0 I1 I2 I3 I4 I5 I6 I7 I8 I9
  69. 69. Efficient retrieval via inverted index Inverted index: w: [1.5 -2 0 -5 0 3 -2 0 ] f0 f1 f2 f3 f4 f5 f6 f7 I0 I2 I0 I2 I1 I0 I4 I6 I2 I7 I1 I3 I4 I6 I5 I9 I3 I8 I3 I9 I5 I8 I4 I7 I9 I6 I9 I8Scoring: I0 I1 I2 I3 I4 I5 I6 I7 I8 I9
  70. 70. Efficient retrieval via inverted index Inverted index: w: [1.5 -2 0 -5 0 3 -2 0 ] f0 f1 f2 f3 f4 f5 f6 f7 I0 I2 I0 I2 I1 I0 I4 I6 I2 I7 I1 I3 I4 I6 I5 I9 I3 I8 I3 I9 I5 I8 I4 I7 I9 I6 I9 I8Scoring: I0 I1 I2 I3 I4 I5 I6 I7 I8 I9
  71. 71. Efficient retrieval via inverted index Inverted index: w: [1.5 -2 0 -5 0 3 -2 0 ] f0 f1 f2 f3 f4 f5 f6 f7 I0 I2 I0 I2 I1 I0 I4 I6 I2 I7 I1 I3 I4 I6 I5 I9 I3 I8 I3 I9 I5 I8 I4 I7 I9 I6 I9 I8Scoring: I0 I1 I2 I3 I4 I5 I6 I7 I8 I9
  72. 72. Efficient retrieval via inverted index Inverted index: w: [1.5 -2 0 -5 0 3 -2 0 ] f0 f1 f2 f3 f4 f5 f6 f7 I0 I2 I0 I2 I1 I0 I4 I6 I2 I7 I1 I3 I4 I6 I5 I9 I3 I8 I3 I9 I5 I8 I4 I7 I9 I6 I9 I8Cost of scoring is linear in the sum of the lengths of invertedlists associated to non-zero weights
  73. 73. Improve efficiency via sparse weight vectorsKey-idea: force w to contain as many zeros as possible classeme vector label ofLearning objective of example n Tomographic inversion with example n 1 wavelet penalization 3 N E(w) = R(w) + C N n=1 L(w; Φn , yn ) w2 regularizer loss function w with d = AWT w and smallest 1 -norm• T L2-SVM: R(w) d =wT w w and smallestn ,2yn ) = max(0, 1 − yn (wT Φn )) w with = AW , L(w; Φ -norm d = AWT w• 2 Since |wi | wi for small wi w 2 w 2i |wi | and |wi | wi for large wi , w1 2 choosing R(w) = i |wi | will tend to |w| produce a small number of larger wi weights and 2 -ball: wzero2 weights more 1 + w2 = constant 2 w 1 -ball: |w1 | + |w2 | = constant
  74. 74. Improve efficiency via sparse weight vectorsKey-idea: force w to contain as many zeros as possible classeme vector label ofLearning objective of example n example n N E(w) = R(w) + C N n=1 L(w; Φn , yn ) regularizer loss function• L2-SVM: R(w) = wT w , L(w; Φn , yn ) = max(0, 1 − yn (wT Φn )) • L1-LR: R(w) = i |wi | , L(w; Φn , yn ) = log(1 + exp(−yn wT Φn ))• FGM (Feature Generating Machine) [Tan et al., 2010]: R(w) = wT w , L(w; Φn , yn ) = max(0, 1 − yn (w ⊙ d)T Φn ) s.t. 1T d ≤ B d ∈ {0, 1}D elementwise product
  75. 75. Performance evaluation on ImageNet (10M images) 35 ! [Rastegari et al., 2011] 35 Full inner product evaluation L2 SVM 30 Full inner product evaluation L1 LR 30 Inverted index L2 SVMPrecision @ 10 (%) 25 Inverted index L1 LR Precision @ 10 (%) 25 20 20 • Performance averaged over 400 object 15 classes used as queries 15 • 10 training examples per query class 10 10 • Database includes 450 images of the query class and 9.7M images of other classes 5 5 • Prec@10 of a random classifiers is 0.005% 0 20 40 60 80 100 120 140 Search time per query (seconds) 0 20 40 60 80 100 120 140 Each curve is obtained by varying sparsity through C in training objective Search time per query (seconds) N E(w) = R(w) + C N n=1 L(w; Φn , yn ) regularizer loss function
  76. 76. Top-k ranking• Do we need to rank the entire database? - users only care about the top-ranked images• Key idea: - for each image iteratively update an upper-bound and a lower-bound on the score - gradually prune images that cannot rank in the top-k
  77. 77. Top-k pruning ! [Rastegari et al., 2011]w: [ 3 -2 0 -6 0 3 -2 0 ] • Highest possible score: for binary vector ΦU s.t. f0 I0: 1 f1 0 f2 1 f3 0 f4 0 f5 1 f6 0 f7 0 ΦU = 1 iff wi 0 i I1: 0 0 1 0 1 0 0 0 I2: 1 1 0 1 0 0 0 0 → initial upper bound I3: 1 0 1 1 0 0 0 0 I4: 1 0 0 0 1 0 1 0 u∗ = wT · ΦU (6 in this case) I5: 0 0 0 0 1 0 1 0 I6: 1 0 0 0 0 1 0 1 I7: 0 I8: 1 1 1 0 0 0 0 1 0 0 1 0 0 0 0 • Lowest possible score: I9: 0 0 0 1 1 1 0 1 for binary vector ΦL s.t. ΦL = 1 iff wi 0 i → initial lower bound l∗ = wT · ΦL (-10 in this case)
  78. 78. Top-k pruning ! [Rastegari et al., 2011]w: [ 3 -2 0 -6 0 3 -2 0 ] • Initialization: u∗ , l∗ for all images upper bound f0 f1 f2 f3 f4 f5 f6 f7 I0: 1 0 1 0 0 1 0 0 I1: 0 0 1 0 1 0 0 0 I2: 1 1 0 1 0 0 0 0 I3: 1 0 1 1 0 0 0 0 I4: 1 0 0 0 1 0 1 0 I5: 0 0 0 0 1 0 1 0 0 I6: 1 0 0 0 0 1 0 1 I7: 0 1 0 0 1 0 0 0 I8: 1 1 0 0 0 1 0 0 I9: 0 0 0 1 1 1 0 1 I0 I1 I2 I3 I4 I5 I6 I7 I8 I9 lower bound
  79. 79. Top-k pruning ! [Rastegari et al., 2011]w: [ 3 -2 0 -6 0 3 -2 0 ] f0 f1 f2 f3 f4 f5 f6 f7 I0: 1 0 1 0 0 1 0 0 I1: 0 0 1 0 1 0 0 0 I2: 1 1 0 1 0 0 0 0 0 I3: 1 0 1 1 0 0 0 0 I4: 1 0 0 0 1 0 1 0 I5: 0 0 0 0 1 0 1 0 I6: 1 0 0 0 0 1 0 1 I7: 0 1 0 0 1 0 0 0 I8: 1 1 0 0 0 1 0 0 I9: 0 0 0 1 1 1 0 1 I0 I1 I2 I3 I4 I5 I6 I7 I8 I9 • Load feature i • Since wi = +3 (0), for each image n: - subtract +3 from the upper bound if φn,i = 0 - add +3 to the lower bound if φn,i = 1
  80. 80. Top-k pruning ! [Rastegari et al., 2011]w: [ 3 -2 0 -6 0 3 -2 0 ] f0 f1 f2 f3 f4 f5 f6 f7 I0: 1 0 1 0 0 1 0 0 I1: 0 0 1 0 1 0 0 0 I2: 1 1 0 1 0 0 0 0 0 I3: 1 0 1 1 0 0 0 0 I4: 1 0 0 0 1 0 1 0 I5: 0 0 0 0 1 0 1 0 I6: 1 0 0 0 0 1 0 1 I7: 0 1 0 0 1 0 0 0 I8: 1 1 0 0 0 1 0 0 I9: 0 0 0 1 1 1 0 1 I0 I1 I2 I3 I4 I5 I6 I7 I8 I9 • Load feature i • Since wi = -2 (0), for each image n: - decrement by 2 the upper bound if φn,i = 1 - increment by 2 the lower bound if φn,i = 0
  81. 81. Top-k pruning ! [Rastegari et al., 2011]w: [ 3 -2 0 -6 0 3 -2 0 ] f0 f1 f2 f3 f4 f5 f6 f7 I0: 1 0 1 0 0 1 0 0 I1: 0 0 1 0 1 0 0 0 I2: 1 1 0 1 0 0 0 0 0 I3: 1 0 1 1 0 0 0 0 I4: 1 0 0 0 1 0 1 0 I5: 0 0 0 0 1 0 1 0 I6: 1 0 0 0 0 1 0 1 I7: 0 1 0 0 1 0 0 0 I8: 1 1 0 0 0 1 0 0 I9: 0 0 0 1 1 1 0 1 I0 I1 I2 I3 I4 I5 I6 I7 I8 I9 • Load feature i • Since wi = -6 (0), for each image n: - decrement by 6 the upper bound if φn,i = 1 - increment by 6 the lower bound if φn,i = 0
  82. 82. Top-k pruning ! [Rastegari et al., 2011]w: [ 3 -2 0 -6 0 3 -2 0 ] f0 f1 f2 f3 f4 f5 f6 f7 I0: 1 0 1 0 0 1 0 0 I1: 0 0 1 0 1 0 0 0 I2: 1 1 0 1 0 0 0 0 0 I3: 1 0 1 1 0 0 0 0 I4: 1 0 0 0 1 0 1 0 I5: 0 0 0 0 1 0 1 0 I6: 1 0 0 0 0 1 0 1 I7: 0 1 0 0 1 0 0 0 I8: 1 1 0 0 0 1 0 0 I9: 0 0 0 1 1 1 0 1 I0 I1 I2 I3 I4 I5 I6 I7 I8 I9• Suppose k = 4: we can prune I2,I9 since they cannot rank in the top-k
  83. 83. Distribution of weights and pruning rateCCV CV IC1745 745 # #1 ICCV 2011 Submission #1745. CONFIDENTIAL REVIEW COPY. DO NOT DISTRIBUTE. ICCV 2011 Submission #1745. CONFIDENTIAL REVIEW COPY. DO NOT DISTRIBUTE.54040 11 100 100 L1−LR L1−LR Distribution absolute weight values Distribution of absolute weight values41541 normalized of absolute weight values42542 L2−SVM L2−SVM43543 0.8 0.8 FGM FGM 80 80 % of images pruned % of images pruned44544 TkP L1−LR, k=10 TkP L1−LR, k=1045545 TkP L1−LR, k=3000 TkP L1−LR, k=3000 0.6 0.6 60 6046546 TkP L2−SVM, k=10 TkP L2−SVM, k=1047547 TkP L2−SVM, k=3000 TkP L2−SVM, k=300048548 0.4 0.4 40 40 TkP FGM, k=10 TkP FGM, k=1049549 TkP FGM, k=3000 TkP FGM, k=300050550 0.2 0.2 20 20515515255253553 00 0054554 aa 00 500 500 1000 1000 1500 1500 Dimension 2000 2000 2500 2500 bb 00 500 500 1000 1000 1500 1500 2000 2000 Number ofof iterations (d) iterations (d) 2500 2500 Dimension Number5555556556 Figure 2. (a) Distribution of weight absolute values for different classifiers (after sorting the weight magnitudes). TkP runs faster with Figure 2. (a) Distribution of weight absolute values for different classifiers (after sorting the weight magnitudes). TkP runs faster with57557 Features considered in descending order of |wi | sparse, highly skewed weight values. (b) Pruning rate of TkP for various classification model and different values ofof k (k = 10, 3000). sparse, highly skewed weight values. (b) Pruning rate of TkP for various classification model and different values k (k = 10, 3000).585585955960560 aa smaller value of kk allows the method to eliminate more smaller value of allows the method to eliminate more61 images from consideration at aavery early stage. 20 20 v=128561 images from consideration at very early stage. v=128 8 v=256 v=25662 w=2 8 v=256 v=256 w=28 8562 w=2 6 v=64 v=64 w=2 6 w=2 w=263
  84. 84. Performance evaluation on 35 ImageNet (10M images) 30 35 ! [Rastegari et al., 2011] Precision @ 10 (%) 25 30 TkP L1−LR 20 TkP L2−SVM Inverted index L1−LRPrecision @ 10 (%) 25 15 Inverted index L2−SVM 20 10 • k = 10 15 • Performance averaged over 400 object 5 classes used as queries 10 • 10 training examples per query class 0 0 50 • 100 150 Database includes 450 images of the query 5 Search time per query (seconds) and 9.7M images of other classes class • Prec@10 of a random classifiers is 0.005% 0 0 50 100 150 Search time per query (seconds)Each curve is obtained by varying sparsity through C in training objective N E(w) = R(w) + C N n=1 L(w; Φn , yn ) regularizer loss function
  85. 85. Alternative search strategy: approximate ranking• Key-idea: approximate the score function with a measure that can computed (more) efficiently (related to approximate NN search: [Shakhnarovich et al., 2006; Grauman and Darrell, 2007; Chum et al., 2008])• Approximate ranking via vector quantization: wT Φ ≈ wT q(Φ) ! q(!) where q(.) is a quantizer returning the cluster centroid nearest to Φ• Problem: - to approximate well the score we need a fine quantization - the dimensionality of our space is D=2659: too large to enable a fine quantization using k-means clustering
  86. 86. Product quantization ! Product quantization for nearest neighbor search [Jegou et al., 2011] • Split feature vector ! into v subvectors: ! [ !1 | !2 | ... | !v ] Vector split into m subvectors: • Subvectors are quantized separately by quantizers Subvectors are quantized separately by quantizers q(!) = [ q1(!1) | q2(!2) | ... | qv(!v) ] where each qi(.) is learned in a space of dimensionality D/v where each is learned by k-means with a limited number of centroids • Example from [Jegou vector split in 8 subvectors of dimension 16 Example: y = 128-dim et al., 2011]: ! is a 128-dimensional vector split into 8 subvectors of dimension 16 16 components16 components y1 y2 y3 y4 y5 y6 y7 y8 !1 !2 !3 !4 !5 !6 !7 !8 xedni noitazitnauq tib-46 stib 8 256 ) 1 y( 1 q q ) 2 y( 2 q q2 ) 3 y( 3 q q3 )4y(4q q4 )5y(5q q5 )6y(6q q6 )7y(7q )8y(8q q7 q8 28 = 256 centroids 1 centroids q1 q2 1 q3 1 q4 1 q5 q6 q7 q8 sdiortnec 1q 2q 3q 4q 5q 6q 7q 8q 652 q1(y1) q2(y2) q3(y3) q4(y4) q5(y5) q6(y6) q7(y7) q8(y8) q1(!1) q2(!2) q3(!3) q4(!4) 1 1y 1 1 1 1 2y 1 3y 4y 5y q5(!5) q6(!6) q7(!7) q8(!8) 6y 7y 8y 8 bits stnenopmoc 61 64-bit quantization index 8 bits 64-bit quantization index 61 noisnemid fo srotcevbus 8 ni tilps rotcev mid-821 = y :elpmaxE hcae erehw sdiortnec fo rebmun detimil a htiw snaem-k yb denrael si
  87. 87. obhgien tseraen rof noitazitnauq tcudorP :srotcevbus m otni tilps rotceV wv  . .   . tnauq yb yletarapes dezitnauq era srotcevbuS   w2   sub-blocks w1   htiw snaem-k yb denrael si centroids (r per sub-block) hcae erehw 1.Filling the look-up table:tcevbus 8 ni tilps rotcev mid-821 = y :elpmaxE look-up table can be precomputed and stored in a stnenopmoc 61 j=15y 4y 3y 2y T 1y wj qj (Φj ) wT Φ ≈ wT q(Φ) = v 6525q 4q 3q Efficient approximate scoring 2q 1q sdiortnecy(5q )4y(4q )3y(3q ) 2 y( 2 q ) 1 y( 1 q stib 8xedni noitazitnauq tib-46
  88. 88. obhgien tseraen rof noitazitnauq tcudorP :srotcevbus m otni tilps rotceV wv  . .   . tnauq yb yletarapes dezitnauq era srotcevbuS   w2   sub-blocks s11 w1 in  ner product  quantization for sub-block 1: htiw snaem-k yb denrael si centroids (r per sub-block) hcae erehw 1.Filling the look-up table:tcevbus 8 ni tilps rotcev mid-821 = y :elpmaxE look-up table can be precomputed and stored in a stnenopmoc 61 j=15y 4y 3y 2y T 1y wj qj (Φj ) wT Φ ≈ wT q(Φ) = v 6525q 4q 3q Efficient approximate scoring 2q 1q sdiortnecy(5q )4y(4q )3y(3q ) 2 y( 2 q ) 1 y( 1 q stib 8xedni noitazitnauq tib-46
  89. 89. obhgien tseraen rof noitazitnauq tcudorP :srotcevbus m otni tilps rotceV wv  . .   . tnauq yb yletarapes dezitnauq era srotcevbuS   w2   sub-blocks uct s11 s12 prod w1 inner   quantization for sub-block 1: htiw snaem-k yb denrael si centroids (r per sub-block) hcae erehw 1.Filling the look-up table:tcevbus 8 ni tilps rotcev mid-821 = y :elpmaxE look-up table can be precomputed and stored in a stnenopmoc 61 j=15y 4y 3y 2y T 1y wj qj (Φj ) wT Φ ≈ wT q(Φ) = v 6525q 4q 3q Efficient approximate scoring 2q 1q sdiortnecy(5q )4y(4q )3y(3q ) 2 y( 2 q ) 1 y( 1 q stib 8xedni noitazitnauq tib-46
  90. 90. obhgien tseraen rof noitazitnauq tcudorP :srotcevbus m otni tilps rotceV wv  . .   . tnauq yb yletarapes dezitnauq era srotcevbuS   w2   sub-blocks duct s11 s12 s13 ... ... ... ... ... ... s1r r pro i w1 nne  quantization for sub-block 1: htiw snaem-k yb denrael si centroids (r per sub-block) hcae erehw 1.Filling the look-up table:tcevbus 8 ni tilps rotcev mid-821 = y :elpmaxE look-up table can be precomputed and stored in a stnenopmoc 61 j=15y 4y 3y 2y T 1y wj qj (Φj ) wT Φ ≈ wT q(Φ) = v 6525q 4q 3q Efficient approximate scoring 2q 1q sdiortnecy(5q )4y(4q )3y(3q ) 2 y( 2 q ) 1 y( 1 q stib 8xedni noitazitnauq tib-46
  91. 91. obhgien tseraen rof noitazitnauq tcudorP :srotcevbus m otni tilps rotceV wv  . .   . tnauq yb yletarapes dezitnauq era srotcevbuS   w2   s21 in sub-blocks ner prod uct w1 s11 s12 s13 ... ... ... ... ... ... s1r   quantization for sub-block 2: htiw snaem-k yb denrael si centroids (r per sub-block) hcae erehw 1.Filling the look-up table:tcevbus 8 ni tilps rotcev mid-821 = y :elpmaxE look-up table can be precomputed and stored in a stnenopmoc 61 j=15y 4y 3y 2y T 1y wj qj (Φj ) wT Φ ≈ wT q(Φ) = v 6525q 4q 3q Efficient approximate scoring 2q 1q sdiortnecy(5q )4y(4q )3y(3q ) 2 y( 2 q ) 1 y( 1 q stib 8xedni noitazitnauq tib-46
  92. 92. xedni noitazitnauq tib-46 stib 8 ) 1 y( 1 q ) 2 y( 2 q )3y(3q )4y(4q y(5q Efficient approximate scoringsdiortnec 652 1q 2q 3q 4q 5q v wT Φ ≈ wT q(Φ) = wj qj (Φj ) T 1y 2y 3y 4y 5y j=1 stnenopmoc 61 can be precomputed and stored in a look-up table tcevbus 8 ni tilps rotcev mid-821 = y :elpmaxE 2.Score each quantized vector q(Φ) in the database using the look-up hcae erehw centroids (r per sub-block) htiw snaem-k yb denrael si table: s1r s11 s12 s13 ... ... ... ... ... ... sub-blocks s21 s22 s23 ... ... ... ... ... ... s2r w q(Φ) = w1 q1 (Φ1 ) + w2 q2 (Φ2 ) + . . . + wv qv... ... ) ... T T T T (Φv tnauq yb yletarapes dezitnauq era srotcevbuS... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...T q(Φ) = w1 q1 (Φ1 ) + w2 q2 (Φ2 ) + . . . + wv qv (Φv ) T T T ... ... ... :srotcevbus m otni tilps rotceV ... ... ... ... ... ... ... sv1 sv2 sv3 ... ... ... ... ... ... svr Only v additions per image! obhgien tseraen rof noitazitnauq tcudorP
  93. 93. Choice of parameters ! [Rastegari et al., 2011]• Dimensionality is first reduced with PCA from D=2659 to D’ D• How do we choose D’, v (number of sub-blocks), r (number of centroids per sub-block)?• Effect of parameter choices on a database of 150K images: (v,r) 20 8 8 (128,2 ) (256,2 ) 6 (256,2 ) 6 (64,2 ) 15 Precision @ 10 (%) 6 8 (64,2 ) (32,2 ) (128,28) D’=512 10 8 (16,2 ) D’=256 8 6 (32,2 ) (64,2 ) D’=128 5 (32,28) 8 (16,2 ) 8 (16,2 ) 0 0 0.05 0.1 0.15 0.2 0.25 0.3 Search time per query (seconds)
  94. 94. Performance evaluation on 150K imagesICCV#1745 ICCV 2011 Submission #1745. CONFIDENTIAL REVIEW COPY. DO NOT DISTRIBUTE. 432 25  
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×