Using classifiers to compute similarities between face images. Prof. Lior Wolf, Tel-Aviv University

11,280 views

Published on

Prof. Lior Wolf, Tel-Aviv University

He is a faculty member at the School of Computer Science at Tel-Aviv University. Previously, he was a post-doctoral associate in Prof. Poggio's lab at MIT. He graduated from the Hebrew University, Jerusalem, where he worked under the supervision of Prof. Shashua. He was awarded the 2008 Sackler Career Development Chair, the Colton Excellence Fellowship for new faculty (2006-2008), the Max Shlumiuk award for 2004, and the Rothchild fellowship for 2004. His joint work with Prof. Shashua in ECCV 2000 received the best paper award, and their work in ICCV 2001 received the Marr prize honorable mention. He was also awarded the best paper award at the post ICCV workshop on eHeritage 2009. In addition, Lior has held several development, consulting and advisory positions in computer vision companies including face.com and superfish, and is a co-founder of FDNA.

Presentation topic:
Using classifiers to compute similarities between images of faces.

Key points:
The One-Shot-Similarity (OSS) is a framework for classifier-based similarity functions. It is based on the use of background samples and was shown to excel in tasks ranging from face recognition to document analysis. In this talk we will present the framework as well as the following results: (1) when using a version of LDA as the underlying classifier, this score is a Conditionally Positive Definite kernel and may be used within kernel-methods (e.g., SVM), (2) OSS can be efficiently computed, and (3) a metric learning technique that is geared toward improved OSS performance.

Published in: Technology, Travel
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
11,280
On SlideShare
0
From Embeds
0
Number of Embeds
9,122
Actions
Shares
0
Downloads
26
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Using classifiers to compute similarities between face images. Prof. Lior Wolf, Tel-Aviv University

  1. 1. Learning visual similarity using classifiers<br />Lior Wolf, The Blavatnik School of Computer Science, Tel-Aviv University<br />Collaborators: Students: YanivTaigman Tal Hassner Orit Klipper-Gross Itay Maoz<br /> face.com Open U Weizmann inst. Tel-Aviv U <br />1<br />
  2. 2. The Blavatnik School of Computer ScienceTel-Aviv University<br />An example of higher education in Israel<br />2<br />
  3. 3. A school in the Faculty of Exact Sciences<br />that also includes: Mathematics, Physics, Chemistry, Geophysics and Planetary Sciences<br />Originated in the 1970’s as part of the School of Math, <br />since 2000 a separate School<br />39 Faculty Members<br />~1000 undergrads<br />~200 MSc students<br />~70 PhD students<br />Post Docs and other research personnel<br />3<br />
  4. 4. School Ranking in the world<br />TAU/CS Ranked #29 in number of citations - Thompson Scientific, (for the years 2000-2010).[Technion #33 , Weizmann #72, HebrewU #105]<br />TAU/CS Ranked #28 by the Shanghai Academic Ranking of World Universities in Computer Science – 2011[Weizmann #12, Technion #15, HebrewU #21]<br />TAU/CS Ranked #14 in the world in CS impact – Scientometrics, Vol. 76, No. 2, 2008.<br />12 TAU/CS faculty in positions 1-100 in “list of most central computer Scientists in Theory of Computer Science” (Kuhn – Wattenhofer, Sigact news, Dec ’07)<br />4<br />
  5. 5. Computer vision in search<br />Query<br />Raw data:<br />images,video,<br />audio<br />Information:objects,tags, IDs,context<br />Searchresults<br />Preprocessing<br />5<br />
  6. 6. The pain: too many images<br />On :<br />Over 1,000,000,000 photos uploaded each month<br />shared by 200,000,000+ users<br />10’s of billions served/week<br />No tags  No Photos…<br />“can I see all my photos?”<br />“tagging takes hours, can you do that for me?”<br />6<br />
  7. 7. The evolution of perceptual search<br />Reranking bysimilarity<br />Text-basedimage search<br />Specializationin face identification<br />Low-level vision<br />Mid-level vision<br />No vision<br />With basicproperties<br />Gist-based Image similarity<br />Catalog basedsearch<br />High-level vision: scene understanding<br />7<br />
  8. 8. Photo Finder for facebook<br />8<br />
  9. 9. THE 1st MOBILE APP TO FIND 3D ITEMS<br />9<br />
  10. 10. WHAT MAKES IT SO HARD?<br />10<br />
  11. 11. High-level vision: what is where?<br />A happy couple walks in a fieldWhat kind of field?<br />Where? Which season?<br />How old are they? Gender?<br />How attractive?<br />What are they wearing?<br />High-level vision: scene understanding<br />11<br />
  12. 12. YaC, Moscow, September 19, 2011<br />Learning visual similarity using classifiers<br />Lior Wolf, The Blavatnik School of Computer Science, Tel-Aviv University<br />Collaborators: Students: YanivTaigman Tal Hassner Orit Klipper-Gross Itay Maoz<br /> face.com Open U Weizmann inst. Tel-Aviv U <br />12<br />
  13. 13. The Pair-Matching Problem<br />Training:<br />13<br />
  14. 14. Modeling never before seen objects<br />Natural setup for image retrieval with no categories<br />Training:<br />The Pair-Matching Problem<br />14<br />
  15. 15. Instances<br />Document Analysis<br />Video Action Recogniti0n<br />Face Recogniti0n<br />Video Face Recogniti0n<br />15<br />
  16. 16. The Pair-Matching Problem<br />Training:<br />16<br />
  17. 17. Labeled Faces in the Wild (LFW)<br />13,000 labeled images of faces collected from the web<br />5,749 individuals<br />1-150 images per individual<br />Training:<br />17<br />
  18. 18. Restricted Protocol<br />10-fold cross validation tests on randomly generated splits, each with:<br />300 same pairs<br />300 not same pairs<br />18<br />
  19. 19. Pipeline (take 1)*<br />Training. Note: no use of labels!<br />=1<br />Sim ( , )<br />same<br />=2<br />Sim( , )<br />Classifier (e.g.SVM) <br />=i<br />Sim ( , )<br />not same<br />=i+1<br />Sim( , )<br />Threshold<br />* “Descriptor Based Methods in the Wild,” ECCVw’08<br />19<br />
  20. 20. Pipeline (take 1)*<br />Training – multiple descriptors similarities <br />(1,1,1,2,…,1,n)<br />same<br />(2,1,2,2,…,2,n)<br />Classifier (e.g.SVM) <br />(i,1,i,2,…,i,n)<br />not same<br />(i+1,1,i+1,2,…,i+1,n)<br />* “Descriptor Based Methods in the Wild,” ECCVw’08<br />20<br />
  21. 21. Some Questions<br />How to represent the images?<br />Which similarity to use?<br />Grayscales, Edge responces [Brunelli & Poggio’93], C1-Gabor [e.g., Riesenhuber & Poggio’99], SIFT [Lowe’04], LBP [e.g., Ojala & Pietikainen & Harwood’96],…<br />Later on:<br />How can subject IDs help improve pair-matching performance?<br />L2, Correlation, Learned metrics [e.g., Bilenko etal.’04, Cristianini etal.’02, Hertz etal. 04, …], “hand-crafted” metrics [e.g., Belongie etal.’01]<br />21<br />
  22. 22. One-Shot Similarity (OSS) Score*<br />What:<br />A measure of the similarity between two vectors<br />Input:<br />The two vectors<br />A set of “Background samples”<br />How:<br />Use “One-Shot Learning” (classification with one positive example)<br />* “Descriptor Based Methods in the Wild,” ECCVw’08 “The One-Shot Similarity Kernel”, ICCV’09<br />22<br />
  23. 23. Computing the “One-Shot” Similarity<br />Step a: Model1 = train (p, A)<br />Step b: Score1 = classify(q, Model1)<br />Step c: Model2 = train (q, A)<br />Set “A” of background examples<br />Step d: Score2 = classify(p, Model1)<br />q<br />One-Shot-Sim = (score1 + score2) /2<br />Similarity <br />p<br />23<br />
  24. 24. Euclidean vs. One-Shot Visualized<br />One-Shot<br />Euclidean<br />24<br />
  25. 25. Euclidean vs. One-Shot Visualized<br />One-Shot<br />Euclidean<br />25<br />
  26. 26. Computing the “One-Shot” Similarity<br />Using LDA as the underlying classifier :<br />Where is the mean of set A, and is the pseudo-inverse of the intra-class cov. matrix. <br />* “The One-Shot Similarity Kernel”, ICCV’09<br />26<br />
  27. 27. Computing the “One-Shot” Similarity<br />Using Free-Scale LDA as the underlying classifier :<br />Where is the mean of set A, and is the pseudo-inverse of the intra-class cov. matrix. <br />* “The One-Shot Similarity Kernel”, ICCV’09<br />27<br />
  28. 28. Some Properties of the OSS*<br />Uses unlabeled training data<br />OSS based on Free-Scale LDA is a CPD Kernel<br />May be efficiently computed<br />Complexity: is independent of the two vectors compared, and so computed only once. Also, repeated comparisons of a vector xi to different xj may be performed in O(n)<br />* “The One-Shot Similarity Kernel”, ICCV’09<br />28<br />
  29. 29. Some Properties of the OSS*<br />* “The One-Shot Similarity Kernel”, ICCV’09<br />29<br />
  30. 30. Some Properties of the OSS*<br />OSS based on Free-Scale LDA is a CPD Kernel<br />* “The One-Shot Similarity Kernel”, ICCV’09<br />30<br />
  31. 31. Metric learning for OSS*<br />Instead of examples xi use Txi for some “optimal” T<br />The transformation T is obtained by a gradient decent procedure that optimizes the score:<br />*“One Shot Similarity Metric Learning for Action Recognition”, In submission.<br />31<br />
  32. 32. The Unrestricted Protocol<br />10-fold cross validation tests on randomly generated splits, each with:<br />300 same pairs<br />300 not same pairs<br />Training now includes subject labels<br />32<br />
  33. 33. Multiple One-Shots*<br />We now have IDs. How do we use them?<br />Compute multiple OSS, each time using examples from a single class <br />* “Multiple One-Shots for Utilizing Class Label Information,” BMVC’09<br />33<br />
  34. 34. Multiple One-Shots<br />ID-based OSS<br />34<br />
  35. 35. Multiple One-Shots<br />We now have IDs. How do we use them?<br />Compute multiple OSS, each time using examples from a single class <br />Discrimination based on different sources of variation: Subject ID, Pose, etc.<br />35<br />
  36. 36. The Pose Issue<br />Most confident wrong results*<br />* “Descriptor Based Methods in the Wild,” ECCVw’08<br />36<br />
  37. 37. Getting Poses<br />To compute Pose based OSS, you need sets of images in the same pose…<br />7 fiducial points (eyes, mouth, nose)<br />14 x,y coordinates<br />14D vector of alignment errors (similarity trnsf.)<br />Project to first Principal Component<br />Bin to 10 classes<br />37<br />
  38. 38. Multiple One-Shots<br />Pose-based OSS<br />38<br />
  39. 39. Multiple One-Shots - Examples <br />5 Id-based OSS and<br />5 Pose-based OSS scores<br />Identity<br />Pose<br />39<br />
  40. 40. Multiple One-Shots - Examples <br />Identity<br />Pose<br />40<br />
  41. 41. Multiple One-Shots - Examples <br />Identity<br />Pose<br />41<br />
  42. 42. Pipeline*<br />Input image pair<br />Image alignment<br />Commercial alignment software by <br />* “Multiple One-Shots for Utilizing Class Label Information,” BMVC’09<br />42<br />
  43. 43. Pipeline*<br />Input image pair<br />Image alignment<br />Feature vectors<br />Using:<br /><ul><li>SIFT [Lowe’04]
  44. 44. LBP [Ojala etal.’96, 01,02]
  45. 45. TPLBP, FPLBP [Wolf etal.’08]</li></ul>* “Multiple One-Shots for Utilizing Class Label Information,” BMVC’09<br />43<br />
  46. 46. Pipeline*<br />Input image pair<br />Image alignment<br />Feature vectors<br />PCA+ITML<br />Information Theoretic Metric Learning [Davis etal.’07]<br />* “Multiple One-Shots for Utilizing Class Label Information,” BMVC’09<br />44<br />
  47. 47. Pipeline*<br />Input image pair<br />Image alignment<br />Feature vectors<br />Multiple OSS scores<br />PCA+ITML<br />20 Subjects<br />10 Poses <br />* “Multiple One-Shots for Utilizing Class Label Information,” BMVC’09<br />45<br />
  48. 48. Pipeline*<br />Input image pair<br />Image alignment<br />Feature vectors<br />Multiple OSS scores<br />SVM classifier<br />Output<br />PCA+ITML<br />Same Not-same<br />* “Multiple One-Shots for Utilizing Class Label Information,” BMVC’09<br />46<br />
  49. 49. Pipeline – Multiple Descriptors*<br />Feature vectors SIFT<br />Multiple OSS scores<br />PCA+ITML<br />Image alignment<br />SVM classifier<br />Output<br />Same Not-same<br />Feature vectors LBP<br />Multiple OSS scores<br />PCA+ITML<br />* “Multiple One-Shots for Utilizing Class Label Information,” BMVC’09<br />47<br />
  50. 50. 0.7847 ± 0.0051 [WHT’08]<br />0.8398 ± 0.0035 [WHT’08 + alignment]<br />Results<br />0.8517 ± 0.0061 [this work, only LBP]<br />0.8950 ± 0.0051 [this work, multi-desc.]<br />0.9753 [Kumar etal. 09 - HUMAN]<br />48<br />
  51. 51. Pair-Matching of Sets<br />* Face Recognition in Unconstrained Videos with Matched B/G Similarity. CVPR 2011.<br />49<br />
  52. 52. Pair-Matching of Sets<br />Training:<br />50<br />
  53. 53. Conventional methods<br /><ul><li>all pairs comparison, distance between all frames of the first video and the second video.
  54. 54. pose based methods, comparing the two most frontal faces in each video or the two faces with the most similar pose.
  55. 55. algebraic methods set-to-set methods, such as max correlation, projection and Procrustes.
  56. 56. non algebraic methods such as PMK and LLC. </li></ul>51<br />
  57. 57. Matched B/G similarity<br /><ul><li> X1 & X2: Sets of video frame descriptors.
  58. 58. B: background set of faces.</li></ul>Similarity = MBGS(X1, X2, B)<br /> B1 = Find_Nearest_Neighbors(X1,B)<br /> Model1 = train(X1, B1)<br /> Confidences1 = classify(X2, Model1)<br /> Sim1 = mean(Confidences1)<br />X2<br />Similarity <br />X1<br />52<br />
  59. 59. Matched B/G similarity<br /><ul><li> X1 & X2: Sets of video frame descriptors.
  60. 60. B: background set of faces.</li></ul>Similarity = MBGS(X1, X2, B)<br /> B1 = Find_Nearest_Neighbors(X1,B)<br /> Model1 = train(X1, B1)<br /> Confidences1 = classify(X2, Model1)<br /> Sim1 = mean(Confidences1)<br />B2 = Find_Nearest_Neighbors(X2, B)<br /> Model2 = train(X2, B2)<br /> Confidences2 = classify(X1, Model2)<br /> Sim2 = mean(Confidences2)<br />Similarity = (Sim1+Sim2)/2<br />53<br />
  61. 61. Thank you!<br />Software available:<br />http://www.cs.tau.ac.il/~wolf<br />54<br />
  62. 62. Why pair matching?<br /><ul><li> Basis of effective classification
  63. 63. Easy to benchmark
  64. 64. We show that whatever works for pair matching works for identification
  65. 65. L. Wolf, T. Hassner, and Y. Taigman Effective Face Recognition by Combining Multiple Descriptors and Learned Background Statistic. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 2011.</li></ul>55<br />
  66. 66. Papers<br />L. Wolf, T. Hassner, and Y. Taigman Effective Face Recognition by Combining Multiple Descriptors and Learned Background Statistic. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 2011.<br />L. Wolf, T. Hassner, and Y. Taigman The One-Shot Similarity Kernel IEEE International Conference on Computer Vision (ICCV), 2009. <br />L. Wolf, Y. Taigman and T. Hassner Similarity Scores based on Background Samples Asian Conference on Computer Vision (ACCV), 2009.<br />Y. Taigman, L. Wolf, and T. Hassner Multiple One-Shots for Utilizing Class Label Information The British Machine Vision Conference (BMVC) , 2009. <br />L. Wolf, T. Hassner, and Y. Taigman Descriptor Based Methods in the Wild Post ECCV workshop on Faces in Real-Life Images: Detection, Alignment, and Recognition , 2008<br />56<br />
  67. 67. Papers<br />L. Wolf, T. Hassner, and I. Maoz. Face Recognition in Unconstrained Videos with Matched Background Similarity. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR),2011.<br />L. Wolf, R. Littman, N. Mayer, T. German, N. Dershowitz, R. Shweka, and Y. Choueka. Identifying Join Candidates in the Cairo Genizah. International Journal of Computer Vision (IJCV), 2011.<br />O. Klipper-Gross, T. Hassner, and L. Wolf. One Shot Similarity Metric Learning for Action Recognition. Submitted, 2011.<br />57<br />

×