Supervised Learning of Semantic Classes for Image Annotation and Retrieval

954 views

Published on

This is presentation done by me for ECSE626 "Statistical Computer Vision" at McGill University. It is presentation of a project inspired by paper "Supervised Learning of Semantic Classes for Image Annotation and Retrieval" from PAMI 2007. It presents my implementation of the paper and my achieved results.

Published in: Technology, Education
1 Comment
0 Likes
Statistics
Notes
  • Be the first to like this

No Downloads
Views
Total views
954
On SlideShare
0
From Embeds
0
Number of Embeds
16
Actions
Shares
0
Downloads
35
Comments
1
Likes
0
Embeds 0
No embeds

No notes for slide

Supervised Learning of Semantic Classes for Image Annotation and Retrieval

  1. 1. G. Carneiro, A. Chan, P. Moreno N. Vasconcelos by: Lukáš Tencer ECSE626 2012
  2. 2. Outline • Introduction • Prior techniques • Supervised OVA Labeling • Unsupervised Labeling • Methodology • Supervised Multiclass Labeling • Semantic Distribution Estimation • Density Estimation • Algorithm • Learning, Annotation, Retrieval • Results • Quantitative • Qualitative • Conclusion
  3. 3. Introduction • Task • Assign labels to unknown images • Retrieve relevant images given labels • Supervised Learning • Learning from labeled training data • Training data consist of pairs • Multiple instance learning • Semantic Classes • labels representing common concepts (sky, bear, snow…) • Image Annotation and Retrieval • Annotation: Given the image D, what labels are present in the image • Given the label what are the top n matching images nilx ii ...1},{ 
  4. 4. Introduction  Datasets:  Corel5K – 5000 images, 272 Classes  Corel30K – 30000 images, 1120 Classes  MIRFLICKR – 25000 images, 37 Classes  (PSU) – not available anymore  ImageCLEF - The CLEF (Cross Language Evaluation Forum) Cross Language Image Retrieval Track  Medical Image retrieval  Photo Annotation  Plant Identification  Wikipedia Retrieval  Patent Image Retrieval and Classification
  5. 5. Introduction  Corel 5K Corel 30K MIRFLICKR Bear New Zealand Urban
  6. 6. Prior Techniques  Supervised OVA  Binary decision problem, concept present / absent  Hidden variable Yi  Decision rule:  Unsupervised Learning  Modeling dependency between text label and image features, expressed as hidden variable L  Considering just positive examples, densities for Yi=1 )0()0|()1()1|( || iiii YYXYYX PXPPXP    D l LWLXWX lPlwPlxPwxP 1 ||, )(),(),(),( L W X W1 W2 W3 X bear polar, grizzly features
  7. 7. Methodology Supervised Multiclass Labeling (SML)  Elements of semantic vocabulary (W) are explicitly made to semantic classes (L) !  Random var. W: annotation and retrieval is then easy to do as: Annotation Retrieval )|(Pandfromsampleisifonly},...,1{, W|X ixwxTiiW i )( )(),( )|( | | xP iPixP xiP X WWX XW  )|(maxarg)(* | XiPXi XWi )|(maxarg)(* | iXPwj jWXji 
  8. 8. Methodology Estimation of Semantic Class Distributions  Given Di training set of images, estimate  Assumption: Gaussian Distribution  How to estimate?  Direct estimation  Model Averaging  Naive Averaging  GMM model:  Averaged: )|(| ixP WX   iD l WLX i WX ilxP D ixP 1 ,|| ),|( 1 ),(   k k li k li k liWLX xGilxP ),,(),|( ,,,,|    k D l k li k li k li i WX i xG D ixP 1 ,,,| ),,( 1 )|( 
  9. 9. Methodology Mixture hierarchies  First step, get GMM from images – regular soft EM  E:  M:   8 1 | ),,()|( k k I k I k IWX xGIxP  Initialization Euclidian distance Mahalonobis distance Initial Par. estimate Expectation Maximizaiton Max iter. 200Change in likelihood is too small    n i j jjiji xGjzzxP 1 2 1 ),;()()|,(  )|,()|,()|,( 1 ttt zxPzxPzxP    )],;([log),( ,| ZXFEQ t xz t    ),(maxarg1 tt Q  
  10. 10. Methodology Mixture hierarchies for label  Second step, get HGMM for labels  E:  M:   64 1 | ),,()|( k k w k w k wWX xGwxP  Initialization Bhattacharyya distance Initial Par. estimate Expectation Maximizaiton Max iter. 200Change in likelihood is too small    n i j jjiji xGjzzxP 1 2 1 ),;()()|,(  )|,()|,()|,( 1 ttt zxPzxPzxP    )],;([log),( ,| ZXFEQ t xz t    ),(maxarg1 tt Q  
  11. 11. E and M step for HGMM  Input:  Output:  E-step:  M-step: KkDj i k j k j k j ,...,1,,...,1},,,{          l l c Ntrace l c l c k j m c Ntrace m c m c k jm jk k j k j l c k j k j m c eG eG h     ]),,([ ]),,([ }){( 2 1 }){( 2 1 1 1 Mmm j m j m j ,...,1},,,{  KD h i m jkjknewm c  )(    jk jk k j m jk k j m jkm jk k j m jk newm c h h ww    where,)(   jk Tm c k j m c k j k j m jk newm c w ]))(([)( 
  12. 12. Algorithm - learning  Training  For each training set I for label w  Decompose image (192px * 128px ) into 8x8 regions by sliding window moving each 2 pixels  Calculate DCT for each window (8*8*3) 192-d feature vector  Calculate mixture of 8 Gaussians for each Image using EM  Calculate mixture of 64 Gaussians for each label using H-EM   8 1 | ),,()|( k k I k I k IWX xGIxP    64 1 | ),,()|( k k w k w k wWX xGwxP 
  13. 13. Algorithm – annotation, retrieval  Annotation  Get n(5) beast labels for image I  Get features from image ((192*128/2)*192)  Get log likelihood for each label, choose the best n  Retrieval  For images IT and label w:  Annotate IT and get decreasing scores of posterior     x iWXiWX wxPwP )|(log)|(log || )|(| iWX wP 
  14. 14. Results-quantitative  Database: Corel 5k  Precision:  Recall:  4000 training 1000 testing retrieved retrievedrelevant relevant retrievedrelevant H C w w recall  auto C w w precision  annotatedautomatic annotatedhuman imagesannotatedcorrectly    auto H C w w w
  15. 15. Results-quantitative Non zero recall mean Recall mean Precision 1 2 3 4 5 6 w with Recall > 0 140 121 110 125 90 131 Mean Recall per w 0.27 0.25 0.25 0.26 0.23 0.27 Mean Precision pre w 0.25 0.24 0.23 0.23 0.2 0.23 Annotation
  16. 16. Results-quantitative Recall > 0 PrecisionAll precision 1 2 3 4 5 6 Mean Recall all w 0.23 0.21 0.20 0.21 0.19 0.24 Mean Recall per w R>0 0.45 0.40 0.40 0.41 0.37 0.41 Retrieval
  17. 17. Results-qualitative
  18. 18. Results-qualitative plane jet f-14 sky ----------------------- sky plane clouds smoke snow coast waves water hills ----------------------- water sky ocean mountain clouds polar bear bars cage ----------------------- bear snow texture sunrise closeup people cheese market street ----------------------- people wall sand flower bird
  19. 19. Results-qualitative
  20. 20. Results-qualitative Blooms Mountain Pool Smoke Woman
  21. 21. Results-qualitative
  22. 22. Conclusions  Pros  Nice segmentation as byproduct of annotation  Great for general concepts with lots of samples  Just weakly annotated data is required (multi-instance learning)  Allows hierarchical representation (adding images, speed)  Contras  Fixed number of labels per image  Learning is time consuming  Parameter tuning is time consuming  Weakly represented classes could be associated with wrong concepts
  23. 23. Resources  Carneiro, G., Chan, A.B., Moreno, P.J., Vasconcelos, N.: Supervised learning of semantic classes for image annotation and retrieval. Pattern Analysis and Machine Intelligence, IEEE Transactions on. 29, 394–410 (2007).  Gudivada, V.N., Raghavan, V.V.: Content based image retrieval systems. Computer. 28, 18–22 (1995).  Belongie, S., Carson, C., Greenspan, H., Malik, J.: Color-and texture-based image segmentation using EM and its application to content-based image retrieval. Computer Vision, 1998. Sixth International Conference on. pp. 675–682. IEEE (1998).  Cappé, O., Moulines, E.: On-line expectation–maximization algorithm for latent data models. Journal of the Royal Statistical Society: Series B (Statistical Methodology). 71, 593–613 (2009).  Datta, R., Joshi, D., Li, J., Wang, J.Z.: Image Retrieval: Ideas, Influences, and Trends of the New Age. ACM Computing Surveys. 40, 1-60 (2008).
  24. 24. lukas.tencer@gmail.com http://tencer.hustej.net @lukastencer accuratelyrandom.blogspot.com facebook.com/lukas.tencer
  25. 25. Google labeling game

×