Successfully reported this slideshow.
Your SlideShare is downloading. ×

Sparse Kernel Learning for Image Annotation

Ad

Sparse Kernel Learning for Image Annotation
Sean Moran and Victor Lavrenko
Institute of Language, Cognition and Computatio...

Ad

Sparse Kernel Learning for Image Annotation
Overview
SKL-CRM
Evaluation
Conclusion

Ad

Sparse Kernel Learning for Image Annotation
Overview
SKL-CRM
Evaluation
Conclusion

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Loading in …3
×

Check these out next

1 of 39 Ad
1 of 39 Ad

More Related Content

Slideshows for you (19)

Sparse Kernel Learning for Image Annotation

  1. 1. Sparse Kernel Learning for Image Annotation Sean Moran and Victor Lavrenko Institute of Language, Cognition and Computation School of Informatics University of Edinburgh ICMR’14 Glasgow, April 2014
  2. 2. Sparse Kernel Learning for Image Annotation Overview SKL-CRM Evaluation Conclusion
  3. 3. Sparse Kernel Learning for Image Annotation Overview SKL-CRM Evaluation Conclusion
  4. 4. Assigning words to pictures Feature Extraction GIST SIFT LAB HAAR Tiger, Grass, Whiskers City, Castle, Smoke Tiger, Tree, Leaves Eagle, Sky Training Dataset P(Tiger | ) = 0.15 P(Grass | ) = 0.12 P(Whiskers| ) = 0.12 Top 5 words as annotation This talk: How best to combine features? Multiple Features Ranked list of words Tiger, Grass, Tree Leaves, Whiskers Annotation Model P(Leaves | ) = 0.10 P(Tree | ) = 0.10 P(Smoke | ) = 0.01 Testing Image P(City | ) = 0.03 P(Waterfall | ) = 0.05 P(Castle | ) = 0.03 P(Eagle | ) = 0.02 P(Sky | ) = 0.08 X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6 X6 X5 X4 X3 X2 X1 X6 X5 X4 X3 X2 X1 X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6 X6 X5 X4 X3 X2 X1 X6 X5 X4 X3 X2 X1 X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6 X6 X5 X4 X3 X2 X1 X6 X5 X4 X3 X2 X1 X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6 X6 X5 X4 X3 X2 X1 X6 X5 X4 X3 X2 X1 X1 X2 X3 X4 X5 X6
  5. 5. Previous work Topic models: latent Dirichlet allocation (LDA) [Barnard et al. ’03], Machine Translation [Duygulu et al. ’02] Mixture models: Continuous Relevance Model (CRM) [Lavrenko et al. ’03], Multiple Bernoulli Relevance Model (MBRM) [Feng ’04] Discriminative models: Support Vector Machine (SVM) [Verma and Jahawar ’13], Passive Aggressive Classifier [Grangier ’08] Local learning models: Joint Equal Contribution (JEC) [Makadia’08], Tag Propagation (Tagprop) [Guillaumin et al. ’09], Two-pass KNN (2PKNN) [Verma et al. ’12]
  6. 6. Combining different feature types Previous work: linear combination of feature distances in a weighted summation with “default” kernels: Kernels x GG(x;p) p =1 x GG(x;p) p =15 x GG(x;p) p =2 Laplacian UniformGaussian Standard kernel assignment: Gaussian for Gist, Laplacian for colour features, χ2 for SIFT
  7. 7. Data-adaptive visual kernels Our contribution: permit the visual kernels themselves to adapt to the data: Kernels x GG(x;p) p =1 x GG(x;p) p =15 x GG(x;p) p =2 Laplacian UniformGaussian Corel 5K Hypothesis: Optimal kernels for GIST, SIFT etc dependent on the image dataset itself
  8. 8. Data-adaptive visual kernels Our contribution: permit the visual kernels themselves to adapt to the data: Kernels x GG(x;p) p =1 x GG(x;p) p =15 x GG(x;p) p =2 Laplacian UniformGaussian IAPR TC12 Hypothesis: Optimal kernels for GIST, SIFT etc dependent on the image dataset itself
  9. 9. Sparse Kernel Continuous Relevance Model (SKL-CRM) Overview SKL-CRM Evaluation Conclusion
  10. 10. Continuous Relevance Model (CRM) CRM estimates joint distribution of image features (f) and words (w)[Lavrenko et al. 2003]: P(w, f) = J∈T P(J) N j=1 P(wj |J) M i=1 P(fi |J) P(J): Uniform prior for training image J P(fi |J): Gaussian non-parametric kernel density estimate P(wi |J): Multinomial for word smoothing Estimate marginal probability distribution over individual tags: P(w|f) = P(w, f) w P(w, f) Top e.g. 5 words with highest P(w|f) used as annotation
  11. 11. Sparse Kernel Learning CRM (SKL-CRM) Introduce binary kernel-feature alignment matrix Ψu,v P(I|J) = M i=1 R j=1 exp − 1 β u,v Ψu,v kv (f u i , f u j ) kv (f u i , f u j ): v-th kernel function on the u-th feature type β: kernel bandwidth parameter Goal: learn Ψu,v by directly maximising annotation F1 score on held-out validation dataset
  12. 12. Generalised Gaussian Kernel Shape factor p: traces out an infinite family of kernels P(fi |fj ) = p1−1/p 2βΓ(1/p) exp − 1 p |fi − fj |p βp Γ: Gamma function β: kernel bandwidth parameter
  13. 13. Generalised Gaussian Kernel Shape factor p: traces out an infinite family of kernels P(fi |fj ) = p1−1/p 2βΓ(1/p) exp − 1 p |fi − fj |p βp x GG(x;p) p =2
  14. 14. Generalised Gaussian Kernel Shape factor p: traces out an infinite family of kernels P(fi |fj ) = p1−1/p 2βΓ(1/p) exp − 1 p |fi − fj |p βp x GG(x;p) p =1
  15. 15. Generalised Gaussian Kernel Shape factor p: traces out an infinite family of kernels P(fi |fj ) = p1−1/p 2βΓ(1/p) exp − 1 p |fi − fj |p βp x GG(x;p) p =15
  16. 16. Multinomial Kernel Multinomial kernel optimised for count-based features: P(fi |fj ) = ( d fi,d )! d (fi,d !) d (pj,d )fi,d fi,d : count for bin d in the unlabelled image i fj,d count for the training image j Jelinek-Mercer smoothing used to estimate pj,d : pj,d = λ fj,d d fj,d + (1 − λ) j fj,d j,d fj,d We also consider standard χ2 and Hellinger kernels
  17. 17. Greedy kernel-feature alignment Features Kernels Laplacian GIST HAAR Gaussian Uniform X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6 SIFT LAB 0 0 0 0 0 0 0 0 0 0 0 0 GIST SIFT LAB HAAR Laplacian Gaussian Uniform Ψvu X6 Iteration 0: F1 0.0 Features GIST HAAR X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6 SIFT LAB X6 Testing Image Training Image x GG(x;p) p =1 x GG(x;p) p =15 x GG(x;p) p =2
  18. 18. Greedy kernel-feature alignment Features Kernels Laplacian GIST HAAR Uniform X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6 SIFT LAB 0 0 0 0 1 0 0 0 0 0 0 0 GIST SIFT LAB HAAR Laplacian Gaussian Uniform Ψvu X6 Iteration 1: F1 0.25 Features GIST HAAR X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6 SIFT LAB X6 Testing Image Training Image x GG(x;p) p =1 x GG(x;p) p =15 x GG(x;p) p =2 Gaussian
  19. 19. Greedy kernel-feature alignment Features GIST HAAR X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6 SIFT LAB 0 0 0 0 1 0 0 0 0 0 0 1 GIST SIFT LAB HAAR Laplacian Gaussian Uniform Ψvu X6 Iteration 2: F1 0.34 Features GIST HAAR X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6 SIFT LAB X6 Testing Image Training Image Kernels Laplacian Uniform x GG(x;p) p =1 x GG(x;p) p =15 x GG(x;p) p =2 Gaussian
  20. 20. Greedy kernel-feature alignment Features GIST HAAR X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6 SIFT LAB 0 0 0 0 1 1 0 0 0 0 0 1 GIST SIFT LAB HAAR Laplacian Gaussian Uniform Ψvu X6 Iteration 3: F1 0.38 Features GIST HAAR X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6 SIFT LAB X6 Testing Image Training Image Kernels x GG(x;p) p =1 x GG(x;p) p =15 x GG(x;p) p =2 Gaussian Laplacian Uniform
  21. 21. Greedy kernel-feature alignment Features GIST HAAR X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6 SIFT LAB 0 0 1 0 1 1 0 0 0 0 0 1 GIST SIFT LAB HAAR Laplacian Gaussian Uniform Ψvu X6 Iteration 4: F1 0.42 Features GIST HAAR X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6 SIFT LAB X6 Testing Image Training Image Kernels Laplacian Uniform x GG(x;p) p =1 x GG(x;p) p =15 x GG(x;p) p =2 Gaussian
  22. 22. Evaluation Overview SKL-CRM Evaluation Conclusion
  23. 23. Datasets/Features Standard evaluation datasets: Corel 5K: 5,000 images (landscapes, cities), 260 keywords IAPR TC12: 19,627 images (tourism, sports), 291 keywords ESP Game: 20,768 images (drawings, graphs), 268 keywords Standard “Tagprop” feature set [Guillaumin et al. ’09]: Bag-of-words histograms: SIFT [Lowe ’04] and Hue [van de Weijer & Schmid ’06] Global colour histograms: RGB, HSV, LAB Global GIST descriptor [Oliva & Torralba ’01] Descriptors, except GIST, also computed in a 3x1 spatial arrangement [Lazebnik et al. ’06]
  24. 24. Evaluation Metrics Standard evaluation metrics [Guillaumin et al. ’09]: Mean per word Recall (R) Mean per word Precision (P) F1 Measure Number of words with recall > 0 (N+) Fixed annotation length of 5 keywords
  25. 25. F1 score of CRM model variants Corel 5K IAPR TC12 ESP Game 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 CRM CRM 15 SKL-CRM F1
  26. 26. F1 score of CRM model variants Corel 5K IAPR TC12 ESP Game 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 CRM CRM 15 SKL-CRM F1 Original CRM Duygulu et al. features
  27. 27. F1 score of CRM model variants Corel 5K IAPR TC12 ESP Game 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 CRM CRM 15 SKL-CRM F1 Original CRM Duygulu et al. features Original CRM 15 Tagprop features +71%
  28. 28. F1 score of CRM model variants Corel 5K IAPR TC12 ESP Game 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 CRM CRM 15 SKL-CRM F1 Original CRM Duygulu et al. features Original CRM 15 Tagprop features +71% SKL-CRM 15 Tagprop features +45%
  29. 29. F1 score of SKL-CRM on Corel 5K HSV_V3H1 DS HS_V3H1 HSV HS HH_V3H1 GIST LAB_V3H1 RGB_V3H1 RGB DH_V3H1 DH HH LAB DS_V3H1 0.31 0.33 0.35 0.37 0.39 0.41 0.43 0.45 SKL-CRM (Valid F1) SKL-CRM (Test F1) Tagprop (Test F1) Feature type F1
  30. 30. F1 score of SKL-CRM on Corel 5K HSV_V3H1 DS HS_V3H1 HSV HS HH_V3H1 GIST LAB_V3H1 RGB_V3H1 RGB DH_V3H1 DH HH LAB DS_V3H1 0.31 0.33 0.35 0.37 0.39 0.41 0.43 0.45 SKL-CRM (Valid F1) SKL-CRM (Test F1) Tagprop (Test F1) Feature type F1
  31. 31. F1 score of SKL-CRM on Corel 5K HSV_V3H1 DS HS_V3H1 HSV HS HH_V3H1 GIST LAB_V3H1 RGB_V3H1 RGB DH_V3H1 DH HH LAB DS_V3H1 0.31 0.33 0.35 0.37 0.39 0.41 0.43 0.45 SKL-CRM (Valid F1) SKL-CRM (Test F1) Tagprop (Test F1) Feature type F1
  32. 32. F1 score of SKL-CRM on Corel 5K HSV_V3H1 DS HS_V3H1 HSV HS HH_V3H1 GIST LAB_V3H1 RGB_V3H1 RGB DH_V3H1 DH HH LAB DS_V3H1 0.31 0.33 0.35 0.37 0.39 0.41 0.43 0.45 SKL-CRM (Valid F1) SKL-CRM (Test F1) Tagprop (Test F1) Feature type F1
  33. 33. F1 score of SKL-CRM on Corel 5K HSV_V3H1 DS HS_V3H1 HSV HS HH_V3H1 GIST LAB_V3H1 RGB_V3H1 RGB DH_V3H1 DH HH LAB DS_V3H1 0.31 0.33 0.35 0.37 0.39 0.41 0.43 0.45 SKL-CRM (Valid F1) SKL-CRM (Test F1) Tagprop (Test F1) Feature type F1
  34. 34. Optimal kernel-feature alignments on Corel 5K Optimal alignments1: HSV: Multinomial (λ = 0.99) HSV V3H1: Generalised Gaussian (p=0.9) Harris Hue (HH V3H1): Generalised Gaussian (p=0.1) ≈ Dirac spike! Harris SIFT (HS): Gaussian HS V3H1: Generalised Gaussian (p=0.7) DenseSift (DS): Laplacian Our data-driven kernels more effective than standard kernels No alignment agrees with literature default assignment i.e. Gaussian for Gist, Laplacian for colour histogram, χ2 for SIFT 1 V3H1 denotes descriptors computed in a spatial arrangement
  35. 35. SKL-CRM Results vs. Literature (Precision & Recall) R P R P 0.20 0.25 0.30 0.35 0.40 0.45 0.50 MBRM JEC Tagprop GS SKL-CRM Corel 5K IAPR TC12
  36. 36. SKL-CRM Results vs. Literature (N+) MBRM JEC Tagprop GS SKL-CRM 0 50 100 150 200 250 300 Corel 5K IAPR TC12 N+
  37. 37. Conclusion Overview SKL-CRM Evaluation Conclusion
  38. 38. Conclusions and Future Work Proposed a sparse kernel model for image annotation Key experimental findings: Default kernel-feature alignment suboptimal Data-adaptive kernels are superior to standard kernels Sparse set of features just as effective as much larger set Greedy forward selection as effective as gradient ascent Future work: superposition of kernels per feature type
  39. 39. Thank you for your attention Sean Moran sean.moran@ed.ac.uk www.seanjmoran.com

×