Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Sparse Kernel Learning for Image Annotation
Sean Moran and Victor Lavrenko
Institute of Language, Cognition and Computatio...
Sparse Kernel Learning for Image Annotation
Overview
SKL-CRM
Evaluation
Conclusion
Sparse Kernel Learning for Image Annotation
Overview
SKL-CRM
Evaluation
Conclusion
Assigning words to pictures
Feature
Extraction
GIST SIFT LAB HAAR
Tiger, Grass,
Whiskers
City, Castle,
Smoke
Tiger, Tree,
...
Previous work
Topic models: latent Dirichlet allocation (LDA) [Barnard et
al. ’03], Machine Translation [Duygulu et al. ’0...
Combining different feature types
Previous work: linear combination of feature distances in a
weighted summation with “defa...
Data-adaptive visual kernels
Our contribution: permit the visual kernels themselves to
adapt to the data:
Kernels
x
GG(x;p...
Data-adaptive visual kernels
Our contribution: permit the visual kernels themselves to
adapt to the data:
Kernels
x
GG(x;p...
Sparse Kernel Continuous Relevance Model (SKL-CRM)
Overview
SKL-CRM
Evaluation
Conclusion
Continuous Relevance Model (CRM)
CRM estimates joint distribution of image features (f) and
words (w)[Lavrenko et al. 2003...
Sparse Kernel Learning CRM (SKL-CRM)
Introduce binary kernel-feature alignment matrix Ψu,v
P(I|J) =
M
i=1
R
j=1
exp −
1
β ...
Generalised Gaussian Kernel
Shape factor p: traces out an infinite family of kernels
P(fi |fj ) =
p1−1/p
2βΓ(1/p)
exp −
1
p...
Generalised Gaussian Kernel
Shape factor p: traces out an infinite family of kernels
P(fi |fj ) =
p1−1/p
2βΓ(1/p)
exp −
1
p...
Generalised Gaussian Kernel
Shape factor p: traces out an infinite family of kernels
P(fi |fj ) =
p1−1/p
2βΓ(1/p)
exp −
1
p...
Generalised Gaussian Kernel
Shape factor p: traces out an infinite family of kernels
P(fi |fj ) =
p1−1/p
2βΓ(1/p)
exp −
1
p...
Multinomial Kernel
Multinomial kernel optimised for count-based features:
P(fi |fj ) =
( d fi,d )!
d (fi,d !)
d
(pj,d )fi,...
Greedy kernel-feature alignment
Features
Kernels
Laplacian
GIST HAAR
Gaussian Uniform
X1
X2
X3
X4
X5
X6
X1
X2
X3
X4
X5
X6
...
Greedy kernel-feature alignment
Features
Kernels
Laplacian
GIST HAAR
Uniform
X1
X2
X3
X4
X5
X6
X1
X2
X3
X4
X5
X6
X1
X2
X3
...
Greedy kernel-feature alignment
Features
GIST HAAR
X1
X2
X3
X4
X5
X6
X1
X2
X3
X4
X5
X6
X1
X2
X3
X4
X5
X6
X1
X2
X3
X4
X5
X6...
Greedy kernel-feature alignment
Features
GIST HAAR
X1
X2
X3
X4
X5
X6
X1
X2
X3
X4
X5
X6
X1
X2
X3
X4
X5
X6
X1
X2
X3
X4
X5
X6...
Greedy kernel-feature alignment
Features
GIST HAAR
X1
X2
X3
X4
X5
X6
X1
X2
X3
X4
X5
X6
X1
X2
X3
X4
X5
X6
X1
X2
X3
X4
X5
X6...
Evaluation
Overview
SKL-CRM
Evaluation
Conclusion
Datasets/Features
Standard evaluation datasets:
Corel 5K: 5,000 images (landscapes, cities), 260 keywords
IAPR TC12: 19,62...
Evaluation Metrics
Standard evaluation metrics [Guillaumin et al. ’09]:
Mean per word Recall (R)
Mean per word Precision (...
F1 score of CRM model variants
Corel 5K IAPR TC12 ESP Game
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
0.45
CRM
CRM 15
SK...
F1 score of CRM model variants
Corel 5K IAPR TC12 ESP Game
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
0.45
CRM
CRM 15
SK...
F1 score of CRM model variants
Corel 5K IAPR TC12 ESP Game
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
0.45
CRM
CRM 15
SK...
F1 score of CRM model variants
Corel 5K IAPR TC12 ESP Game
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
0.45
CRM
CRM 15
SK...
F1 score of SKL-CRM on Corel 5K
HSV_V3H1
DS
HS_V3H1
HSV
HS
HH_V3H1
GIST
LAB_V3H1
RGB_V3H1
RGB
DH_V3H1
DH
HH
LAB
DS_V3H1
0....
F1 score of SKL-CRM on Corel 5K
HSV_V3H1
DS
HS_V3H1
HSV
HS
HH_V3H1
GIST
LAB_V3H1
RGB_V3H1
RGB
DH_V3H1
DH
HH
LAB
DS_V3H1
0....
F1 score of SKL-CRM on Corel 5K
HSV_V3H1
DS
HS_V3H1
HSV
HS
HH_V3H1
GIST
LAB_V3H1
RGB_V3H1
RGB
DH_V3H1
DH
HH
LAB
DS_V3H1
0....
F1 score of SKL-CRM on Corel 5K
HSV_V3H1
DS
HS_V3H1
HSV
HS
HH_V3H1
GIST
LAB_V3H1
RGB_V3H1
RGB
DH_V3H1
DH
HH
LAB
DS_V3H1
0....
F1 score of SKL-CRM on Corel 5K
HSV_V3H1
DS
HS_V3H1
HSV
HS
HH_V3H1
GIST
LAB_V3H1
RGB_V3H1
RGB
DH_V3H1
DH
HH
LAB
DS_V3H1
0....
Optimal kernel-feature alignments on Corel 5K
Optimal alignments1:
HSV: Multinomial (λ = 0.99)
HSV V3H1: Generalised Gauss...
SKL-CRM Results vs. Literature (Precision & Recall)
R P R P
0.20
0.25
0.30
0.35
0.40
0.45
0.50
MBRM JEC
Tagprop GS
SKL-CRM...
SKL-CRM Results vs. Literature (N+)
MBRM JEC Tagprop GS SKL-CRM
0
50
100
150
200
250
300
Corel 5K
IAPR TC12
N+
Conclusion
Overview
SKL-CRM
Evaluation
Conclusion
Conclusions and Future Work
Proposed a sparse kernel model for image annotation
Key experimental findings:
Default kernel-f...
Thank you for your attention
Sean Moran
sean.moran@ed.ac.uk
www.seanjmoran.com
Upcoming SlideShare
Loading in …5
×

Sparse Kernel Learning for Image Annotation

1,387 views

Published on

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Sparse Kernel Learning for Image Annotation

  1. 1. Sparse Kernel Learning for Image Annotation Sean Moran and Victor Lavrenko Institute of Language, Cognition and Computation School of Informatics University of Edinburgh ICMR’14 Glasgow, April 2014
  2. 2. Sparse Kernel Learning for Image Annotation Overview SKL-CRM Evaluation Conclusion
  3. 3. Sparse Kernel Learning for Image Annotation Overview SKL-CRM Evaluation Conclusion
  4. 4. Assigning words to pictures Feature Extraction GIST SIFT LAB HAAR Tiger, Grass, Whiskers City, Castle, Smoke Tiger, Tree, Leaves Eagle, Sky Training Dataset P(Tiger | ) = 0.15 P(Grass | ) = 0.12 P(Whiskers| ) = 0.12 Top 5 words as annotation This talk: How best to combine features? Multiple Features Ranked list of words Tiger, Grass, Tree Leaves, Whiskers Annotation Model P(Leaves | ) = 0.10 P(Tree | ) = 0.10 P(Smoke | ) = 0.01 Testing Image P(City | ) = 0.03 P(Waterfall | ) = 0.05 P(Castle | ) = 0.03 P(Eagle | ) = 0.02 P(Sky | ) = 0.08 X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6 X6 X5 X4 X3 X2 X1 X6 X5 X4 X3 X2 X1 X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6 X6 X5 X4 X3 X2 X1 X6 X5 X4 X3 X2 X1 X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6 X6 X5 X4 X3 X2 X1 X6 X5 X4 X3 X2 X1 X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6 X6 X5 X4 X3 X2 X1 X6 X5 X4 X3 X2 X1 X1 X2 X3 X4 X5 X6
  5. 5. Previous work Topic models: latent Dirichlet allocation (LDA) [Barnard et al. ’03], Machine Translation [Duygulu et al. ’02] Mixture models: Continuous Relevance Model (CRM) [Lavrenko et al. ’03], Multiple Bernoulli Relevance Model (MBRM) [Feng ’04] Discriminative models: Support Vector Machine (SVM) [Verma and Jahawar ’13], Passive Aggressive Classifier [Grangier ’08] Local learning models: Joint Equal Contribution (JEC) [Makadia’08], Tag Propagation (Tagprop) [Guillaumin et al. ’09], Two-pass KNN (2PKNN) [Verma et al. ’12]
  6. 6. Combining different feature types Previous work: linear combination of feature distances in a weighted summation with “default” kernels: Kernels x GG(x;p) p =1 x GG(x;p) p =15 x GG(x;p) p =2 Laplacian UniformGaussian Standard kernel assignment: Gaussian for Gist, Laplacian for colour features, χ2 for SIFT
  7. 7. Data-adaptive visual kernels Our contribution: permit the visual kernels themselves to adapt to the data: Kernels x GG(x;p) p =1 x GG(x;p) p =15 x GG(x;p) p =2 Laplacian UniformGaussian Corel 5K Hypothesis: Optimal kernels for GIST, SIFT etc dependent on the image dataset itself
  8. 8. Data-adaptive visual kernels Our contribution: permit the visual kernels themselves to adapt to the data: Kernels x GG(x;p) p =1 x GG(x;p) p =15 x GG(x;p) p =2 Laplacian UniformGaussian IAPR TC12 Hypothesis: Optimal kernels for GIST, SIFT etc dependent on the image dataset itself
  9. 9. Sparse Kernel Continuous Relevance Model (SKL-CRM) Overview SKL-CRM Evaluation Conclusion
  10. 10. Continuous Relevance Model (CRM) CRM estimates joint distribution of image features (f) and words (w)[Lavrenko et al. 2003]: P(w, f) = J∈T P(J) N j=1 P(wj |J) M i=1 P(fi |J) P(J): Uniform prior for training image J P(fi |J): Gaussian non-parametric kernel density estimate P(wi |J): Multinomial for word smoothing Estimate marginal probability distribution over individual tags: P(w|f) = P(w, f) w P(w, f) Top e.g. 5 words with highest P(w|f) used as annotation
  11. 11. Sparse Kernel Learning CRM (SKL-CRM) Introduce binary kernel-feature alignment matrix Ψu,v P(I|J) = M i=1 R j=1 exp − 1 β u,v Ψu,v kv (f u i , f u j ) kv (f u i , f u j ): v-th kernel function on the u-th feature type β: kernel bandwidth parameter Goal: learn Ψu,v by directly maximising annotation F1 score on held-out validation dataset
  12. 12. Generalised Gaussian Kernel Shape factor p: traces out an infinite family of kernels P(fi |fj ) = p1−1/p 2βΓ(1/p) exp − 1 p |fi − fj |p βp Γ: Gamma function β: kernel bandwidth parameter
  13. 13. Generalised Gaussian Kernel Shape factor p: traces out an infinite family of kernels P(fi |fj ) = p1−1/p 2βΓ(1/p) exp − 1 p |fi − fj |p βp x GG(x;p) p =2
  14. 14. Generalised Gaussian Kernel Shape factor p: traces out an infinite family of kernels P(fi |fj ) = p1−1/p 2βΓ(1/p) exp − 1 p |fi − fj |p βp x GG(x;p) p =1
  15. 15. Generalised Gaussian Kernel Shape factor p: traces out an infinite family of kernels P(fi |fj ) = p1−1/p 2βΓ(1/p) exp − 1 p |fi − fj |p βp x GG(x;p) p =15
  16. 16. Multinomial Kernel Multinomial kernel optimised for count-based features: P(fi |fj ) = ( d fi,d )! d (fi,d !) d (pj,d )fi,d fi,d : count for bin d in the unlabelled image i fj,d count for the training image j Jelinek-Mercer smoothing used to estimate pj,d : pj,d = λ fj,d d fj,d + (1 − λ) j fj,d j,d fj,d We also consider standard χ2 and Hellinger kernels
  17. 17. Greedy kernel-feature alignment Features Kernels Laplacian GIST HAAR Gaussian Uniform X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6 SIFT LAB 0 0 0 0 0 0 0 0 0 0 0 0 GIST SIFT LAB HAAR Laplacian Gaussian Uniform Ψvu X6 Iteration 0: F1 0.0 Features GIST HAAR X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6 SIFT LAB X6 Testing Image Training Image x GG(x;p) p =1 x GG(x;p) p =15 x GG(x;p) p =2
  18. 18. Greedy kernel-feature alignment Features Kernels Laplacian GIST HAAR Uniform X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6 SIFT LAB 0 0 0 0 1 0 0 0 0 0 0 0 GIST SIFT LAB HAAR Laplacian Gaussian Uniform Ψvu X6 Iteration 1: F1 0.25 Features GIST HAAR X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6 SIFT LAB X6 Testing Image Training Image x GG(x;p) p =1 x GG(x;p) p =15 x GG(x;p) p =2 Gaussian
  19. 19. Greedy kernel-feature alignment Features GIST HAAR X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6 SIFT LAB 0 0 0 0 1 0 0 0 0 0 0 1 GIST SIFT LAB HAAR Laplacian Gaussian Uniform Ψvu X6 Iteration 2: F1 0.34 Features GIST HAAR X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6 SIFT LAB X6 Testing Image Training Image Kernels Laplacian Uniform x GG(x;p) p =1 x GG(x;p) p =15 x GG(x;p) p =2 Gaussian
  20. 20. Greedy kernel-feature alignment Features GIST HAAR X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6 SIFT LAB 0 0 0 0 1 1 0 0 0 0 0 1 GIST SIFT LAB HAAR Laplacian Gaussian Uniform Ψvu X6 Iteration 3: F1 0.38 Features GIST HAAR X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6 SIFT LAB X6 Testing Image Training Image Kernels x GG(x;p) p =1 x GG(x;p) p =15 x GG(x;p) p =2 Gaussian Laplacian Uniform
  21. 21. Greedy kernel-feature alignment Features GIST HAAR X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6 SIFT LAB 0 0 1 0 1 1 0 0 0 0 0 1 GIST SIFT LAB HAAR Laplacian Gaussian Uniform Ψvu X6 Iteration 4: F1 0.42 Features GIST HAAR X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6 SIFT LAB X6 Testing Image Training Image Kernels Laplacian Uniform x GG(x;p) p =1 x GG(x;p) p =15 x GG(x;p) p =2 Gaussian
  22. 22. Evaluation Overview SKL-CRM Evaluation Conclusion
  23. 23. Datasets/Features Standard evaluation datasets: Corel 5K: 5,000 images (landscapes, cities), 260 keywords IAPR TC12: 19,627 images (tourism, sports), 291 keywords ESP Game: 20,768 images (drawings, graphs), 268 keywords Standard “Tagprop” feature set [Guillaumin et al. ’09]: Bag-of-words histograms: SIFT [Lowe ’04] and Hue [van de Weijer & Schmid ’06] Global colour histograms: RGB, HSV, LAB Global GIST descriptor [Oliva & Torralba ’01] Descriptors, except GIST, also computed in a 3x1 spatial arrangement [Lazebnik et al. ’06]
  24. 24. Evaluation Metrics Standard evaluation metrics [Guillaumin et al. ’09]: Mean per word Recall (R) Mean per word Precision (P) F1 Measure Number of words with recall > 0 (N+) Fixed annotation length of 5 keywords
  25. 25. F1 score of CRM model variants Corel 5K IAPR TC12 ESP Game 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 CRM CRM 15 SKL-CRM F1
  26. 26. F1 score of CRM model variants Corel 5K IAPR TC12 ESP Game 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 CRM CRM 15 SKL-CRM F1 Original CRM Duygulu et al. features
  27. 27. F1 score of CRM model variants Corel 5K IAPR TC12 ESP Game 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 CRM CRM 15 SKL-CRM F1 Original CRM Duygulu et al. features Original CRM 15 Tagprop features +71%
  28. 28. F1 score of CRM model variants Corel 5K IAPR TC12 ESP Game 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 CRM CRM 15 SKL-CRM F1 Original CRM Duygulu et al. features Original CRM 15 Tagprop features +71% SKL-CRM 15 Tagprop features +45%
  29. 29. F1 score of SKL-CRM on Corel 5K HSV_V3H1 DS HS_V3H1 HSV HS HH_V3H1 GIST LAB_V3H1 RGB_V3H1 RGB DH_V3H1 DH HH LAB DS_V3H1 0.31 0.33 0.35 0.37 0.39 0.41 0.43 0.45 SKL-CRM (Valid F1) SKL-CRM (Test F1) Tagprop (Test F1) Feature type F1
  30. 30. F1 score of SKL-CRM on Corel 5K HSV_V3H1 DS HS_V3H1 HSV HS HH_V3H1 GIST LAB_V3H1 RGB_V3H1 RGB DH_V3H1 DH HH LAB DS_V3H1 0.31 0.33 0.35 0.37 0.39 0.41 0.43 0.45 SKL-CRM (Valid F1) SKL-CRM (Test F1) Tagprop (Test F1) Feature type F1
  31. 31. F1 score of SKL-CRM on Corel 5K HSV_V3H1 DS HS_V3H1 HSV HS HH_V3H1 GIST LAB_V3H1 RGB_V3H1 RGB DH_V3H1 DH HH LAB DS_V3H1 0.31 0.33 0.35 0.37 0.39 0.41 0.43 0.45 SKL-CRM (Valid F1) SKL-CRM (Test F1) Tagprop (Test F1) Feature type F1
  32. 32. F1 score of SKL-CRM on Corel 5K HSV_V3H1 DS HS_V3H1 HSV HS HH_V3H1 GIST LAB_V3H1 RGB_V3H1 RGB DH_V3H1 DH HH LAB DS_V3H1 0.31 0.33 0.35 0.37 0.39 0.41 0.43 0.45 SKL-CRM (Valid F1) SKL-CRM (Test F1) Tagprop (Test F1) Feature type F1
  33. 33. F1 score of SKL-CRM on Corel 5K HSV_V3H1 DS HS_V3H1 HSV HS HH_V3H1 GIST LAB_V3H1 RGB_V3H1 RGB DH_V3H1 DH HH LAB DS_V3H1 0.31 0.33 0.35 0.37 0.39 0.41 0.43 0.45 SKL-CRM (Valid F1) SKL-CRM (Test F1) Tagprop (Test F1) Feature type F1
  34. 34. Optimal kernel-feature alignments on Corel 5K Optimal alignments1: HSV: Multinomial (λ = 0.99) HSV V3H1: Generalised Gaussian (p=0.9) Harris Hue (HH V3H1): Generalised Gaussian (p=0.1) ≈ Dirac spike! Harris SIFT (HS): Gaussian HS V3H1: Generalised Gaussian (p=0.7) DenseSift (DS): Laplacian Our data-driven kernels more effective than standard kernels No alignment agrees with literature default assignment i.e. Gaussian for Gist, Laplacian for colour histogram, χ2 for SIFT 1 V3H1 denotes descriptors computed in a spatial arrangement
  35. 35. SKL-CRM Results vs. Literature (Precision & Recall) R P R P 0.20 0.25 0.30 0.35 0.40 0.45 0.50 MBRM JEC Tagprop GS SKL-CRM Corel 5K IAPR TC12
  36. 36. SKL-CRM Results vs. Literature (N+) MBRM JEC Tagprop GS SKL-CRM 0 50 100 150 200 250 300 Corel 5K IAPR TC12 N+
  37. 37. Conclusion Overview SKL-CRM Evaluation Conclusion
  38. 38. Conclusions and Future Work Proposed a sparse kernel model for image annotation Key experimental findings: Default kernel-feature alignment suboptimal Data-adaptive kernels are superior to standard kernels Sparse set of features just as effective as much larger set Greedy forward selection as effective as gradient ascent Future work: superposition of kernels per feature type
  39. 39. Thank you for your attention Sean Moran sean.moran@ed.ac.uk www.seanjmoran.com

×