SlideShare a Scribd company logo
1 of 39
Download to read offline
Sparse Kernel Learning for Image Annotation
Sean Moran and Victor Lavrenko
Institute of Language, Cognition and Computation
School of Informatics
University of Edinburgh
ICMR’14 Glasgow, April 2014
Sparse Kernel Learning for Image Annotation
Overview
SKL-CRM
Evaluation
Conclusion
Sparse Kernel Learning for Image Annotation
Overview
SKL-CRM
Evaluation
Conclusion
Assigning words to pictures
Feature
Extraction
GIST SIFT LAB HAAR
Tiger, Grass,
Whiskers
City, Castle,
Smoke
Tiger, Tree,
Leaves
Eagle, Sky
Training Dataset
P(Tiger | ) = 0.15
P(Grass | ) = 0.12
P(Whiskers| ) = 0.12
Top 5 words as
annotation
This talk:
How best to
combine
features?
Multiple Features
Ranked list of words
Tiger, Grass, Tree
Leaves, Whiskers
Annotation Model
P(Leaves | ) = 0.10
P(Tree | ) = 0.10
P(Smoke | ) = 0.01
Testing Image
P(City | ) = 0.03
P(Waterfall | ) = 0.05
P(Castle | ) = 0.03
P(Eagle | ) = 0.02
P(Sky | ) = 0.08
X1
X2
X3
X4
X5
X6
X1
X2
X3
X4
X5
X6
X1
X2
X3
X4
X5
X6
X1
X2
X3
X4
X5
X6
X1
X2
X3
X4
X5
X6
X1
X2
X3
X4
X5
X6
X1
X2
X3
X4
X5
X6
X1
X2
X3
X4
X5
X6
X1
X2
X3
X4
X5
X6
X1
X2
X3
X4
X5
X6
X6
X5
X4
X3
X2
X1
X6
X5
X4
X3
X2
X1
X1
X2
X3
X4
X5
X6
X1
X2
X3
X4
X5
X6
X1
X2
X3
X4
X5
X6
X6
X5
X4
X3
X2
X1
X6
X5
X4
X3
X2
X1
X1
X2
X3
X4
X5
X6
X1
X2
X3
X4
X5
X6
X1
X2
X3
X4
X5
X6
X6
X5
X4
X3
X2
X1
X6
X5
X4
X3
X2
X1
X1
X2
X3
X4
X5
X6
X1
X2
X3
X4
X5
X6
X1
X2
X3
X4
X5
X6
X6
X5
X4
X3
X2
X1
X6
X5
X4
X3
X2
X1
X1
X2
X3
X4
X5
X6
Previous work
Topic models: latent Dirichlet allocation (LDA) [Barnard et
al. ’03], Machine Translation [Duygulu et al. ’02]
Mixture models: Continuous Relevance Model (CRM)
[Lavrenko et al. ’03], Multiple Bernoulli Relevance Model
(MBRM) [Feng ’04]
Discriminative models: Support Vector Machine (SVM)
[Verma and Jahawar ’13], Passive Aggressive Classifier
[Grangier ’08]
Local learning models: Joint Equal Contribution (JEC)
[Makadia’08], Tag Propagation (Tagprop) [Guillaumin et al.
’09], Two-pass KNN (2PKNN) [Verma et al. ’12]
Combining different feature types
Previous work: linear combination of feature distances in a
weighted summation with “default” kernels:
Kernels
x
GG(x;p)
p =1
x
GG(x;p)
p =15
x
GG(x;p)
p =2
Laplacian UniformGaussian
Standard kernel assignment: Gaussian for Gist, Laplacian
for colour features, χ2 for SIFT
Data-adaptive visual kernels
Our contribution: permit the visual kernels themselves to
adapt to the data:
Kernels
x
GG(x;p)
p =1
x
GG(x;p)
p =15
x
GG(x;p)
p =2
Laplacian UniformGaussian
Corel 5K
Hypothesis: Optimal kernels for GIST, SIFT etc dependent
on the image dataset itself
Data-adaptive visual kernels
Our contribution: permit the visual kernels themselves to
adapt to the data:
Kernels
x
GG(x;p)
p =1
x
GG(x;p)
p =15
x
GG(x;p)
p =2
Laplacian UniformGaussian
IAPR TC12
Hypothesis: Optimal kernels for GIST, SIFT etc dependent
on the image dataset itself
Sparse Kernel Continuous Relevance Model (SKL-CRM)
Overview
SKL-CRM
Evaluation
Conclusion
Continuous Relevance Model (CRM)
CRM estimates joint distribution of image features (f) and
words (w)[Lavrenko et al. 2003]:
P(w, f) =
J∈T
P(J)
N
j=1
P(wj |J)
M
i=1
P(fi |J)
P(J): Uniform prior for training image J
P(fi |J): Gaussian non-parametric kernel density estimate
P(wi |J): Multinomial for word smoothing
Estimate marginal probability distribution over individual tags:
P(w|f) =
P(w, f)
w P(w, f)
Top e.g. 5 words with highest P(w|f) used as annotation
Sparse Kernel Learning CRM (SKL-CRM)
Introduce binary kernel-feature alignment matrix Ψu,v
P(I|J) =
M
i=1
R
j=1
exp −
1
β u,v
Ψu,v kv
(f u
i , f u
j )
kv
(f u
i , f u
j ): v-th kernel function on the u-th feature type
β: kernel bandwidth parameter
Goal: learn Ψu,v by directly maximising annotation F1 score
on held-out validation dataset
Generalised Gaussian Kernel
Shape factor p: traces out an infinite family of kernels
P(fi |fj ) =
p1−1/p
2βΓ(1/p)
exp −
1
p
|fi − fj |p
βp
Γ: Gamma function
β: kernel bandwidth parameter
Generalised Gaussian Kernel
Shape factor p: traces out an infinite family of kernels
P(fi |fj ) =
p1−1/p
2βΓ(1/p)
exp −
1
p
|fi − fj |p
βp
x
GG(x;p)
p =2
Generalised Gaussian Kernel
Shape factor p: traces out an infinite family of kernels
P(fi |fj ) =
p1−1/p
2βΓ(1/p)
exp −
1
p
|fi − fj |p
βp
x
GG(x;p)
p =1
Generalised Gaussian Kernel
Shape factor p: traces out an infinite family of kernels
P(fi |fj ) =
p1−1/p
2βΓ(1/p)
exp −
1
p
|fi − fj |p
βp
x
GG(x;p)
p =15
Multinomial Kernel
Multinomial kernel optimised for count-based features:
P(fi |fj ) =
( d fi,d )!
d (fi,d !)
d
(pj,d )fi,d
fi,d : count for bin d in the unlabelled image i
fj,d count for the training image j
Jelinek-Mercer smoothing used to estimate pj,d :
pj,d = λ
fj,d
d fj,d
+ (1 − λ)
j fj,d
j,d fj,d
We also consider standard χ2 and Hellinger kernels
Greedy kernel-feature alignment
Features
Kernels
Laplacian
GIST HAAR
Gaussian Uniform
X1
X2
X3
X4
X5
X6
X1
X2
X3
X4
X5
X6
X1
X2
X3
X4
X5
X6
X1
X2
X3
X4
X5
X6
SIFT LAB
0 0 0 0
0 0 0 0
0 0 0 0
GIST SIFT LAB HAAR
Laplacian
Gaussian
Uniform
Ψvu
X6
Iteration 0:
F1 0.0
Features
GIST HAAR
X1
X2
X3
X4
X5
X6
X1
X2
X3
X4
X5
X6
X1
X2
X3
X4
X5
X6
X1
X2
X3
X4
X5
X6
SIFT LAB
X6
Testing Image
Training Image
x
GG(x;p)
p =1
x
GG(x;p)
p =15
x
GG(x;p)
p =2
Greedy kernel-feature alignment
Features
Kernels
Laplacian
GIST HAAR
Uniform
X1
X2
X3
X4
X5
X6
X1
X2
X3
X4
X5
X6
X1
X2
X3
X4
X5
X6
X1
X2
X3
X4
X5
X6
SIFT LAB
0 0 0 0
1 0 0 0
0 0 0 0
GIST SIFT LAB HAAR
Laplacian
Gaussian
Uniform
Ψvu
X6
Iteration 1:
F1 0.25
Features
GIST HAAR
X1
X2
X3
X4
X5
X6
X1
X2
X3
X4
X5
X6
X1
X2
X3
X4
X5
X6
X1
X2
X3
X4
X5
X6
SIFT LAB
X6
Testing Image
Training Image
x
GG(x;p)
p =1
x
GG(x;p)
p =15
x
GG(x;p)
p =2
Gaussian
Greedy kernel-feature alignment
Features
GIST HAAR
X1
X2
X3
X4
X5
X6
X1
X2
X3
X4
X5
X6
X1
X2
X3
X4
X5
X6
X1
X2
X3
X4
X5
X6
SIFT LAB
0 0 0 0
1 0 0 0
0 0 0 1
GIST SIFT LAB HAAR
Laplacian
Gaussian
Uniform
Ψvu
X6
Iteration 2:
F1 0.34
Features
GIST HAAR
X1
X2
X3
X4
X5
X6
X1
X2
X3
X4
X5
X6
X1
X2
X3
X4
X5
X6
X1
X2
X3
X4
X5
X6
SIFT LAB
X6
Testing Image
Training Image
Kernels
Laplacian Uniform
x
GG(x;p)
p =1
x
GG(x;p)
p =15
x
GG(x;p)
p =2
Gaussian
Greedy kernel-feature alignment
Features
GIST HAAR
X1
X2
X3
X4
X5
X6
X1
X2
X3
X4
X5
X6
X1
X2
X3
X4
X5
X6
X1
X2
X3
X4
X5
X6
SIFT LAB
0 0 0 0
1 1 0 0
0 0 0 1
GIST SIFT LAB HAAR
Laplacian
Gaussian
Uniform
Ψvu
X6
Iteration 3:
F1 0.38
Features
GIST HAAR
X1
X2
X3
X4
X5
X6
X1
X2
X3
X4
X5
X6
X1
X2
X3
X4
X5
X6
X1
X2
X3
X4
X5
X6
SIFT LAB
X6
Testing Image
Training Image
Kernels
x
GG(x;p)
p =1
x
GG(x;p)
p =15
x
GG(x;p)
p =2
Gaussian Laplacian Uniform
Greedy kernel-feature alignment
Features
GIST HAAR
X1
X2
X3
X4
X5
X6
X1
X2
X3
X4
X5
X6
X1
X2
X3
X4
X5
X6
X1
X2
X3
X4
X5
X6
SIFT LAB
0 0 1 0
1 1 0 0
0 0 0 1
GIST SIFT LAB HAAR
Laplacian
Gaussian
Uniform
Ψvu
X6
Iteration 4:
F1 0.42
Features
GIST HAAR
X1
X2
X3
X4
X5
X6
X1
X2
X3
X4
X5
X6
X1
X2
X3
X4
X5
X6
X1
X2
X3
X4
X5
X6
SIFT LAB
X6
Testing Image
Training Image
Kernels
Laplacian Uniform
x
GG(x;p)
p =1
x
GG(x;p)
p =15
x
GG(x;p)
p =2
Gaussian
Evaluation
Overview
SKL-CRM
Evaluation
Conclusion
Datasets/Features
Standard evaluation datasets:
Corel 5K: 5,000 images (landscapes, cities), 260 keywords
IAPR TC12: 19,627 images (tourism, sports), 291 keywords
ESP Game: 20,768 images (drawings, graphs), 268 keywords
Standard “Tagprop” feature set [Guillaumin et al. ’09]:
Bag-of-words histograms: SIFT [Lowe ’04] and Hue [van de
Weijer & Schmid ’06]
Global colour histograms: RGB, HSV, LAB
Global GIST descriptor [Oliva & Torralba ’01]
Descriptors, except GIST, also computed in a 3x1 spatial
arrangement [Lazebnik et al. ’06]
Evaluation Metrics
Standard evaluation metrics [Guillaumin et al. ’09]:
Mean per word Recall (R)
Mean per word Precision (P)
F1 Measure
Number of words with recall > 0 (N+)
Fixed annotation length of 5 keywords
F1 score of CRM model variants
Corel 5K IAPR TC12 ESP Game
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
0.45
CRM
CRM 15
SKL-CRM
F1
F1 score of CRM model variants
Corel 5K IAPR TC12 ESP Game
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
0.45
CRM
CRM 15
SKL-CRM
F1
Original CRM
Duygulu et al.
features
F1 score of CRM model variants
Corel 5K IAPR TC12 ESP Game
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
0.45
CRM
CRM 15
SKL-CRM
F1
Original CRM
Duygulu et al.
features
Original CRM
15 Tagprop
features +71%
F1 score of CRM model variants
Corel 5K IAPR TC12 ESP Game
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
0.45
CRM
CRM 15
SKL-CRM
F1
Original CRM
Duygulu et al.
features
Original CRM
15 Tagprop
features +71%
SKL-CRM
15 Tagprop
features +45%
F1 score of SKL-CRM on Corel 5K
HSV_V3H1
DS
HS_V3H1
HSV
HS
HH_V3H1
GIST
LAB_V3H1
RGB_V3H1
RGB
DH_V3H1
DH
HH
LAB
DS_V3H1
0.31
0.33
0.35
0.37
0.39
0.41
0.43
0.45
SKL-CRM (Valid F1)
SKL-CRM (Test F1)
Tagprop (Test F1)
Feature type
F1
F1 score of SKL-CRM on Corel 5K
HSV_V3H1
DS
HS_V3H1
HSV
HS
HH_V3H1
GIST
LAB_V3H1
RGB_V3H1
RGB
DH_V3H1
DH
HH
LAB
DS_V3H1
0.31
0.33
0.35
0.37
0.39
0.41
0.43
0.45
SKL-CRM (Valid F1)
SKL-CRM (Test F1)
Tagprop (Test F1)
Feature type
F1
F1 score of SKL-CRM on Corel 5K
HSV_V3H1
DS
HS_V3H1
HSV
HS
HH_V3H1
GIST
LAB_V3H1
RGB_V3H1
RGB
DH_V3H1
DH
HH
LAB
DS_V3H1
0.31
0.33
0.35
0.37
0.39
0.41
0.43
0.45
SKL-CRM (Valid F1)
SKL-CRM (Test F1)
Tagprop (Test F1)
Feature type
F1
F1 score of SKL-CRM on Corel 5K
HSV_V3H1
DS
HS_V3H1
HSV
HS
HH_V3H1
GIST
LAB_V3H1
RGB_V3H1
RGB
DH_V3H1
DH
HH
LAB
DS_V3H1
0.31
0.33
0.35
0.37
0.39
0.41
0.43
0.45
SKL-CRM (Valid F1)
SKL-CRM (Test F1)
Tagprop (Test F1)
Feature type
F1
F1 score of SKL-CRM on Corel 5K
HSV_V3H1
DS
HS_V3H1
HSV
HS
HH_V3H1
GIST
LAB_V3H1
RGB_V3H1
RGB
DH_V3H1
DH
HH
LAB
DS_V3H1
0.31
0.33
0.35
0.37
0.39
0.41
0.43
0.45
SKL-CRM (Valid F1)
SKL-CRM (Test F1)
Tagprop (Test F1)
Feature type
F1
Optimal kernel-feature alignments on Corel 5K
Optimal alignments1:
HSV: Multinomial (λ = 0.99)
HSV V3H1: Generalised Gaussian (p=0.9)
Harris Hue (HH V3H1): Generalised Gaussian (p=0.1) ≈
Dirac spike!
Harris SIFT (HS): Gaussian
HS V3H1: Generalised Gaussian (p=0.7)
DenseSift (DS): Laplacian
Our data-driven kernels more effective than standard kernels
No alignment agrees with literature default assignment i.e.
Gaussian for Gist, Laplacian for colour histogram, χ2 for SIFT
1
V3H1 denotes descriptors computed in a spatial arrangement
SKL-CRM Results vs. Literature (Precision & Recall)
R P R P
0.20
0.25
0.30
0.35
0.40
0.45
0.50
MBRM JEC
Tagprop GS
SKL-CRM
Corel 5K IAPR TC12
SKL-CRM Results vs. Literature (N+)
MBRM JEC Tagprop GS SKL-CRM
0
50
100
150
200
250
300
Corel 5K
IAPR TC12
N+
Conclusion
Overview
SKL-CRM
Evaluation
Conclusion
Conclusions and Future Work
Proposed a sparse kernel model for image annotation
Key experimental findings:
Default kernel-feature alignment suboptimal
Data-adaptive kernels are superior to standard kernels
Sparse set of features just as effective as much larger set
Greedy forward selection as effective as gradient ascent
Future work: superposition of kernels per feature type
Thank you for your attention
Sean Moran
sean.moran@ed.ac.uk
www.seanjmoran.com

More Related Content

What's hot

CVPR2010: higher order models in computer vision: Part 4
CVPR2010: higher order models in computer vision: Part 4 CVPR2010: higher order models in computer vision: Part 4
CVPR2010: higher order models in computer vision: Part 4
zukun
 

What's hot (20)

Kernelization algorithms for graph and other structure modification problems
Kernelization algorithms for graph and other structure modification problemsKernelization algorithms for graph and other structure modification problems
Kernelization algorithms for graph and other structure modification problems
 
Learning to Reconstruct at Stanford
Learning to Reconstruct at StanfordLearning to Reconstruct at Stanford
Learning to Reconstruct at Stanford
 
Meta-learning and the ELBO
Meta-learning and the ELBOMeta-learning and the ELBO
Meta-learning and the ELBO
 
Accelerating Random Forests in Scikit-Learn
Accelerating Random Forests in Scikit-LearnAccelerating Random Forests in Scikit-Learn
Accelerating Random Forests in Scikit-Learn
 
Variational Autoencoded Regression of Visual Data with Generative Adversarial...
Variational Autoencoded Regression of Visual Data with Generative Adversarial...Variational Autoencoded Regression of Visual Data with Generative Adversarial...
Variational Autoencoded Regression of Visual Data with Generative Adversarial...
 
4 informed-search
4 informed-search4 informed-search
4 informed-search
 
A TRAINING METHOD USING
 DNN-GUIDED LAYERWISE PRETRAINING
 FOR DEEP GAUSSIAN ...
A TRAINING METHOD USING
 DNN-GUIDED LAYERWISE PRETRAINING
 FOR DEEP GAUSSIAN ...A TRAINING METHOD USING
 DNN-GUIDED LAYERWISE PRETRAINING
 FOR DEEP GAUSSIAN ...
A TRAINING METHOD USING
 DNN-GUIDED LAYERWISE PRETRAINING
 FOR DEEP GAUSSIAN ...
 
Conditional neural processes
Conditional neural processesConditional neural processes
Conditional neural processes
 
2014-06-20 Multinomial Logistic Regression with Apache Spark
2014-06-20 Multinomial Logistic Regression with Apache Spark2014-06-20 Multinomial Logistic Regression with Apache Spark
2014-06-20 Multinomial Logistic Regression with Apache Spark
 
Large scale logistic regression and linear support vector machines using spark
Large scale logistic regression and linear support vector machines using sparkLarge scale logistic regression and linear support vector machines using spark
Large scale logistic regression and linear support vector machines using spark
 
Julia Kreutzer - 2017 - Bandit Structured Prediction for Neural Seq2Seq Learning
Julia Kreutzer - 2017 - Bandit Structured Prediction for Neural Seq2Seq LearningJulia Kreutzer - 2017 - Bandit Structured Prediction for Neural Seq2Seq Learning
Julia Kreutzer - 2017 - Bandit Structured Prediction for Neural Seq2Seq Learning
 
Boosted Tree-based Multinomial Logit Model for Aggregated Market Data
Boosted Tree-based Multinomial Logit Model for Aggregated Market DataBoosted Tree-based Multinomial Logit Model for Aggregated Market Data
Boosted Tree-based Multinomial Logit Model for Aggregated Market Data
 
CVPR2010: higher order models in computer vision: Part 4
CVPR2010: higher order models in computer vision: Part 4 CVPR2010: higher order models in computer vision: Part 4
CVPR2010: higher order models in computer vision: Part 4
 
NIPS2017 Few-shot Learning and Graph Convolution
NIPS2017 Few-shot Learning and Graph ConvolutionNIPS2017 Few-shot Learning and Graph Convolution
NIPS2017 Few-shot Learning and Graph Convolution
 
Variational autoencoders for speech processing d.bielievtsov dataconf 21 04 18
Variational autoencoders for speech processing d.bielievtsov dataconf 21 04 18Variational autoencoders for speech processing d.bielievtsov dataconf 21 04 18
Variational autoencoders for speech processing d.bielievtsov dataconf 21 04 18
 
Polylogarithmic approximation algorithm for weighted F-deletion problems
Polylogarithmic approximation algorithm for weighted F-deletion problemsPolylogarithmic approximation algorithm for weighted F-deletion problems
Polylogarithmic approximation algorithm for weighted F-deletion problems
 
Multinomial Logistic Regression with Apache Spark
Multinomial Logistic Regression with Apache SparkMultinomial Logistic Regression with Apache Spark
Multinomial Logistic Regression with Apache Spark
 
Learning to Reconstruct
Learning to ReconstructLearning to Reconstruct
Learning to Reconstruct
 
LSGAN - SIMPle(Simple Idea Meaningful Performance Level up)
LSGAN - SIMPle(Simple Idea Meaningful Performance Level up)LSGAN - SIMPle(Simple Idea Meaningful Performance Level up)
LSGAN - SIMPle(Simple Idea Meaningful Performance Level up)
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 

Similar to Sparse Kernel Learning for Image Annotation

Software tookits for machine learning and graphical models
Software tookits for machine learning and graphical modelsSoftware tookits for machine learning and graphical models
Software tookits for machine learning and graphical models
butest
 
Deep Learning, Microsoft Cognitive Toolkit (CNTK) and Azure Machine Learning ...
Deep Learning, Microsoft Cognitive Toolkit (CNTK) and Azure Machine Learning ...Deep Learning, Microsoft Cognitive Toolkit (CNTK) and Azure Machine Learning ...
Deep Learning, Microsoft Cognitive Toolkit (CNTK) and Azure Machine Learning ...
Naoki (Neo) SATO
 
INTRODUCTION AND HISTORY OF R PROGRAMMING.pdf
INTRODUCTION AND HISTORY OF R PROGRAMMING.pdfINTRODUCTION AND HISTORY OF R PROGRAMMING.pdf
INTRODUCTION AND HISTORY OF R PROGRAMMING.pdf
ranapoonam1
 

Similar to Sparse Kernel Learning for Image Annotation (20)

ICMR 2014 - Sparse Kernel Learning Poster
ICMR 2014 - Sparse Kernel Learning PosterICMR 2014 - Sparse Kernel Learning Poster
ICMR 2014 - Sparse Kernel Learning Poster
 
Perm winter school 2014.01.31
Perm winter school 2014.01.31Perm winter school 2014.01.31
Perm winter school 2014.01.31
 
Seminar psu 20.10.2013
Seminar psu 20.10.2013Seminar psu 20.10.2013
Seminar psu 20.10.2013
 
Alpine Spark Implementation - Technical
Alpine Spark Implementation - TechnicalAlpine Spark Implementation - Technical
Alpine Spark Implementation - Technical
 
Software tookits for machine learning and graphical models
Software tookits for machine learning and graphical modelsSoftware tookits for machine learning and graphical models
Software tookits for machine learning and graphical models
 
RLTopics_2021_Lect1.pdf
RLTopics_2021_Lect1.pdfRLTopics_2021_Lect1.pdf
RLTopics_2021_Lect1.pdf
 
R Language Introduction
R Language IntroductionR Language Introduction
R Language Introduction
 
Python Lab manual program for BE First semester (all department
Python Lab manual program for BE First semester (all departmentPython Lab manual program for BE First semester (all department
Python Lab manual program for BE First semester (all department
 
Hands-on Tutorial of Machine Learning in Python
Hands-on Tutorial of Machine Learning in PythonHands-on Tutorial of Machine Learning in Python
Hands-on Tutorial of Machine Learning in Python
 
Apache SystemML Optimizer and Runtime techniques by Arvind Surve and Matthias...
Apache SystemML Optimizer and Runtime techniques by Arvind Surve and Matthias...Apache SystemML Optimizer and Runtime techniques by Arvind Surve and Matthias...
Apache SystemML Optimizer and Runtime techniques by Arvind Surve and Matthias...
 
Apache SystemML Optimizer and Runtime techniques by Arvind Surve and Matthias...
Apache SystemML Optimizer and Runtime techniques by Arvind Surve and Matthias...Apache SystemML Optimizer and Runtime techniques by Arvind Surve and Matthias...
Apache SystemML Optimizer and Runtime techniques by Arvind Surve and Matthias...
 
Scala as a Declarative Language
Scala as a Declarative LanguageScala as a Declarative Language
Scala as a Declarative Language
 
Deep Learning, Microsoft Cognitive Toolkit (CNTK) and Azure Machine Learning ...
Deep Learning, Microsoft Cognitive Toolkit (CNTK) and Azure Machine Learning ...Deep Learning, Microsoft Cognitive Toolkit (CNTK) and Azure Machine Learning ...
Deep Learning, Microsoft Cognitive Toolkit (CNTK) and Azure Machine Learning ...
 
Rtutorial
RtutorialRtutorial
Rtutorial
 
introtorandrstudio.ppt
introtorandrstudio.pptintrotorandrstudio.ppt
introtorandrstudio.ppt
 
What's new in Apache SystemML - Declarative Machine Learning
What's new in Apache SystemML  - Declarative Machine LearningWhat's new in Apache SystemML  - Declarative Machine Learning
What's new in Apache SystemML - Declarative Machine Learning
 
IIUG 2016 Gathering Informix data into R
IIUG 2016 Gathering Informix data into RIIUG 2016 Gathering Informix data into R
IIUG 2016 Gathering Informix data into R
 
Early Results of OpenMP 4.5 Portability on NVIDIA GPUs & CPUs
Early Results of OpenMP 4.5 Portability on NVIDIA GPUs & CPUsEarly Results of OpenMP 4.5 Portability on NVIDIA GPUs & CPUs
Early Results of OpenMP 4.5 Portability on NVIDIA GPUs & CPUs
 
Problem Understanding through Landscape Theory
Problem Understanding through Landscape TheoryProblem Understanding through Landscape Theory
Problem Understanding through Landscape Theory
 
INTRODUCTION AND HISTORY OF R PROGRAMMING.pdf
INTRODUCTION AND HISTORY OF R PROGRAMMING.pdfINTRODUCTION AND HISTORY OF R PROGRAMMING.pdf
INTRODUCTION AND HISTORY OF R PROGRAMMING.pdf
 

More from Sean Moran

Deep Local Parametric Filters for Image Enhancement
Deep Local Parametric Filters for Image EnhancementDeep Local Parametric Filters for Image Enhancement
Deep Local Parametric Filters for Image Enhancement
Sean Moran
 
Deep Local Parametric Filters for Image Enhancement
Deep Local Parametric Filters for Image EnhancementDeep Local Parametric Filters for Image Enhancement
Deep Local Parametric Filters for Image Enhancement
Sean Moran
 

More from Sean Moran (10)

Deep Local Parametric Filters for Image Enhancement
Deep Local Parametric Filters for Image EnhancementDeep Local Parametric Filters for Image Enhancement
Deep Local Parametric Filters for Image Enhancement
 
Deep Local Parametric Filters for Image Enhancement
Deep Local Parametric Filters for Image EnhancementDeep Local Parametric Filters for Image Enhancement
Deep Local Parametric Filters for Image Enhancement
 
Learning to Project and Binarise for Hashing-based Approximate Nearest Neighb...
Learning to Project and Binarise for Hashing-based Approximate Nearest Neighb...Learning to Project and Binarise for Hashing-based Approximate Nearest Neighb...
Learning to Project and Binarise for Hashing-based Approximate Nearest Neighb...
 
PhD thesis defence slides
PhD thesis defence slidesPhD thesis defence slides
PhD thesis defence slides
 
Regularised Cross-Modal Hashing (SIGIR'15 Poster)
Regularised Cross-Modal Hashing (SIGIR'15 Poster)Regularised Cross-Modal Hashing (SIGIR'15 Poster)
Regularised Cross-Modal Hashing (SIGIR'15 Poster)
 
Graph Regularised Hashing (ECIR'15 Talk)
Graph Regularised Hashing (ECIR'15 Talk)Graph Regularised Hashing (ECIR'15 Talk)
Graph Regularised Hashing (ECIR'15 Talk)
 
Graph Regularised Hashing
Graph Regularised HashingGraph Regularised Hashing
Graph Regularised Hashing
 
Data Science Research Day (Talk)
Data Science Research Day (Talk)Data Science Research Day (Talk)
Data Science Research Day (Talk)
 
ACL Variable Bit Quantisation Talk
ACL Variable Bit Quantisation TalkACL Variable Bit Quantisation Talk
ACL Variable Bit Quantisation Talk
 
Neighbourhood Preserving Quantisation for LSH SIGIR Poster
Neighbourhood Preserving Quantisation for LSH SIGIR PosterNeighbourhood Preserving Quantisation for LSH SIGIR Poster
Neighbourhood Preserving Quantisation for LSH SIGIR Poster
 

Recently uploaded

Recently uploaded (20)

Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutes
 
ECS 2024 Teams Premium - Pretty Secure
ECS 2024   Teams Premium - Pretty SecureECS 2024   Teams Premium - Pretty Secure
ECS 2024 Teams Premium - Pretty Secure
 
The Metaverse: Are We There Yet?
The  Metaverse:    Are   We  There  Yet?The  Metaverse:    Are   We  There  Yet?
The Metaverse: Are We There Yet?
 
Syngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdfSyngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdf
 
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
 
IESVE for Early Stage Design and Planning
IESVE for Early Stage Design and PlanningIESVE for Early Stage Design and Planning
IESVE for Early Stage Design and Planning
 
What's New in Teams Calling, Meetings and Devices April 2024
What's New in Teams Calling, Meetings and Devices April 2024What's New in Teams Calling, Meetings and Devices April 2024
What's New in Teams Calling, Meetings and Devices April 2024
 
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
 
AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101
 
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
 
Powerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara LaskowskaPowerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara Laskowska
 
IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024
 
Demystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyDemystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John Staveley
 
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
 
A Business-Centric Approach to Design System Strategy
A Business-Centric Approach to Design System StrategyA Business-Centric Approach to Design System Strategy
A Business-Centric Approach to Design System Strategy
 
PLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. StartupsPLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. Startups
 
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi IbrahimzadeFree and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
 
Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024
 
Strategic AI Integration in Engineering Teams
Strategic AI Integration in Engineering TeamsStrategic AI Integration in Engineering Teams
Strategic AI Integration in Engineering Teams
 
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
 

Sparse Kernel Learning for Image Annotation

  • 1. Sparse Kernel Learning for Image Annotation Sean Moran and Victor Lavrenko Institute of Language, Cognition and Computation School of Informatics University of Edinburgh ICMR’14 Glasgow, April 2014
  • 2. Sparse Kernel Learning for Image Annotation Overview SKL-CRM Evaluation Conclusion
  • 3. Sparse Kernel Learning for Image Annotation Overview SKL-CRM Evaluation Conclusion
  • 4. Assigning words to pictures Feature Extraction GIST SIFT LAB HAAR Tiger, Grass, Whiskers City, Castle, Smoke Tiger, Tree, Leaves Eagle, Sky Training Dataset P(Tiger | ) = 0.15 P(Grass | ) = 0.12 P(Whiskers| ) = 0.12 Top 5 words as annotation This talk: How best to combine features? Multiple Features Ranked list of words Tiger, Grass, Tree Leaves, Whiskers Annotation Model P(Leaves | ) = 0.10 P(Tree | ) = 0.10 P(Smoke | ) = 0.01 Testing Image P(City | ) = 0.03 P(Waterfall | ) = 0.05 P(Castle | ) = 0.03 P(Eagle | ) = 0.02 P(Sky | ) = 0.08 X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6 X6 X5 X4 X3 X2 X1 X6 X5 X4 X3 X2 X1 X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6 X6 X5 X4 X3 X2 X1 X6 X5 X4 X3 X2 X1 X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6 X6 X5 X4 X3 X2 X1 X6 X5 X4 X3 X2 X1 X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6 X6 X5 X4 X3 X2 X1 X6 X5 X4 X3 X2 X1 X1 X2 X3 X4 X5 X6
  • 5. Previous work Topic models: latent Dirichlet allocation (LDA) [Barnard et al. ’03], Machine Translation [Duygulu et al. ’02] Mixture models: Continuous Relevance Model (CRM) [Lavrenko et al. ’03], Multiple Bernoulli Relevance Model (MBRM) [Feng ’04] Discriminative models: Support Vector Machine (SVM) [Verma and Jahawar ’13], Passive Aggressive Classifier [Grangier ’08] Local learning models: Joint Equal Contribution (JEC) [Makadia’08], Tag Propagation (Tagprop) [Guillaumin et al. ’09], Two-pass KNN (2PKNN) [Verma et al. ’12]
  • 6. Combining different feature types Previous work: linear combination of feature distances in a weighted summation with “default” kernels: Kernels x GG(x;p) p =1 x GG(x;p) p =15 x GG(x;p) p =2 Laplacian UniformGaussian Standard kernel assignment: Gaussian for Gist, Laplacian for colour features, χ2 for SIFT
  • 7. Data-adaptive visual kernels Our contribution: permit the visual kernels themselves to adapt to the data: Kernels x GG(x;p) p =1 x GG(x;p) p =15 x GG(x;p) p =2 Laplacian UniformGaussian Corel 5K Hypothesis: Optimal kernels for GIST, SIFT etc dependent on the image dataset itself
  • 8. Data-adaptive visual kernels Our contribution: permit the visual kernels themselves to adapt to the data: Kernels x GG(x;p) p =1 x GG(x;p) p =15 x GG(x;p) p =2 Laplacian UniformGaussian IAPR TC12 Hypothesis: Optimal kernels for GIST, SIFT etc dependent on the image dataset itself
  • 9. Sparse Kernel Continuous Relevance Model (SKL-CRM) Overview SKL-CRM Evaluation Conclusion
  • 10. Continuous Relevance Model (CRM) CRM estimates joint distribution of image features (f) and words (w)[Lavrenko et al. 2003]: P(w, f) = J∈T P(J) N j=1 P(wj |J) M i=1 P(fi |J) P(J): Uniform prior for training image J P(fi |J): Gaussian non-parametric kernel density estimate P(wi |J): Multinomial for word smoothing Estimate marginal probability distribution over individual tags: P(w|f) = P(w, f) w P(w, f) Top e.g. 5 words with highest P(w|f) used as annotation
  • 11. Sparse Kernel Learning CRM (SKL-CRM) Introduce binary kernel-feature alignment matrix Ψu,v P(I|J) = M i=1 R j=1 exp − 1 β u,v Ψu,v kv (f u i , f u j ) kv (f u i , f u j ): v-th kernel function on the u-th feature type β: kernel bandwidth parameter Goal: learn Ψu,v by directly maximising annotation F1 score on held-out validation dataset
  • 12. Generalised Gaussian Kernel Shape factor p: traces out an infinite family of kernels P(fi |fj ) = p1−1/p 2βΓ(1/p) exp − 1 p |fi − fj |p βp Γ: Gamma function β: kernel bandwidth parameter
  • 13. Generalised Gaussian Kernel Shape factor p: traces out an infinite family of kernels P(fi |fj ) = p1−1/p 2βΓ(1/p) exp − 1 p |fi − fj |p βp x GG(x;p) p =2
  • 14. Generalised Gaussian Kernel Shape factor p: traces out an infinite family of kernels P(fi |fj ) = p1−1/p 2βΓ(1/p) exp − 1 p |fi − fj |p βp x GG(x;p) p =1
  • 15. Generalised Gaussian Kernel Shape factor p: traces out an infinite family of kernels P(fi |fj ) = p1−1/p 2βΓ(1/p) exp − 1 p |fi − fj |p βp x GG(x;p) p =15
  • 16. Multinomial Kernel Multinomial kernel optimised for count-based features: P(fi |fj ) = ( d fi,d )! d (fi,d !) d (pj,d )fi,d fi,d : count for bin d in the unlabelled image i fj,d count for the training image j Jelinek-Mercer smoothing used to estimate pj,d : pj,d = λ fj,d d fj,d + (1 − λ) j fj,d j,d fj,d We also consider standard χ2 and Hellinger kernels
  • 17. Greedy kernel-feature alignment Features Kernels Laplacian GIST HAAR Gaussian Uniform X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6 SIFT LAB 0 0 0 0 0 0 0 0 0 0 0 0 GIST SIFT LAB HAAR Laplacian Gaussian Uniform Ψvu X6 Iteration 0: F1 0.0 Features GIST HAAR X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6 SIFT LAB X6 Testing Image Training Image x GG(x;p) p =1 x GG(x;p) p =15 x GG(x;p) p =2
  • 18. Greedy kernel-feature alignment Features Kernels Laplacian GIST HAAR Uniform X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6 SIFT LAB 0 0 0 0 1 0 0 0 0 0 0 0 GIST SIFT LAB HAAR Laplacian Gaussian Uniform Ψvu X6 Iteration 1: F1 0.25 Features GIST HAAR X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6 SIFT LAB X6 Testing Image Training Image x GG(x;p) p =1 x GG(x;p) p =15 x GG(x;p) p =2 Gaussian
  • 19. Greedy kernel-feature alignment Features GIST HAAR X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6 SIFT LAB 0 0 0 0 1 0 0 0 0 0 0 1 GIST SIFT LAB HAAR Laplacian Gaussian Uniform Ψvu X6 Iteration 2: F1 0.34 Features GIST HAAR X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6 SIFT LAB X6 Testing Image Training Image Kernels Laplacian Uniform x GG(x;p) p =1 x GG(x;p) p =15 x GG(x;p) p =2 Gaussian
  • 20. Greedy kernel-feature alignment Features GIST HAAR X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6 SIFT LAB 0 0 0 0 1 1 0 0 0 0 0 1 GIST SIFT LAB HAAR Laplacian Gaussian Uniform Ψvu X6 Iteration 3: F1 0.38 Features GIST HAAR X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6 SIFT LAB X6 Testing Image Training Image Kernels x GG(x;p) p =1 x GG(x;p) p =15 x GG(x;p) p =2 Gaussian Laplacian Uniform
  • 21. Greedy kernel-feature alignment Features GIST HAAR X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6 SIFT LAB 0 0 1 0 1 1 0 0 0 0 0 1 GIST SIFT LAB HAAR Laplacian Gaussian Uniform Ψvu X6 Iteration 4: F1 0.42 Features GIST HAAR X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6 X1 X2 X3 X4 X5 X6 SIFT LAB X6 Testing Image Training Image Kernels Laplacian Uniform x GG(x;p) p =1 x GG(x;p) p =15 x GG(x;p) p =2 Gaussian
  • 23. Datasets/Features Standard evaluation datasets: Corel 5K: 5,000 images (landscapes, cities), 260 keywords IAPR TC12: 19,627 images (tourism, sports), 291 keywords ESP Game: 20,768 images (drawings, graphs), 268 keywords Standard “Tagprop” feature set [Guillaumin et al. ’09]: Bag-of-words histograms: SIFT [Lowe ’04] and Hue [van de Weijer & Schmid ’06] Global colour histograms: RGB, HSV, LAB Global GIST descriptor [Oliva & Torralba ’01] Descriptors, except GIST, also computed in a 3x1 spatial arrangement [Lazebnik et al. ’06]
  • 24. Evaluation Metrics Standard evaluation metrics [Guillaumin et al. ’09]: Mean per word Recall (R) Mean per word Precision (P) F1 Measure Number of words with recall > 0 (N+) Fixed annotation length of 5 keywords
  • 25. F1 score of CRM model variants Corel 5K IAPR TC12 ESP Game 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 CRM CRM 15 SKL-CRM F1
  • 26. F1 score of CRM model variants Corel 5K IAPR TC12 ESP Game 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 CRM CRM 15 SKL-CRM F1 Original CRM Duygulu et al. features
  • 27. F1 score of CRM model variants Corel 5K IAPR TC12 ESP Game 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 CRM CRM 15 SKL-CRM F1 Original CRM Duygulu et al. features Original CRM 15 Tagprop features +71%
  • 28. F1 score of CRM model variants Corel 5K IAPR TC12 ESP Game 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 CRM CRM 15 SKL-CRM F1 Original CRM Duygulu et al. features Original CRM 15 Tagprop features +71% SKL-CRM 15 Tagprop features +45%
  • 29. F1 score of SKL-CRM on Corel 5K HSV_V3H1 DS HS_V3H1 HSV HS HH_V3H1 GIST LAB_V3H1 RGB_V3H1 RGB DH_V3H1 DH HH LAB DS_V3H1 0.31 0.33 0.35 0.37 0.39 0.41 0.43 0.45 SKL-CRM (Valid F1) SKL-CRM (Test F1) Tagprop (Test F1) Feature type F1
  • 30. F1 score of SKL-CRM on Corel 5K HSV_V3H1 DS HS_V3H1 HSV HS HH_V3H1 GIST LAB_V3H1 RGB_V3H1 RGB DH_V3H1 DH HH LAB DS_V3H1 0.31 0.33 0.35 0.37 0.39 0.41 0.43 0.45 SKL-CRM (Valid F1) SKL-CRM (Test F1) Tagprop (Test F1) Feature type F1
  • 31. F1 score of SKL-CRM on Corel 5K HSV_V3H1 DS HS_V3H1 HSV HS HH_V3H1 GIST LAB_V3H1 RGB_V3H1 RGB DH_V3H1 DH HH LAB DS_V3H1 0.31 0.33 0.35 0.37 0.39 0.41 0.43 0.45 SKL-CRM (Valid F1) SKL-CRM (Test F1) Tagprop (Test F1) Feature type F1
  • 32. F1 score of SKL-CRM on Corel 5K HSV_V3H1 DS HS_V3H1 HSV HS HH_V3H1 GIST LAB_V3H1 RGB_V3H1 RGB DH_V3H1 DH HH LAB DS_V3H1 0.31 0.33 0.35 0.37 0.39 0.41 0.43 0.45 SKL-CRM (Valid F1) SKL-CRM (Test F1) Tagprop (Test F1) Feature type F1
  • 33. F1 score of SKL-CRM on Corel 5K HSV_V3H1 DS HS_V3H1 HSV HS HH_V3H1 GIST LAB_V3H1 RGB_V3H1 RGB DH_V3H1 DH HH LAB DS_V3H1 0.31 0.33 0.35 0.37 0.39 0.41 0.43 0.45 SKL-CRM (Valid F1) SKL-CRM (Test F1) Tagprop (Test F1) Feature type F1
  • 34. Optimal kernel-feature alignments on Corel 5K Optimal alignments1: HSV: Multinomial (λ = 0.99) HSV V3H1: Generalised Gaussian (p=0.9) Harris Hue (HH V3H1): Generalised Gaussian (p=0.1) ≈ Dirac spike! Harris SIFT (HS): Gaussian HS V3H1: Generalised Gaussian (p=0.7) DenseSift (DS): Laplacian Our data-driven kernels more effective than standard kernels No alignment agrees with literature default assignment i.e. Gaussian for Gist, Laplacian for colour histogram, χ2 for SIFT 1 V3H1 denotes descriptors computed in a spatial arrangement
  • 35. SKL-CRM Results vs. Literature (Precision & Recall) R P R P 0.20 0.25 0.30 0.35 0.40 0.45 0.50 MBRM JEC Tagprop GS SKL-CRM Corel 5K IAPR TC12
  • 36. SKL-CRM Results vs. Literature (N+) MBRM JEC Tagprop GS SKL-CRM 0 50 100 150 200 250 300 Corel 5K IAPR TC12 N+
  • 38. Conclusions and Future Work Proposed a sparse kernel model for image annotation Key experimental findings: Default kernel-feature alignment suboptimal Data-adaptive kernels are superior to standard kernels Sparse set of features just as effective as much larger set Greedy forward selection as effective as gradient ascent Future work: superposition of kernels per feature type
  • 39. Thank you for your attention Sean Moran sean.moran@ed.ac.uk www.seanjmoran.com