This is presentation done by me for ECSE626 "Statistical Computer Vision" at McGill University. It is presentation of a project inspired by paper "Supervised Learning of Semantic Classes for Image Annotation and Retrieval" from PAMI 2007. It presents my implementation of the paper and my achieved results.
3. Introduction
• Task
• Assign labels to unknown images
• Retrieve relevant images given labels
• Supervised Learning
• Learning from labeled training data
• Training data consist of pairs
• Multiple instance learning
• Semantic Classes
• labels representing common concepts (sky, bear, snow…)
• Image Annotation and Retrieval
• Annotation: Given the image D, what labels are present in the
image
• Given the label what are the top n matching images
nilx ii ...1},{
4. Introduction
Datasets:
Corel5K – 5000 images, 272 Classes
Corel30K – 30000 images, 1120 Classes
MIRFLICKR – 25000 images, 37 Classes
(PSU) – not available anymore
ImageCLEF - The CLEF (Cross Language
Evaluation Forum) Cross Language Image
Retrieval Track
Medical Image retrieval
Photo Annotation
Plant Identification
Wikipedia Retrieval
Patent Image Retrieval and Classification
6. Prior Techniques
Supervised OVA
Binary decision problem, concept present / absent
Hidden variable Yi
Decision rule:
Unsupervised Learning
Modeling dependency between text label and image
features, expressed as hidden variable L
Considering just positive examples, densities for Yi=1
)0()0|()1()1|( || iiii YYXYYX PXPPXP
D
l LWLXWX lPlwPlxPwxP 1 ||, )(),(),(),(
L
W X
W1 W2 W3 X
bear
polar, grizzly features
7. Methodology
Supervised Multiclass Labeling (SML)
Elements of semantic vocabulary (W) are
explicitly made to semantic classes (L) !
Random var. W:
annotation and retrieval is then easy to do as:
Annotation Retrieval
)|(Pandfromsampleisifonly},...,1{, W|X ixwxTiiW i
)(
)(),(
)|( |
|
xP
iPixP
xiP
X
WWX
XW
)|(maxarg)(* | XiPXi XWi )|(maxarg)(* | iXPwj jWXji
8. Methodology
Estimation of Semantic Class
Distributions
Given Di training set of images, estimate
Assumption: Gaussian Distribution
How to estimate?
Direct estimation
Model Averaging
Naive Averaging
GMM model:
Averaged:
)|(| ixP WX
iD
l WLX
i
WX ilxP
D
ixP 1 ,|| ),|(
1
),(
k
k
li
k
li
k
liWLX xGilxP ),,(),|( ,,,,|
k
D
l
k
li
k
li
k
li
i
WX
i
xG
D
ixP
1
,,,| ),,(
1
)|(
9. Methodology
Mixture hierarchies
First step, get GMM from images – regular soft
EM
E:
M:
8
1
| ),,()|(
k
k
I
k
I
k
IWX xGIxP
Initialization
Euclidian distance
Mahalonobis
distance
Initial Par.
estimate
Expectation
Maximizaiton
Max iter. 200Change in likelihood
is too small
n
i
j jjiji xGjzzxP
1
2
1
),;()()|,(
)|,()|,()|,( 1 ttt
zxPzxPzxP
)],;([log),( ,|
ZXFEQ t
xz
t
),(maxarg1 tt
Q
10. Methodology
Mixture hierarchies for label
Second step, get HGMM for labels
E:
M:
64
1
| ),,()|(
k
k
w
k
w
k
wWX xGwxP Initialization
Bhattacharyya
distance
Initial Par.
estimate
Expectation
Maximizaiton
Max iter. 200Change in likelihood
is too small
n
i
j jjiji xGjzzxP
1
2
1
),;()()|,(
)|,()|,()|,( 1 ttt
zxPzxPzxP
)],;([log),( ,|
ZXFEQ t
xz
t
),(maxarg1 tt
Q
11. E and M step for HGMM
Input:
Output:
E-step:
M-step:
KkDj i
k
j
k
j
k
j ,...,1,,...,1},,,{
l
l
c
Ntrace
l
c
l
c
k
j
m
c
Ntrace
m
c
m
c
k
jm
jk
k
j
k
j
l
c
k
j
k
j
m
c
eG
eG
h
]),,([
]),,([
}){(
2
1
}){(
2
1
1
1
Mmm
j
m
j
m
j ,...,1},,,{
KD
h
i
m
jkjknewm
c
)(
jk
jk
k
j
m
jk
k
j
m
jkm
jk
k
j
m
jk
newm
c
h
h
ww
where,)(
jk
Tm
c
k
j
m
c
k
j
k
j
m
jk
newm
c w ]))(([)(
12. Algorithm - learning
Training
For each training set I for label w
Decompose image (192px * 128px ) into 8x8 regions
by sliding window moving each 2 pixels
Calculate DCT for each window (8*8*3) 192-d feature
vector
Calculate mixture of 8 Gaussians for each Image
using EM
Calculate mixture of 64 Gaussians for each label
using H-EM
8
1
| ),,()|(
k
k
I
k
I
k
IWX xGIxP
64
1
| ),,()|(
k
k
w
k
w
k
wWX xGwxP
13. Algorithm – annotation, retrieval
Annotation
Get n(5) beast labels for image I
Get features from image ((192*128/2)*192)
Get log likelihood for each label, choose the best
n
Retrieval
For images IT and label w:
Annotate IT and get decreasing scores of posterior
x
iWXiWX wxPwP )|(log)|(log ||
)|(| iWX wP
14. Results-quantitative
Database: Corel 5k
Precision:
Recall:
4000 training 1000 testing
retrieved
retrievedrelevant
relevant
retrievedrelevant
H
C
w
w
recall
auto
C
w
w
precision
annotatedautomatic
annotatedhuman
imagesannotatedcorrectly
auto
H
C
w
w
w
15. Results-quantitative
Non zero recall mean Recall mean Precision
1 2 3 4 5 6
w with Recall > 0 140 121 110 125 90 131
Mean Recall per w 0.27 0.25 0.25 0.26 0.23 0.27
Mean Precision pre
w
0.25 0.24 0.23 0.23 0.2 0.23
Annotation
16. Results-quantitative
Recall > 0 PrecisionAll precision
1 2 3 4 5 6
Mean Recall all w 0.23 0.21 0.20 0.21 0.19 0.24
Mean Recall per w
R>0
0.45 0.40 0.40 0.41 0.37 0.41
Retrieval
22. Conclusions
Pros
Nice segmentation as byproduct of annotation
Great for general concepts with lots of samples
Just weakly annotated data is required (multi-instance
learning)
Allows hierarchical representation (adding images, speed)
Contras
Fixed number of labels per image
Learning is time consuming
Parameter tuning is time consuming
Weakly represented classes could be associated with
wrong concepts
23. Resources
Carneiro, G., Chan, A.B., Moreno, P.J., Vasconcelos, N.: Supervised learning of
semantic classes for image annotation and retrieval. Pattern Analysis and Machine
Intelligence, IEEE Transactions on. 29, 394–410 (2007).
Gudivada, V.N., Raghavan, V.V.: Content based image retrieval systems. Computer.
28, 18–22 (1995).
Belongie, S., Carson, C., Greenspan, H., Malik, J.: Color-and texture-based image
segmentation using EM and its application to content-based image retrieval.
Computer Vision, 1998. Sixth International Conference on. pp. 675–682. IEEE
(1998).
Cappé, O., Moulines, E.: On-line expectation–maximization algorithm for latent data
models. Journal of the Royal Statistical Society: Series B (Statistical Methodology).
71, 593–613 (2009).
Datta, R., Joshi, D., Li, J., Wang, J.Z.: Image Retrieval: Ideas, Influences, and
Trends of the New Age. ACM Computing Surveys. 40, 1-60 (2008).