Supervised Learning of Semantic Classes for Image Annotation and Retrieval

G. Carneiro, A. Chan, P. Moreno N. Vasconcelos
by: Lukáš
Tencer
ECSE626 2012

Outline
• Introduction
• Prior techniques
• Supervised OVA Labeling
• Unsupervised Labeling
• Methodology
• Supervised Multiclass Labeling
• Semantic Distribution Estimation
• Density Estimation
• Algorithm
• Learning, Annotation, Retrieval
• Results
• Quantitative
• Qualitative
• Conclusion

Introduction
• Task
• Assign labels to unknown images
• Retrieve relevant images given labels
• Supervised Learning
• Learning from labeled training data
• Training data consist of pairs
• Multiple instance learning
• Semantic Classes
• labels representing common concepts (sky, bear, snow…)
• Image Annotation and Retrieval
• Annotation: Given the image D, what labels are present in the
image
• Given the label what are the top n matching images
nilx ii ...1},{ 

Introduction
 Datasets:
 Corel5K – 5000 images, 272 Classes
 Corel30K – 30000 images, 1120 Classes
 MIRFLICKR – 25000 images, 37 Classes
 (PSU) – not available anymore
 ImageCLEF - The CLEF (Cross Language
Evaluation Forum) Cross Language Image
Retrieval Track
 Medical Image retrieval
 Photo Annotation
 Plant Identification
 Wikipedia Retrieval
 Patent Image Retrieval and Classification

Introduction
 Corel 5K Corel 30K MIRFLICKR
Bear New Zealand Urban

Prior Techniques
 Supervised OVA
 Binary decision problem, concept present / absent
 Hidden variable Yi
 Decision rule:
 Unsupervised Learning
 Modeling dependency between text label and image
features, expressed as hidden variable L
 Considering just positive examples, densities for Yi=1
)0()0|()1()1|( || iiii YYXYYX PXPPXP 


D
l LWLXWX lPlwPlxPwxP 1 ||, )(),(),(),(
L
W X
W1 W2 W3 X
bear
polar, grizzly features

Methodology
Supervised Multiclass Labeling (SML)
 Elements of semantic vocabulary (W) are
explicitly made to semantic classes (L) !
 Random var. W:
annotation and retrieval is then easy to do as:
Annotation Retrieval
)|(Pandfromsampleisifonly},...,1{, W|X ixwxTiiW i
)(
)(),(
)|( |
|
xP
iPixP
xiP
X
WWX
XW 
)|(maxarg)(* | XiPXi XWi )|(maxarg)(* | iXPwj jWXji 

Methodology
Estimation of Semantic Class
Distributions
 Given Di training set of images, estimate
 Assumption: Gaussian Distribution
 How to estimate?
 Direct estimation
 Model Averaging
 Naive Averaging
 GMM model:
 Averaged:
)|(| ixP WX

 iD
l WLX
i
WX ilxP
D
ixP 1 ,|| ),|(
1
),(
 
k
k
li
k
li
k
liWLX xGilxP ),,(),|( ,,,,| 


k
D
l
k
li
k
li
k
li
i
WX
i
xG
D
ixP
1
,,,| ),,(
1
)|( 

Methodology
Mixture hierarchies
 First step, get GMM from images – regular soft
EM
 E:
 M:


8
1
| ),,()|(
k
k
I
k
I
k
IWX xGIxP 
Initialization
Euclidian distance
Mahalonobis
distance
Initial Par.
estimate
Expectation
Maximizaiton
Max iter. 200Change in likelihood
is too small



n
i
j jjiji xGjzzxP
1
2
1
),;()()|,( 
)|,()|,()|,( 1 ttt
zxPzxPzxP   
)],;([log),( ,|
ZXFEQ t
xz
t
 

),(maxarg1 tt
Q  

Methodology
Mixture hierarchies for label
 Second step, get HGMM for labels
 E:
 M:


64
1
| ),,()|(
k
k
w
k
w
k
wWX xGwxP  Initialization
Bhattacharyya
distance
Initial Par.
estimate
Expectation
Maximizaiton
Max iter. 200Change in likelihood
is too small



n
i
j jjiji xGjzzxP
1
2
1
),;()()|,( 
)|,()|,()|,( 1 ttt
zxPzxPzxP   
)],;([log),( ,|
ZXFEQ t
xz
t
 

),(maxarg1 tt
Q  

E and M step for HGMM
 Input:
 Output:
 E-step:
 M-step:
KkDj i
k
j
k
j
k
j ,...,1,,...,1},,,{ 








l
l
c
Ntrace
l
c
l
c
k
j
m
c
Ntrace
m
c
m
c
k
jm
jk
k
j
k
j
l
c
k
j
k
j
m
c
eG
eG
h




]),,([
]),,([
}){(
2
1
}){(
2
1
1
1
Mmm
j
m
j
m
j ,...,1},,,{ 
KD
h
i
m
jkjknewm
c

)(



jk
jk
k
j
m
jk
k
j
m
jkm
jk
k
j
m
jk
newm
c
h
h
ww


 where,)(
 
jk
Tm
c
k
j
m
c
k
j
k
j
m
jk
newm
c w ]))(([)( 

Algorithm - learning
 Training
 For each training set I for label w
 Decompose image (192px * 128px ) into 8x8 regions
by sliding window moving each 2 pixels
 Calculate DCT for each window (8*8*3) 192-d feature
vector
 Calculate mixture of 8 Gaussians for each Image
using EM
 Calculate mixture of 64 Gaussians for each label
using H-EM


8
1
| ),,()|(
k
k
I
k
I
k
IWX xGIxP 


64
1
| ),,()|(
k
k
w
k
w
k
wWX xGwxP 

Algorithm – annotation, retrieval
 Annotation
 Get n(5) beast labels for image I
 Get features from image ((192*128/2)*192)
 Get log likelihood for each label, choose the best
n
 Retrieval
 For images IT and label w:
 Annotate IT and get decreasing scores of posterior




x
iWXiWX wxPwP )|(log)|(log ||
)|(| iWX wP 

Results-quantitative
 Database: Corel 5k
 Precision:
 Recall:
 4000 training 1000 testing
retrieved
retrievedrelevant
relevant
retrievedrelevant
H
C
w
w
recall 
auto
C
w
w
precision 
annotatedautomatic
annotatedhuman
imagesannotatedcorrectly



auto
H
C
w
w
w

Non zero recall mean Recall mean Precision
1 2 3 4 5 6
w with Recall > 0 140 121 110 125 90 131
Mean Recall per w 0.27 0.25 0.25 0.26 0.23 0.27
Mean Precision pre
w
0.25 0.24 0.23 0.23 0.2 0.23
Annotation

Recall > 0 PrecisionAll precision
1 2 3 4 5 6
Mean Recall all w 0.23 0.21 0.20 0.21 0.19 0.24
Mean Recall per w
R>0
0.45 0.40 0.40 0.41 0.37 0.41
Retrieval

Results-qualitative
plane jet f-14 sky
-----------------------
sky plane clouds
smoke snow
coast waves
water hills
-----------------------
water sky ocean
mountain clouds
polar bear bars
cage
-----------------------
bear snow texture
sunrise closeup
people cheese
market street
-----------------------
people wall sand
flower bird

Results-qualitative
Blooms Mountain Pool Smoke Woman

Conclusions
 Pros
 Nice segmentation as byproduct of annotation
 Great for general concepts with lots of samples
 Just weakly annotated data is required (multi-instance
learning)
 Allows hierarchical representation (adding images, speed)
 Contras
 Fixed number of labels per image
 Learning is time consuming
 Parameter tuning is time consuming
 Weakly represented classes could be associated with
wrong concepts

Resources
 Carneiro, G., Chan, A.B., Moreno, P.J., Vasconcelos, N.: Supervised learning of
semantic classes for image annotation and retrieval. Pattern Analysis and Machine
Intelligence, IEEE Transactions on. 29, 394–410 (2007).
 Gudivada, V.N., Raghavan, V.V.: Content based image retrieval systems. Computer.
28, 18–22 (1995).
 Belongie, S., Carson, C., Greenspan, H., Malik, J.: Color-and texture-based image
segmentation using EM and its application to content-based image retrieval.
Computer Vision, 1998. Sixth International Conference on. pp. 675–682. IEEE
(1998).
 Cappé, O., Moulines, E.: On-line expectation–maximization algorithm for latent data
models. Journal of the Royal Statistical Society: Series B (Statistical Methodology).
71, 593–613 (2009).
 Datta, R., Joshi, D., Li, J., Wang, J.Z.: Image Retrieval: Ideas, Influences, and
Trends of the New Age. ACM Computing Surveys. 40, 1-60 (2008).

lukas.tencer@gmail.com
http://tencer.hustej.net
@lukastencer
accuratelyrandom.blogspot.com
facebook.com/lukas.tencer

Supervised Learning of Semantic Classes for Image Annotation and Retrieval

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (14)

Similar to Supervised Learning of Semantic Classes for Image Annotation and Retrieval

Similar to Supervised Learning of Semantic Classes for Image Annotation and Retrieval (20)

Recently uploaded

Recently uploaded (20)

Supervised Learning of Semantic Classes for Image Annotation and Retrieval