This document proposes using information-theoretic co-clustering to create visual topics from visual words for improved visual category recognition. It discusses limitations of existing bag-of-features models due to semantic scatter from intra-category appearance variation. The approach clusters images and visual words to form visual topics that better capture semantic relationships and reduce dimensionality, improving classification performance compared to bag-of-words models on multiple datasets. Experiments demonstrate increased F1-scores using visual topics compared to visual words alone.
Visual Category Recognition using Information-Theoretic Co-Clustering
1. Visual Category Recognition using
Information-Theoretic Co-clustering
Ashish Gupta and Richard Bowden
University of Surrey
a.gupta@surrey.ac.uk
Feb.25,2012
Ashish Gupta and Richard Bowden Co-Clustering for Visual Categorization
2. Contents
Introduction: Visual Categorization
Current practice: Bag-of-Features Model
Problem: Semantic scatter
Idea: Visual Topic ← Visual Words
Approach: Information-Theoretic Co-Clustering
Solution: Bag-of-Topics Model
Experiments: multiple Datasets and Dictionary sizes
Summary
Ashish Gupta and Richard Bowden Co-Clustering for Visual Categorization
3. Visual Category Recognition
Definition
Detect presence of an instance of a visual
category in an image.
Visual Categorization Pipeline
Ashish Gupta and Richard Bowden Co-Clustering for Visual Categorization
4. Bag-of-Features
Visual Word
Representative feature vector
(generally centroid) of each
cluster.
Image Histogram
Histogram of assignments
of image feature vectors to
visual words.
Ashish Gupta and Richard Bowden Co-Clustering for Visual Categorization
5. Discriminative Dictionary
Basic Intuition
If normalized image histogram of test sample is closer to trained
positive histogram then image is classified as positive (motocycle).
Ashish Gupta and Richard Bowden Co-Clustering for Visual Categorization
6. Challenges
Several variations in visual category appearance render category
recognition very difficult.
Ashish Gupta and Richard Bowden Co-Clustering for Visual Categorization
7. Intra-category appearance variation
Small variations in instances of object part causes associated
descriptors to get scattered in feature space.
Ashish Gupta and Richard Bowden Co-Clustering for Visual Categorization
8. Semantic Scatter
Small variations in instances of object part causes associated
descriptors to get scattered in feature space.
Descriptors of different categories are inter-leaved in feature
space (local patch decriptors have low specificity).
Ashish Gupta and Richard Bowden Co-Clustering for Visual Categorization
9. Towards Discriminative Dictionary
Increase dictionary size to split up interleaved feature vectors.
Dictionary learning is unsupervised ⇒ discriminative
dictionary not assured.
Ashish Gupta and Richard Bowden Co-Clustering for Visual Categorization
10. Visual Topic ← Visual Words
Idea
Combine visual words which are semantically related and
create a visual topic.
Dictionary of visual topics should be more robust to
intra-category variation.
Ashish Gupta and Richard Bowden Co-Clustering for Visual Categorization
11. Visual Word Distribution
Visual word selection is based entirely on distribution density of
feature vectors independent of image source.
Distribution across images important for selection of visual word.
Ashish Gupta and Richard Bowden Co-Clustering for Visual Categorization
12. Image-Word Distribution
Word distribution across images ∝ semantic equivalence.
Histogram distribution across words ∝ image similarity.
Ashish Gupta and Richard Bowden Co-Clustering for Visual Categorization
13. Co-Clustering
Formulate the image-word matrix as a joint probability distribution.
Optimal co-clustering maximizes mutual information between
clustered random variables.
Ashish Gupta and Richard Bowden Co-Clustering for Visual Categorization
14. Bag-of-Topics ← Bag-of-Words
Image histogram feature vectors in high-dimensional visual words
space are projected to lower dimensional visual topic space.
The distance between feature vectors from the same category is
reduced.
Ashish Gupta and Richard Bowden Co-Clustering for Visual Categorization
15. Experiment
Feature Descriptor : SIFT : Affine invariant local image patch
descriptor
Data set:
Scene 15 : 15 visual categories of natural indoor and outdoor
scenes. Each category has about 200 to 400 images and the
entire dataset has 4485 images.
Pascal VOC2006 : It has 10 visual categories with about 175
to 650 images per category. There are a total of 5304 images.
Pascal VOC2007 : It has 20 visual categories. Each category
contains images ranging from 100 to 2000, with 9963 images
in all.
Pascal VOC2010 : It has 20 visual categories with 300 to 3500
images per category. There are a total of 21738 images.
Classifier : k-NN
Ashish Gupta and Richard Bowden Co-Clustering for Visual Categorization
16. Scene-15 Dataset
F1-score across categories for 100, 500, and 1000 visual Topics.
Ashish Gupta and Richard Bowden Co-Clustering for Visual Categorization
17. PASCAL VOC2006 Dataset
F1-score across categories for 100, 500, and 1000 visual Topics.
Ashish Gupta and Richard Bowden Co-Clustering for Visual Categorization
18. PASCAL VOC2007 Dataset
F1-score across categories for 100, 500, and 1000 visual Topics.
Ashish Gupta and Richard Bowden Co-Clustering for Visual Categorization
19. Dictionary Size
10,000 words → n Topics. Appropriate number of Topics?
Large dictionary becomes category dependent.
Ashish Gupta and Richard Bowden Co-Clustering for Visual Categorization
20. Summary
Bag-of-Features in limited: unsupervised clustering.
Significant intra-category appearance variation: semantic scatter.
Visual Topic ← Visual Word.
Co-clustering Image-Word distribution.
Semantic dimensionality reduction.
Ashish Gupta and Richard Bowden Co-Clustering for Visual Categorization