Improvement of Content-Based Image Retrieval
by Using Clustering and Relevance Feedback
Master Thesis Defense
Bui The Vinh
May 13, 2010
Content
 Introduction
 Image’s Features & Similarity
 Clustering Algorithm
 Relevance Feedback
 Implementation and Evaluation
 Conclusions and Future Work
2
Introduction
3
 Key points
 How to represent an image
 How to determine whether two images are similar or not
 Framework
Introduction
4
 Practical Applications
 Medical diagnosis
 Crime prevention
 Online shopping
 Etc.
 Challenges
 Real-time system
 High accuracy
 Contributions
 Build a complete CBIR
system
 Improve the searching time
by using clustering
 Increase the accuracy by
applying support vector
machine in Relevance
Feedback
Content
 Introduction
 Image Features & Similarity
 Clustering Algorithm
 Relevance Feedback
 Implementation and Evaluation
 Conclusions and Future Work
5
Feature Extraction Model
6
F1
B
F2
F3
 Basic Image features: COLOR, SHAPE, TEXTURE
Image Representation
7
 Image representation
 CEDD: Color and edge
directivity descriptor (proposed
by Chatzichristofis and
Boutalis)
 Incorporate color and texture
information in a histogram
 Each image is represented by
a high dimensional real vector
0 0 7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0
0 0 1 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0
0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
 Vectors representing images depend on the method of extracting image
features
Similarity Measurement
8
 Formula
 Calculate the distance between two corresponding vectors
 Tanimoto distance
F1
F3
F2
Content
Introduction
Image’s Features
Clustering Algorithm
Relevance Feedback
Implementation and Evaluation
Conclusions and Future Work
9
Overview of Clustering
10
 Motivation
 The amount of image data involved is very large
 Finding groups of objects such that:
 The objects in a group will be similar to one another
 The objects in a group will be different from the objects in other groups
K-means Clustering
11
 Definition
 K-means is a partition clustering algorithm based on iterative relocation that
partitions a dataset into k clusters.
 Objective
 Locally minimizes sum of squared distance between the data points and
their corresponding cluster centers:
 Given a set of observations (x1, x2, …, xn);
Cluster into k sets (k < n) X = {X1, X2, …, Xk}
K-means Clustering (2)
12
 Algorithm
 Initialize k cluster centers randomly. Repeat until it converges:
 Cluster Assignment Step: Assign each data point xi to the cluster fh such
that distance of xi from center of fh is minimum
 Center Re-estimation Step: Re-estimate each cluster center as the mean
of the points in that cluster
Content
Introduction
Image’s Features
Clustering Algorithm
Relevance Feedback
Implementation and Evaluation
Conclusions and Future Work
13
Relevance feedback?
14
 Motivation
 The limitation of low-level image feature-based searching
 Mechanism
 After initial retrieval results are presented, allow the user to provide
feedback on the relevance of one or more of the retrieved images.
 Use this feedback information to reformulate the query.
 Produce new results based on reformulated query.
 Challenges
 Require real-time processing
 Training data set is small
RF Architecture
15
Rankings
CBIR
System
Ranked
Images
1. Img1
2. Img2
3. Img3
.
.
1. Img1 
2. Img2 
3. Img3 
.
.
Feedback
Query
Image
Revised
Query
Re-Ranked
Images
1. Img2
2. Img4
3. Img5
.
.
Query
Reformulation
Images
Database
Support vector machine
16
 Classification method
 Given a set of training examples, each marked as belonging to one of two
categories
 An SVM training algorithm builds a model that predicts whether a new
example falls into one category or the other.
 Linear Case
 Training data
 A separating hyperplane
 Optimal separating hyperplane (OSH)
Support vector machine (2)
17
 Linear Case (cont.)
 The classification function
 Non-linear Case
 The classification function
 Kernels
Content
Introduction
Image’s Features
Clustering Algorithm
Relevance Feedback
Implementation and Evaluation
Conclusions and Future Work
18
Clustering Implementation
19
 Clustering
 Take feature vectors database
as input
 Apply K-means algorithm to
cluster the database
 Finding
 Find appropriate cluster with
the query image
RF Implementation
20
 Support vector machine classifier
 Suitable when number of training data is
small
 Can be applied in a real-time system
Environment & Parameters
21
 Environment
 9918 images with various
kinds of images
 Desktop computer: Intel
Core 2 Dual 3.16 GHz, 4-GB
RAM, Windows 7 Ultimate
 Sun Java 1.6-u7
 All components of the
system are implemented by
using Java
 Parameters
 Choose K=7 for K-means
algorithm
 Choose radical basis function
(RBF) for support vector
machine
Clustering Evaluation
22
 Accuracy
 Clustering does not adversely affect the accuracy
Clustering Evaluation
23
 Searching time
Applying clustering
significantly improves the
performance
RF Evaluation
24
 Accuracy
 Improve the accuracy after several iterations
Content
Introduction
Image’s Features
Clustering Algorithm
Relevance Feedback
Implementation and Evaluation
Conclusions and Future Work
25
Conclusion
26
 Achievements
 Successfully build a complete content-based image retrieval system
 The performance is significantly improved by applying K-means clustering
algorithm to cluster image database
 Using support vector machine in “Relevance Feedback” can remarkably
increase the accuracy
 Shortcomings
 Low-level feature-based searching method depends on other authors’
method
 Future works
 Develop a low-level feature-based searching method that is suitable with
each kind of images domain
27

Btv thesis defense_v1.02-final

  • 1.
    Improvement of Content-BasedImage Retrieval by Using Clustering and Relevance Feedback Master Thesis Defense Bui The Vinh May 13, 2010
  • 2.
    Content  Introduction  Image’sFeatures & Similarity  Clustering Algorithm  Relevance Feedback  Implementation and Evaluation  Conclusions and Future Work 2
  • 3.
    Introduction 3  Key points How to represent an image  How to determine whether two images are similar or not  Framework
  • 4.
    Introduction 4  Practical Applications Medical diagnosis  Crime prevention  Online shopping  Etc.  Challenges  Real-time system  High accuracy  Contributions  Build a complete CBIR system  Improve the searching time by using clustering  Increase the accuracy by applying support vector machine in Relevance Feedback
  • 5.
    Content  Introduction  ImageFeatures & Similarity  Clustering Algorithm  Relevance Feedback  Implementation and Evaluation  Conclusions and Future Work 5
  • 6.
    Feature Extraction Model 6 F1 B F2 F3 Basic Image features: COLOR, SHAPE, TEXTURE
  • 7.
    Image Representation 7  Imagerepresentation  CEDD: Color and edge directivity descriptor (proposed by Chatzichristofis and Boutalis)  Incorporate color and texture information in a histogram  Each image is represented by a high dimensional real vector 0 0 7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0  Vectors representing images depend on the method of extracting image features
  • 8.
    Similarity Measurement 8  Formula Calculate the distance between two corresponding vectors  Tanimoto distance F1 F3 F2
  • 9.
    Content Introduction Image’s Features Clustering Algorithm RelevanceFeedback Implementation and Evaluation Conclusions and Future Work 9
  • 10.
    Overview of Clustering 10 Motivation  The amount of image data involved is very large  Finding groups of objects such that:  The objects in a group will be similar to one another  The objects in a group will be different from the objects in other groups
  • 11.
    K-means Clustering 11  Definition K-means is a partition clustering algorithm based on iterative relocation that partitions a dataset into k clusters.  Objective  Locally minimizes sum of squared distance between the data points and their corresponding cluster centers:  Given a set of observations (x1, x2, …, xn); Cluster into k sets (k < n) X = {X1, X2, …, Xk}
  • 12.
    K-means Clustering (2) 12 Algorithm  Initialize k cluster centers randomly. Repeat until it converges:  Cluster Assignment Step: Assign each data point xi to the cluster fh such that distance of xi from center of fh is minimum  Center Re-estimation Step: Re-estimate each cluster center as the mean of the points in that cluster
  • 13.
    Content Introduction Image’s Features Clustering Algorithm RelevanceFeedback Implementation and Evaluation Conclusions and Future Work 13
  • 14.
    Relevance feedback? 14  Motivation The limitation of low-level image feature-based searching  Mechanism  After initial retrieval results are presented, allow the user to provide feedback on the relevance of one or more of the retrieved images.  Use this feedback information to reformulate the query.  Produce new results based on reformulated query.  Challenges  Require real-time processing  Training data set is small
  • 15.
    RF Architecture 15 Rankings CBIR System Ranked Images 1. Img1 2.Img2 3. Img3 . . 1. Img1  2. Img2  3. Img3  . . Feedback Query Image Revised Query Re-Ranked Images 1. Img2 2. Img4 3. Img5 . . Query Reformulation Images Database
  • 16.
    Support vector machine 16 Classification method  Given a set of training examples, each marked as belonging to one of two categories  An SVM training algorithm builds a model that predicts whether a new example falls into one category or the other.  Linear Case  Training data  A separating hyperplane  Optimal separating hyperplane (OSH)
  • 17.
    Support vector machine(2) 17  Linear Case (cont.)  The classification function  Non-linear Case  The classification function  Kernels
  • 18.
    Content Introduction Image’s Features Clustering Algorithm RelevanceFeedback Implementation and Evaluation Conclusions and Future Work 18
  • 19.
    Clustering Implementation 19  Clustering Take feature vectors database as input  Apply K-means algorithm to cluster the database  Finding  Find appropriate cluster with the query image
  • 20.
    RF Implementation 20  Supportvector machine classifier  Suitable when number of training data is small  Can be applied in a real-time system
  • 21.
    Environment & Parameters 21 Environment  9918 images with various kinds of images  Desktop computer: Intel Core 2 Dual 3.16 GHz, 4-GB RAM, Windows 7 Ultimate  Sun Java 1.6-u7  All components of the system are implemented by using Java  Parameters  Choose K=7 for K-means algorithm  Choose radical basis function (RBF) for support vector machine
  • 22.
    Clustering Evaluation 22  Accuracy Clustering does not adversely affect the accuracy
  • 23.
    Clustering Evaluation 23  Searchingtime Applying clustering significantly improves the performance
  • 24.
    RF Evaluation 24  Accuracy Improve the accuracy after several iterations
  • 25.
    Content Introduction Image’s Features Clustering Algorithm RelevanceFeedback Implementation and Evaluation Conclusions and Future Work 25
  • 26.
    Conclusion 26  Achievements  Successfullybuild a complete content-based image retrieval system  The performance is significantly improved by applying K-means clustering algorithm to cluster image database  Using support vector machine in “Relevance Feedback” can remarkably increase the accuracy  Shortcomings  Low-level feature-based searching method depends on other authors’ method  Future works  Develop a low-level feature-based searching method that is suitable with each kind of images domain
  • 27.