1. Improvement of Content-Based Image Retrieval
by Using Clustering and Relevance Feedback
Master Thesis Defense
Bui The Vinh
May 13, 2010
2. Content
Introduction
Image’s Features & Similarity
Clustering Algorithm
Relevance Feedback
Implementation and Evaluation
Conclusions and Future Work
2
3. Introduction
3
Key points
How to represent an image
How to determine whether two images are similar or not
Framework
4. Introduction
4
Practical Applications
Medical diagnosis
Crime prevention
Online shopping
Etc.
Challenges
Real-time system
High accuracy
Contributions
Build a complete CBIR
system
Improve the searching time
by using clustering
Increase the accuracy by
applying support vector
machine in Relevance
Feedback
5. Content
Introduction
Image Features & Similarity
Clustering Algorithm
Relevance Feedback
Implementation and Evaluation
Conclusions and Future Work
5
6. Feature Extraction Model
6
F1
B
F2
F3
Basic Image features: COLOR, SHAPE, TEXTURE
8. Similarity Measurement
8
Formula
Calculate the distance between two corresponding vectors
Tanimoto distance
F1
F3
F2
9. Content
Introduction
Image’s Features
Clustering Algorithm
Relevance Feedback
Implementation and Evaluation
Conclusions and Future Work
9
10. Overview of Clustering
10
Motivation
The amount of image data involved is very large
Finding groups of objects such that:
The objects in a group will be similar to one another
The objects in a group will be different from the objects in other groups
11. K-means Clustering
11
Definition
K-means is a partition clustering algorithm based on iterative relocation that
partitions a dataset into k clusters.
Objective
Locally minimizes sum of squared distance between the data points and
their corresponding cluster centers:
Given a set of observations (x1, x2, …, xn);
Cluster into k sets (k < n) X = {X1, X2, …, Xk}
12. K-means Clustering (2)
12
Algorithm
Initialize k cluster centers randomly. Repeat until it converges:
Cluster Assignment Step: Assign each data point xi to the cluster fh such
that distance of xi from center of fh is minimum
Center Re-estimation Step: Re-estimate each cluster center as the mean
of the points in that cluster
13. Content
Introduction
Image’s Features
Clustering Algorithm
Relevance Feedback
Implementation and Evaluation
Conclusions and Future Work
13
14. Relevance feedback?
14
Motivation
The limitation of low-level image feature-based searching
Mechanism
After initial retrieval results are presented, allow the user to provide
feedback on the relevance of one or more of the retrieved images.
Use this feedback information to reformulate the query.
Produce new results based on reformulated query.
Challenges
Require real-time processing
Training data set is small
16. Support vector machine
16
Classification method
Given a set of training examples, each marked as belonging to one of two
categories
An SVM training algorithm builds a model that predicts whether a new
example falls into one category or the other.
Linear Case
Training data
A separating hyperplane
Optimal separating hyperplane (OSH)
17. Support vector machine (2)
17
Linear Case (cont.)
The classification function
Non-linear Case
The classification function
Kernels
18. Content
Introduction
Image’s Features
Clustering Algorithm
Relevance Feedback
Implementation and Evaluation
Conclusions and Future Work
18
19. Clustering Implementation
19
Clustering
Take feature vectors database
as input
Apply K-means algorithm to
cluster the database
Finding
Find appropriate cluster with
the query image
20. RF Implementation
20
Support vector machine classifier
Suitable when number of training data is
small
Can be applied in a real-time system
21. Environment & Parameters
21
Environment
9918 images with various
kinds of images
Desktop computer: Intel
Core 2 Dual 3.16 GHz, 4-GB
RAM, Windows 7 Ultimate
Sun Java 1.6-u7
All components of the
system are implemented by
using Java
Parameters
Choose K=7 for K-means
algorithm
Choose radical basis function
(RBF) for support vector
machine
22. Clustering Evaluation
22
Accuracy
Clustering does not adversely affect the accuracy
23. Clustering Evaluation
23
Searching time
Applying clustering
significantly improves the
performance
24. RF Evaluation
24
Accuracy
Improve the accuracy after several iterations
25. Content
Introduction
Image’s Features
Clustering Algorithm
Relevance Feedback
Implementation and Evaluation
Conclusions and Future Work
25
26. Conclusion
26
Achievements
Successfully build a complete content-based image retrieval system
The performance is significantly improved by applying K-means clustering
algorithm to cluster image database
Using support vector machine in “Relevance Feedback” can remarkably
increase the accuracy
Shortcomings
Low-level feature-based searching method depends on other authors’
method
Future works
Develop a low-level feature-based searching method that is suitable with
each kind of images domain
Be the first to comment