Btv thesis defense_v1.02-final

Improvement of Content-Based Image Retrieval
by Using Clustering and Relevance Feedback
Master Thesis Defense
Bui The Vinh
May 13, 2010

Content
 Introduction
 Image’s Features & Similarity
 Clustering Algorithm
 Relevance Feedback
 Implementation and Evaluation
 Conclusions and Future Work
2

Introduction
3
 Key points
 How to represent an image
 How to determine whether two images are similar or not
 Framework

Introduction
4
 Practical Applications
 Medical diagnosis
 Crime prevention
 Online shopping
 Etc.
 Challenges
 Real-time system
 High accuracy
 Contributions
 Build a complete CBIR
system
 Improve the searching time
by using clustering
 Increase the accuracy by
applying support vector
machine in Relevance
Feedback

Content
 Introduction
 Image Features & Similarity
 Clustering Algorithm
 Relevance Feedback
 Implementation and Evaluation
 Conclusions and Future Work
5

Feature Extraction Model
6
F1
B
F2
F3
 Basic Image features: COLOR, SHAPE, TEXTURE

Image Representation
7
 Image representation
 CEDD: Color and edge
directivity descriptor (proposed
by Chatzichristofis and
Boutalis)
 Incorporate color and texture
information in a histogram
 Each image is represented by
a high dimensional real vector
0 0 7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0
0 0 1 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0
0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
 Vectors representing images depend on the method of extracting image
features

Similarity Measurement
8
 Formula
 Calculate the distance between two corresponding vectors
 Tanimoto distance
F1
F3
F2

Content
Introduction
Image’s Features
Clustering Algorithm
Relevance Feedback
Implementation and Evaluation
Conclusions and Future Work
9

Overview of Clustering
10
 Motivation
 The amount of image data involved is very large
 Finding groups of objects such that:
 The objects in a group will be similar to one another
 The objects in a group will be different from the objects in other groups

K-means Clustering
11
 Definition
 K-means is a partition clustering algorithm based on iterative relocation that
partitions a dataset into k clusters.
 Objective
 Locally minimizes sum of squared distance between the data points and
their corresponding cluster centers:
 Given a set of observations (x1, x2, …, xn);
Cluster into k sets (k < n) X = {X1, X2, …, Xk}

K-means Clustering (2)
12
 Algorithm
 Initialize k cluster centers randomly. Repeat until it converges:
 Cluster Assignment Step: Assign each data point xi to the cluster fh such
that distance of xi from center of fh is minimum
 Center Re-estimation Step: Re-estimate each cluster center as the mean
of the points in that cluster

Content
Introduction
Image’s Features
Relevance Feedback
13

Relevance feedback?
14
 Motivation
 The limitation of low-level image feature-based searching
 Mechanism
 After initial retrieval results are presented, allow the user to provide
feedback on the relevance of one or more of the retrieved images.
 Use this feedback information to reformulate the query.
 Produce new results based on reformulated query.
 Challenges
 Require real-time processing
 Training data set is small

RF Architecture
15
Rankings
CBIR
System
Ranked
Images
1. Img1
2. Img2
3. Img3
.
.
1. Img1 
2. Img2 
3. Img3 
.
.
Feedback
Query
Image
Revised
Query
Re-Ranked
Images
1. Img2
2. Img4
3. Img5
.
.
Query
Reformulation
Images
Database

Support vector machine
16
 Classification method
 Given a set of training examples, each marked as belonging to one of two
categories
 An SVM training algorithm builds a model that predicts whether a new
example falls into one category or the other.
 Linear Case
 Training data
 A separating hyperplane
 Optimal separating hyperplane (OSH)

Support vector machine (2)
17
 Linear Case (cont.)
 The classification function
 Non-linear Case
 The classification function
 Kernels

Content
Introduction
Image’s Features
Relevance Feedback
18

Clustering Implementation
19
 Clustering
 Take feature vectors database
as input
 Apply K-means algorithm to
cluster the database
 Finding
 Find appropriate cluster with
the query image

RF Implementation
20
 Support vector machine classifier
 Suitable when number of training data is
small
 Can be applied in a real-time system

Environment & Parameters
21
 Environment
 9918 images with various
kinds of images
 Desktop computer: Intel
Core 2 Dual 3.16 GHz, 4-GB
RAM, Windows 7 Ultimate
 Sun Java 1.6-u7
 All components of the
system are implemented by
using Java
 Parameters
 Choose K=7 for K-means
algorithm
 Choose radical basis function
(RBF) for support vector
machine

Clustering Evaluation
22
 Accuracy
 Clustering does not adversely affect the accuracy

Clustering Evaluation
23
 Searching time
Applying clustering
significantly improves the
performance

RF Evaluation
24
 Accuracy
 Improve the accuracy after several iterations

Content
Introduction
Image’s Features
Relevance Feedback
25

Conclusion
26
 Achievements
 Successfully build a complete content-based image retrieval system
 The performance is significantly improved by applying K-means clustering
algorithm to cluster image database
 Using support vector machine in “Relevance Feedback” can remarkably
increase the accuracy
 Shortcomings
 Low-level feature-based searching method depends on other authors’
method
 Future works
 Develop a low-level feature-based searching method that is suitable with
each kind of images domain

Btv thesis defense_v1.02-final

More Related Content

What's hot

Similar to Btv thesis defense_v1.02-final

Recently uploaded

Btv thesis defense_v1.02-final