SlideShare a Scribd company logo
1 of 57
Download to read offline
CONTENT BASED IMAGE RETRIEVAL
A Project Report
Submitted in the partial fulfillment for the degree of
Bachelor of Technology
by
Ramashish Baranwal and Ripinder Singh
DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING
INDIAN INSTITUTE OF TECHNOLOGY,
GUWAHATI
April 2003
Certificate
This is to certify that the work contained in the thesis titled “ Content Based Image Retrieval”
by Ramashish Baranwal and Ripinder Singh, has been carried out under my supervision and this
work has not been submitted elsewhere for a degree.
Dr. P.K. Bora
Associate Professor
ECE Department
Acknowledgement
It gives us great pleasure to express our most sincere feelings of gratitude to our supervisor
Dr. P.K. Bora for his invaluable help and guidance in the course of this project. The valuable
research experience gained by us during this project would not have been possible without his
encouragement and support. We would also like to thank University of California, Berkeley for
granting us the permission to use their images.
Abstract
Retrieving images from a large and varied collection on the basis of visual content is a challenging
and important field of research. In this report we present Imagefinder, a content-based image
retrieval system that incorporates various features of an efficient retrieval system. The system
employs a new information theoretic approach to segmentation through clustering in the feature
space. The problem of segmentation is treated as a problem of maximizing the information about
the image through segmentation. The gain in information is measured on the basis of an evaluation
function derived from information theory. The image is segmented in to a small set of image regions
that are coherent in colour and intensity. Each of these homogeneous regions is characterized by a
feature vector comprising of its color and shape attributes. The database is organized using C-tree,
a variation of the kd-tree which supports efficient retrieval of k-nearest neighbours. This is achieved
by storing the information about siblings of the kd-tree at each node.
An important aspect of the system is that the user is allowed to select a region of interest for
the query. The results are presented in the increasing order of distance from those stored in the
database, based on the nearest neighbour criterion. Our system also incorporates a method for
relevance feedback aimed at improving the results for the query. Based on the relevance marked
by the user, the system automatically re-assigns the weights of the feature components to produce
more appropriate results.
Contents
1 Introduction 1
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 What is Content Based Image Retrieval? . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.3 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.4 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.5 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 Literature Survey 4
2.1 Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2.1 Colour . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2.2 Shape . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2.3 Texture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3 Indexing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3.1 Dimensionality Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3.2 Multi-Dimensionality Indexing . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3 Image Segmentation 8
3.1 Region-Based Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.1.1 Region Growing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.1.2 Region Merging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.1.3 Region Pruning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.2 Information Theoretic Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.3 Information Gain by Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.4 Using Classification Gain for Segmentation . . . . . . . . . . . . . . . . . . . . . . . 13
3.5 Colour Image Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.5.1 The Segmentation Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.5.2 k-means Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.5.3 Post-processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
i
4 Feature Extraction and Database Organization 18
4.1 Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4.2 Shape Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4.2.1 Invariant Moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4.2.2 Eccentricity and Compactness . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.3 Colour Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.4 Database Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.5 Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.5.1 Binary Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.5.2 Multidimensional Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.5.3 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.6 C-tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.6.1 Building C-tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.6.2 Nearest Neighbour Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.7 Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.8 Relevance Feedback . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
5 Experimental Results 29
5.1 Image Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
5.2 Segmentation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
5.2.1 Region-based Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
5.2.2 Information Theoretic Approach . . . . . . . . . . . . . . . . . . . . . . . . . 29
5.2.3 Region vs. Information based approaches . . . . . . . . . . . . . . . . . . . . 30
5.3 Query Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
5.4 Relevance Feedback . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
6 Conclusion 44
6.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
6.2 Suggestions for Further Development . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
Bibliography 46
ii
List of Figures
1.1 Overview of an Image Retrieval System . . . . . . . . . . . . . . . . . . . . . . . . . 2
2.1 A typical Content-Based Image Retrieval system. . . . . . . . . . . . . . . . . . . . 5
3.1 The 8-connected neighbourhood of the pixel to be assigned a label . . . . . . . . . . 9
3.2 Region growing algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.3 Region merging algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.4 Region pruning algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.5 The image of an elephant and its representation in Luv space . . . . . . . . . . . . . 14
3.6 Information based segmentation algorithm . . . . . . . . . . . . . . . . . . . . . . . . 15
4.1 Nearest point to the query point need not lie in the same cell in which the query
point lies. Point 1 represents the query point whose nearest neighbour point 3 lies
in a different cell. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.2 Algorithm for making connections in a C-tree . . . . . . . . . . . . . . . . . . . . . . 24
4.3 Two nodes (a)separated, (b)intersecting, and (c)sharing a region of finite length in
the horizontal direction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
5.1 Segmentation by region based approach, (a)Original Image, (b)After region growing,
(c)After region merging, (d)Regions after removing small regions, (e)Boundaries of
regions superimposed on original image. . . . . . . . . . . . . . . . . . . . . . . . . . 30
5.2 Segmentation by region based approach, (a)Original Image, (b)After region growing,
(c)After region merging, (d)Regions after removing small regions, (e)Boundaries of
regions superimposed on original image. . . . . . . . . . . . . . . . . . . . . . . . . . 30
5.3 Segmentation results on some randomly selected animal and bird images from the
database. The segmented regions are shown as white boundaries superimposed on
the original image. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
5.4 Segmentation results on some randomly selected images of natural and outdoor
scenes from the database. The segmented regions are shown as white boundaries
superimposed on the original image. . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
5.5 A query session in Imagefinder. (a)The user is asked to select his object of interest
by pressing the mouse button in the region of object. (b)After the user has selected
his object of interest, top 16 results are returned. . . . . . . . . . . . . . . . . . . . . 33
5.6 Top 16 result for the query of a crane, all 16 out of 16 results are relevant. . . . . . . 35
iii
5.7 Top 16 result for the query of sunflower. 9 out of 16 results are relevant. The
sunflower images form very small, about 0.2% part of the database. . . . . . . . . . . 36
5.8 Top 16 result for the query of fox. 8 out of 16 returned images are relevant. The fox
images form about 1% of the database. . . . . . . . . . . . . . . . . . . . . . . . . . . 37
5.9 Top 16 result for the query of sky. 14 out of 16 images have sky as a part of the
image and are therefore relevant. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
5.10 Top 16 result for the query of an eagle, 8 out of 16 images are relevant. The eagle
images form about 1% of the database. . . . . . . . . . . . . . . . . . . . . . . . . . . 39
5.11 Top 16 result for the query of an water, 6 out of 16 images are relevant. The water
images form about 1% of the database. . . . . . . . . . . . . . . . . . . . . . . . . . . 40
5.12 A typical feedback session in Imagefinder. The user has marked relevant images by
pressing the relevant radio buttons against them. The query based on the feedback
can be performed by pressing Query Again button. . . . . . . . . . . . . . . . . . . . 41
5.13 Top 16 results for the query of sunflower after one feedback iteration. Precision(without
feedback)=56.25%, precision(after one iteration)=68.75%. . . . . . . . . . . . . . . . 42
5.14 Top 16 results for the query of water after one feedback iteration. Precision(without
feedback)= 37.5%, precision(after one iteration)=50%. . . . . . . . . . . . . . . . . . 43
iv
List of Tables
5.1 Precision values for various categories of images present in the database. . . . . . . . 34
5.2 Precision values for various categories of images after one feedback iteration. . . . . 41
v
Chapter 1
Introduction
1.1 Introduction
With the pervasive use of computers both at work as well as home, a large amount of multimedia
information is being generated having wide variety of potential applications such as interactive
entertainment, video on demand, video rental services, news distribution, multimedia libraries, etc.
Powerful micro computers, high speed networking, high capacity storage medium, improvements
in compression algorithms and recent advances in fields of audio, video and imaging have made
multimedia systems more viable technically as well as economically.A typical multimedia database
which is composed of various multimedia objects archived together, must address the issues of
representation, indexing, retrieval and manipulation of the multimedia data.
It is therefore important to develop a retrieval system that can retrieve information effectively
and efficiently from these large databases. Image and video form a integral part of any multimedia
database. There are a number of applications that require images to be retrieved automatically such
as robotics, remote sensing, finger print recognition, automatic surveillance systems and medical
imaging applications. Retrieval of image data based on pictorial queries is an interesting and a
challenging problem which has developed as a major field of research due to the emergence of large
image databases and digital libraries. These databases typically consist of thousands of images,
making it difficult for the users to browse through the entire set. Various applications of digital
libraries and image databases have been described in literature.
1.2 What is Content Based Image Retrieval?
Early database systems employed textual features such as filenames, captions and keywords to
annotate and retrieve images. This method has several limitations. First and most important of
all, the content of the image cannot be always described in words.The perception of the image
is highly subjective and the same image may mean different to different people. In addition to
this, to fully understand the content of the image, the spatial relationship betweent the various
objects in the image should be expressed. If such a linguistic way of representation is applied to a
database that has to be employed globally (e.g. on world wide web),it will be severely limited by
the linguistic barrier. Apart from this, this method requires a human to annotate each and every
1
Figure 1.1: Overview of an Image Retrieval System
image in the database, which is simply not possible for large databases. Therefore it is desirable
to have a mechanism for retrieval that depends on the visual content of the image. The extraction
of visual features that can be used to measure similarity should be automatic for the system be
scaleable to large databases. The human perception of visual content is based on the colour content,
shape, texture, layout, position, etc. Therefore features based on these attributes can be employed
for making an effective retrieval system. Content Based Image Retrieval (CBIR) is a relatively new
area of research which deals with the retrieval of images that are similar to some query image in
visual content. Many image retrieval systems, both comercial and research, have employed these
criterion with varying degree of accuracy. Some of them are QBIC [1, 2] from IBM, Virage [3] from
Virage Inc, VisualSEEk [4] from Columbia Univ., PhotoBook [5] from MIT Media Labs, Netra [6]
from UCSB and MARS [7] from Univ. of Illinois.
A traditional image retrieval system first preprocesses an input images to extract its features
such as colour, texture, shape, etc. One of the most important part in this pre-processing is image
segmentation as the success of the entire system depends heavily on the accuracy achieved during
this stage. Better segmentation leads to better representation and therefore better results. The
features are then extracted from these segmented objects, which provide a means to efficiently
represent the information contained in it that can be compared with those of other object to
evaluate similarity. These features are stored along with the images in the database. When a query
image is presented, it is similarly preprocessed to extract its features, which are then matched with
the feature vectors present in the database. A ranked set of images with high matching scores is
presented at the output. The outline of a general CBIR system is shown in figure 1.1.
1.3 Problem Definition
Efficient access to digital images has become an issue of importance recently. In this thesis, we
address the problem of an efficient and accurate retrieval system that retrieves images similar in
content to a given query image. The image retrieval should be largely insensitive to variations in
image scale, rotation, and translation. In other words, even if the database contains the images
that are similar in content to the query image, but differ in their orientation, position, or size, the
image retrieval system should be able to correctly match the query image with its prototype in
the database. We aim to develop a system which have high accuracy as well as speed for online
2
applications
1.4 Motivation
The motivation for content based image retrieval arises due to its applications in numerous fields,
from areas as diverse as assembly line inspections to robotic navigation to medical image recognition
for detection of various deformities. The ease with which humans perceive the visual content of an
image into meaningful information is not yet understood well. The existing image retrieval systems
employ colour, shape and texture attributes in images with varying degree of success. The present
scenario leaves lots of scope for development of newer and more efficient algorithms for retrieval
systems.
1.5 Thesis Outline
The outline of the thesis is as follows. Chapter 2 briefly reviews the relevant literature on content-
based image retrieval methods. We propose the information theoretic approach to segmentation in
chapter 3. Chapter 4 focuses on feature extraction and organization of data for efficient retrieval.
Experimental results on an image database are presented in chapter 5. Chapter 6 presents the
conclusions and some ideas for future research.
3
Chapter 2
Literature Survey
Content-Based Image Retrieval involves three fundamental bases i. e. Image Segmentation, Visual
Feature Extraction and Multi-Dimensional Indexing. Much of the past work has been concentrated
on identifying the appropriate models for image feature such as colour or texture. For each database
image, a feature vector which describes the visual features is computed and stored. Given a query
image, its feature vector is evaluated and those images which have the nearest feature vectors to
that of the query image are retrieved. A general CBIR system is shown in figure 2.1. In the
following sections we briefly review the literature for three important processes in a CBIR system
highlighting the major contributions, techniques and methods available.
2.1 Segmentation
Segmentation is an important process in Content-Based Image Retrieval as both the shape features
and the layout features depend on a good segmentation. Segmentation can be defined as a process
of extracting the objects out of an image that can be used for extraction of features. Image
segmentation plays an important part in scene analysis and image understanding. Many techniques
have been proposed in literature and can be broadly classified in to two main categories: Region-
based and Edge-based. Region based approaches work by observing the neighbourhood of a pixel
for certain similarity. The Edge based approach tries to detect the dominant regions from a gradient
image through the procedure of edge following and linking. Segmentation techniques include region
growing, watershed analysis, region clustering, etc.
One of the earliest methods of image segmentation for gray scale images was proposed by
Pavlidis [8]. In this approach segments are obtained by using region growing approach and then
edges between the regions are eliminated or modified based on contrast, gradient and shape of the
boundary. In [9] Hansen and Higgins exploited a fast algorithm for watershed analysis along with
relaxation labeling. The image was subdivided in to catchment basins and then used relaxation
labeling to refine and update the classification. In [10] Serge et al. proposed an automatic segmen-
tation algorithm based on clustering in spatial colour-texture space by Expectation-Maximization.
It itteratively models the joint distribution of colour with a mixture of gaussians. The user can
directly access the regions and specify what aspects of the image are important to query.
4
Figure 2.1: A typical Content-Based Image Retrieval system.
2.2 Feature Extraction
Traditional image retrieval systems use a single visual attribute such as shape, colour or texture
to represent the image, and retrieval is based on the features used to represent that attribute.
Although this approach is simplistic, but may sometimes lack sufficient discriminatory information
and might not be able to accommodate large scale and orientation changes. For example a colour
based scheme may not be able to distinguish between a apple and a red house. Following sections
discuss the traditional approaches using a single attribute. A review of newer work integrating
these attributes is dealt with later.
2.2.1 Colour
Colour is one of the most widely used visual features in image retrieval. Not only is it more robust to
background complications, but also independent of the image size and orientation. Due to the ease
of availability of colour in digital image libraries in form of RGB,HSI, etc, it has been extensively
used as a feature in CBIR systems. Some studies of colour perception and colour spaces can be
found in [11, 12].
Colour histogram is the most commonly used feature representation. Statistically it represents
the joint probability of the intensities of the three colour spaces. Generally they are invariant
to translation and rotation. Normalization makes them insensitive to scaling. However a colour
histograms fails to incorporate the spatial connectivity information of the pixels leading to incorrect
retrieval. Besides colour histograms, many approaches such as Colour Moments [13], Colour Sets
[14] can be found in literature. Many research results have extended the global colour features to
local ones by dividing the image in to a number of sub-blocks and extracting colour features from
each of them. Some of these can be found in [15].
2.2.2 Shape
Although colour seem to be highly reliable attribute for image retrieval, it cannot provide sufficient
discrimination that is demanded by a CBIR system. Incorporation of shape features can greatly
5
enhance the selectivity and improve the performance. Also shape is an important attribute when
binary or gray scale images have to be dealt with. Shape representation can be broadly categorized
in to two categories, boundary-based and region-based. The boundary-based techniques employ
the information about the object’s outer boundary as a feature while the region-based approach
uses the entire region information to form the feature vectors. Boundary-based include polygonal
approximation of shape [16], shape matching using Fourier descriptors [17, 18], etc. Region-based
methods include object matching using invariant moments [19].
The main idea of Fourier descriptors is to use Fourier transformed boundary as the shape feature.
It helps to control digitization noise in the image boundary. Also a improved Fourier Descriptor
algorithm that is invariant to noise and geometric transformation is proposed in [20]. The main
idea of Invariant Moments is to use region based moments, which are invariant to transformations,
as shape features. In [19], Hu identified seven such moments that are invariant to rotation, scaling
and orientation. Many improvements have been suggested to incorporate the effects of digitization
on these moments such as [21].
2.2.3 Texture
Texture refers to the visual patterns that have the properties of homogeneity that do not result
from the presence of only a single colour or intensity. It is an innate property of all surfaces and
provides important information about the structural arrangement of surfaces and their relationship
to surrounding environment. In absence of colour and shape, texture can act as a vital attribute for
image classification. Texture analysis is an important and useful area of study in computer vision
and can be divided in to two main classes: Statistical and structural. Statistical methods define
texture in terms of spatial distribution of grey values. These include use of co-occurrence ma-
trices [22], autocorrelation methods(extracting periodicity of repetitive textural elements). Taura
et. al [23] explored the texture representation on the basis of psychological studies and developed
six texture properties given by coarseness, contrast, directionality, regularity, line likeness and
roughness. These are based on the computational approximation to the visual textural features.
Structural or Geometric methods are deterministic texels, which repeat according to placement
rules, deterministic or random. A texel contains several pixels whose placement can be periodic,
random or quasi-random.
Smith and Chang [24] used statistics(mean and variance) extracted from the wavelet sub-band
as a texture representation. In a more recent paper [25], Ma and Manjunath evaluated the texture
image annotation by orthogonal and bi-orthogonal Wavelet transform and Gabor wavelet transform.
They found that Gabor transform best among tested candidates, which matched the human vision
study results.
2.3 Indexing
To make a content-based Image Retrieval truly scalable to large size image collections, efficient
multi-dimensional indexing techniques need to be explored. The two main challenges before such
an indexing technique are:
1. High dimensionality.
6
2. Non-Euclidean similarity measures.
The basic idea to solve these problems is to first perform dimensionality reduction and then
apply any appropriate multi-dimensional indexing measure supporting Non-Euclidean similarity
measures.
2.3.1 Dimensionality Reduction
Even though the dimensions of a feature vector is very high the effective or embedded dimension is
much lower. There are two popular ways of dimensionality reduction: Karhunen-Loeve Transform
(KLT) and column-wise clustering.
KLT and its variations have been studied by many researchers in fields ranging from face
recognition to finger-print recognition. Faster implementations of KLT have been proposed by
Faloutsos and Lin [26]. It provides a dynamic update of indexing structure which is indispensable
for a application which requires newer images being added to the database.
Clustering is another powerful tool in dimensionality reduction employed in various fields such as
Pattern Recognition, Speech Recognition and Information Retrieval. Normally it is used to cluster
similar objects to form groups. This type of clustering is called row-wise clustering. However the
same technique can be applied column-wise yielding dimensionality reduction [27].
2.3.2 Multi-Dimensionality Indexing
The existing popular multi-dimensional indexing techniques include Bucketing Algorithm, k-d tree,
priority k-d tree [28], quad-tree, R and R+ trees [29, 30], etc. Since most of these approaches are
based on Euclidean similarity measures which may not be applicable in Image Retrieval systems.
There are two important techniques employed towards solving this problem i.e. clustering and Neu-
ral Networks. Various clustering algorithms supporting incremental clustering have been proposed
in literature like the ones by Charikar [31] and Rui and Chakrabarti [32].
In [33] Zhang proposed the use of Self Organizing Maps (SOM) Neural Nets as tool for con-
structing the tree indexing structure in Image Retrieval. The advantage of using SOMs were
learning ability, dynamic clustering and potential of supporting arbitrary similarity measures.
2.4 Discussion
We find that segmentation is one of the most important step in a CBIR system and the accuracy of
future steps depend very much on the accuracy achieved during segmentation. Also colour may not
be sufficient for representation as features, therefore incorporating shape and texture will improve
the quality of retrieval. The features chosen should be such that they can also handle change in
orientation and scaling. A large database will require efficient indexing as one of the main aspects to
be considered. A improved performance can be achieved by integrating various features combined
with faster indexing and retrieval methods.
7
Chapter 3
Image Segmentation
Image segmentation is the first and the most important step in image recognition and understanding
process. The purpose of segmentation is to divide the image into homogeneously similar regions.
Generally, the pixels of an object are homogeneous to each other compared with the pixels in the
other parts of the image, so a good segmentation helps in separating the objects from the other
parts of the image.Since the recognition and understanding are done on the basis of the objects, the
success of this step is crucial for a good performance of the entire system as error in this step will
be propagated further. This is the reason that accurate and automated image segmentation plays
an important role in image processing systems. It will be desirable for a colour image segmentation
algorithm to be insensitive to shadows, changes in lighting and surface reflection properties.
Traditionally, much of the segmentation literature is devoted to the segmentation of gray-scale
images. However, the images for our domain are general colour images. Instead of discarding the
colour information, we use it to get better segmentation. In the following sections, we present two
segmentation algorithms. The first algorithm follows a region based approach, while the second is
based on information theoretic clustering.
3.1 Region-Based Segmentation
The underlying assumption behind the region based approach is that the objects in an image
generally consist of similar and connected pixels. The problem then is to efficiently determine the
similar pixels and group them.
For an N × N image, let {F(x, y); x, y = 1, 2, ..., N} be a two-dimensional image pixel array.
For colour images, F(x,y) represents the colour at the pixel (x,y). Assuming the colour information
is represented in the form of three primary colours Red, Green and Blue, the image function can
then be written as F(x, y) = {FR(x, y), FG(x, y), FB(x, y)}. The basic procedure is to examine the
neighbourhood of a pixel and assign it the label of its neighbour if it is similar to it. This is done
by employing simple raster scan, i.e. scanning the image left to right, top to bottom. The idea
is to examine the neighbouring pixels, which have been previously assigned labels, and assign the
label of the similar neighbour to the new pixel. In case there is no similar neighbour, a new label
is assigned to the pixel and the pixel starts the beginning of a new region. In some cases, there
may be more than one similar neighbour with different labels, in such cases the similar labels are
8
marked equivalent and the new pixel is assigned the lower label. We have considered 8-connected
neighbourhood, but only the four neighbouring pixels need to be observed, since only these pixel
would have been assigned labels previously. This becomes clear from figure 3.1. The central pixel
is the pixel that is to be assigned a label. At this point, the four shaded neighbours have been
assigned labels, so only these need to be considered.
Figure 3.1: The 8-connected neighbourhood of the pixel to be assigned a label
The algorithm is divided into three parts- region growing, region merging and removing small
regions. We describe these in the following sections.
3.1.1 Region Growing
Starting with topmost left pixel, the algorithm scans the image from left to right and top to bottom
and for each pixel examines the four neighbouring pixels, which have been assigned labels previously
and find their similarity to the current pixel. The similarity criterion is based on the Euclidean
distance between the pixels. For two pixels F1 = {FR1, FG1, FB1} and F2 = {FR2, FG2, FB2} the
distance is given as:
dR(F1, F2) = |FR1 − FR2|
dG(F1, F2) = |FG1 − FG2|
dB(F1, F2) = |FB1 − FB2|



(3.1)
where di, i = {R, G, B} is the distance for the ith component.
The criterion for similarity between the two pixels is then given as:
di(F1, F2) < Ti, i ∈ {R, G, B} (3.2)
where Ti is the threshold value corresponding to ith colour component.
The threshold values determine the quality of the segmentation, a small value may lead to over-
segmentation whereas a higher value may cause under-segmentation. The good threshold values will
vary from image to image, so we have taken the threshold to be the 30% of the standard deviation
value of the entire image. This has the advantage that the threshold values change according to the
contrast of the image, lower for low contrast and higher for the high contrast images. Also it helps
to cancel the effect of local noisy pixel values. The choice of the thresholds is found to be rather
conservative and therefore another criterion is employed alongwith the above which is as follows:
di(µ1, F2) < κσi (3.3)
9
where µ is the mean value of the region to which the neighbouring pixel belongs and σ is its
standard deviation. The subscript i denotes the ith colour component, i ∈ {R, G, B}.
σ is the average distance of the pixel value from the mean value of the region, so the effect of the
above criterion is to increase the threshold value gradually so as to prevent the over-segmentation,
however this would not merge the two non-homogeneous parts as the pixel values in this case change
rapidly. This increase in the threshold value is controlled by the factor κ. By experimentation we
have found the factor of 1.10 as most appropriate. This does produce some over-segmentation, but
this is taken care of in the later stages. So the combined criterion is
di(F1, F2) < Ti or
di(µ1, F2) < 1.1σi
(3.4)
The criterion is applied separately on each colour component and should be satisfied by all the
components. Once the distance has been computed the labels are assigned based on the number of
similar neighbours. If there is exactly one similar neighbour, assign its label to this new pixel and
modify the mean and variance of the region corresponding to this label. If there are more than one
similar neighbour, mark their labels as equivalent and assign the lower of those labels to the new
pixel thereby merging the corresponding regions. If no similar neighbour is found, assign a new
label to the current pixel. The values of the mean and standard deviation of the region to which
the pixel is added are modified for the addition of current pixel.
The algorithm for the region growing is shown in Figure 3.2. The output of region growing
is normally a heavily over-segmented image, so we apply a region-merging algorithm. Prior to
merging, the image is blurred by doing a low-pass filtering by taking a window size of 3x3, so that
the small variations and fine texture are smoothed out. This helps in bringing the properties of
the similar regions closer to one another.
Algorithm 1
1. Starting with the left-most pixel, scan the image from left to right, top to bottom. Assign a
label to the first pixel.
2. For all the four neighbouring pixels evaluate distance based on equation 3.1.
3. Using equation 3.4, find the set of similar neighbours that satisfy the condition.
4. Assign the pixel a label same as that of the similar neighbour. In case of more than one
matching neighbour mark there labels as equivalent. Update the value of region standard
deviations.
5. If no similar neighbour is found, assign a new label to the pixel.
6. Repeat the entire procedure till the end of the image is reached.
Figure 3.2: Region growing algorithm
10
3.1.2 Region Merging
Once the regions have been generated by the growing process, there is a need for merging the large
number of regions. This is done by merging similar regions based on a similarity criteria. For our
case, two regions R1 and R2 are considered to be similar if-
|µ1 − µ2| < κ(σ1 + σ2) (3.5)
where, µi represents the mean value of the region i, σi is its standard deviation and κ is a scale
parameter controlling the extent to which the two regions are merged. Since most of the pixels lie
with in a distance of σ from the mean value µ, so if the two regions are to be merged, κ should be
greater than unity. By experimentation, we have found that making κ=1.2 gives good performance.
The algorithm for region merging is shown in Figure 3.3.
The region-merging step brings down the number of regions considerably (by a factor of 2-10).
Still we are left with a large number of small regions, particulary for images having high texture
content. So we merge all the regions whose area is less than one percent of the image area to the
closest matching neighbouring region. This is described next.
Algorithm 2
1. Mark all the regions as undecided. While there are undecided regions do
2. Pick one of the undecided regions and mark it as decided and current region Rc.
3. Examine the neighboring regions of Rc and evaluate their similarity with this region based
on the similarity criterion in equation 3.5
4. Merge the current Rc region with all the regions which are similar and modify current region
= merged region.
5. Mark the regions that are similar as decided.
6. Continue step 3 till no more similar regions are found.
7. Continue step 2 till there are no more undecided regions
Figure 3.3: Region merging algorithm
3.1.3 Region Pruning
The number of very small regions is usually large which do not contribute much to prominent visual
content, so we use a region pruning method. Here regions smaller than a particular size are simply
merged to the nearest most similar region. The algorithm for the same is shown in Figure 3.4. At
the end of the three algorithms a well segmented image is formed which can now be used to extract
the feature vectors and storage in the database.
11
Algorithm 3
1. Mark all the regions with area less than one percent of the image area as undecided.
2. Pick up an undecided region and mark it as the current region Rc.
3. Find the closest matching neighboring region Rm and merge Rc with it and mark the merged
region as Rc. If area of Rm is less than one percent of image area mark Rm as decided.
4. Do step 3 till the area of Rc is less than one percent of image area.
5. Continue step 2 till there are undecided regions.
Figure 3.4: Region pruning algorithm
3.2 Information Theoretic Approach
Apart from the gradient,histogram and region based approaches, there is one other approach to
image segmentation which has gained attention in the recent past. This approach bears similarity
to the approach used for data clustering. In a way, image segmentation can be viewed as the
problem of clustering. The clustering can be supervised, when we have the information about
the number of regions and their characteristics. Examples are domain specific applications like
industrial inspection applications, automated inspection of electronic assemblies with the objective
of determining the presence or absence of specific anomalies such as missing components or broken
circuit paths, etc. On the other hand the clustering will be unsupervised, when we don’t have
any prior information about the image content. Examples include the segmentation of general real
world images.
The use of information theoretic approach to clustering is quite old and has a long history,
particularly in the field of artificial intelligence and machine learning. Well known examples based
on this approach are Quinlan’s [34] ID3 for constructing decision trees, Fisher’s [35] COBWEB,
Granary’s [36] CLASSIT, etc. Recent works on image segmentation using information theoretic
approach include [10, 37]. In the following sections, we present an algorithm based on this approach.
Section 3.3 discuss the basis for determining the suitable number of clusters. In section 3.4
we present its application to image segmentation, while sections 3.5 presents the segmentation
algorithm.
3.3 Information Gain by Clustering
Information can be defined in several ways. For our purpose, we define information as the ability to
correctly predict the attributes of instances. This definition is very intuitive since more information
we have about the instances, better will be our prediction about their attributes. The attributes
in our case are the features derived from image which represents its content like color, texture, etc.
The basic idea behind partitioning is that the partitioning of objects into certain classes leads
to an increase in our information. The membership to a class imposes certain restrictions on the
12
values of their attributes thereby increasing the ability to predict them. This also corresponds to
the purpose of image segmentation, as by segmenting the image we are trying to find regions which
are homogeneous. Based on the definition of information above, we define classification gain as
the increase in information by partitioning over the information that is available without any such
partitioning. Assuming that the attributes are independent of one another, the expression for the
classification gain can then be written as [35]
Gain(k) =
K
k=1
P(Ck)
I
i=1
values
j
P(Ai = Vij|Ck)2
−
I
i=1
values
j
P(Ai = Vij)2
K
(3.6)
where K is the number of classes and I is the number of attributes.
As shown by Gluck and Corter [38], the subexpression i j P(Ai = Vij|Ck)2 is the expected
number of attribute values that can be correctly guessed for an arbitrary member of class Ck.
It assumes that one guesses a value Vij for an attribute Ai with a probability which is equal to
its probability of occurrence i.e. P(Ai = Vij|Ck) and that this guess is correct with the same
probability. The first term in the numerator of (3.6) is therefore a measure of the expected number
of correct guesses given a set of K categories, while the second term represents the expected
number of correct guesses without this knowledge. The division by K lets one compare different
size clusterings and acts as a penalty on the increase in the number of categories.
3.4 Using Classification Gain for Segmentation
The expression for classification gain in (3.6) assumes that the attributes of instances takes discrete
values. In our case, the instances are image pixels and the attributes are features extracted from
them for segmentation. These attributes are colour, texture features, etc. so the values that they
take are in general continuous. So we need to generalize (3.6), in particular the two innermost
summations values
j P(Ai = Vij|Ck)2 and values
j P(Ai = Vij)2 need to be generalized for the
continuous domain.
The summations will then change to integration and we need to make some assumption about
the distribution of values. Without any prior knowledge, we assume that the values of attributes in
each class follow a gaussian distribution. Though the validity about the assumption of such a simple
distribution can certainly be questioned, the experimental results suggest that this assumption is
approximately correct for general real world images. For the first summation, the distribution is
for a particular class, while the second summation uses the distribution for the whole image which
can be viewed as a single class. In either case, the integral becomes
values
j
P(Ai = V ij)2
=
∞
−∞
1
σ2
i 2π
exp −
x − µi
σi
2
dx =
1
2
√
π
1
σi
(3.7)
where µ is the mean and σ is the standard deviation.
Since, the expression for gain is to be used for comparison only, so the factor of 1/2
√
π can be
discarded. So our expression for the gain simplifies to
13
Figure 3.5: The image of an elephant and its representation in Luv space
Gain(k) =
K
k
P(Ck)
I
i
1
σik
−
I
i
1
σi
K
(3.8)
where I is the number of features, K is the number of classes, σik is the standard deviation
for a given feature in a given class and σi is the standard deviation for a given feature in the
entire image.From (3.8) it is clearly evident that maximizing gain requires maximizing 1/σik or
minimizing σik which is equivalent to maximizing intra-class similarity.
The use of (3.8) however introduces a problem. When σ = 0, the value of 1/σ becomes infinite.
To resolve this, we use the notion of acuity as suggested by Gennari [36], a system parameter that
specifies a minimum value for σ. Specifying a minimum value for σ is motivated by the fact that
our perception ability does not have infinite resolution. The limit on σ corresponds to the notion
of a “just noticeable difference” in psychophysics - the lower limit on our perception ability.
3.5 Colour Image Segmentation
3.5.1 The Segmentation Algorithm
We do the segmentation on the basis of colour features. The LUV space is used for its perceptual
uniformity. It also decouples the luminance and colour components. This is important for our
assumption of their independece. The system starts by extracting the feature vector at each pixel.
For an image with N pixels, we get N data points which give there representation in the feature
space. Figure(3.5.1) shows such a representation. These data points are then clustered by the
k-means algorithm. The number of clusters K into which the data is clustered is varied and the
value of gain resulting from the classification is then calculated for each K by (3.8). We vary K
from 2 to 10. The value of K for which the maximum gain is obtained is taken as optimum. This
gives the partitioning or segmentation of the image in the feature space.
To calculate the classification gain by (3.8), we have yet to specify σmin, the minimum value of
σ. As discussed before, this limit on σ is an indication of the lowest limit of our perception ability.
This provides the clue for determining σmin. Since we are using the colour features, it is possible
to experimentally determine the minimum difference in the values of say luminance that is “just
noticeable”. Let this value difference be denoted as σabs min. Since the Luv space is perceptually
uniform, the value of σabs min for the three components can be taken as same. However, the use of
14
σabs min as σmin leads to oversegmentation as though the colour in an object is generally uniform,
the variation is often perceptible. Normally, there are only few dominant colours in an image, so we
set the the value of σmin to be a fraction of the σ over the entire image. The value of the fraction
used by us is 0.1. For images having low contrast, this value sometimes go below σabs min. So an
appropriate estimate for σmin is
σmin = max(σabs min, 0.1 × σ) (3.9)
The classification of the feature vectors gives the segmentation in the feature space. However,
the pixels belonging to the regions in the image must also be spatially connected for the region
to provide a meaningful representation of objects. Therefore, to extract the regions we group the
pixels which are spatially connected and belong to the same cluster in the feature space. This is
done by labelling the pixels in the image by the cluster to which they belongs and grouping the
spatially connected pixels having the same label. An algorithmic description of the entire process
is given in Figure(3.6).
Algorithm 4
1. Extract feature vectors from the image.
2. Initiliaze MaxGain=0, OptimumClusterNumber=-1.
3. For K=2 to 10
Cluster by k-means algorithm.
Calculate gain.
if gain > MaxGain
MaxGain = gain.
OptimumClusterNumber = K.
end
end.
4. Corresponding to the OptimumClusterNumber find cluster centers.
5. Label each pixel by the cluster to which it belongs.
6. Group spatially connected pixels belonging to the same cluster.
Figure 3.6: Information based segmentation algorithm
3.5.2 k-means Clustering
k-means algorithm is a generalization of Lloyd-Max algorithm to multiple dimensions. The straight
forward extension of Lloyd-Max algorithm to multiple dimensions, however is computationally ex-
pensive particularly for large number of data points (even for a small size image of dimensions
300x200, we will get 60,000 data points). Fortunately, some fast implementations of k-means al-
gorithm [39, 40] has been recently proposed which attempts to reduce the number of distance
15
computations by arranging the data points in a suitable data structure and utilising some geomet-
rical constraints. We use the algorithm proposed by Kanungo and others [39].
Given a set of initial center points, the k-means algorithm iteratively updates the center points
based on the minimization of a cost function. The most commonly used cost function is the mean-
square error function. The performance of the algorithm often depends on the choice of the initial
center points. There is no known way of selecting a set of initial center points which provides a
global optimization. Here we use a simple strategy for choosing set of initial center points which
is good enough for our purpose. Though there is no proof that this strategy will provide global
optimization, it is definitely much better than choosing random initial points. Experimental results
also suggest that it performs very well without significantly increasing the computational time.
The k-means algorithm proposed in [39] uses a kd-tree to organize the data points. A kd-tree
[28] is a binary tree which represents a hierarchical subdivision of the data set. Each node of the
kd-tree is associated with a closed hyperbox, called a cell. The root’s cell is the bounding box of
the whole data set. If the cell contains one point, then it is declared to be a leaf. Otherwise, it is
split into two hyperrectangles by an axis-orthogonal hyperplane. The points in the cell are then
partitioned to one side or the other of this hyperplane. The resulting subcells are the children of
the original cell, thus leading to a binary tree structure. Thus, the use of kd-tree tree provides a
hierarchical organisation of the data. We use this property of the kd-tree to derive our set of intial
center points.
Let N be the number of nodes at depth d in a kd-tree. Then N and d are related by
N = 2d
(3.10)
Let k be the number of clusters in which the data is to be classified.Let Nd be the set of nodes
at depth log k +1. It may be recalled that each node of the kd-tree is associated with a cell which
occupies a part of the data space. Let C be the centroid of the the data points contained in the
cell corresponding to a node. Let Nc be the set of centroids associated with the nodes at depth
log k + 1. We treat this set of centroids as the candidates for our set of initial data points. Let
Nk denote the set of k centroids randomly chosen from this set. Considering points in Nk as the
cluster centers, we find the mean-square error. This is then repeated for a different Nk. The set
which gives the least mean square error is taken as the set of initial center points. This approach
is similar in spirit to the genetic algorithms.
3.5.3 Post-processing
The segmentation algorithm gives good results, however sometimes the following problems arise -
• Sometimes when the image contains a large background part with gradual colour variation,
the background gets splitted into two or more parts. This happens due to the large (though
uniform) spread of the cluster corresponding to background in the feature space, which cause
it to break into two or more smaller clusters. To circumvent this, we employ an edge based
post processing. Specifically, if a large part (atleast 75%) of the common boundary between
two regions have low gradient, the two regions are merged. Let T90 be the gradient value such
that atleast 90 percent of the pixels in the image have their gradient less than T90. A pixel is
considered to have a low gradient if its gradient value is less than 0.5 × T90.
16
• Normally, the group of spatially connected pixels belong to the same cluster, however due to
noise sometimes a pixel may belong to a different cluster than its neighbouring pixels. This
results in the formation of very small regions (consisting of a few pixels). Such small regions
do not have any importance and they are ignored. Specifically, we ignore a region if its area
is less than one percent of the total image area.
17
Chapter 4
Feature Extraction and Database
Organization
4.1 Feature Extraction
Feature extraction can be viewed as a mapping of an image to a feature space. Let f represents a
mapping from the image space on to a N -Dimensional feature space, x = {x1, x2, ..., xN }, i.e.,
f : F → x,
where N is the number of features used to represent the regions of the image. For two different
regions, feature extraction should produce two feature vectors which are distinct and dissimilar,
while for two similar regions the feature vectors should also be similar. This similarity may be
evaluated based on some distance measure. An efficient matching scheme depends on the amount
of discriminatory information contained in the extracted features. Various representations have
be discussed in literature like Fourier descriptors, histogram of edge angles, Invariant moments,
etc for shape and colour variance, colour histogram, etc for colour. The extracted features should
be invariant to rotation, scale and reflection. The following section discusses the features used to
represent the images in the database.
4.2 Shape Features
Image retrieval based on object shape is considered to be one of the most difficult aspects of content-
based image retrieval because of difficulties in low -level image segmentation and the variety of ways
a given 3D object can be projected into 2D shapes. The features used for shape representation
should be able to provide sufficient discriminatory shape information that is more or less invariant
to various projections.
4.2.1 Invariant Moments
Moment invariants are a set of seven moments that are invariant under scale, reflection and rotation.
The shape of an object can be expressed in term of these 7 invariant moments [19]. For an image,
18
the central moment of order (p+q) is given by:
µpq =
x y
(x − x)p
(y − y)q
(4.1)
where, x and y represent the mean of the x and y co-ordinates of the region respectively and are
given as:
x =
x y
x
n
y =
x y
y
n
(4.2)
where n is the number of points lying in the region.
The normalized central moments, denoted by ηpq are defined as:
ηpq =
µpq
µγ
00
(4.3)
where,
γ =
p + q
2
+ 1 (4.4)
for p+q = 2,3,...
A set of Moment invariants based on the 2nd and 3rd order are given as follows:
M1 = (η20 + η02),
M2 = (η20 − η02)2 + 4η2
11,
M3 = (η30 − 3η12)2 + (3η21 − η03)2,
M4 = (η30 + η12)2 + (η21 + η03)2,
M5 = (η30 + η12)(η30 − 3η12)[(η30 + η12)2 − 3(η21 + η03)2]
+(3η21 − η03)(η21 + η03)[3(η30 + η12)2 − (η21 − η03)2],
M6 = (η20 − η02)[(η30 + η12)2 − (η21 + η03)2]
+4η11(η30 + η12)(η21 + η03),
M7 = (3η21 − η03)(η30 + η12)[(η30 + η12)2 − 3(η21 + η03)2]
−(η30 − 3η12)(η21 + η03)[3(η30 + η12)2 − (η21 − η03)2].



(4.5)
M1 through M6 are invariant under rotation, scale and reflection. M7 is invariant only in its
absolute magnitude under a reflection.
4.2.2 Eccentricity and Compactness
Two more shape features are used for the shape representation. These are Eccentricity and Com-
pactness and are defined as follows as given by [41].
Eccentricity =
Imin
Imax
=
µ20 + µ02 − (µ20 − µ02)2 + 4µ2
11
µ20 + µ02 + (µ20 − µ02)2 + 4µ2
11
(4.6)
19
where, µp,q is the (p+q) order central moment defined in eq. 4.1, Imin and Imax represent the short
axis and the long axis respectively. It can be termed as the ratio of the minor axis to the major
axis of the best fitting ellipse of the shape.
Compactness =
4πA
P2
(4.7)
where, P is the perimeter and A is the area of the polygon describing the shape. Compactness
expresses the extent to which a shape is a circle. A circle’s compactness is 1 and a bar’s compactness
is close to 0.
4.3 Colour Features
Color is one of the most recognizable elements of image content, and is a very important attribute
in extracting information from images. It is relatively robust to background complications and
independent of image size and orientation. We treat the colour in Luv space because of its percep-
tual uniformity and the colour features are given by the mean and the variance of the colour in the
regions. That is the feature corresponding to colour are given as:
µi =
x y
Fi(x, y)
n
σ2
i =
x y
(Fi(x, y) − µi)2
n



(4.8)
∀(x, y) in the Region, n represents the number of points in that region and i ∈ {L, U, V }
The six colour features combined with the nine shape features constitute a set of fifteen at-
tributes that form the feature vectors used to setup the database. On the basis of these features it
is possible to perform retrieval through colour, shape or both. Since the amount of feature vectors
that need to be stored for a large image database is quite high and therefore query processing
time will largely depend on the size of the database. The need for faster query processing can be
addressed by the use of efficient database organization and indexing. Since the retrieval is based
upon nearest k retrievals, there is a need for a indexing scheme that can incorporate the same. In
the following section, we describe database organization.
4.4 Database Organization
Database organization is an important issue in the field of data mining and pattern recognition.
In the absence of any prior organization, a linear search in the whole database is required for
determining the nearest neighbours around a given data point. Particularly for large databases,
this linear search is time-consuming and often prohibitive. To support efficient searching, a number
of ways of organizing the database have been proposed. At the heart of these methods, is a way of
hierarchically partitioning the database into smaller units.
20
4.5 Trees
4.5.1 Binary Trees
Tree structures have been used for long for recursively partitioning the data set. The simplest
of the tree is binary tree. As the name suggests, each non-leaf node in the binary tree is split
further into two nodes which are called its child nodes. The data set in the node is then partitioned
into two non-overlapping parts, each associated with one of the child node. If the partitioning is
appropriate, the number of data points in the two nodes are approximately equal. The most widely
used criterion for partitioning is based on median sub-division. Specifically, the median value of
the points in the data-set at the node is determined and the points are divided on the basis of
whether they lies to the left or right of the median value. This produces two child nodes, each with
approximately half of the points in the data-set.
Let the number of data points in the database be N. Based on the median sub-division, the
depth of the tree formed is then
d = log2 N (4.9)
To search for a point, the value of the point is compared with the median-value at each node.
If the value of the point is less, the search is then carried on the left child, else it is carried on the
right child. This procedure is repeated till a leaf node is reached. The number of comparisons that
is required is therefore proportional to the depth of the tree, which is logarithmic in the number of
data points. The search complexity is therefore logarithmic.
t ∝ O(log N) (4.10)
4.5.2 Multidimensional Trees
The counter-part of binary trees in multi-dimension is called kd-tree. The splitting in the kd-trees
is done on the basis of the median value along a particular dimension which is generally taken as the
one having the largest range. At each stage therefore the data is divided into two parts along the
dimension having largest range. Each node in the kd-tree is then associated with a hyper-rectangle
which encloses the data points associated with the node. Will will refer to this hyper-rectangle
by cell of the node. The search proceeds in a similar manner. Specifically, at each non-leaf node,
the search-point’s value along the dimension at which the node is split is compared. The search
complexity is similarly logarithmic.
4.5.3 Limitations
The tree data structures by their very nature allow very fast identification of cell in which the data
point to be searched (referred by query data point hereafter) lies. However, the database may not
contain the query data point. In this case, the data-point nearest to the query point is desired.
However the cell containing the query point may not contain its nearest neighbour. Figure 4.1
illustrates this. Let us assume that the maximum distance rmax with in which the nearest neighbour
lies is known (a priori knowledge of rmax is not required, it is shown later how to estimate rmax. The
search space then is a hyper-sphere (for searches based on euclidean distance) of radius rmax which
21
Figure 4.1: Nearest point to the query point need not lie in the same cell in which the query point
lies. Point 1 represents the query point whose nearest neighbour point 3 lies in a different cell.
contains or intersects more than one cell and all the cells with in or intersecting this hyper-sphere
should be evaluated. Further, the search is often required not just for a single nearest point, but for
a certain number of nearest points. This is known as k-nearest neighbour search in the literature.
The movement in the trees can be considered as a kind of vertical or top-down movement which
helps to locate the cell in which query data point lies very fast. However, once the cell containing
the query point is located, what is required is a kind of horizontal movement 1 around that cell for
searching nearest neighbours. The trees by their very nature support only the vertical movement
and not the horizontal movement. A number of methods have been proposed in the literature to
search k-nearest neighbours around a given point . In doing so, most of the methods attempt to
backtrack in the tree, which is not very efficient. This is because the data structures used by these
methods do not provide support for the horizontal movement around a point.
Another limitation of the search based on trees is that they do not allow the weighted search
for points in the multi-dimensional space i.e. the search where the weights on each dimension can
be varied at run time. This is an important problem and arises frequently in a number of fields.
In our case, it arises when searching for similar images from a database based on user feedback.
The user can either tell in advance (while querying) the degree of importance of different features
e.g. “the shape features are more important to me than the colour features for this image” or the
relative importance of the different features can be ascertained from the feedback provided by the
user (this set of images are relevant and this set is irrelevant for my query). The same problem
arises when the data points have different ranges in different dimensions and have to be normalized.
The search problem then essentially translates to a weighted search in which more weight is given
to features which are more important and less to others.
In order to overcome the above mentioned limitations we present a data structure called here-
after C-tree, for connected-tree. The data structure is based on kd-tree, however it differs in an
important way in that it provides the possibility of horizontal movement around a point very
efficiently. There is no restriction on the number of dimensions. We also show how this possibil-
ity of horizontal movement allows us to do weighted nearest neighbour search efficiently without
modifying the tree.
1
Strictly speaking, this term should be applied only to two dimensional data. In a broad sense, however we mean
breadth first search around the point though the purpose is different
22
4.6 C-tree
C-tree is a data structure which is very similar to the kd-tree. It differs from the kd-tree in that
each node also maintains the information of its connected siblings apart from its children. Two
nodes are connected siblings if they are at the same level in the tree and if their cells share a
common boundary. We will distinguish between two kinds of connected siblings, corner connected
and side connected. Two siblings are side connected if the common boundary between them is a
non-zero hyper-plane of dimension d-1, where d is the dimension of the data points, otherwise it
is a corner sibling. For searching, only the information about side connected siblings need to be
maintained. The reason for this is as follows. Suppose that we are now at a certain node which we
call current node and are looking for the nodes enclosed or intersecting with the search hyper-sphere
which are its siblings. If a search hyper-sphere encloses or intersects a corner connected sibling,
then there will always be another node lying with in or intersecting the hyper-sphere which will be
side connected to both the current node and the corner connected node. Also that side connected
sibling will be visited during the search (all nodes lying with in or intersecting the hyper-sphere are
required to be visited). Therefore, the corner connected sibling will always be visited via the side
connected sibling. Keeping its information is therefore not required. To summarize, apart from
the information kept by a kd-tree node, a node in the C-tree also keeps the information about its
side connected siblings. This extra information needs to be kept for every node in the tree while
building the tree. After the tree has been built, this extra information can be removed from the
non-leaf nodes, as the non-leaf nodes are used only in top-down traversal during searching which
does not require this information.
4.6.1 Building C-tree
Organization of the database is done by building the C-tree corresponding to the data points in the
database. The process for building the tree is similar to building the kd-tree, except for an extra
procedure for making connections between the side connected siblings in the tree. We will call this
procedure as MakeConnection. The connection making starts at the root node and the connections
are made recursively, i.e. at the time of making the connections for a node, the connections for its
parent would already have been made. Each parent is responsible for making the connection of its
child nodes. Suppose that we are at the node np, the parent node. If np is a leaf node i.e. if it does
not have any children, nothing needs to be done. Otherwise, let its children be nlc and nrc, left
and right child. nlc and nrc will always be side connected siblings as they are made by splitting
np along one dimension. So they are connected to each other. The only other possible candidates
for side connected siblings of nlc and nrc are the children of np’s side connected siblings. In case
any side connected sibling of np does not have children, it itself becomes candidate for nlc and
nrc’s side connected siblings. A simple procedure which is used to determine whether two nodes
are side connected siblings or not is as follows. Let d be the dimension of data-points. Then, along
each dimension, we see whether the two nodes just intersect at a point or whether they have a
common region of non-zero length. This can be done very easily. For ith dimension, let v1imin and
v1imax be the minimum and maximum values of cell corresponding to node 1 along this dimension.
Similarly v2imin and v2imax are the minimum and maximum values of cell corresponding to node 2
along this dimension. If v1imax < v2imin or v2imax < v1imin , then node 1 and 2 are not connected
23
Algorithm - MakeConnection
MakeConnection(Node parent)
if parent is a leaf node
return
end
Connect parent’s left and right son.
For each connected sibling S of parent
if S is a leaf node
if S and parent’s left son are side connected
Connect S and parent’s left son
end
if S and parent’s right son are side connected
Connect S and parent’s right son
end
else for each child C of S
if C and parent’s left son are side connected
Connect C and parent’s left son
end
if C and parent’s right son are side connected
Connect C and parent’s right son
end
end
end
MakeConnection(parent’s left son)
MakeConnection(parent’s right son)
Figure 4.2: Algorithm for making connections in a C-tree
at all. Else if v1imax = v2imin or v2imax = v1imin , the two intersect at a point in the ith dimension,
otherwise the two share a common region of finite length along ith dimension. The procedure is
illustrated in Figure 4.3. Two nodes if they are connected, are side connected siblings if they share
a common region of non-zero length in d-1 dimensions. Connections are then made between two
side connected siblings by storing a pointer of the sibling in the node. An algorithmic description
for procedure MakeConnection is shown in Figure 4.2.
4.6.2 Nearest Neighbour Search
This section explain how the nearest neighbour search can be performed by using a C-tree. As
mentioned before, often weighted search is required. The explanation will be therefore for weighted
24
(a) (b) (c)
Figure 4.3: Two nodes (a)separated, (b)intersecting, and (c)sharing a region of finite length in the
horizontal direction.
k-nearest neighbour search. The euclidean search and nearest neighbour search are just the special
cases of this weighted search. In the former, the weight along each dimension is same, while in the
latter k is one. So the same procedure applies.
Let Xq be the data point and k be the number of nearest neighbours to be queried. Let d be
the dimension of data points and w be the weight vector by which dimensions are weighted. That
is,
w = {w1, w2, ..., wd}
The distance between the two data points x1 and x2 is then given by
d12 =
d
i=1
[wi(x1i − x2i)]2
(4.11)
The first step in the search is to locate the cell in which Xq lies. This is done by traversing the
tree in a top-down manner. Starting at the root at each node, the value of Xq along the dimension
at which the node was split is compared with the value by which the node was split. If the value
is less than the splitting value, the process is repeated at the left child, else the process is repeated
at the right child. This is done till a leaf node is reached. Xq is contained in the cell of this node
which we call Cellq. In each node we maintain a variable called checked, this variable is true if the
node has been evaluated, else it is false. Two lists are maintained, one is a sorted list of k-nearest
neighbours seen so far. We call this list Lsn. The sorting is done by their distance from Xq. The
other is a list of nodes whose siblings are to be evaluated along with the minimum distance of their
cell from Xq. This list which we call Lr is also sorted by the minimum distance of the cell from
Xq. Let rmax be the distance of last element in Lsn. The search space for weighted search will be
a hyper-ellipse the length of whose axis along a particular dimension will be inversely proportional
to the weight along that dimension. The points on the boundary of this hyper-ellipse will have
their distance equal to rmax. The search procedure need not consider the nodes which lie entirely
outside this hyper-ellipse, since we already have k points whose distance is less than or equal to
rmax. A node will lie entirely outside the hyper-ellipse if its minimum distance from Xq is greater
than rmax. This requires finding the point in the cell which is nearest to the query point. This
can be done very easily as follows. Recalling that the cell is a hyper-rectangle, let the coordinates
of this hyper-rectangle along the ith dimension be bimin and bimax. The coordinates of the point
25
which is nearest to Xq along ith dimension is bimin if bimin is greater than Xqi, else bimax if bimax
less than Xqi, else it is zero. The minimum distance can then be calculated by equation 4.11. The
search algorithm proceeds as follows. After locating Cellq, we initialize Lsn and Lr. This is done
by inserting the first k points in Cellq and its siblings in Lsn and the corresponding nodes in Lr.
The variable checked of nodes which are inserted are marked true. If the number of siblings of Cellq
is less than k, the initialization is done by inserting points and nodes corresponding to siblings of
other nodes (in the order in which they appear) in Lr. This gives an initial bound on rmax. Now
the nodes in Lr are examined for their siblings. Specifically, the first node in Lr is taken and if its
minimum distance is less than rmax, its sibling whose checked is false is examined. The value of
checked is marked true for this sibling. If the minimum distance of the cell corresponding to the
sibling is less than rmax, its node is inserted in Lr. If the distance of the point corresponding to
the sibling is less than rmax, the last point is removed from Lsn and the point is inserted in Lsn.
The value of rmax is then updated. By this, we are continuously reducing the space which has to
be examined for the search. If all the siblings of the node have been examined, i.e. their checked
are marked true, it is removed from Lr. The search terminates when the minimum distance of the
first node in Lr becomes greater than rmax. After the algorithm terminates, the points in Lsn are
returned as the k-nearest neighbours.
4.7 Matching
Once the database has been properly organized, the query can be done by evaluating the similarity
based on some distance measure. The distance measure defines the closeness of two features in the
database. However, we cannot use a simple euclidean distance as the distance measure. The reason
is that the different features have different ranges, so to give all components equal importance some
kind of normalization is necessary. Assuming the features to have a Gaussian distribution, we use
can compute the mean µi and standard deviation σi for ith feature. The normalization for ith
component of a feature vector x can then be done as follows -
xi =
xi − µi
σi
(4.12)
It is easy to show that the probability of a normalized feature value to be in the range of [-1,1]
is 68%. It is also easy to show that the Euclidean distance of normalized feature vectors correspond
to the Tokuhara distance for un-normalized feature vectors. The Tokuhara distance between two
vectors x1 and x2 is given by
d2
(x1
, x2
) =
N
i=1
(x1
i − x2
i )2
σ2
i
(4.13)
where, σi is the standard deviation of the ith component of the feature vector and N is its dimension.
In our system, instead of normalizing each feature vector, we have used the Tokuhara distance.
This allows us to incrementally increase the database without modifying the existing feature vectors.
At the time of query, the images are ranked in increasing order of the distance from the queried
object feature vector based on the above distance measure.
26
4.8 Relevance Feedback
Relevance feedback is a technique which is used to assess the importance of different features by
learning from the feedback provided by the user. The learning can be done by both positive as
well as negative feedback. The objective of the learning is to present more relevant results to the
user.The application of this technique to content based image retrieval is recent and examples of the
systems using feedback in image retrieval are MARS [7],etc. Most of the systems using relevance
feedback use only positive examples for learning. However, our experimentation suggests that the
use of both positive and negative examples greatly increases the precision and recall than only
when positive examples are used.
The basic method behind learning from feedback is to assign different weights to different
features according to their importance, more important features are given more weight while less
weight is given to less important features. The features used for similarity matching and retrieval
in our system are colour and shape features. There are six colour features, the mean and standard
deviation of luminance (L) and chrominance (u,v) components and nine shape features consisting
of the seven invariant moments, eccentricity and compactness.
Initially, as we have no priori information about the importance of different features, equal
weight is assigned to each feature and the results are presented to the user. Our system Im-
agefinder presents top 16 results. The user then marks images which he considers as relevant.
Thus, the results are divided into two sets - relevant and non-relevant. Let SR and SNR denote
the set of relevant and non-relevant images respectively. The aim is to find the consistent set of
features for both relevant and non-relevant images. For the former the weight is increased whereas
the weight is decreased for the latter case. Consider the relevant set SR first. The features which
are important will be consistent in SR, i.e. they will have similar values in the images of relevant
set. On the other hand, the features which are not important will vary across the set. Therefore,
the inverse of the standard deviation of a feature represents a measure of its weight i.e.
wi ∝
1
σR
i
(4.14)
where σR
i is the standard deviation of the ith feature under consideration, computed from the
relevant images.
Using just the relevant images however has one drawback, it cannot recognize the features which
are non-discriminatory, i.e. which are similar in both relevant and non-relevant sets. Ideally, the
weight of these features should be left unchanged. To overcome this drawback, we use the standard
deviation of features in the images of non-relevant set also. However in SNR, the features which are
consistent (having low standard deviation) represent the features which are not important, since
the results based on these features are not marked relevant by the user. So for non-relevant results,
the weight of a feature is directly proportional to its standard deviation across SNR i.e.
wi ∝ σNR
i (4.15)
where i is the feature under consideration and σNR
i is its standard deviation computed from the
non-relevant images.
27
Combining (4.14) and (4.15),
wi ∝
σNR
i
σR
i
(4.16)
Since the weights are used for the calculation of distances which are used in the relative sense
only, the constant of proportionality can be taken to be one. Therefore,
wi =
σNR
i
σR
i
(4.17)
Using (4.17) has the advantage that the weights of the features which are non-discriminatory
will remain unchanged and will be close to 1. This is because the standard deviation for the non-
discriminatory features will tend to have similar values for images in both relevant and non-relevant
set and therefore cancels out.
28
Chapter 5
Experimental Results
This chapter presents the results using our system Imagefinder. We first present the results
on image segmentation using both region based and information theoretic approaches. Then the
results on query is presented. Finally we present the results on improvement by using feedback.
5.1 Image Database
The database used in our system consists of about 5000 images taken from a number of sources,
mostly INTERNET. The major part of the images in the database were provided by University
of California, Berkeley [42]. The images in the database consists of variable size images from a
number of categories like natural scenes, animals, birds and other outdoor images. For preparing
the database, the images were segmented and the feature vectors were extracted from the regions.
The preparation of the database was done offline.
5.2 Segmentation Results
5.2.1 Region-based Approach
This section shows the results of region based segmentation on two real world images. Figure 5.1(a)
is an image of a sunflower. The regions after region growing are shown in Figure 5.1(b). The area
of a region is shown by the same colour. From the figure, it is clearly evident that the outcome of
region is a heavily over segmented image. Figure 5.1(c) shows the regions after region merging. The
number of regions have now decreased drastically, and the shape of the objects in the image are
clearly evident. However, many small regions remain which are removed by region pruning. Figure
5.1(d) shows the boundary of the regions obtained after region pruning. Finally, the boundaries
are shown superimposed on the original image in Figure 5.1(e). Figure 5.2 shows the result on an
apple image.
5.2.2 Information Theoretic Approach
In this section the results of segmentation based on information theoretic are shown on a variety of
real world images. Figure 5.3 shows the results on some randomly selected images of animals and
29
(a) (b) (c) (d) (e)
Figure 5.1: Segmentation by region based approach, (a)Original Image, (b)After region growing,
(c)After region merging, (d)Regions after removing small regions, (e)Boundaries of regions super-
imposed on original image.
(a) (b) (c) (d) (e)
Figure 5.2: Segmentation by region based approach, (a)Original Image, (b)After region growing,
(c)After region merging, (d)Regions after removing small regions, (e)Boundaries of regions super-
imposed on original image.
birds, whereas Figure 5.4 shows the results on images of outdoor and natural scenes.
5.2.3 Region vs. Information based approaches
Region based approach gives good results when the regions in image are homogeneous and contain
little or no texture. This is evident from Figure 5.1 and Figure 5.2. The results are better for the
apple image which contains almost no texture than sunflower image which contain some amount of
texture. The information theoretic approach is more robust to texture and little colour variations
in the image. However for still better results, the texture features should themselves be included
in the segmentation process. Even the information theoretic approach will not work on images of
high texture content like zebra or leopard image.
30
Figure 5.3: Segmentation results on some randomly selected animal and bird images from the
database. The segmented regions are shown as white boundaries superimposed on the original
image.
31
Figure 5.4: Segmentation results on some randomly selected images of natural and outdoor scenes
from the database. The segmented regions are shown as white boundaries superimposed on the
original image.
32
(a) (b)
Figure 5.5: A query session in Imagefinder. (a)The user is asked to select his object of interest
by pressing the mouse button in the region of object. (b)After the user has selected his object of
interest, top 16 results are returned.
5.3 Query Results
This section presents the query results for Imagefinder. To perform a query, a user first selects
an image. Segmentation is then done on the selected image and the segmented image is presented
to the user who then select the region of his interest. Currently the query for only one region
is allowed at a time. After the user has selected his region of interest, top 16 images from the
database containing the region nearest to the queried region are presented. A typical query session
in Imagefinder is shown in Figure 5.5.
To determine the retrieval effectiveness, we use the precision measure, which is defined as
precision =
number of relevant images
number of returned images
× 100% (5.1)
The precision is 100% when all the returned images are relevant. The relevancy of the result is
determined by the judgment from a human user.
We now present some typical query results. Figure 5.6 shows the results for the query of the
crane image. The results are particularly impressive, 16 out of 16 images are that of crane; though
cranes form only a very small part of the database (less than 1%). This may be attributed to the
fact that the segmentation of the crane can be very correctly done and both the colour and shape
features are effective.
Figure 5.7 shows the query results for the sunflower image. The sunflower images are associated
with a small texture content. However, the precision of the results shows the robustness of seg-
mentation to the presence of small texture content. Also, one more observation can be made from
the results. The different results have different background. This demonstrates the usefulness of
33
Category Precision
Yellow flower 91.67%
Sky 93.75%
Red flower 64.52%
Crane 63.64%
Tree 50.78%
Fox 46.87%
Eagle 46.87%
Water 43.75%
Table 5.1: Precision values for various categories of images present in the database.
retrieval at the level of objects, in this case sunflower. Similar thing could not have been achieved
by a system using only global image characteristics.
Figure 5.8 shows the results of query on the fox image. The fox images are particularly very
difficult to segment, since there is very little to distinguish the fox from the background. We
humans are able to distinguish the fox mainly because of our prior knowledge which is not available
to Imagefinder. The results should therefore not be surprising.
Figure 5.9 shows the query results for sky as the desired object. The precision is particularly
impressive, out of top 16 results returned 14 are relevant. The following observation can be made
in particular, the results obtained have varying shape but similar colour. This is of course true for
sky, which is not associated with sky. One more thing that can be noted is that the only common
feature in the results is the presence of sky, the other objects present vary greatly. This shows
that Imagefinder is looking for a specific object sky and not does not care for the presence of other
objects, which is a characteristic of object based retrieval. The results for some other categories
like eagle, water bodies are shown in Figure 5.10-5.11.
Finally, we summarize the results for different categories in table 5.1.
34
Query Image
Figure 5.6: Top 16 result for the query of a crane, all 16 out of 16 results are relevant.
35
Query Image
Figure 5.7: Top 16 result for the query of sunflower. 9 out of 16 results are relevant. The sunflower
images form very small, about 0.2% part of the database.
36
Query Image
Figure 5.8: Top 16 result for the query of fox. 8 out of 16 returned images are relevant. The fox
images form about 1% of the database.
37
Query Image
Figure 5.9: Top 16 result for the query of sky. 14 out of 16 images have sky as a part of the image
and are therefore relevant.
38
Query Image
Figure 5.10: Top 16 result for the query of an eagle, 8 out of 16 images are relevant. The eagle
images form about 1% of the database.
39
Query Image
Figure 5.11: Top 16 result for the query of an water, 6 out of 16 images are relevant. The water
images form about 1% of the database.
40
5.4 Relevance Feedback
Figure 5.12: A typical feedback session in Imagefinder. The user has marked relevant images
by pressing the relevant radio buttons against them. The query based on the feedback can be
performed by pressing Query Again button.
This section shows the improvement in results obtained by feedback from the user. To give
feedback in Imagefinder, the user marks a set of images which he considers are relevant to the
query and query again. Imagefinder returns a new set of results based on the feedback. A sample
session is shown in Figure 5.12.
Category Precision
Without Feedback After Feedback
Tree 50.78% 60.93%
Fox 46.87% 51.56%
Eagle 46.87% 56.25%
Water 43.75% 52.08%
Table 5.2: Precision values for various categories of images after one feedback iteration.
The first result we show is that for sunflower. The results without feedback was shown in Figure
5.7. Figure 5.13 shows the result after one iteration. The precision without feedback was 56.25%,
whereas after one iteration it increased to 68.75%, an improvement of 12.50%.
The second result is that for water bodies, shown in figure 5.14. The result without feedback
was shown in figure 5.11. The improvement of 12.5% was observed for this query. A summary of
feedback results is shown in table 5.2. The results were obtained after averaging over a number of
queries in each category.
41
Query Image
Figure 5.13: Top 16 results for the query of sunflower after one feedback iteration. Precision(without
feedback)=56.25%, precision(after one iteration)=68.75%.
42
Query Image
Figure 5.14: Top 16 results for the query of water after one feedback iteration. Precision(without
feedback)= 37.5%, precision(after one iteration)=50%.
43
Chapter 6
Conclusion
This chapter summarizes the work done in this thesis. Several directions for further development
are also outlined.
6.1 Summary
In this thesis, the problem of Content Based Image Retrieval was discussed. We chose to support
the retrieval at the level of objects which corresponds more naturally to what a user would like to
have in an image retrieval system. A broad range of issues from image segmentation to relevance
feedback were addressed. We presented a new algorithm for image segmentation based on an
information theoretic approach. The features used for retrieval were based on shape (moment
invariants, eccentricity and compactness) and colour (mean and standard deviation). We also
presented a data structure based on kd-tree, which is specifically suited for efficient searching in a
large database. The learning in our system Imagefinder was incorporated by using both positive
and negative feedback from the user. The system was tested on a database of about 5000 real world
images consisting of several categories.
6.2 Suggestions for Further Development
Content based image retrieval is still in developmental stages. There are a number of issues which
can be addressed. Here we outline a few of them.
• The performance of any system supporting query at the level of objects is severely affected
by segmentation. Traditionally, the use of feedback is limited to assessing the importance of
different features only. The learning by feedback can be extended to include segmentation
also, which we think will provide better results.
• The segmentation process in Imagefinder used only colour features. Apart from colour, texture
is also an important feature which can be used to improve segmentation results. Due to time
constraints, we could not incorporate texture features for assessing similarity. Its use should
lead to better results.
44
• Currently Imagefinder supports query for one region only, i.e. it does not use the knowledge
of presence of other objects. However, the context is also a very powerful feature for judging
similarity. Further work therefore can be done to incorporate this, i.e. support for query of
multiple regions in the image.
• The learning by feedback in Imagefinder is restricted only to one session. This is a short
term learning in which the system forgets everything after the session is over. A long term
user may not like to tell the system the same thing everytime, when he queries for the same
object. The knowledge gained by learning can therefore be saved for future sessions, thereby
allowing the system to remember what it learned previously.
45
Bibliography
[1] M. Flickner, H. Sawhney, W. Niblack, et al., “Query by image and video content: The qbic
system,” IEEE Computers, 1995.
[2] W. Niblack and R. Barber, “The qbic project: Querying images by content using colour,
texture and shape,” in Proc. SPIE Storage and Retrieval for Image and Video Databases, Feb
1994.
[3] J. R. Bach, C. Fuller, and A. Gupta, “The virage image search engine: An open framework
for image management,” in Proc. SPIE Storage and Retrieval for Image and Video Databases,
1997.
[4] J. R. Smith and S.-F. Chang, “Visualseek: A fully automated content-based image query
system,” in Proc. ACM Multimedia, 1996.
[5] A. Pentland, R. W. Picard, and S. Sclaroff, “Photobook: Content-based manipulation of image
databases,” International Journal of Computer Vision, 1996.
[6] W. Y. Ma and B. S. Manjunath, “Netra: A toolbox for navigating large image databases,” in
Proc. IEEE Int. Conf. on Image Processing, 1997.
[7] T. S. Huang, S. Mehrotra, and K. Ramachandran, “Multimedia analysis and retrieval system
(mars) project,” in Proc. of 33rd Annual Clinicon Library Application pod Data Processing -
Digital Image Access and Retrieval, 1996.
[8] T. Pavlidis and Y. T. Liow, “Integrating region growing and edge detection,” IEEE Transac-
tions on Pattern Analysis and Machine Intelligence, vol. 12, pp. 225–233, March 1990.
[9] M. Hanson and W. Higgins, “Watershed driven relaxation labeling for image segmentation,”
in Proc. IEEE Intl. Conf. on Image Processing, 1994.
[10] C. Carson, S. Belongie, H. Greenspan, and J. Malik, “Blobworld : Image segmentation us-
ing expectation-maximization and its application to image querying,” IEEE Transactions on
Pattern Analysis and Machine Intelligence, vol. 24, no. 8, August 2002.
[11] M. Miyahara, “Mathematical transform of (r,g,b) colour data to munsell (h,s,v) colour data,”
SPIE Visual Communication and Image Processing, vol. 1001, 1998.
46
[12] W.-J. Jia Wang and R. Acatya, “Colour clustering techniques for colour-content-based image
from image databases,” in Proc. of IEEE Conf. on Multimedia Computing and Systems, 1997.
[13] M. Stricker and M. Orengo, “Similarity of colour images,” in Proc. of SPIE Storage and
Retrieval for Images and Video Databases, 1995.
[14] J. R. Smith and S.-F. Chang, “Single colour extraction and image query,” in Proc. of IEEE
Int. Conf. on Image Processing, 1995.
[15] T. S. Chua, K.-L. Tan, and B. C. Ooi, “Fast signature-based colour spatial image retrieval
for multimedia database systems,” in Proc. of IEEE Conf. on Multimedia Computing and
Systems, 1997.
[16] R. Schettini, “Multicoloured object recognition and location,” Pattern Recognition Letters, pp.
1089–1097, November 1994.
[17] E. Persoen and K. S. Fu, “Shape discrimination using fourier descriptors,” IEEE Trans. on
Sys. Man. and Cyb, vol. 6, pp. 661–674, 1984.
[18] C. T. Zahn and R. Roskies, “Fourier descriptors for plane closed curves,” IEEE Trans. on
Computers, 1972.
[19] M. k. Hu, “Visual pattern recognition by moment invariants, computer methods in image
analysis,” IRE Trans. on Information Technology, vol. 8, 1962.
[20] Y. Rui, A. She, and T. Huang, “Modified fourier descriptors for shape representation - a
practical approach,” in Proc. of Int. Workshop on Image Databases and Multimedia Search,
1996.
[21] D. kapur, Y. N. Lakshman, and T. Saxena, “Computing invariants using elimination methods,”
in Proc. IEEE International Conference on Image Proc., 1995.
[22] R. M. Harlik, “Statistical and structural approaches to texture,” in Proc. IEEE, 1978, pp.
786–804.
[23] H. Tamura, S. Mori, and T. Yamawaki, “Texture feature corresponding to visual perception,”
in IEEE Trans. on Sys. Man. and Cyb, vol. 8, no. 6, 1978.
[24] J. R. Smith and S.-F. Chang, “Transform features for texture classification and discrimination
in large image databases,” in Proc. IEEE Int. Conf. on Image Proc, 1994.
[25] W. Y. Ma and B. S. Manjunath, “A comparison of wavelet transform features for texture
image annotation,” in Proc. IEEE Int. Conf. on Image Proc., 1995.
[26] C. Faloutsos and K.-I. lin. Fastmap, “A fast algorithm for indexing, data-mining and visual-
ization of traditional and multimedia datasets,” in Proc. SIGMOD, 1995, pp. 163–174.
[27] G. Salton and C. Bucley, Introduction to Modern Information Retrieval. Mc-Graw-Hill Book
Company, 1983.
47
[28] J. Bentley, “Multidimensional binary search trees used for associative searching,” Communi-
cations ACM, vol. 18, pp. 509–517, 1975.
[29] R-tree: a dynamic index structure for spatial searching. Proc. ACM SIGMOD, 1984.
[30] The R+ tree: A dynamic index for multi-dimensional objects. Proc. 12th VLDB, 1987.
[31] M. Charlikar, C. Chekur, T. Feeder, and R. Motwani, “Incremental clustering and dynamic
information retrieval,” in Proc. 29th Annual ACM Symposium on Theory of Computing, 1997,
pp. 625–635.
[32] Y. Rui, K. Chakrabarti, S. Mehrotra, Y. Zhao, and T. S. Huan, Dynamic clustering for optimal
retrieval in high dimensional multimedia databases, TR- MARS-10-97, 1997.
[33] H. J. Zhang and D. Zhong, “A scheme for visual feature based image retrieval,” in Proc. SPIE
Storage and Retrieval for Image and Video Databases, 1995.
[34] J. Quinlan, “Induction of decision trees,” Machine Learning, vol. 1, pp. 81–106, 1986.
[35] D. Fisher, “Knowledge acquisition via incremental conceptual clustering,” Machine Learning,
vol. 2, pp. 139–172, 1987.
[36] J. Gennari, P. Langley, and D. Fisher, “Models of incremental concept formation,” Artificial
Intelligence, pp. 11–61, 1989.
[37] E. Gokcay and J. C. Principe, “Information theoretic clustering,” IEEE Transactions on Pat-
tern Analysis and Machine Intelligence, vol. 24, no. 2, March 1990.
[38] M. Gluck and J. Corter, “Information uncertainty and the utility of categories,” in Proceedings
of the Seventh Annual Conference of the Cognitive Science Society, 1990, pp. 283–287.
[39] T. Kanungo, D. M. Mount, and N. S. Netanyahu, “The analysis of a simple k-means clustering
algorithm analysis and implementation,” IEEE Transactions on Pattern Analysis and Machine
Intelligence, vol. 24, no. 6, June 2002.
[40] D.Pelleg and A. Moore, “Accelerating exact k-means algorithm with geometric reasoning,” in
Proc. ACM SIGKDD Int’l Conf. Knowledge Discovery and Data Mining, August 1999, pp.
277–281.
[41] A. Jain, Fundamentals of Digital Image Processing. Prentice-Hall, 1989.
[42] Image source. [Online]. Available: ftp://dlp.cs.berkely.edu
48

More Related Content

What's hot

Branch and-bound nearest neighbor searching over unbalanced trie-structured o...
Branch and-bound nearest neighbor searching over unbalanced trie-structured o...Branch and-bound nearest neighbor searching over unbalanced trie-structured o...
Branch and-bound nearest neighbor searching over unbalanced trie-structured o...Michail Argyriou
 
Final Report - Major Project - MAP
Final Report - Major Project - MAPFinal Report - Major Project - MAP
Final Report - Major Project - MAPArjun Aravind
 
Dimensional modeling in a bi environment
Dimensional modeling in a bi environmentDimensional modeling in a bi environment
Dimensional modeling in a bi environmentdivjeev
 
M152 notes
M152 notesM152 notes
M152 noteswfei
 
Master thesis xavier pererz sala
Master thesis  xavier pererz salaMaster thesis  xavier pererz sala
Master thesis xavier pererz salapansuriya
 
Coding interview preparation
Coding interview preparationCoding interview preparation
Coding interview preparationSrinevethaAR
 
Fundamentals of computational_fluid_dynamics_-_h._lomax__t._pulliam__d._zingg
Fundamentals of computational_fluid_dynamics_-_h._lomax__t._pulliam__d._zinggFundamentals of computational_fluid_dynamics_-_h._lomax__t._pulliam__d._zingg
Fundamentals of computational_fluid_dynamics_-_h._lomax__t._pulliam__d._zinggRohit Bapat
 
From sound to grammar: theory, representations and a computational model
From sound to grammar: theory, representations and a computational modelFrom sound to grammar: theory, representations and a computational model
From sound to grammar: theory, representations and a computational modelMarco Piccolino
 
Calculus Research Lab 2: Integrals
Calculus Research Lab 2: IntegralsCalculus Research Lab 2: Integrals
Calculus Research Lab 2: IntegralsA Jorge Garcia
 
Au anthea-ws-201011-ma sc-thesis
Au anthea-ws-201011-ma sc-thesisAu anthea-ws-201011-ma sc-thesis
Au anthea-ws-201011-ma sc-thesisevegod
 
Reconstruction of Surfaces from Three-Dimensional Unorganized Point Sets / Ro...
Reconstruction of Surfaces from Three-Dimensional Unorganized Point Sets / Ro...Reconstruction of Surfaces from Three-Dimensional Unorganized Point Sets / Ro...
Reconstruction of Surfaces from Three-Dimensional Unorganized Point Sets / Ro...Robert Mencl
 
Fuzzy and Neural Approaches in Engineering MATLAB
Fuzzy and Neural Approaches in Engineering MATLABFuzzy and Neural Approaches in Engineering MATLAB
Fuzzy and Neural Approaches in Engineering MATLABESCOM
 
Big Data and the Web: Algorithms for Data Intensive Scalable Computing
Big Data and the Web: Algorithms for Data Intensive Scalable ComputingBig Data and the Web: Algorithms for Data Intensive Scalable Computing
Big Data and the Web: Algorithms for Data Intensive Scalable ComputingGabriela Agustini
 
Rapport d'analyse Dimensionality Reduction
Rapport d'analyse Dimensionality ReductionRapport d'analyse Dimensionality Reduction
Rapport d'analyse Dimensionality ReductionMatthieu Cisel
 

What's hot (20)

Branch and-bound nearest neighbor searching over unbalanced trie-structured o...
Branch and-bound nearest neighbor searching over unbalanced trie-structured o...Branch and-bound nearest neighbor searching over unbalanced trie-structured o...
Branch and-bound nearest neighbor searching over unbalanced trie-structured o...
 
Final Report - Major Project - MAP
Final Report - Major Project - MAPFinal Report - Major Project - MAP
Final Report - Major Project - MAP
 
Dimensional modeling in a bi environment
Dimensional modeling in a bi environmentDimensional modeling in a bi environment
Dimensional modeling in a bi environment
 
Examensarbete
ExamensarbeteExamensarbete
Examensarbete
 
Test
TestTest
Test
 
M152 notes
M152 notesM152 notes
M152 notes
 
Master thesis xavier pererz sala
Master thesis  xavier pererz salaMaster thesis  xavier pererz sala
Master thesis xavier pererz sala
 
Coding interview preparation
Coding interview preparationCoding interview preparation
Coding interview preparation
 
Db2 partitioning
Db2 partitioningDb2 partitioning
Db2 partitioning
 
SCE-0188
SCE-0188SCE-0188
SCE-0188
 
Fundamentals of computational_fluid_dynamics_-_h._lomax__t._pulliam__d._zingg
Fundamentals of computational_fluid_dynamics_-_h._lomax__t._pulliam__d._zinggFundamentals of computational_fluid_dynamics_-_h._lomax__t._pulliam__d._zingg
Fundamentals of computational_fluid_dynamics_-_h._lomax__t._pulliam__d._zingg
 
M.Sc thesis
M.Sc thesisM.Sc thesis
M.Sc thesis
 
From sound to grammar: theory, representations and a computational model
From sound to grammar: theory, representations and a computational modelFrom sound to grammar: theory, representations and a computational model
From sound to grammar: theory, representations and a computational model
 
Calculus Research Lab 2: Integrals
Calculus Research Lab 2: IntegralsCalculus Research Lab 2: Integrals
Calculus Research Lab 2: Integrals
 
Au anthea-ws-201011-ma sc-thesis
Au anthea-ws-201011-ma sc-thesisAu anthea-ws-201011-ma sc-thesis
Au anthea-ws-201011-ma sc-thesis
 
Reconstruction of Surfaces from Three-Dimensional Unorganized Point Sets / Ro...
Reconstruction of Surfaces from Three-Dimensional Unorganized Point Sets / Ro...Reconstruction of Surfaces from Three-Dimensional Unorganized Point Sets / Ro...
Reconstruction of Surfaces from Three-Dimensional Unorganized Point Sets / Ro...
 
Fuzzy and Neural Approaches in Engineering MATLAB
Fuzzy and Neural Approaches in Engineering MATLABFuzzy and Neural Approaches in Engineering MATLAB
Fuzzy and Neural Approaches in Engineering MATLAB
 
Big Data and the Web: Algorithms for Data Intensive Scalable Computing
Big Data and the Web: Algorithms for Data Intensive Scalable ComputingBig Data and the Web: Algorithms for Data Intensive Scalable Computing
Big Data and the Web: Algorithms for Data Intensive Scalable Computing
 
Rapport d'analyse Dimensionality Reduction
Rapport d'analyse Dimensionality ReductionRapport d'analyse Dimensionality Reduction
Rapport d'analyse Dimensionality Reduction
 
Queueing
QueueingQueueing
Queueing
 

Similar to btpreport

Content Based Image Retrieval
Content Based Image RetrievalContent Based Image Retrieval
Content Based Image RetrievalLéo Vetter
 
High Performance Traffic Sign Detection
High Performance Traffic Sign DetectionHigh Performance Traffic Sign Detection
High Performance Traffic Sign DetectionCraig Ferguson
 
Stochastic Processes and Simulations – A Machine Learning Perspective
Stochastic Processes and Simulations – A Machine Learning PerspectiveStochastic Processes and Simulations – A Machine Learning Perspective
Stochastic Processes and Simulations – A Machine Learning Perspectivee2wi67sy4816pahn
 
R data mining_clear
R data mining_clearR data mining_clear
R data mining_clearsinanspoon
 
Specification of the Linked Media Layer
Specification of the Linked Media LayerSpecification of the Linked Media Layer
Specification of the Linked Media LayerLinkedTV
 
Location In Wsn
Location In WsnLocation In Wsn
Location In Wsnnetfet
 
Classification System for Impedance Spectra
Classification System for Impedance SpectraClassification System for Impedance Spectra
Classification System for Impedance SpectraCarl Sapp
 
Aidan_O_Mahony_Project_Report
Aidan_O_Mahony_Project_ReportAidan_O_Mahony_Project_Report
Aidan_O_Mahony_Project_ReportAidan O Mahony
 
Im-ception - An exploration into facial PAD through the use of fine tuning de...
Im-ception - An exploration into facial PAD through the use of fine tuning de...Im-ception - An exploration into facial PAD through the use of fine tuning de...
Im-ception - An exploration into facial PAD through the use of fine tuning de...Cooper Wakefield
 
Machine learning solutions for transportation networks
Machine learning solutions for transportation networksMachine learning solutions for transportation networks
Machine learning solutions for transportation networksbutest
 
Machine learning solutions for transportation networks
Machine learning solutions for transportation networksMachine learning solutions for transportation networks
Machine learning solutions for transportation networksbutest
 

Similar to btpreport (20)

Content Based Image Retrieval
Content Based Image RetrievalContent Based Image Retrieval
Content Based Image Retrieval
 
High Performance Traffic Sign Detection
High Performance Traffic Sign DetectionHigh Performance Traffic Sign Detection
High Performance Traffic Sign Detection
 
Stochastic Processes and Simulations – A Machine Learning Perspective
Stochastic Processes and Simulations – A Machine Learning PerspectiveStochastic Processes and Simulations – A Machine Learning Perspective
Stochastic Processes and Simulations – A Machine Learning Perspective
 
Thesis Abstract
Thesis AbstractThesis Abstract
Thesis Abstract
 
Ivis
IvisIvis
Ivis
 
R data mining_clear
R data mining_clearR data mining_clear
R data mining_clear
 
Specification of the Linked Media Layer
Specification of the Linked Media LayerSpecification of the Linked Media Layer
Specification of the Linked Media Layer
 
Location In Wsn
Location In WsnLocation In Wsn
Location In Wsn
 
Classification System for Impedance Spectra
Classification System for Impedance SpectraClassification System for Impedance Spectra
Classification System for Impedance Spectra
 
Big data-and-the-web
Big data-and-the-webBig data-and-the-web
Big data-and-the-web
 
Grl book
Grl bookGrl book
Grl book
 
Aidan_O_Mahony_Project_Report
Aidan_O_Mahony_Project_ReportAidan_O_Mahony_Project_Report
Aidan_O_Mahony_Project_Report
 
Neural networks and deep learning
Neural networks and deep learningNeural networks and deep learning
Neural networks and deep learning
 
dissertation
dissertationdissertation
dissertation
 
report
reportreport
report
 
Im-ception - An exploration into facial PAD through the use of fine tuning de...
Im-ception - An exploration into facial PAD through the use of fine tuning de...Im-ception - An exploration into facial PAD through the use of fine tuning de...
Im-ception - An exploration into facial PAD through the use of fine tuning de...
 
Machine learning solutions for transportation networks
Machine learning solutions for transportation networksMachine learning solutions for transportation networks
Machine learning solutions for transportation networks
 
Machine learning solutions for transportation networks
Machine learning solutions for transportation networksMachine learning solutions for transportation networks
Machine learning solutions for transportation networks
 
outiar.pdf
outiar.pdfoutiar.pdf
outiar.pdf
 
Milan_thesis.pdf
Milan_thesis.pdfMilan_thesis.pdf
Milan_thesis.pdf
 

btpreport

  • 1. CONTENT BASED IMAGE RETRIEVAL A Project Report Submitted in the partial fulfillment for the degree of Bachelor of Technology by Ramashish Baranwal and Ripinder Singh DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING INDIAN INSTITUTE OF TECHNOLOGY, GUWAHATI April 2003
  • 2. Certificate This is to certify that the work contained in the thesis titled “ Content Based Image Retrieval” by Ramashish Baranwal and Ripinder Singh, has been carried out under my supervision and this work has not been submitted elsewhere for a degree. Dr. P.K. Bora Associate Professor ECE Department
  • 3. Acknowledgement It gives us great pleasure to express our most sincere feelings of gratitude to our supervisor Dr. P.K. Bora for his invaluable help and guidance in the course of this project. The valuable research experience gained by us during this project would not have been possible without his encouragement and support. We would also like to thank University of California, Berkeley for granting us the permission to use their images.
  • 4. Abstract Retrieving images from a large and varied collection on the basis of visual content is a challenging and important field of research. In this report we present Imagefinder, a content-based image retrieval system that incorporates various features of an efficient retrieval system. The system employs a new information theoretic approach to segmentation through clustering in the feature space. The problem of segmentation is treated as a problem of maximizing the information about the image through segmentation. The gain in information is measured on the basis of an evaluation function derived from information theory. The image is segmented in to a small set of image regions that are coherent in colour and intensity. Each of these homogeneous regions is characterized by a feature vector comprising of its color and shape attributes. The database is organized using C-tree, a variation of the kd-tree which supports efficient retrieval of k-nearest neighbours. This is achieved by storing the information about siblings of the kd-tree at each node. An important aspect of the system is that the user is allowed to select a region of interest for the query. The results are presented in the increasing order of distance from those stored in the database, based on the nearest neighbour criterion. Our system also incorporates a method for relevance feedback aimed at improving the results for the query. Based on the relevance marked by the user, the system automatically re-assigns the weights of the feature components to produce more appropriate results.
  • 5. Contents 1 Introduction 1 1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 What is Content Based Image Retrieval? . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.3 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.4 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.5 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2 Literature Survey 4 2.1 Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.2 Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.2.1 Colour . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.2.2 Shape . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.2.3 Texture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.3 Indexing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.3.1 Dimensionality Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.3.2 Multi-Dimensionality Indexing . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 3 Image Segmentation 8 3.1 Region-Based Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 3.1.1 Region Growing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 3.1.2 Region Merging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 3.1.3 Region Pruning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 3.2 Information Theoretic Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 3.3 Information Gain by Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 3.4 Using Classification Gain for Segmentation . . . . . . . . . . . . . . . . . . . . . . . 13 3.5 Colour Image Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.5.1 The Segmentation Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.5.2 k-means Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.5.3 Post-processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 i
  • 6. 4 Feature Extraction and Database Organization 18 4.1 Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 4.2 Shape Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 4.2.1 Invariant Moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 4.2.2 Eccentricity and Compactness . . . . . . . . . . . . . . . . . . . . . . . . . . 19 4.3 Colour Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 4.4 Database Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 4.5 Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 4.5.1 Binary Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 4.5.2 Multidimensional Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 4.5.3 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 4.6 C-tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 4.6.1 Building C-tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 4.6.2 Nearest Neighbour Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 4.7 Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 4.8 Relevance Feedback . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 5 Experimental Results 29 5.1 Image Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 5.2 Segmentation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 5.2.1 Region-based Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 5.2.2 Information Theoretic Approach . . . . . . . . . . . . . . . . . . . . . . . . . 29 5.2.3 Region vs. Information based approaches . . . . . . . . . . . . . . . . . . . . 30 5.3 Query Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 5.4 Relevance Feedback . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 6 Conclusion 44 6.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 6.2 Suggestions for Further Development . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 Bibliography 46 ii
  • 7. List of Figures 1.1 Overview of an Image Retrieval System . . . . . . . . . . . . . . . . . . . . . . . . . 2 2.1 A typical Content-Based Image Retrieval system. . . . . . . . . . . . . . . . . . . . 5 3.1 The 8-connected neighbourhood of the pixel to be assigned a label . . . . . . . . . . 9 3.2 Region growing algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 3.3 Region merging algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 3.4 Region pruning algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 3.5 The image of an elephant and its representation in Luv space . . . . . . . . . . . . . 14 3.6 Information based segmentation algorithm . . . . . . . . . . . . . . . . . . . . . . . . 15 4.1 Nearest point to the query point need not lie in the same cell in which the query point lies. Point 1 represents the query point whose nearest neighbour point 3 lies in a different cell. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 4.2 Algorithm for making connections in a C-tree . . . . . . . . . . . . . . . . . . . . . . 24 4.3 Two nodes (a)separated, (b)intersecting, and (c)sharing a region of finite length in the horizontal direction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 5.1 Segmentation by region based approach, (a)Original Image, (b)After region growing, (c)After region merging, (d)Regions after removing small regions, (e)Boundaries of regions superimposed on original image. . . . . . . . . . . . . . . . . . . . . . . . . . 30 5.2 Segmentation by region based approach, (a)Original Image, (b)After region growing, (c)After region merging, (d)Regions after removing small regions, (e)Boundaries of regions superimposed on original image. . . . . . . . . . . . . . . . . . . . . . . . . . 30 5.3 Segmentation results on some randomly selected animal and bird images from the database. The segmented regions are shown as white boundaries superimposed on the original image. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 5.4 Segmentation results on some randomly selected images of natural and outdoor scenes from the database. The segmented regions are shown as white boundaries superimposed on the original image. . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 5.5 A query session in Imagefinder. (a)The user is asked to select his object of interest by pressing the mouse button in the region of object. (b)After the user has selected his object of interest, top 16 results are returned. . . . . . . . . . . . . . . . . . . . . 33 5.6 Top 16 result for the query of a crane, all 16 out of 16 results are relevant. . . . . . . 35 iii
  • 8. 5.7 Top 16 result for the query of sunflower. 9 out of 16 results are relevant. The sunflower images form very small, about 0.2% part of the database. . . . . . . . . . . 36 5.8 Top 16 result for the query of fox. 8 out of 16 returned images are relevant. The fox images form about 1% of the database. . . . . . . . . . . . . . . . . . . . . . . . . . . 37 5.9 Top 16 result for the query of sky. 14 out of 16 images have sky as a part of the image and are therefore relevant. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 5.10 Top 16 result for the query of an eagle, 8 out of 16 images are relevant. The eagle images form about 1% of the database. . . . . . . . . . . . . . . . . . . . . . . . . . . 39 5.11 Top 16 result for the query of an water, 6 out of 16 images are relevant. The water images form about 1% of the database. . . . . . . . . . . . . . . . . . . . . . . . . . . 40 5.12 A typical feedback session in Imagefinder. The user has marked relevant images by pressing the relevant radio buttons against them. The query based on the feedback can be performed by pressing Query Again button. . . . . . . . . . . . . . . . . . . . 41 5.13 Top 16 results for the query of sunflower after one feedback iteration. Precision(without feedback)=56.25%, precision(after one iteration)=68.75%. . . . . . . . . . . . . . . . 42 5.14 Top 16 results for the query of water after one feedback iteration. Precision(without feedback)= 37.5%, precision(after one iteration)=50%. . . . . . . . . . . . . . . . . . 43 iv
  • 9. List of Tables 5.1 Precision values for various categories of images present in the database. . . . . . . . 34 5.2 Precision values for various categories of images after one feedback iteration. . . . . 41 v
  • 10. Chapter 1 Introduction 1.1 Introduction With the pervasive use of computers both at work as well as home, a large amount of multimedia information is being generated having wide variety of potential applications such as interactive entertainment, video on demand, video rental services, news distribution, multimedia libraries, etc. Powerful micro computers, high speed networking, high capacity storage medium, improvements in compression algorithms and recent advances in fields of audio, video and imaging have made multimedia systems more viable technically as well as economically.A typical multimedia database which is composed of various multimedia objects archived together, must address the issues of representation, indexing, retrieval and manipulation of the multimedia data. It is therefore important to develop a retrieval system that can retrieve information effectively and efficiently from these large databases. Image and video form a integral part of any multimedia database. There are a number of applications that require images to be retrieved automatically such as robotics, remote sensing, finger print recognition, automatic surveillance systems and medical imaging applications. Retrieval of image data based on pictorial queries is an interesting and a challenging problem which has developed as a major field of research due to the emergence of large image databases and digital libraries. These databases typically consist of thousands of images, making it difficult for the users to browse through the entire set. Various applications of digital libraries and image databases have been described in literature. 1.2 What is Content Based Image Retrieval? Early database systems employed textual features such as filenames, captions and keywords to annotate and retrieve images. This method has several limitations. First and most important of all, the content of the image cannot be always described in words.The perception of the image is highly subjective and the same image may mean different to different people. In addition to this, to fully understand the content of the image, the spatial relationship betweent the various objects in the image should be expressed. If such a linguistic way of representation is applied to a database that has to be employed globally (e.g. on world wide web),it will be severely limited by the linguistic barrier. Apart from this, this method requires a human to annotate each and every 1
  • 11. Figure 1.1: Overview of an Image Retrieval System image in the database, which is simply not possible for large databases. Therefore it is desirable to have a mechanism for retrieval that depends on the visual content of the image. The extraction of visual features that can be used to measure similarity should be automatic for the system be scaleable to large databases. The human perception of visual content is based on the colour content, shape, texture, layout, position, etc. Therefore features based on these attributes can be employed for making an effective retrieval system. Content Based Image Retrieval (CBIR) is a relatively new area of research which deals with the retrieval of images that are similar to some query image in visual content. Many image retrieval systems, both comercial and research, have employed these criterion with varying degree of accuracy. Some of them are QBIC [1, 2] from IBM, Virage [3] from Virage Inc, VisualSEEk [4] from Columbia Univ., PhotoBook [5] from MIT Media Labs, Netra [6] from UCSB and MARS [7] from Univ. of Illinois. A traditional image retrieval system first preprocesses an input images to extract its features such as colour, texture, shape, etc. One of the most important part in this pre-processing is image segmentation as the success of the entire system depends heavily on the accuracy achieved during this stage. Better segmentation leads to better representation and therefore better results. The features are then extracted from these segmented objects, which provide a means to efficiently represent the information contained in it that can be compared with those of other object to evaluate similarity. These features are stored along with the images in the database. When a query image is presented, it is similarly preprocessed to extract its features, which are then matched with the feature vectors present in the database. A ranked set of images with high matching scores is presented at the output. The outline of a general CBIR system is shown in figure 1.1. 1.3 Problem Definition Efficient access to digital images has become an issue of importance recently. In this thesis, we address the problem of an efficient and accurate retrieval system that retrieves images similar in content to a given query image. The image retrieval should be largely insensitive to variations in image scale, rotation, and translation. In other words, even if the database contains the images that are similar in content to the query image, but differ in their orientation, position, or size, the image retrieval system should be able to correctly match the query image with its prototype in the database. We aim to develop a system which have high accuracy as well as speed for online 2
  • 12. applications 1.4 Motivation The motivation for content based image retrieval arises due to its applications in numerous fields, from areas as diverse as assembly line inspections to robotic navigation to medical image recognition for detection of various deformities. The ease with which humans perceive the visual content of an image into meaningful information is not yet understood well. The existing image retrieval systems employ colour, shape and texture attributes in images with varying degree of success. The present scenario leaves lots of scope for development of newer and more efficient algorithms for retrieval systems. 1.5 Thesis Outline The outline of the thesis is as follows. Chapter 2 briefly reviews the relevant literature on content- based image retrieval methods. We propose the information theoretic approach to segmentation in chapter 3. Chapter 4 focuses on feature extraction and organization of data for efficient retrieval. Experimental results on an image database are presented in chapter 5. Chapter 6 presents the conclusions and some ideas for future research. 3
  • 13. Chapter 2 Literature Survey Content-Based Image Retrieval involves three fundamental bases i. e. Image Segmentation, Visual Feature Extraction and Multi-Dimensional Indexing. Much of the past work has been concentrated on identifying the appropriate models for image feature such as colour or texture. For each database image, a feature vector which describes the visual features is computed and stored. Given a query image, its feature vector is evaluated and those images which have the nearest feature vectors to that of the query image are retrieved. A general CBIR system is shown in figure 2.1. In the following sections we briefly review the literature for three important processes in a CBIR system highlighting the major contributions, techniques and methods available. 2.1 Segmentation Segmentation is an important process in Content-Based Image Retrieval as both the shape features and the layout features depend on a good segmentation. Segmentation can be defined as a process of extracting the objects out of an image that can be used for extraction of features. Image segmentation plays an important part in scene analysis and image understanding. Many techniques have been proposed in literature and can be broadly classified in to two main categories: Region- based and Edge-based. Region based approaches work by observing the neighbourhood of a pixel for certain similarity. The Edge based approach tries to detect the dominant regions from a gradient image through the procedure of edge following and linking. Segmentation techniques include region growing, watershed analysis, region clustering, etc. One of the earliest methods of image segmentation for gray scale images was proposed by Pavlidis [8]. In this approach segments are obtained by using region growing approach and then edges between the regions are eliminated or modified based on contrast, gradient and shape of the boundary. In [9] Hansen and Higgins exploited a fast algorithm for watershed analysis along with relaxation labeling. The image was subdivided in to catchment basins and then used relaxation labeling to refine and update the classification. In [10] Serge et al. proposed an automatic segmen- tation algorithm based on clustering in spatial colour-texture space by Expectation-Maximization. It itteratively models the joint distribution of colour with a mixture of gaussians. The user can directly access the regions and specify what aspects of the image are important to query. 4
  • 14. Figure 2.1: A typical Content-Based Image Retrieval system. 2.2 Feature Extraction Traditional image retrieval systems use a single visual attribute such as shape, colour or texture to represent the image, and retrieval is based on the features used to represent that attribute. Although this approach is simplistic, but may sometimes lack sufficient discriminatory information and might not be able to accommodate large scale and orientation changes. For example a colour based scheme may not be able to distinguish between a apple and a red house. Following sections discuss the traditional approaches using a single attribute. A review of newer work integrating these attributes is dealt with later. 2.2.1 Colour Colour is one of the most widely used visual features in image retrieval. Not only is it more robust to background complications, but also independent of the image size and orientation. Due to the ease of availability of colour in digital image libraries in form of RGB,HSI, etc, it has been extensively used as a feature in CBIR systems. Some studies of colour perception and colour spaces can be found in [11, 12]. Colour histogram is the most commonly used feature representation. Statistically it represents the joint probability of the intensities of the three colour spaces. Generally they are invariant to translation and rotation. Normalization makes them insensitive to scaling. However a colour histograms fails to incorporate the spatial connectivity information of the pixels leading to incorrect retrieval. Besides colour histograms, many approaches such as Colour Moments [13], Colour Sets [14] can be found in literature. Many research results have extended the global colour features to local ones by dividing the image in to a number of sub-blocks and extracting colour features from each of them. Some of these can be found in [15]. 2.2.2 Shape Although colour seem to be highly reliable attribute for image retrieval, it cannot provide sufficient discrimination that is demanded by a CBIR system. Incorporation of shape features can greatly 5
  • 15. enhance the selectivity and improve the performance. Also shape is an important attribute when binary or gray scale images have to be dealt with. Shape representation can be broadly categorized in to two categories, boundary-based and region-based. The boundary-based techniques employ the information about the object’s outer boundary as a feature while the region-based approach uses the entire region information to form the feature vectors. Boundary-based include polygonal approximation of shape [16], shape matching using Fourier descriptors [17, 18], etc. Region-based methods include object matching using invariant moments [19]. The main idea of Fourier descriptors is to use Fourier transformed boundary as the shape feature. It helps to control digitization noise in the image boundary. Also a improved Fourier Descriptor algorithm that is invariant to noise and geometric transformation is proposed in [20]. The main idea of Invariant Moments is to use region based moments, which are invariant to transformations, as shape features. In [19], Hu identified seven such moments that are invariant to rotation, scaling and orientation. Many improvements have been suggested to incorporate the effects of digitization on these moments such as [21]. 2.2.3 Texture Texture refers to the visual patterns that have the properties of homogeneity that do not result from the presence of only a single colour or intensity. It is an innate property of all surfaces and provides important information about the structural arrangement of surfaces and their relationship to surrounding environment. In absence of colour and shape, texture can act as a vital attribute for image classification. Texture analysis is an important and useful area of study in computer vision and can be divided in to two main classes: Statistical and structural. Statistical methods define texture in terms of spatial distribution of grey values. These include use of co-occurrence ma- trices [22], autocorrelation methods(extracting periodicity of repetitive textural elements). Taura et. al [23] explored the texture representation on the basis of psychological studies and developed six texture properties given by coarseness, contrast, directionality, regularity, line likeness and roughness. These are based on the computational approximation to the visual textural features. Structural or Geometric methods are deterministic texels, which repeat according to placement rules, deterministic or random. A texel contains several pixels whose placement can be periodic, random or quasi-random. Smith and Chang [24] used statistics(mean and variance) extracted from the wavelet sub-band as a texture representation. In a more recent paper [25], Ma and Manjunath evaluated the texture image annotation by orthogonal and bi-orthogonal Wavelet transform and Gabor wavelet transform. They found that Gabor transform best among tested candidates, which matched the human vision study results. 2.3 Indexing To make a content-based Image Retrieval truly scalable to large size image collections, efficient multi-dimensional indexing techniques need to be explored. The two main challenges before such an indexing technique are: 1. High dimensionality. 6
  • 16. 2. Non-Euclidean similarity measures. The basic idea to solve these problems is to first perform dimensionality reduction and then apply any appropriate multi-dimensional indexing measure supporting Non-Euclidean similarity measures. 2.3.1 Dimensionality Reduction Even though the dimensions of a feature vector is very high the effective or embedded dimension is much lower. There are two popular ways of dimensionality reduction: Karhunen-Loeve Transform (KLT) and column-wise clustering. KLT and its variations have been studied by many researchers in fields ranging from face recognition to finger-print recognition. Faster implementations of KLT have been proposed by Faloutsos and Lin [26]. It provides a dynamic update of indexing structure which is indispensable for a application which requires newer images being added to the database. Clustering is another powerful tool in dimensionality reduction employed in various fields such as Pattern Recognition, Speech Recognition and Information Retrieval. Normally it is used to cluster similar objects to form groups. This type of clustering is called row-wise clustering. However the same technique can be applied column-wise yielding dimensionality reduction [27]. 2.3.2 Multi-Dimensionality Indexing The existing popular multi-dimensional indexing techniques include Bucketing Algorithm, k-d tree, priority k-d tree [28], quad-tree, R and R+ trees [29, 30], etc. Since most of these approaches are based on Euclidean similarity measures which may not be applicable in Image Retrieval systems. There are two important techniques employed towards solving this problem i.e. clustering and Neu- ral Networks. Various clustering algorithms supporting incremental clustering have been proposed in literature like the ones by Charikar [31] and Rui and Chakrabarti [32]. In [33] Zhang proposed the use of Self Organizing Maps (SOM) Neural Nets as tool for con- structing the tree indexing structure in Image Retrieval. The advantage of using SOMs were learning ability, dynamic clustering and potential of supporting arbitrary similarity measures. 2.4 Discussion We find that segmentation is one of the most important step in a CBIR system and the accuracy of future steps depend very much on the accuracy achieved during segmentation. Also colour may not be sufficient for representation as features, therefore incorporating shape and texture will improve the quality of retrieval. The features chosen should be such that they can also handle change in orientation and scaling. A large database will require efficient indexing as one of the main aspects to be considered. A improved performance can be achieved by integrating various features combined with faster indexing and retrieval methods. 7
  • 17. Chapter 3 Image Segmentation Image segmentation is the first and the most important step in image recognition and understanding process. The purpose of segmentation is to divide the image into homogeneously similar regions. Generally, the pixels of an object are homogeneous to each other compared with the pixels in the other parts of the image, so a good segmentation helps in separating the objects from the other parts of the image.Since the recognition and understanding are done on the basis of the objects, the success of this step is crucial for a good performance of the entire system as error in this step will be propagated further. This is the reason that accurate and automated image segmentation plays an important role in image processing systems. It will be desirable for a colour image segmentation algorithm to be insensitive to shadows, changes in lighting and surface reflection properties. Traditionally, much of the segmentation literature is devoted to the segmentation of gray-scale images. However, the images for our domain are general colour images. Instead of discarding the colour information, we use it to get better segmentation. In the following sections, we present two segmentation algorithms. The first algorithm follows a region based approach, while the second is based on information theoretic clustering. 3.1 Region-Based Segmentation The underlying assumption behind the region based approach is that the objects in an image generally consist of similar and connected pixels. The problem then is to efficiently determine the similar pixels and group them. For an N × N image, let {F(x, y); x, y = 1, 2, ..., N} be a two-dimensional image pixel array. For colour images, F(x,y) represents the colour at the pixel (x,y). Assuming the colour information is represented in the form of three primary colours Red, Green and Blue, the image function can then be written as F(x, y) = {FR(x, y), FG(x, y), FB(x, y)}. The basic procedure is to examine the neighbourhood of a pixel and assign it the label of its neighbour if it is similar to it. This is done by employing simple raster scan, i.e. scanning the image left to right, top to bottom. The idea is to examine the neighbouring pixels, which have been previously assigned labels, and assign the label of the similar neighbour to the new pixel. In case there is no similar neighbour, a new label is assigned to the pixel and the pixel starts the beginning of a new region. In some cases, there may be more than one similar neighbour with different labels, in such cases the similar labels are 8
  • 18. marked equivalent and the new pixel is assigned the lower label. We have considered 8-connected neighbourhood, but only the four neighbouring pixels need to be observed, since only these pixel would have been assigned labels previously. This becomes clear from figure 3.1. The central pixel is the pixel that is to be assigned a label. At this point, the four shaded neighbours have been assigned labels, so only these need to be considered. Figure 3.1: The 8-connected neighbourhood of the pixel to be assigned a label The algorithm is divided into three parts- region growing, region merging and removing small regions. We describe these in the following sections. 3.1.1 Region Growing Starting with topmost left pixel, the algorithm scans the image from left to right and top to bottom and for each pixel examines the four neighbouring pixels, which have been assigned labels previously and find their similarity to the current pixel. The similarity criterion is based on the Euclidean distance between the pixels. For two pixels F1 = {FR1, FG1, FB1} and F2 = {FR2, FG2, FB2} the distance is given as: dR(F1, F2) = |FR1 − FR2| dG(F1, F2) = |FG1 − FG2| dB(F1, F2) = |FB1 − FB2|    (3.1) where di, i = {R, G, B} is the distance for the ith component. The criterion for similarity between the two pixels is then given as: di(F1, F2) < Ti, i ∈ {R, G, B} (3.2) where Ti is the threshold value corresponding to ith colour component. The threshold values determine the quality of the segmentation, a small value may lead to over- segmentation whereas a higher value may cause under-segmentation. The good threshold values will vary from image to image, so we have taken the threshold to be the 30% of the standard deviation value of the entire image. This has the advantage that the threshold values change according to the contrast of the image, lower for low contrast and higher for the high contrast images. Also it helps to cancel the effect of local noisy pixel values. The choice of the thresholds is found to be rather conservative and therefore another criterion is employed alongwith the above which is as follows: di(µ1, F2) < κσi (3.3) 9
  • 19. where µ is the mean value of the region to which the neighbouring pixel belongs and σ is its standard deviation. The subscript i denotes the ith colour component, i ∈ {R, G, B}. σ is the average distance of the pixel value from the mean value of the region, so the effect of the above criterion is to increase the threshold value gradually so as to prevent the over-segmentation, however this would not merge the two non-homogeneous parts as the pixel values in this case change rapidly. This increase in the threshold value is controlled by the factor κ. By experimentation we have found the factor of 1.10 as most appropriate. This does produce some over-segmentation, but this is taken care of in the later stages. So the combined criterion is di(F1, F2) < Ti or di(µ1, F2) < 1.1σi (3.4) The criterion is applied separately on each colour component and should be satisfied by all the components. Once the distance has been computed the labels are assigned based on the number of similar neighbours. If there is exactly one similar neighbour, assign its label to this new pixel and modify the mean and variance of the region corresponding to this label. If there are more than one similar neighbour, mark their labels as equivalent and assign the lower of those labels to the new pixel thereby merging the corresponding regions. If no similar neighbour is found, assign a new label to the current pixel. The values of the mean and standard deviation of the region to which the pixel is added are modified for the addition of current pixel. The algorithm for the region growing is shown in Figure 3.2. The output of region growing is normally a heavily over-segmented image, so we apply a region-merging algorithm. Prior to merging, the image is blurred by doing a low-pass filtering by taking a window size of 3x3, so that the small variations and fine texture are smoothed out. This helps in bringing the properties of the similar regions closer to one another. Algorithm 1 1. Starting with the left-most pixel, scan the image from left to right, top to bottom. Assign a label to the first pixel. 2. For all the four neighbouring pixels evaluate distance based on equation 3.1. 3. Using equation 3.4, find the set of similar neighbours that satisfy the condition. 4. Assign the pixel a label same as that of the similar neighbour. In case of more than one matching neighbour mark there labels as equivalent. Update the value of region standard deviations. 5. If no similar neighbour is found, assign a new label to the pixel. 6. Repeat the entire procedure till the end of the image is reached. Figure 3.2: Region growing algorithm 10
  • 20. 3.1.2 Region Merging Once the regions have been generated by the growing process, there is a need for merging the large number of regions. This is done by merging similar regions based on a similarity criteria. For our case, two regions R1 and R2 are considered to be similar if- |µ1 − µ2| < κ(σ1 + σ2) (3.5) where, µi represents the mean value of the region i, σi is its standard deviation and κ is a scale parameter controlling the extent to which the two regions are merged. Since most of the pixels lie with in a distance of σ from the mean value µ, so if the two regions are to be merged, κ should be greater than unity. By experimentation, we have found that making κ=1.2 gives good performance. The algorithm for region merging is shown in Figure 3.3. The region-merging step brings down the number of regions considerably (by a factor of 2-10). Still we are left with a large number of small regions, particulary for images having high texture content. So we merge all the regions whose area is less than one percent of the image area to the closest matching neighbouring region. This is described next. Algorithm 2 1. Mark all the regions as undecided. While there are undecided regions do 2. Pick one of the undecided regions and mark it as decided and current region Rc. 3. Examine the neighboring regions of Rc and evaluate their similarity with this region based on the similarity criterion in equation 3.5 4. Merge the current Rc region with all the regions which are similar and modify current region = merged region. 5. Mark the regions that are similar as decided. 6. Continue step 3 till no more similar regions are found. 7. Continue step 2 till there are no more undecided regions Figure 3.3: Region merging algorithm 3.1.3 Region Pruning The number of very small regions is usually large which do not contribute much to prominent visual content, so we use a region pruning method. Here regions smaller than a particular size are simply merged to the nearest most similar region. The algorithm for the same is shown in Figure 3.4. At the end of the three algorithms a well segmented image is formed which can now be used to extract the feature vectors and storage in the database. 11
  • 21. Algorithm 3 1. Mark all the regions with area less than one percent of the image area as undecided. 2. Pick up an undecided region and mark it as the current region Rc. 3. Find the closest matching neighboring region Rm and merge Rc with it and mark the merged region as Rc. If area of Rm is less than one percent of image area mark Rm as decided. 4. Do step 3 till the area of Rc is less than one percent of image area. 5. Continue step 2 till there are undecided regions. Figure 3.4: Region pruning algorithm 3.2 Information Theoretic Approach Apart from the gradient,histogram and region based approaches, there is one other approach to image segmentation which has gained attention in the recent past. This approach bears similarity to the approach used for data clustering. In a way, image segmentation can be viewed as the problem of clustering. The clustering can be supervised, when we have the information about the number of regions and their characteristics. Examples are domain specific applications like industrial inspection applications, automated inspection of electronic assemblies with the objective of determining the presence or absence of specific anomalies such as missing components or broken circuit paths, etc. On the other hand the clustering will be unsupervised, when we don’t have any prior information about the image content. Examples include the segmentation of general real world images. The use of information theoretic approach to clustering is quite old and has a long history, particularly in the field of artificial intelligence and machine learning. Well known examples based on this approach are Quinlan’s [34] ID3 for constructing decision trees, Fisher’s [35] COBWEB, Granary’s [36] CLASSIT, etc. Recent works on image segmentation using information theoretic approach include [10, 37]. In the following sections, we present an algorithm based on this approach. Section 3.3 discuss the basis for determining the suitable number of clusters. In section 3.4 we present its application to image segmentation, while sections 3.5 presents the segmentation algorithm. 3.3 Information Gain by Clustering Information can be defined in several ways. For our purpose, we define information as the ability to correctly predict the attributes of instances. This definition is very intuitive since more information we have about the instances, better will be our prediction about their attributes. The attributes in our case are the features derived from image which represents its content like color, texture, etc. The basic idea behind partitioning is that the partitioning of objects into certain classes leads to an increase in our information. The membership to a class imposes certain restrictions on the 12
  • 22. values of their attributes thereby increasing the ability to predict them. This also corresponds to the purpose of image segmentation, as by segmenting the image we are trying to find regions which are homogeneous. Based on the definition of information above, we define classification gain as the increase in information by partitioning over the information that is available without any such partitioning. Assuming that the attributes are independent of one another, the expression for the classification gain can then be written as [35] Gain(k) = K k=1 P(Ck) I i=1 values j P(Ai = Vij|Ck)2 − I i=1 values j P(Ai = Vij)2 K (3.6) where K is the number of classes and I is the number of attributes. As shown by Gluck and Corter [38], the subexpression i j P(Ai = Vij|Ck)2 is the expected number of attribute values that can be correctly guessed for an arbitrary member of class Ck. It assumes that one guesses a value Vij for an attribute Ai with a probability which is equal to its probability of occurrence i.e. P(Ai = Vij|Ck) and that this guess is correct with the same probability. The first term in the numerator of (3.6) is therefore a measure of the expected number of correct guesses given a set of K categories, while the second term represents the expected number of correct guesses without this knowledge. The division by K lets one compare different size clusterings and acts as a penalty on the increase in the number of categories. 3.4 Using Classification Gain for Segmentation The expression for classification gain in (3.6) assumes that the attributes of instances takes discrete values. In our case, the instances are image pixels and the attributes are features extracted from them for segmentation. These attributes are colour, texture features, etc. so the values that they take are in general continuous. So we need to generalize (3.6), in particular the two innermost summations values j P(Ai = Vij|Ck)2 and values j P(Ai = Vij)2 need to be generalized for the continuous domain. The summations will then change to integration and we need to make some assumption about the distribution of values. Without any prior knowledge, we assume that the values of attributes in each class follow a gaussian distribution. Though the validity about the assumption of such a simple distribution can certainly be questioned, the experimental results suggest that this assumption is approximately correct for general real world images. For the first summation, the distribution is for a particular class, while the second summation uses the distribution for the whole image which can be viewed as a single class. In either case, the integral becomes values j P(Ai = V ij)2 = ∞ −∞ 1 σ2 i 2π exp − x − µi σi 2 dx = 1 2 √ π 1 σi (3.7) where µ is the mean and σ is the standard deviation. Since, the expression for gain is to be used for comparison only, so the factor of 1/2 √ π can be discarded. So our expression for the gain simplifies to 13
  • 23. Figure 3.5: The image of an elephant and its representation in Luv space Gain(k) = K k P(Ck) I i 1 σik − I i 1 σi K (3.8) where I is the number of features, K is the number of classes, σik is the standard deviation for a given feature in a given class and σi is the standard deviation for a given feature in the entire image.From (3.8) it is clearly evident that maximizing gain requires maximizing 1/σik or minimizing σik which is equivalent to maximizing intra-class similarity. The use of (3.8) however introduces a problem. When σ = 0, the value of 1/σ becomes infinite. To resolve this, we use the notion of acuity as suggested by Gennari [36], a system parameter that specifies a minimum value for σ. Specifying a minimum value for σ is motivated by the fact that our perception ability does not have infinite resolution. The limit on σ corresponds to the notion of a “just noticeable difference” in psychophysics - the lower limit on our perception ability. 3.5 Colour Image Segmentation 3.5.1 The Segmentation Algorithm We do the segmentation on the basis of colour features. The LUV space is used for its perceptual uniformity. It also decouples the luminance and colour components. This is important for our assumption of their independece. The system starts by extracting the feature vector at each pixel. For an image with N pixels, we get N data points which give there representation in the feature space. Figure(3.5.1) shows such a representation. These data points are then clustered by the k-means algorithm. The number of clusters K into which the data is clustered is varied and the value of gain resulting from the classification is then calculated for each K by (3.8). We vary K from 2 to 10. The value of K for which the maximum gain is obtained is taken as optimum. This gives the partitioning or segmentation of the image in the feature space. To calculate the classification gain by (3.8), we have yet to specify σmin, the minimum value of σ. As discussed before, this limit on σ is an indication of the lowest limit of our perception ability. This provides the clue for determining σmin. Since we are using the colour features, it is possible to experimentally determine the minimum difference in the values of say luminance that is “just noticeable”. Let this value difference be denoted as σabs min. Since the Luv space is perceptually uniform, the value of σabs min for the three components can be taken as same. However, the use of 14
  • 24. σabs min as σmin leads to oversegmentation as though the colour in an object is generally uniform, the variation is often perceptible. Normally, there are only few dominant colours in an image, so we set the the value of σmin to be a fraction of the σ over the entire image. The value of the fraction used by us is 0.1. For images having low contrast, this value sometimes go below σabs min. So an appropriate estimate for σmin is σmin = max(σabs min, 0.1 × σ) (3.9) The classification of the feature vectors gives the segmentation in the feature space. However, the pixels belonging to the regions in the image must also be spatially connected for the region to provide a meaningful representation of objects. Therefore, to extract the regions we group the pixels which are spatially connected and belong to the same cluster in the feature space. This is done by labelling the pixels in the image by the cluster to which they belongs and grouping the spatially connected pixels having the same label. An algorithmic description of the entire process is given in Figure(3.6). Algorithm 4 1. Extract feature vectors from the image. 2. Initiliaze MaxGain=0, OptimumClusterNumber=-1. 3. For K=2 to 10 Cluster by k-means algorithm. Calculate gain. if gain > MaxGain MaxGain = gain. OptimumClusterNumber = K. end end. 4. Corresponding to the OptimumClusterNumber find cluster centers. 5. Label each pixel by the cluster to which it belongs. 6. Group spatially connected pixels belonging to the same cluster. Figure 3.6: Information based segmentation algorithm 3.5.2 k-means Clustering k-means algorithm is a generalization of Lloyd-Max algorithm to multiple dimensions. The straight forward extension of Lloyd-Max algorithm to multiple dimensions, however is computationally ex- pensive particularly for large number of data points (even for a small size image of dimensions 300x200, we will get 60,000 data points). Fortunately, some fast implementations of k-means al- gorithm [39, 40] has been recently proposed which attempts to reduce the number of distance 15
  • 25. computations by arranging the data points in a suitable data structure and utilising some geomet- rical constraints. We use the algorithm proposed by Kanungo and others [39]. Given a set of initial center points, the k-means algorithm iteratively updates the center points based on the minimization of a cost function. The most commonly used cost function is the mean- square error function. The performance of the algorithm often depends on the choice of the initial center points. There is no known way of selecting a set of initial center points which provides a global optimization. Here we use a simple strategy for choosing set of initial center points which is good enough for our purpose. Though there is no proof that this strategy will provide global optimization, it is definitely much better than choosing random initial points. Experimental results also suggest that it performs very well without significantly increasing the computational time. The k-means algorithm proposed in [39] uses a kd-tree to organize the data points. A kd-tree [28] is a binary tree which represents a hierarchical subdivision of the data set. Each node of the kd-tree is associated with a closed hyperbox, called a cell. The root’s cell is the bounding box of the whole data set. If the cell contains one point, then it is declared to be a leaf. Otherwise, it is split into two hyperrectangles by an axis-orthogonal hyperplane. The points in the cell are then partitioned to one side or the other of this hyperplane. The resulting subcells are the children of the original cell, thus leading to a binary tree structure. Thus, the use of kd-tree tree provides a hierarchical organisation of the data. We use this property of the kd-tree to derive our set of intial center points. Let N be the number of nodes at depth d in a kd-tree. Then N and d are related by N = 2d (3.10) Let k be the number of clusters in which the data is to be classified.Let Nd be the set of nodes at depth log k +1. It may be recalled that each node of the kd-tree is associated with a cell which occupies a part of the data space. Let C be the centroid of the the data points contained in the cell corresponding to a node. Let Nc be the set of centroids associated with the nodes at depth log k + 1. We treat this set of centroids as the candidates for our set of initial data points. Let Nk denote the set of k centroids randomly chosen from this set. Considering points in Nk as the cluster centers, we find the mean-square error. This is then repeated for a different Nk. The set which gives the least mean square error is taken as the set of initial center points. This approach is similar in spirit to the genetic algorithms. 3.5.3 Post-processing The segmentation algorithm gives good results, however sometimes the following problems arise - • Sometimes when the image contains a large background part with gradual colour variation, the background gets splitted into two or more parts. This happens due to the large (though uniform) spread of the cluster corresponding to background in the feature space, which cause it to break into two or more smaller clusters. To circumvent this, we employ an edge based post processing. Specifically, if a large part (atleast 75%) of the common boundary between two regions have low gradient, the two regions are merged. Let T90 be the gradient value such that atleast 90 percent of the pixels in the image have their gradient less than T90. A pixel is considered to have a low gradient if its gradient value is less than 0.5 × T90. 16
  • 26. • Normally, the group of spatially connected pixels belong to the same cluster, however due to noise sometimes a pixel may belong to a different cluster than its neighbouring pixels. This results in the formation of very small regions (consisting of a few pixels). Such small regions do not have any importance and they are ignored. Specifically, we ignore a region if its area is less than one percent of the total image area. 17
  • 27. Chapter 4 Feature Extraction and Database Organization 4.1 Feature Extraction Feature extraction can be viewed as a mapping of an image to a feature space. Let f represents a mapping from the image space on to a N -Dimensional feature space, x = {x1, x2, ..., xN }, i.e., f : F → x, where N is the number of features used to represent the regions of the image. For two different regions, feature extraction should produce two feature vectors which are distinct and dissimilar, while for two similar regions the feature vectors should also be similar. This similarity may be evaluated based on some distance measure. An efficient matching scheme depends on the amount of discriminatory information contained in the extracted features. Various representations have be discussed in literature like Fourier descriptors, histogram of edge angles, Invariant moments, etc for shape and colour variance, colour histogram, etc for colour. The extracted features should be invariant to rotation, scale and reflection. The following section discusses the features used to represent the images in the database. 4.2 Shape Features Image retrieval based on object shape is considered to be one of the most difficult aspects of content- based image retrieval because of difficulties in low -level image segmentation and the variety of ways a given 3D object can be projected into 2D shapes. The features used for shape representation should be able to provide sufficient discriminatory shape information that is more or less invariant to various projections. 4.2.1 Invariant Moments Moment invariants are a set of seven moments that are invariant under scale, reflection and rotation. The shape of an object can be expressed in term of these 7 invariant moments [19]. For an image, 18
  • 28. the central moment of order (p+q) is given by: µpq = x y (x − x)p (y − y)q (4.1) where, x and y represent the mean of the x and y co-ordinates of the region respectively and are given as: x = x y x n y = x y y n (4.2) where n is the number of points lying in the region. The normalized central moments, denoted by ηpq are defined as: ηpq = µpq µγ 00 (4.3) where, γ = p + q 2 + 1 (4.4) for p+q = 2,3,... A set of Moment invariants based on the 2nd and 3rd order are given as follows: M1 = (η20 + η02), M2 = (η20 − η02)2 + 4η2 11, M3 = (η30 − 3η12)2 + (3η21 − η03)2, M4 = (η30 + η12)2 + (η21 + η03)2, M5 = (η30 + η12)(η30 − 3η12)[(η30 + η12)2 − 3(η21 + η03)2] +(3η21 − η03)(η21 + η03)[3(η30 + η12)2 − (η21 − η03)2], M6 = (η20 − η02)[(η30 + η12)2 − (η21 + η03)2] +4η11(η30 + η12)(η21 + η03), M7 = (3η21 − η03)(η30 + η12)[(η30 + η12)2 − 3(η21 + η03)2] −(η30 − 3η12)(η21 + η03)[3(η30 + η12)2 − (η21 − η03)2].    (4.5) M1 through M6 are invariant under rotation, scale and reflection. M7 is invariant only in its absolute magnitude under a reflection. 4.2.2 Eccentricity and Compactness Two more shape features are used for the shape representation. These are Eccentricity and Com- pactness and are defined as follows as given by [41]. Eccentricity = Imin Imax = µ20 + µ02 − (µ20 − µ02)2 + 4µ2 11 µ20 + µ02 + (µ20 − µ02)2 + 4µ2 11 (4.6) 19
  • 29. where, µp,q is the (p+q) order central moment defined in eq. 4.1, Imin and Imax represent the short axis and the long axis respectively. It can be termed as the ratio of the minor axis to the major axis of the best fitting ellipse of the shape. Compactness = 4πA P2 (4.7) where, P is the perimeter and A is the area of the polygon describing the shape. Compactness expresses the extent to which a shape is a circle. A circle’s compactness is 1 and a bar’s compactness is close to 0. 4.3 Colour Features Color is one of the most recognizable elements of image content, and is a very important attribute in extracting information from images. It is relatively robust to background complications and independent of image size and orientation. We treat the colour in Luv space because of its percep- tual uniformity and the colour features are given by the mean and the variance of the colour in the regions. That is the feature corresponding to colour are given as: µi = x y Fi(x, y) n σ2 i = x y (Fi(x, y) − µi)2 n    (4.8) ∀(x, y) in the Region, n represents the number of points in that region and i ∈ {L, U, V } The six colour features combined with the nine shape features constitute a set of fifteen at- tributes that form the feature vectors used to setup the database. On the basis of these features it is possible to perform retrieval through colour, shape or both. Since the amount of feature vectors that need to be stored for a large image database is quite high and therefore query processing time will largely depend on the size of the database. The need for faster query processing can be addressed by the use of efficient database organization and indexing. Since the retrieval is based upon nearest k retrievals, there is a need for a indexing scheme that can incorporate the same. In the following section, we describe database organization. 4.4 Database Organization Database organization is an important issue in the field of data mining and pattern recognition. In the absence of any prior organization, a linear search in the whole database is required for determining the nearest neighbours around a given data point. Particularly for large databases, this linear search is time-consuming and often prohibitive. To support efficient searching, a number of ways of organizing the database have been proposed. At the heart of these methods, is a way of hierarchically partitioning the database into smaller units. 20
  • 30. 4.5 Trees 4.5.1 Binary Trees Tree structures have been used for long for recursively partitioning the data set. The simplest of the tree is binary tree. As the name suggests, each non-leaf node in the binary tree is split further into two nodes which are called its child nodes. The data set in the node is then partitioned into two non-overlapping parts, each associated with one of the child node. If the partitioning is appropriate, the number of data points in the two nodes are approximately equal. The most widely used criterion for partitioning is based on median sub-division. Specifically, the median value of the points in the data-set at the node is determined and the points are divided on the basis of whether they lies to the left or right of the median value. This produces two child nodes, each with approximately half of the points in the data-set. Let the number of data points in the database be N. Based on the median sub-division, the depth of the tree formed is then d = log2 N (4.9) To search for a point, the value of the point is compared with the median-value at each node. If the value of the point is less, the search is then carried on the left child, else it is carried on the right child. This procedure is repeated till a leaf node is reached. The number of comparisons that is required is therefore proportional to the depth of the tree, which is logarithmic in the number of data points. The search complexity is therefore logarithmic. t ∝ O(log N) (4.10) 4.5.2 Multidimensional Trees The counter-part of binary trees in multi-dimension is called kd-tree. The splitting in the kd-trees is done on the basis of the median value along a particular dimension which is generally taken as the one having the largest range. At each stage therefore the data is divided into two parts along the dimension having largest range. Each node in the kd-tree is then associated with a hyper-rectangle which encloses the data points associated with the node. Will will refer to this hyper-rectangle by cell of the node. The search proceeds in a similar manner. Specifically, at each non-leaf node, the search-point’s value along the dimension at which the node is split is compared. The search complexity is similarly logarithmic. 4.5.3 Limitations The tree data structures by their very nature allow very fast identification of cell in which the data point to be searched (referred by query data point hereafter) lies. However, the database may not contain the query data point. In this case, the data-point nearest to the query point is desired. However the cell containing the query point may not contain its nearest neighbour. Figure 4.1 illustrates this. Let us assume that the maximum distance rmax with in which the nearest neighbour lies is known (a priori knowledge of rmax is not required, it is shown later how to estimate rmax. The search space then is a hyper-sphere (for searches based on euclidean distance) of radius rmax which 21
  • 31. Figure 4.1: Nearest point to the query point need not lie in the same cell in which the query point lies. Point 1 represents the query point whose nearest neighbour point 3 lies in a different cell. contains or intersects more than one cell and all the cells with in or intersecting this hyper-sphere should be evaluated. Further, the search is often required not just for a single nearest point, but for a certain number of nearest points. This is known as k-nearest neighbour search in the literature. The movement in the trees can be considered as a kind of vertical or top-down movement which helps to locate the cell in which query data point lies very fast. However, once the cell containing the query point is located, what is required is a kind of horizontal movement 1 around that cell for searching nearest neighbours. The trees by their very nature support only the vertical movement and not the horizontal movement. A number of methods have been proposed in the literature to search k-nearest neighbours around a given point . In doing so, most of the methods attempt to backtrack in the tree, which is not very efficient. This is because the data structures used by these methods do not provide support for the horizontal movement around a point. Another limitation of the search based on trees is that they do not allow the weighted search for points in the multi-dimensional space i.e. the search where the weights on each dimension can be varied at run time. This is an important problem and arises frequently in a number of fields. In our case, it arises when searching for similar images from a database based on user feedback. The user can either tell in advance (while querying) the degree of importance of different features e.g. “the shape features are more important to me than the colour features for this image” or the relative importance of the different features can be ascertained from the feedback provided by the user (this set of images are relevant and this set is irrelevant for my query). The same problem arises when the data points have different ranges in different dimensions and have to be normalized. The search problem then essentially translates to a weighted search in which more weight is given to features which are more important and less to others. In order to overcome the above mentioned limitations we present a data structure called here- after C-tree, for connected-tree. The data structure is based on kd-tree, however it differs in an important way in that it provides the possibility of horizontal movement around a point very efficiently. There is no restriction on the number of dimensions. We also show how this possibil- ity of horizontal movement allows us to do weighted nearest neighbour search efficiently without modifying the tree. 1 Strictly speaking, this term should be applied only to two dimensional data. In a broad sense, however we mean breadth first search around the point though the purpose is different 22
  • 32. 4.6 C-tree C-tree is a data structure which is very similar to the kd-tree. It differs from the kd-tree in that each node also maintains the information of its connected siblings apart from its children. Two nodes are connected siblings if they are at the same level in the tree and if their cells share a common boundary. We will distinguish between two kinds of connected siblings, corner connected and side connected. Two siblings are side connected if the common boundary between them is a non-zero hyper-plane of dimension d-1, where d is the dimension of the data points, otherwise it is a corner sibling. For searching, only the information about side connected siblings need to be maintained. The reason for this is as follows. Suppose that we are now at a certain node which we call current node and are looking for the nodes enclosed or intersecting with the search hyper-sphere which are its siblings. If a search hyper-sphere encloses or intersects a corner connected sibling, then there will always be another node lying with in or intersecting the hyper-sphere which will be side connected to both the current node and the corner connected node. Also that side connected sibling will be visited during the search (all nodes lying with in or intersecting the hyper-sphere are required to be visited). Therefore, the corner connected sibling will always be visited via the side connected sibling. Keeping its information is therefore not required. To summarize, apart from the information kept by a kd-tree node, a node in the C-tree also keeps the information about its side connected siblings. This extra information needs to be kept for every node in the tree while building the tree. After the tree has been built, this extra information can be removed from the non-leaf nodes, as the non-leaf nodes are used only in top-down traversal during searching which does not require this information. 4.6.1 Building C-tree Organization of the database is done by building the C-tree corresponding to the data points in the database. The process for building the tree is similar to building the kd-tree, except for an extra procedure for making connections between the side connected siblings in the tree. We will call this procedure as MakeConnection. The connection making starts at the root node and the connections are made recursively, i.e. at the time of making the connections for a node, the connections for its parent would already have been made. Each parent is responsible for making the connection of its child nodes. Suppose that we are at the node np, the parent node. If np is a leaf node i.e. if it does not have any children, nothing needs to be done. Otherwise, let its children be nlc and nrc, left and right child. nlc and nrc will always be side connected siblings as they are made by splitting np along one dimension. So they are connected to each other. The only other possible candidates for side connected siblings of nlc and nrc are the children of np’s side connected siblings. In case any side connected sibling of np does not have children, it itself becomes candidate for nlc and nrc’s side connected siblings. A simple procedure which is used to determine whether two nodes are side connected siblings or not is as follows. Let d be the dimension of data-points. Then, along each dimension, we see whether the two nodes just intersect at a point or whether they have a common region of non-zero length. This can be done very easily. For ith dimension, let v1imin and v1imax be the minimum and maximum values of cell corresponding to node 1 along this dimension. Similarly v2imin and v2imax are the minimum and maximum values of cell corresponding to node 2 along this dimension. If v1imax < v2imin or v2imax < v1imin , then node 1 and 2 are not connected 23
  • 33. Algorithm - MakeConnection MakeConnection(Node parent) if parent is a leaf node return end Connect parent’s left and right son. For each connected sibling S of parent if S is a leaf node if S and parent’s left son are side connected Connect S and parent’s left son end if S and parent’s right son are side connected Connect S and parent’s right son end else for each child C of S if C and parent’s left son are side connected Connect C and parent’s left son end if C and parent’s right son are side connected Connect C and parent’s right son end end end MakeConnection(parent’s left son) MakeConnection(parent’s right son) Figure 4.2: Algorithm for making connections in a C-tree at all. Else if v1imax = v2imin or v2imax = v1imin , the two intersect at a point in the ith dimension, otherwise the two share a common region of finite length along ith dimension. The procedure is illustrated in Figure 4.3. Two nodes if they are connected, are side connected siblings if they share a common region of non-zero length in d-1 dimensions. Connections are then made between two side connected siblings by storing a pointer of the sibling in the node. An algorithmic description for procedure MakeConnection is shown in Figure 4.2. 4.6.2 Nearest Neighbour Search This section explain how the nearest neighbour search can be performed by using a C-tree. As mentioned before, often weighted search is required. The explanation will be therefore for weighted 24
  • 34. (a) (b) (c) Figure 4.3: Two nodes (a)separated, (b)intersecting, and (c)sharing a region of finite length in the horizontal direction. k-nearest neighbour search. The euclidean search and nearest neighbour search are just the special cases of this weighted search. In the former, the weight along each dimension is same, while in the latter k is one. So the same procedure applies. Let Xq be the data point and k be the number of nearest neighbours to be queried. Let d be the dimension of data points and w be the weight vector by which dimensions are weighted. That is, w = {w1, w2, ..., wd} The distance between the two data points x1 and x2 is then given by d12 = d i=1 [wi(x1i − x2i)]2 (4.11) The first step in the search is to locate the cell in which Xq lies. This is done by traversing the tree in a top-down manner. Starting at the root at each node, the value of Xq along the dimension at which the node was split is compared with the value by which the node was split. If the value is less than the splitting value, the process is repeated at the left child, else the process is repeated at the right child. This is done till a leaf node is reached. Xq is contained in the cell of this node which we call Cellq. In each node we maintain a variable called checked, this variable is true if the node has been evaluated, else it is false. Two lists are maintained, one is a sorted list of k-nearest neighbours seen so far. We call this list Lsn. The sorting is done by their distance from Xq. The other is a list of nodes whose siblings are to be evaluated along with the minimum distance of their cell from Xq. This list which we call Lr is also sorted by the minimum distance of the cell from Xq. Let rmax be the distance of last element in Lsn. The search space for weighted search will be a hyper-ellipse the length of whose axis along a particular dimension will be inversely proportional to the weight along that dimension. The points on the boundary of this hyper-ellipse will have their distance equal to rmax. The search procedure need not consider the nodes which lie entirely outside this hyper-ellipse, since we already have k points whose distance is less than or equal to rmax. A node will lie entirely outside the hyper-ellipse if its minimum distance from Xq is greater than rmax. This requires finding the point in the cell which is nearest to the query point. This can be done very easily as follows. Recalling that the cell is a hyper-rectangle, let the coordinates of this hyper-rectangle along the ith dimension be bimin and bimax. The coordinates of the point 25
  • 35. which is nearest to Xq along ith dimension is bimin if bimin is greater than Xqi, else bimax if bimax less than Xqi, else it is zero. The minimum distance can then be calculated by equation 4.11. The search algorithm proceeds as follows. After locating Cellq, we initialize Lsn and Lr. This is done by inserting the first k points in Cellq and its siblings in Lsn and the corresponding nodes in Lr. The variable checked of nodes which are inserted are marked true. If the number of siblings of Cellq is less than k, the initialization is done by inserting points and nodes corresponding to siblings of other nodes (in the order in which they appear) in Lr. This gives an initial bound on rmax. Now the nodes in Lr are examined for their siblings. Specifically, the first node in Lr is taken and if its minimum distance is less than rmax, its sibling whose checked is false is examined. The value of checked is marked true for this sibling. If the minimum distance of the cell corresponding to the sibling is less than rmax, its node is inserted in Lr. If the distance of the point corresponding to the sibling is less than rmax, the last point is removed from Lsn and the point is inserted in Lsn. The value of rmax is then updated. By this, we are continuously reducing the space which has to be examined for the search. If all the siblings of the node have been examined, i.e. their checked are marked true, it is removed from Lr. The search terminates when the minimum distance of the first node in Lr becomes greater than rmax. After the algorithm terminates, the points in Lsn are returned as the k-nearest neighbours. 4.7 Matching Once the database has been properly organized, the query can be done by evaluating the similarity based on some distance measure. The distance measure defines the closeness of two features in the database. However, we cannot use a simple euclidean distance as the distance measure. The reason is that the different features have different ranges, so to give all components equal importance some kind of normalization is necessary. Assuming the features to have a Gaussian distribution, we use can compute the mean µi and standard deviation σi for ith feature. The normalization for ith component of a feature vector x can then be done as follows - xi = xi − µi σi (4.12) It is easy to show that the probability of a normalized feature value to be in the range of [-1,1] is 68%. It is also easy to show that the Euclidean distance of normalized feature vectors correspond to the Tokuhara distance for un-normalized feature vectors. The Tokuhara distance between two vectors x1 and x2 is given by d2 (x1 , x2 ) = N i=1 (x1 i − x2 i )2 σ2 i (4.13) where, σi is the standard deviation of the ith component of the feature vector and N is its dimension. In our system, instead of normalizing each feature vector, we have used the Tokuhara distance. This allows us to incrementally increase the database without modifying the existing feature vectors. At the time of query, the images are ranked in increasing order of the distance from the queried object feature vector based on the above distance measure. 26
  • 36. 4.8 Relevance Feedback Relevance feedback is a technique which is used to assess the importance of different features by learning from the feedback provided by the user. The learning can be done by both positive as well as negative feedback. The objective of the learning is to present more relevant results to the user.The application of this technique to content based image retrieval is recent and examples of the systems using feedback in image retrieval are MARS [7],etc. Most of the systems using relevance feedback use only positive examples for learning. However, our experimentation suggests that the use of both positive and negative examples greatly increases the precision and recall than only when positive examples are used. The basic method behind learning from feedback is to assign different weights to different features according to their importance, more important features are given more weight while less weight is given to less important features. The features used for similarity matching and retrieval in our system are colour and shape features. There are six colour features, the mean and standard deviation of luminance (L) and chrominance (u,v) components and nine shape features consisting of the seven invariant moments, eccentricity and compactness. Initially, as we have no priori information about the importance of different features, equal weight is assigned to each feature and the results are presented to the user. Our system Im- agefinder presents top 16 results. The user then marks images which he considers as relevant. Thus, the results are divided into two sets - relevant and non-relevant. Let SR and SNR denote the set of relevant and non-relevant images respectively. The aim is to find the consistent set of features for both relevant and non-relevant images. For the former the weight is increased whereas the weight is decreased for the latter case. Consider the relevant set SR first. The features which are important will be consistent in SR, i.e. they will have similar values in the images of relevant set. On the other hand, the features which are not important will vary across the set. Therefore, the inverse of the standard deviation of a feature represents a measure of its weight i.e. wi ∝ 1 σR i (4.14) where σR i is the standard deviation of the ith feature under consideration, computed from the relevant images. Using just the relevant images however has one drawback, it cannot recognize the features which are non-discriminatory, i.e. which are similar in both relevant and non-relevant sets. Ideally, the weight of these features should be left unchanged. To overcome this drawback, we use the standard deviation of features in the images of non-relevant set also. However in SNR, the features which are consistent (having low standard deviation) represent the features which are not important, since the results based on these features are not marked relevant by the user. So for non-relevant results, the weight of a feature is directly proportional to its standard deviation across SNR i.e. wi ∝ σNR i (4.15) where i is the feature under consideration and σNR i is its standard deviation computed from the non-relevant images. 27
  • 37. Combining (4.14) and (4.15), wi ∝ σNR i σR i (4.16) Since the weights are used for the calculation of distances which are used in the relative sense only, the constant of proportionality can be taken to be one. Therefore, wi = σNR i σR i (4.17) Using (4.17) has the advantage that the weights of the features which are non-discriminatory will remain unchanged and will be close to 1. This is because the standard deviation for the non- discriminatory features will tend to have similar values for images in both relevant and non-relevant set and therefore cancels out. 28
  • 38. Chapter 5 Experimental Results This chapter presents the results using our system Imagefinder. We first present the results on image segmentation using both region based and information theoretic approaches. Then the results on query is presented. Finally we present the results on improvement by using feedback. 5.1 Image Database The database used in our system consists of about 5000 images taken from a number of sources, mostly INTERNET. The major part of the images in the database were provided by University of California, Berkeley [42]. The images in the database consists of variable size images from a number of categories like natural scenes, animals, birds and other outdoor images. For preparing the database, the images were segmented and the feature vectors were extracted from the regions. The preparation of the database was done offline. 5.2 Segmentation Results 5.2.1 Region-based Approach This section shows the results of region based segmentation on two real world images. Figure 5.1(a) is an image of a sunflower. The regions after region growing are shown in Figure 5.1(b). The area of a region is shown by the same colour. From the figure, it is clearly evident that the outcome of region is a heavily over segmented image. Figure 5.1(c) shows the regions after region merging. The number of regions have now decreased drastically, and the shape of the objects in the image are clearly evident. However, many small regions remain which are removed by region pruning. Figure 5.1(d) shows the boundary of the regions obtained after region pruning. Finally, the boundaries are shown superimposed on the original image in Figure 5.1(e). Figure 5.2 shows the result on an apple image. 5.2.2 Information Theoretic Approach In this section the results of segmentation based on information theoretic are shown on a variety of real world images. Figure 5.3 shows the results on some randomly selected images of animals and 29
  • 39. (a) (b) (c) (d) (e) Figure 5.1: Segmentation by region based approach, (a)Original Image, (b)After region growing, (c)After region merging, (d)Regions after removing small regions, (e)Boundaries of regions super- imposed on original image. (a) (b) (c) (d) (e) Figure 5.2: Segmentation by region based approach, (a)Original Image, (b)After region growing, (c)After region merging, (d)Regions after removing small regions, (e)Boundaries of regions super- imposed on original image. birds, whereas Figure 5.4 shows the results on images of outdoor and natural scenes. 5.2.3 Region vs. Information based approaches Region based approach gives good results when the regions in image are homogeneous and contain little or no texture. This is evident from Figure 5.1 and Figure 5.2. The results are better for the apple image which contains almost no texture than sunflower image which contain some amount of texture. The information theoretic approach is more robust to texture and little colour variations in the image. However for still better results, the texture features should themselves be included in the segmentation process. Even the information theoretic approach will not work on images of high texture content like zebra or leopard image. 30
  • 40. Figure 5.3: Segmentation results on some randomly selected animal and bird images from the database. The segmented regions are shown as white boundaries superimposed on the original image. 31
  • 41. Figure 5.4: Segmentation results on some randomly selected images of natural and outdoor scenes from the database. The segmented regions are shown as white boundaries superimposed on the original image. 32
  • 42. (a) (b) Figure 5.5: A query session in Imagefinder. (a)The user is asked to select his object of interest by pressing the mouse button in the region of object. (b)After the user has selected his object of interest, top 16 results are returned. 5.3 Query Results This section presents the query results for Imagefinder. To perform a query, a user first selects an image. Segmentation is then done on the selected image and the segmented image is presented to the user who then select the region of his interest. Currently the query for only one region is allowed at a time. After the user has selected his region of interest, top 16 images from the database containing the region nearest to the queried region are presented. A typical query session in Imagefinder is shown in Figure 5.5. To determine the retrieval effectiveness, we use the precision measure, which is defined as precision = number of relevant images number of returned images × 100% (5.1) The precision is 100% when all the returned images are relevant. The relevancy of the result is determined by the judgment from a human user. We now present some typical query results. Figure 5.6 shows the results for the query of the crane image. The results are particularly impressive, 16 out of 16 images are that of crane; though cranes form only a very small part of the database (less than 1%). This may be attributed to the fact that the segmentation of the crane can be very correctly done and both the colour and shape features are effective. Figure 5.7 shows the query results for the sunflower image. The sunflower images are associated with a small texture content. However, the precision of the results shows the robustness of seg- mentation to the presence of small texture content. Also, one more observation can be made from the results. The different results have different background. This demonstrates the usefulness of 33
  • 43. Category Precision Yellow flower 91.67% Sky 93.75% Red flower 64.52% Crane 63.64% Tree 50.78% Fox 46.87% Eagle 46.87% Water 43.75% Table 5.1: Precision values for various categories of images present in the database. retrieval at the level of objects, in this case sunflower. Similar thing could not have been achieved by a system using only global image characteristics. Figure 5.8 shows the results of query on the fox image. The fox images are particularly very difficult to segment, since there is very little to distinguish the fox from the background. We humans are able to distinguish the fox mainly because of our prior knowledge which is not available to Imagefinder. The results should therefore not be surprising. Figure 5.9 shows the query results for sky as the desired object. The precision is particularly impressive, out of top 16 results returned 14 are relevant. The following observation can be made in particular, the results obtained have varying shape but similar colour. This is of course true for sky, which is not associated with sky. One more thing that can be noted is that the only common feature in the results is the presence of sky, the other objects present vary greatly. This shows that Imagefinder is looking for a specific object sky and not does not care for the presence of other objects, which is a characteristic of object based retrieval. The results for some other categories like eagle, water bodies are shown in Figure 5.10-5.11. Finally, we summarize the results for different categories in table 5.1. 34
  • 44. Query Image Figure 5.6: Top 16 result for the query of a crane, all 16 out of 16 results are relevant. 35
  • 45. Query Image Figure 5.7: Top 16 result for the query of sunflower. 9 out of 16 results are relevant. The sunflower images form very small, about 0.2% part of the database. 36
  • 46. Query Image Figure 5.8: Top 16 result for the query of fox. 8 out of 16 returned images are relevant. The fox images form about 1% of the database. 37
  • 47. Query Image Figure 5.9: Top 16 result for the query of sky. 14 out of 16 images have sky as a part of the image and are therefore relevant. 38
  • 48. Query Image Figure 5.10: Top 16 result for the query of an eagle, 8 out of 16 images are relevant. The eagle images form about 1% of the database. 39
  • 49. Query Image Figure 5.11: Top 16 result for the query of an water, 6 out of 16 images are relevant. The water images form about 1% of the database. 40
  • 50. 5.4 Relevance Feedback Figure 5.12: A typical feedback session in Imagefinder. The user has marked relevant images by pressing the relevant radio buttons against them. The query based on the feedback can be performed by pressing Query Again button. This section shows the improvement in results obtained by feedback from the user. To give feedback in Imagefinder, the user marks a set of images which he considers are relevant to the query and query again. Imagefinder returns a new set of results based on the feedback. A sample session is shown in Figure 5.12. Category Precision Without Feedback After Feedback Tree 50.78% 60.93% Fox 46.87% 51.56% Eagle 46.87% 56.25% Water 43.75% 52.08% Table 5.2: Precision values for various categories of images after one feedback iteration. The first result we show is that for sunflower. The results without feedback was shown in Figure 5.7. Figure 5.13 shows the result after one iteration. The precision without feedback was 56.25%, whereas after one iteration it increased to 68.75%, an improvement of 12.50%. The second result is that for water bodies, shown in figure 5.14. The result without feedback was shown in figure 5.11. The improvement of 12.5% was observed for this query. A summary of feedback results is shown in table 5.2. The results were obtained after averaging over a number of queries in each category. 41
  • 51. Query Image Figure 5.13: Top 16 results for the query of sunflower after one feedback iteration. Precision(without feedback)=56.25%, precision(after one iteration)=68.75%. 42
  • 52. Query Image Figure 5.14: Top 16 results for the query of water after one feedback iteration. Precision(without feedback)= 37.5%, precision(after one iteration)=50%. 43
  • 53. Chapter 6 Conclusion This chapter summarizes the work done in this thesis. Several directions for further development are also outlined. 6.1 Summary In this thesis, the problem of Content Based Image Retrieval was discussed. We chose to support the retrieval at the level of objects which corresponds more naturally to what a user would like to have in an image retrieval system. A broad range of issues from image segmentation to relevance feedback were addressed. We presented a new algorithm for image segmentation based on an information theoretic approach. The features used for retrieval were based on shape (moment invariants, eccentricity and compactness) and colour (mean and standard deviation). We also presented a data structure based on kd-tree, which is specifically suited for efficient searching in a large database. The learning in our system Imagefinder was incorporated by using both positive and negative feedback from the user. The system was tested on a database of about 5000 real world images consisting of several categories. 6.2 Suggestions for Further Development Content based image retrieval is still in developmental stages. There are a number of issues which can be addressed. Here we outline a few of them. • The performance of any system supporting query at the level of objects is severely affected by segmentation. Traditionally, the use of feedback is limited to assessing the importance of different features only. The learning by feedback can be extended to include segmentation also, which we think will provide better results. • The segmentation process in Imagefinder used only colour features. Apart from colour, texture is also an important feature which can be used to improve segmentation results. Due to time constraints, we could not incorporate texture features for assessing similarity. Its use should lead to better results. 44
  • 54. • Currently Imagefinder supports query for one region only, i.e. it does not use the knowledge of presence of other objects. However, the context is also a very powerful feature for judging similarity. Further work therefore can be done to incorporate this, i.e. support for query of multiple regions in the image. • The learning by feedback in Imagefinder is restricted only to one session. This is a short term learning in which the system forgets everything after the session is over. A long term user may not like to tell the system the same thing everytime, when he queries for the same object. The knowledge gained by learning can therefore be saved for future sessions, thereby allowing the system to remember what it learned previously. 45
  • 55. Bibliography [1] M. Flickner, H. Sawhney, W. Niblack, et al., “Query by image and video content: The qbic system,” IEEE Computers, 1995. [2] W. Niblack and R. Barber, “The qbic project: Querying images by content using colour, texture and shape,” in Proc. SPIE Storage and Retrieval for Image and Video Databases, Feb 1994. [3] J. R. Bach, C. Fuller, and A. Gupta, “The virage image search engine: An open framework for image management,” in Proc. SPIE Storage and Retrieval for Image and Video Databases, 1997. [4] J. R. Smith and S.-F. Chang, “Visualseek: A fully automated content-based image query system,” in Proc. ACM Multimedia, 1996. [5] A. Pentland, R. W. Picard, and S. Sclaroff, “Photobook: Content-based manipulation of image databases,” International Journal of Computer Vision, 1996. [6] W. Y. Ma and B. S. Manjunath, “Netra: A toolbox for navigating large image databases,” in Proc. IEEE Int. Conf. on Image Processing, 1997. [7] T. S. Huang, S. Mehrotra, and K. Ramachandran, “Multimedia analysis and retrieval system (mars) project,” in Proc. of 33rd Annual Clinicon Library Application pod Data Processing - Digital Image Access and Retrieval, 1996. [8] T. Pavlidis and Y. T. Liow, “Integrating region growing and edge detection,” IEEE Transac- tions on Pattern Analysis and Machine Intelligence, vol. 12, pp. 225–233, March 1990. [9] M. Hanson and W. Higgins, “Watershed driven relaxation labeling for image segmentation,” in Proc. IEEE Intl. Conf. on Image Processing, 1994. [10] C. Carson, S. Belongie, H. Greenspan, and J. Malik, “Blobworld : Image segmentation us- ing expectation-maximization and its application to image querying,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, no. 8, August 2002. [11] M. Miyahara, “Mathematical transform of (r,g,b) colour data to munsell (h,s,v) colour data,” SPIE Visual Communication and Image Processing, vol. 1001, 1998. 46
  • 56. [12] W.-J. Jia Wang and R. Acatya, “Colour clustering techniques for colour-content-based image from image databases,” in Proc. of IEEE Conf. on Multimedia Computing and Systems, 1997. [13] M. Stricker and M. Orengo, “Similarity of colour images,” in Proc. of SPIE Storage and Retrieval for Images and Video Databases, 1995. [14] J. R. Smith and S.-F. Chang, “Single colour extraction and image query,” in Proc. of IEEE Int. Conf. on Image Processing, 1995. [15] T. S. Chua, K.-L. Tan, and B. C. Ooi, “Fast signature-based colour spatial image retrieval for multimedia database systems,” in Proc. of IEEE Conf. on Multimedia Computing and Systems, 1997. [16] R. Schettini, “Multicoloured object recognition and location,” Pattern Recognition Letters, pp. 1089–1097, November 1994. [17] E. Persoen and K. S. Fu, “Shape discrimination using fourier descriptors,” IEEE Trans. on Sys. Man. and Cyb, vol. 6, pp. 661–674, 1984. [18] C. T. Zahn and R. Roskies, “Fourier descriptors for plane closed curves,” IEEE Trans. on Computers, 1972. [19] M. k. Hu, “Visual pattern recognition by moment invariants, computer methods in image analysis,” IRE Trans. on Information Technology, vol. 8, 1962. [20] Y. Rui, A. She, and T. Huang, “Modified fourier descriptors for shape representation - a practical approach,” in Proc. of Int. Workshop on Image Databases and Multimedia Search, 1996. [21] D. kapur, Y. N. Lakshman, and T. Saxena, “Computing invariants using elimination methods,” in Proc. IEEE International Conference on Image Proc., 1995. [22] R. M. Harlik, “Statistical and structural approaches to texture,” in Proc. IEEE, 1978, pp. 786–804. [23] H. Tamura, S. Mori, and T. Yamawaki, “Texture feature corresponding to visual perception,” in IEEE Trans. on Sys. Man. and Cyb, vol. 8, no. 6, 1978. [24] J. R. Smith and S.-F. Chang, “Transform features for texture classification and discrimination in large image databases,” in Proc. IEEE Int. Conf. on Image Proc, 1994. [25] W. Y. Ma and B. S. Manjunath, “A comparison of wavelet transform features for texture image annotation,” in Proc. IEEE Int. Conf. on Image Proc., 1995. [26] C. Faloutsos and K.-I. lin. Fastmap, “A fast algorithm for indexing, data-mining and visual- ization of traditional and multimedia datasets,” in Proc. SIGMOD, 1995, pp. 163–174. [27] G. Salton and C. Bucley, Introduction to Modern Information Retrieval. Mc-Graw-Hill Book Company, 1983. 47
  • 57. [28] J. Bentley, “Multidimensional binary search trees used for associative searching,” Communi- cations ACM, vol. 18, pp. 509–517, 1975. [29] R-tree: a dynamic index structure for spatial searching. Proc. ACM SIGMOD, 1984. [30] The R+ tree: A dynamic index for multi-dimensional objects. Proc. 12th VLDB, 1987. [31] M. Charlikar, C. Chekur, T. Feeder, and R. Motwani, “Incremental clustering and dynamic information retrieval,” in Proc. 29th Annual ACM Symposium on Theory of Computing, 1997, pp. 625–635. [32] Y. Rui, K. Chakrabarti, S. Mehrotra, Y. Zhao, and T. S. Huan, Dynamic clustering for optimal retrieval in high dimensional multimedia databases, TR- MARS-10-97, 1997. [33] H. J. Zhang and D. Zhong, “A scheme for visual feature based image retrieval,” in Proc. SPIE Storage and Retrieval for Image and Video Databases, 1995. [34] J. Quinlan, “Induction of decision trees,” Machine Learning, vol. 1, pp. 81–106, 1986. [35] D. Fisher, “Knowledge acquisition via incremental conceptual clustering,” Machine Learning, vol. 2, pp. 139–172, 1987. [36] J. Gennari, P. Langley, and D. Fisher, “Models of incremental concept formation,” Artificial Intelligence, pp. 11–61, 1989. [37] E. Gokcay and J. C. Principe, “Information theoretic clustering,” IEEE Transactions on Pat- tern Analysis and Machine Intelligence, vol. 24, no. 2, March 1990. [38] M. Gluck and J. Corter, “Information uncertainty and the utility of categories,” in Proceedings of the Seventh Annual Conference of the Cognitive Science Society, 1990, pp. 283–287. [39] T. Kanungo, D. M. Mount, and N. S. Netanyahu, “The analysis of a simple k-means clustering algorithm analysis and implementation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, no. 6, June 2002. [40] D.Pelleg and A. Moore, “Accelerating exact k-means algorithm with geometric reasoning,” in Proc. ACM SIGKDD Int’l Conf. Knowledge Discovery and Data Mining, August 1999, pp. 277–281. [41] A. Jain, Fundamentals of Digital Image Processing. Prentice-Hall, 1989. [42] Image source. [Online]. Available: ftp://dlp.cs.berkely.edu 48