The paper was delivered in the Proceedings of the 2011 IEEE 7th International Conference on Intelligent Computer Communication and Processing (ICCP 2011) on August 26th, 2011 in Cluj-Napoca, Romania.
Publication: http://bit.ly/GDIsRY
Abstract:
OntoGen is a semi-automatic and data-driven ontology editor focusing on editing of topic ontologies. It utilizes text mining tools to make the ontology-related tasks simpler to the user. This focus on building ontologies from textual data is what we are trying to bridge. We have successfully extended OntoGen to work with image data and allow for ontology construction and editing on collections of labeled or unlabeled images. Browsing large heterogeneous image collections efficiently is certainly a challenging task and we feel that semiautomatic ontology construction, as described in this paper, makes this task easier.
2. IMAGE DATA
ď˘ Difficult to handle
ď˘ High-dimensional representations
ď˘ The amount of image data is constantly increasing
and there is a rising need for reliable automatic
image analysis systems in practical applications
3. ď˘ Image representation Application
Data
Mining
Extract
features
Text
Color
info
SIFT
features
4. SIFT FEATURES
ď˘ Rotation, scale and translation invariant orientation
gradients located at âinterestingâ points on an
image
ď˘ Usually, the SIFT feature space is quantized so that
some ârepresentativeâ vectors are found
ď˘ Each feature on an observed image is then
assigned to its nearest representative and this is
how the so called âcodebookâ histogram is obtained
5. COLOR HISTOGRAMS
ď˘ Color information on an image might or might not
be of interest for a particular problem, but it usually
represents a useful piece of information
ď˘ There are several ways to handle this
information, but the simplest and fastest one is to
simply divide the color spectrum into âbucketsâ and
calculate the distribution of colors into these
buckets, thereby obtaining the color histogram for
an image
6. ONTOGEN
ď˘ OntoGen is a tool which allows us to do semi-
automatic ontology construction, clustering,
classification, as well as data visualization via
multidimensional scaling
ď˘ This can easily be applied on image data to gain an
overview of collections of images
7. IMAGE FEATURE EXTRACTION
ď˘ We extract SIFT features and color histograms for
each image
ď˘ We calculate the distance between images as the
weighted sum of distances between the two
distributions (SIFT codebook and color data)
ď˘ If images have annotations, this can easily be
incorporated by adding a third part in the
representation for each image
8. ONTOGEN ON IMAGE DATA
ď˘ On the next few slides we show the usage of
OntoGen on one simple data set
ď˘ The data was taken from ImageNet online image
collection. The particular subset contains images of
various types of flowers, as well as images of fire
and images of buildings
13. CREATING AN ONTOLOGY
ď˘ We can do k-means clustering to detect groups of
similar images
ď˘ We can use these groups to create a level in the
ontology
ď˘ The relevant features are displayed on top of the
nodes
14. SO, LETâS LOOK AT SOME OF THOSE NODES
AND THEIR MEDOIDSâŚPRETTY GOODâŚ
15. HOWEVERâŚ
ď˘ One of the first-level sub-concepts is not good,
which can be seen by observing itâs medoids:
ď˘ So, now we can branch it further into more refined
sub-concepts to improve the quality
16. BEFORE WE DO SO, WE CAN VISUALIZE THE
SUB-CONCEPT IN DOCUMENT ATLAS
17. SO âŚ
ď˘ This is definite evidence that the concept should be
split into at least two different sub-concepts
ď˘ Most of the images inside it represent buildings, but
there are some that belong to a certain type of
flower, as well as some depicting fire
ď˘ So, just to be safe, letâs say we want 5 sub-
concepts
20. CONCLUSIONS
ď˘ What we see is that we can construct an image
ontology in a semi-supervised way
ď˘ By using k-means clustering based on SIFT+color
image representation we can detect candidates for
concepts in the ontology and then refine them until
we reach good quality
21. AKNOWLEDGEMENTS
ď˘ Thiswork was supported by the bilateral
project between Slovenia and Romania
âUnderstanding Human Behavior for Video
Survailance Applications,â the Slovenian
Research Agency and the ICT Programme
of the EC PlanetData (ICTNoE-257641).