Automatic Image Annotation Using Color K-Means Clustering

See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/221365086
Automatic Image Annotation Using Color K-
Means Clustering
Conference Paper · November 2009
DOI: 10.1007/978-3-642-05036-7_61 · Source: DBLP
CITATION
1
READS
55
2 authors:
Some of the authors of this publication are also working on these related projects:
QUANTITATIVE MODELLING OF MALAY VOWEL SOUNDS View project
Nursuriati Jamil
Universiti Teknologi MARA
60 PUBLICATIONS 150 CITATIONS
SEE PROFILE
Siti 'Aisyah Sa'dan
Universiti Teknologi MARA
3 PUBLICATIONS 1 CITATION
SEE PROFILE
All content following this page was uploaded by Nursuriati Jamil on 01 December 2016.
The user has requested enhancement of the downloaded file. All in-text references underlined in blue
are linked to publications on ResearchGate, letting you access and read them immediately.

H. Badioze Zaman et al. (Eds.): IVIC 2009, LNCS 5857, pp. 645–652, 2009.
© Springer-Verlag Berlin Heidelberg 2009
Automatic Image Annotation Using Color K-Means
Clustering
Nursuriati Jamil and Siti ’Aisyah Sa’adan
Faculty of Computer & Mathematical Sciences, Universiti Teknologi MARA
40450 Shah Alam, Selangor, Malaysia
liza@tmsk.uitm.edu.my, aisyah.sadan@gmail.com
Abstract. Automatic image annotation is a process of modeling a human in as-
signing words to images based on visual observations. It is essential as manual
annotation is time consuming especially for large databases and there is no
standard captioning procedure because it is based on human perception. This
paper discusses implementation of automatic image annotation using K-means
clustering algorithm to annotate the colors with the appropriate words by using
predefined colors. Experiments are conducted to identify the number of cen-
troids, distance measures and initialization mode for the best clustering results.
A prototype of an automatic image annotation is developed and then tested us-
ing thirty-five beach scenery photographs. Results showed that annotating im-
age using evenly-spaced initialization mode and 100 centroids measured using
City-Block distance function managed to achieve a commendable 75% preci-
sion rate.
Keywords: Automatic image annotation, K-means clustering, RGB model, ini-
tialization mode, cluster number.
1 Introduction
Automatic image annotation is defined indirectly as the process by which a computer
system automatically assigns words in the form of captioning to a digital image [14].
Commonly, automatic image annotation is used in image retrieval systems to organize
and locate images of interest from a database. Annotation-based image retrieval is
perceived as better than content-based image retrieval (CBIR) because it allows user
to compose queries freely using their natural language [4]. Furthermore, CBIR system
matches images based on the low-level visual similarities. Thus, it has some limita-
tions due to missing semantic information [8].
Clustering algorithms are commonly used in classifying low-level features of the
images prior to annotation. [1] defined clustering as the process of organizing objects
into groups whose members are similar in some way. A cluster is therefore a collection
of objects that are similar between them and are dissimilar to objects belonging to
other clusters. Several popular clustering algorithms include K-Means, Expectation
Maximization (EM) and Discreet Distribution (D2) clustering [7] algorithms. K-Means
clustering relies on hard assignment of information to a given set of partitions also

646 N. Jamil and S. ’Aisyah Sa’adan
known as cluster centers or the K centroids [13]. At every step of the algorithm, each
data value is assigned to the nearest centroid based on some similarity parameter that is
calculated using distance measurement. Then, the centroids are then recalculated based
on these hard assignments. With each successive pass, a data value can change the
centroid where it belongs to, thus altering the values of the centroid at every pass. K-
Means clustering has been used extensively to facilitate in classifications of low-level
features in image retrieval systems [3][14][11][7][9]. The EM algorithm employed in
[13] [14], on the other hand relies on soft assignment of data given set of centroids.
Every data value is associated with every centroid through system of weight based on
strongly the data value should be associated with the particular centroid. In general, K-
Means clustering works better than EM algorithm and is fairly simple to implement for
image segmentation using color as the feature parameter [13].
In this paper, implementation of K-Means clustering algorithm is experimented to
automatically annotate beach scenery photographs using their RGB color features. The
purpose of the study is to investigate the suitable number of centroids, distance meas-
ures and initialization mode in an attempt to achieve the best clustering performance.
2 K-Means Clustering
The K-Means is a very popular algorithm and one of the best for implementing the
clustering process [12]. It has a time complexity that is dominated by the product of
the number of patterns, the number of centroids, and the number of iterations. For an
image, K-Means clustering may be implemented as follows:
i) Place K points into the space represented by the pixels that are being clustered.
These points represent initial cluster centroids (K), also known as initialization
point.
ii) Assign each pixel to the cluster that has the closest centroid (obtain by measuring
distance).
iii) When all pixels have been assigned, recalculate the positions of the K centroids.
iv) Repeat Steps 2 and 3 until the centroids no longer move.
Factors that may affect performance of K-Means algorithm are the initialization
mode, distance measures and the number of centroids used during clustering process.
Initialization mode is important in order to have accurate RGB representation for the
centroids at the starting point of the clustering. Each pixel in the image is then assign
to its proper cluster based on its similarity by using a distance measure. This will
influence the shape of the clusters, as some elements may be close or further away to
one another according to the distance calculated [10]. Thus, the distance measurement
used is also vital to ensure every pixel is assigned to its centroid precisely. Common
distance functions used in clustering are Euclidean distance, City Block distance,
Minkowski distance and Canberra distance. Number of centroids, K chosen in the
clustering process must also be taken into consideration too. According to [13], the
number of the centroids used in the segmentation has a very large effect on the output.
The more centroids used in the color setup, more possible colors are available to show
up in the output.

Automatic Image Annotation Using Color K-Means Clustering 647
3 Materials and Methods
As mentioned previously, this paper discusses the implementation of an automatic
annotation prototype that will annotate beach photographs using eight predefined
words: sky, sea, beach, cloud, tree, hill, grass and rock. Fig. 1 demonstrates the dia-
gram of the annotation process.
Fig. 1. Automatic annotation processes
3.1 Data Collection
Ten natural beach scenery photographs are downloaded from [5][6] as these images
have been classified into their proper categories for benchmarking purpose. These
images are chosen from a total of 3,360 photograph images to be used as training
images. For testing purposes, thirty-five photograph images are collected randomly
from search engine Yahoo! and Google. The criteria of the test images are that they
are beach scenery photographs and they must have at least one of the eight beach
elements, which are sky, sea, beach, cloud, tree, hill, grass and rock.
3.2 Manual Image Annotation
All thirty-five testing images are manually annotated using visual inspection of three
people. They are given a selected list of words taken from Oxford Fajar dictionary [2]
that describe beach scenery and they manually annotated the test images based on the
given words. Results of these manual annotations are then used as benchmarking of
the proposed prototype.
Beach image
Color extraction
Predefined
colors
Color clustering
Identify init mode,
cluster no, distance
measure
K-Means clustering
Automatic annotation
Captions: SKY,
CLOUD, SEA, ROCK,
TREE, GRASS, HILL,
BEACH
Manual
annotation
Relevance
list
Benchmarking
Training

3.3 Color Feature Extraction
Color features using RGB model of the eight beach elements mentioned earlier are
extracted from the training images. These predefined color features are later used
during the testing phase for annotating the test images. Table 1 shows the RGB aver-
age color values for all the beach elements.
Table 1. Predefined colors of the beach elements
Beach element Average RGB values
Sky 88, 122, 170
Sea 58, 97, 123
Beach 187, 174, 147
Grass 59, 69, 30
Hill 43, 88, 75
Tree 72, 79, 36
Rock 76, 67, 69
Cloud 190, 189, 199
3.4 Color Clustering
Two experiments are conducted to determine the initialization mode, distance func-
tion and number of clusters in an effort to achieve the highest performance of K-
Means algorithm. The first experiment is to discover the best combination of initiali-
zation mode and distance measure. The initialization modes that are tested are evenly-
spaced mode and max-data mode; and the distance measurements that are involved
are Euclidean, City Block and Canberra. Objective of the second experiment is to
identify the appropriate number of centroids (K) to be implemented in automatic an-
notation prototype. These centroids contain the RGB values to be compared later with
predefined color of beach elements. The numbers of centroids to be tested are 8K,
30K, 50K and 100K.
To evaluate the performance of the clustering algorithm, Recall and Precision
measures are computed [14], where numCorrect is the number of correctly retrieved
words from output caption, numRetrieved is the total number of retrieved words
from the caption and numExist is the actual number of retrieved words for the
caption.
Recall =
numCorrect
numRetrieved
(1)
Precision =
numCorrect
numExist
(2)
3.5 Development of Automatic Annotation Prototype
Based on the experiment results, a prototype of an automatic annotation system was
developed using Java programming language. The software development tools used

are BlueJ version 2.1.2 with Java Development Kit of version jdk1.6.0_05, Java Run-
time Environment of version jre1.6.0_07, Java Advance Imaging Development Kit
version jai-1_1_3-lib-windows-i586-jdk and Java Advance Imaging Runtime Envi-
ronment version jai-1_1_3-lib-windows-i586-jre. The prototype is then tested and
evaluated using thirty-five photographs of beach scenery.
4 Results and Discussions
Table 2 shows results of the first experiment to determine the combination of
initialization mode and distance measure in achieving the best performance of clus-
tering. Overall, evenly-spaced initialization mode performed better compared to
max-data mode. It can be also seen that the highest average precision rate of 88% is
accomplished by using evenly-spaced initialization mode and City Block distance
measure. Even though recall rate of this combination is slightly lower than Canberra
measure, we perceived precision rate as a better judgment of clustering
performance.
Table 2. Performance of different combinations of initialization modes and distance measures
Precision Recall
Initialization
Mode Euclidian CityBlock Canberra Euclidian CityBlock Canberra
Evenly-spaced 0.80 0.88 0.83 0.40 0.40 0.46
Max-data 0.82 0.87 0.88 0.34 0.32 0.37
The experiment result of comparing the number of centroids is recorded in Table 3.
From the table, it is shown that 8K and 100K have equal precision rate of 88% in
annotating the images. Therefore, we include the recall rate of 40% in order to
conclude that the highest performance was achieved when the highest number of
centroids of 100 is used. It is also interesting to note that when using 30 and 50 cen-
troids, the precision rates are in fact lower than when utilizing only 8 centroids.
After all the techniques and distance measure are determined, the prototype was
developed and tested with the 35 testing images. Fig. 2 illustrates an output of one of
the annotated image. Recall and precision rates of the tested images are demonstrated
in Table 4 showing and average precision rate of 75% and recall rate of 50%.
Table 3. Performance of different number of centroids
Number of Centroids (K)
Average
8 30 50 100
Precision 0.88 0.87 0.87 0.88
Recall 0.32 0.34 0.38 0.40

Table 4. Recall and Precision Rates of the Prototype
Image Precision Recall
beach001.jpg 0.5 0.25
beach002.jpg 0.67 0.4
beach003.jpg 0.4 0.5
beach004.jpg 1 0.33
beach005.jpg 1 0.4
beach007.jpg 1 0.4
beach008.jpg 1 0.4
beach009.jpg 1 0.5
beach010.jpg 0.57 1
beach011.jpg 1 0.2
beach012.jpg 1 0.4
beach013.jpg 0.5 1
beach015.jpg 1 0.4
beach016.jpg 1 0.2
beach017.jpg 1 0.4
beach019.jpg 0.67 0.5
beach020.jpg 0.5 0.25
beach021.jpg 1 0.67
beach022.jpg 1 0.33
beach023.jpg 0.67 1
beach024.jpg 0.57 1
beach025.jpg 1 0.4
beach026.jpg 0.5 0.25
beach027.jpg 1 0.25
beach028.jpg 1 0.4
beach030.jpg 0.5 1
beach031.jpg 0.67 1
beach032.jpg 0.2 0.33
beach033.jpg 0.33 0.35
beach034.jpg 0.63 1
beach035.jpg 1 0.25
Average 0.75 0.50
Table 5 illustrated the result of the percentage of each beach element correctly re-
trieved. From the table, it is shown that SKY and CLOUD have the highest retrieval
rate at 77% and 70%, respectively. This is due to the fact that SKY has similar color
with CLOUD. In other words, when there is SKY, there is possibility of CLOUD to
be annotated. However, ROCK has 0% of correctly retrieved rate. The main reason of
this is the little occurrence of ROCK in all the tested images.

Fig. 2. An image automatically annotated with 5 words related to beach scenery
Table 5. Percentage of beach elements correctly retrieved
Beach Element
Manual
Annotation
Automatic
Annotation Correctly Retrieved
SKY 35 27 77.14
SEA 28 11 39.29
BEACH 33 13 39.39
GRASS 6 2 33.33
HILL 8 1 12.50
TREE 21 7 33.33
ROCK 4 0 0.00
CLOUD 20 14 70.00
5 Conclusion
From the experimental results, it shows that the prototype is best implemented using
evenly spaced values for initialization mode with City Block distance for distance
measure in K-Means clustering. Even though the training data is very small, due to
lack of free image database, the prototype achieved a commendable precision rate of
75 %. This shows that K-Means algorithm is robust enough to be utilized in clustering
low-level features of an image for annotation purposes. Our study is an initial work of
automatic image annotation. There are several constraints and limitations that should
be overcome with further research.
Future work to improve the accuracy of the system can take many directions. For
example, this prototype needs to be tested with other color model that is more align
with human vision such as HSV color model. More training images should be ac-
quired to increase the accuracy of the feature extractions. Finally, more features
should be extracted from the image to imply more meaning when annotation process
is performed.

References
1. A Tutorial on Clustering Algorithm,
http://home.dei.polimi.it/matteucc/Clustering/tutorial_html/
index.html
2. Hawkins, J.M.: Kamus Dwibahasa Oxford Fajar: Melayu Inggeris, 4th edn. Fajar Bakti,
Selangor (2004)
3. Çavuş, Ö., Aksoy, S.: Semantic Scene Classification for Image Annotation and Retrieval.
In: da Vitoria Lobo, N., et al. (eds.) IAPR 2008. LNCS, vol. 5342, pp. 402–410. Springer,
Heidelberg (2008)
4. Inoue, M.: On the Need for Annotation-Based Image Retrieval. In: Workshop of Informa-
tion Retrieval in Context, pp. 44–46 (2004)
5. James Wang Research Group, http://wang.ist.psu.edu/~jwang/test1.zip
6. Jia Li Research Group,
http://www.stat.psu.edu/~jiali/li_photograph.tar
7. Li, J., Wang, J.Z.: Real-Time Computerized Annotation of Pictures. In: ACM Multimedia
Conference, pp. 911–920 (2006)
8. Pan, J.Y., Yang, H.J., Duygulu, P., Faloutsos, C.: Automatic Image Captioning. In: IEEE
International Conference on Multimedia and Expo., pp. 1987–1990 (2004)
9. Sayar, A., Yarman-Vural, F.T.: Image Annotation by Semi-Supervised Constrained by
SIFT Orientation Information. In: 23rd International Symposium on Computer and Infor-
mation Sciences, pp. 1–4 (2008)
10. Similarity Measurements, http://people.revoledu.com/kardi/tutorial/
Similarity/index.html
11. Srikanth, M., Varner, J., Bowden, M., Moldovan, D.: Exploiting Ontologies for Automatic
Image Annotation. In: 28th International ACM SIGIR Conference on Research and Devel-
opment in information Retrieval, pp. 552–558 (2005)
12. Vrahatis, M.N., Boutsinas, B., Alevizos, P., Pavlides, G.: The New k-Windows Algorithm
for Improving the K-Means Clustering Algorithm. J. Complexity 18(1), 375–391 (2002)
13. Vutsinas, C.: Image Segmentation: K-Means and EM Algorithms,
http://www.ces.clemson.edu/~stb/ece847/fall2007/projects/
kmeans_em.doc
14. Wang, L., Liu, L., Khan, L.: Automatic Image Annotation and Retrieval using Subspace
Clustering Algorithm. In: 2nd ACM International Workshop on Multimedia Databases, pp.
100–108 (2004)
15. Li, W., Sun, M.: Automatic Image Annotation Based on WordNet and Hierarchical En-
sembles. In: Gelbukh, A. (ed.) CICLing 2006. LNCS, vol. 3878, pp. 417–428. Springer,
Heidelberg (2006)

Automatic Image Annotation Using Color K-Means Clustering

Recommended

Recommended

More Related Content

Similar to Automatic Image Annotation Using Color K-Means Clustering

Similar to Automatic Image Annotation Using Color K-Means Clustering (20)

More from Jim Webb

More from Jim Webb (20)

Recently uploaded

Recently uploaded (20)

Automatic Image Annotation Using Color K-Means Clustering