More Related Content
Similar to Script identification using dct coefficients 2
Similar to Script identification using dct coefficients 2 (20)
More from IAEME Publication
More from IAEME Publication (20)
Script identification using dct coefficients 2
- 1. International Journal of Graphics and Multimedia (IJGM), ISSN 0976 – 6448(Print),
ISSN 0976 – 6456(Online) Volume 4, Issue 1, January - April 2013, © IAEME
31
SCRIPT IDENTIFICATION USING DCT COEFFICIENTS
M. M. Kodabagi1
, Hemavati C. Purad2
1
Department of Computer Science and Engineering, Basaveshwar Engineering College,
Bagalkot-587102, Karnataka, India
2
Department of Computer Science and Engineering, Tontadarya College of Engineering,
Gadag-582101, Karnataka, India
ABSTRACT
Automated systems for understanding low resolution images of display boards are
facilitating several new applications such as blind assistants, tour guide systems, location
aware systems and many more. Script identification at word level is one of the very important
pre-processing steps for development of such systems prior to further image analysis. In this
paper, a new approach for word level script identification of text in low resolution images of
display boards is presented. The proposed methodology uses horizontal run statistics and
texture features for distinguishing 3 Indian scripts namely; Hindi, Kannada and English. The
method computes discrete cosine transform based texture features from input word image and
uses newly defined threshold based discriminant function to identify the script class. The
methodology is evaluated on 800 low resolution word images of display boards. The
proposed method is robust and insensitive to the variations in size and style of font, number
of characters, thickness and spacing between characters, noise, and other degradations. The
proposed method achieves an overall identification accuracy of 85.44% and individual
identification accuracy of 100% for Hindi Script, 70.33% for Kannada Script and 86% for
English.
1. INTRODUCTION
In recent years, the camera embedded hand held systems such as smart mobile
phones, tablets and PDA’s are being widely used and they increasingly exhibit higher
computing and communication capabilities. These devices with internet access facilities are
being used for wide variety of purposes such as information seeking, mobile commerce and
other business and enterprise applications. One such application is to understand written text
INTERNATIONAL JOURNAL OF GRAPHICS AND
MULTIMEDIA (IJGM)
ISSN 0976 - 6448 (Print)
ISSN 0976 -6456 (Online)
Volume 4, Issue 1, January - April 2013, pp. 31-40
© IAEME: www.iaeme.com/ijgm.asp
Journal Impact Factor (2013): 4.1089 (Calculated by GISI)
www.jifactor.com
IJGM
© I A E M E
- 2. International Journal of Graphics and Multimedia (IJGM), ISSN 0976 – 6448(Print),
ISSN 0976 – 6456(Online) Volume 4, Issue 1, January - April 2013, © IAEME
32
on display boards in an unknown environment. People who move across different places in
the world for field work and business find it difficult to understand written text on display
boards particularly in foreign environment. This is especially true in countries like India,
which are multilingual. Hence there is a need for a gadget that helps people to understand
display boards by detecting and translating written matter while providing localized
information.
The written matter on display boards/name boards provides important information for
the needs and safety of people, and may be written in unknown languages. The written matter
can be street names, restaurant names, building names, company names, traffic directions,
warning signs etc. Researchers have focused their attention on development of techniques for
understanding written text on such display boards. There is a spurt of activity in the
development of web based intelligent hand held systems for such applications.
In the reported works [1-10] on intelligent systems for hand held devices, not many
works pertain to understanding of written text on display boards. Therefore, scope exists for
exploring such possibilities. The text understanding involves several processing steps; text
detection and extraction, preprocessing for line, word and character separation, script
identification, text recognition and language translation. In the Indian context, the written text
on display board may contain multilingual information. Therefore, recognition and language
translation tasks require script identification at word level. Hence, script identification at
word level is one of the very important processing steps for development of such systems
prior to further analysis.
The script identification of text in low resolution images of display boards is a
difficult and challenging problem due to various issues such as font size, style, and spacing
between characters, skew and other degradations. The reported works on script identification
employ a number of different approaches, which are categorized into local and global
methods. The local approaches use connected component analysis process for determining
the script of text. In contrast, the global approaches measure the properties of a region/block
of text and give sufficient characterization of the underlying script. Hence global approaches,
such as texture analysis is a good choice for solving such a problem.
The task of script identification of text in low resolution image of display board is an
important step whose output will be used by the later processing steps of display board
understanding system. In this paper, a new approach for word level script identification of
text in low resolution images of display boards is presented. The proposed methodology uses
horizontal run statistics and texture features for distinguishing 3 Indian scripts namely; Hindi,
Kannada and English. The method computes discrete cosine transform (DCT) based texture
features from input word image and uses newly defined threshold based discriminant function
to identify the script class. The proposed method is robust and insensitive to the variations in
size and style of font, number of characters, thickness and spacing between characters, noise,
and other degradations. The proposed method achieves an overall identification accuracy of
85.44% and individual identification accuracy of 100% for Hindi Script, 70.33% for Kannada
Script and 86% for English Script.
The rest of the paper is organized as follows; the detailed survey related to script
identification from images is described in Section 2. The proposed method is presented in
Section 3. The experimental results and discussions are given in Section 4. Section 5
concludes the work and lists future directions of the work.
- 3. International Journal of Graphics and Multimedia (IJGM), ISSN 0976 – 6448(Print),
ISSN 0976 – 6456(Online) Volume 4, Issue 1, January - April 2013, © IAEME
33
2. RELATED WORKS
A substantial amount of work has gone into the research related to script identification
from printed document images. Some of the related works are summarized in the following.
The script identification of low resolution image of display board is a necessary step
for development of various other tasks of display board understanding system. A number of
methods for script identification have been published in recent years and are categorized into
local and global approaches. The local approaches perform connected component analysis
and use statistic based features for script identification. Few such methods are summarized in
the following; An approach for determining the script and language of document images is
proposed in [11]. Initially, the algorithm determines connected components and locates
upward concavities in the connected components. It then classifies the script into two broad
classes Han-based (Chinese, Japanese and Korean) and Latin-based (English, French,
German and Russian) languages. The Han-based languages are later differentiated using
statistics of optical densities of connected components. And Latin-based languages are
identified based on the most frequently occurring word shapes characteristics.
An automatic technique for the identification of printed Roman, Chinese, Arabic,
devnagari und Bangla text lines from single document image is found in [12]. The method
uses headline feature to separate Devanagari and Bangla script line into one group and other
script lines (English, Chinese and Arabic) are separated into other group. The technique
obtains zone wise features to identify Devanagari and Bangla scripts. Further, vertical run
length statistics and water reservoir features are used to classify Chinese, English and Arabic
scripts. The experimental results were conducted on 25000 text lines and identification rates
of 97.32%, 98.65%, 97.53%, 96.02% and 97.12% for English, Chinese, Arabic, Devnagari
and Bangla scripts respectively are reported. However, the approach reports higher error rates
for short text lines containing a word with few characters.
The method for script and language identification of noisy and degraded document
images is employed in [13]. The method identifies script based on document vectorization
technique that converts each image into vertical cut vector and character extremum points
that characterizes the shape and frequency of contained character or word images. The
method is tolerant to the variation in text fonts and styles, noise, and various types of
document degradation. For each script or language under study, a script or language template
is first constructed through a training process. Scripts and languages of document images are
then determined according to the distances between converted document vectors and the pre-
constructed script and language templates. Experimental results show that the proposed
technique is accurate, easy for extension, and tolerant to noise and various types of document
degradation. The technique proposes further investigation for the images containing
perspective and curvature distortion and skew angle.
In contrast, the global approaches measure the texture of a region of text to identify
the underlying script. Some of the texture based approaches are detailed below; The method
describing effectiveness of rotation invariant texture features for automatic script
identification is found in [14]. The method computes features from text blocks using multi-
channel gabor filters and constructs a representative feature vector for each language. Then,
Euclidian distance classifier is used for script identification of 6 languages (Chinese, English,
Greek, Russian, Persian, and Malayalam). An average classification accuracy of 96.7% is
reported. The sensitivity of texture analysis to different fonts is also discussed.
- 4. International Journal of Graphics and Multimedia (IJGM), ISSN 0976 – 6448(Print),
ISSN 0976 – 6456(Online) Volume 4, Issue 1, January - April 2013, © IAEME
34
A technique that investigates use of texture analysis for script and language
identification from document images is presented in [15]. The method obtains a uniform
block of text from document image. Multiple channel gabor filters and gray level coocurrence
matrices (GLCMs) are used to extract texture features. Then K-NN classifier is used to
classify seven languages; Chinese, English, Greek, Korean, Malyalam, Persian and Russian.
The test results showed that gabor filters proved to be more accurate than the GLCMs,
producing results which are over 95% accurate.
The texture analysis technique for script identification is described in [16]. The
method conducts evaluation of commonly used texture features for the purpose of script
identification and provides a qualitative measure of which features are most appropriate for
this task. The texture features include GLCM, Gabor filter bank energies, and a number of
wavelet energy features. The experimental results have shown that the wavelet log co-
ocurrence features outperform other techniques giving lowest error rate of 1%. The
effectiveness of features extracted from co-occurrence histograms of wavelet decomposed
images and KNN classifier for script identification of 7 Indian languages are discussed in
[17]. Many recent works on script identification are reported in [18-19].
Out of many works cited in the literature, it is found that few limitations still exist
with the reported script and language identification methods. First, the performance of local
approaches depends upon correct segmentation of connected components. Consequently, they
are very sensitive to the segmentation error resulting from noise and various types of
document degradation. Second, the global techniques need more time to measure the texture
of a region. But, these methods are of good choice for analysis of low resolution images of
display boards. Hence, use of textural features is further investigated in the proposed work.
It is also noticed that, the global techniques, operates on predefined size text blocks
containing matter pertaining to same script for determination of script and language of
underlying document. But this is not the case with written text on display boards in the Indian
scenario, as text may contain multilingual information. Therefore, it is necessary to identify
script and language at word level which is essential for later processing steps such as text
understanding and language translation. The task of script identification at word level is
difficult and challenging, because distinguishing properties are to be obtained from a small
region containing text of variable size and font. Therefore more research is desirable/needed
to model texture of small region containing text of variable size and font for better
characterization and classification with reduced computational complexity. Hence, the
current work is undertaken to identify new properties of texture using discrete cosine
transform coefficients for script identification of low resolution images of display boards.
The detailed description of the proposed methodology is given in the next section.
3. PROPOSED METHODOLOGY FOR SCRIPT IDENTIFICATION
The proposed methodology uses DCT based texture feature for identification of the
script class of low resolution display board images. The methodology comprises three phases;
Preprocessing, Extraction of DCT Energy Features and Script Class Identification. The block
diagram of proposed model is given in Fig. 1. The detailed description of each processing
step is presented in the following subsections.
- 5. International Journal of Graphics and Multimedia (IJGM), ISSN 0976 – 6448(Print),
ISSN 0976 – 6456(Online) Volume 4, Issue 1, January - April 2013, © IAEME
35
Test word image of display board
Hindi
(satisfied)
Fig. 1: Block diagram of proposed method
3.1 Preprocessing
The works reported in literature preprocess document image to obtain uniform sized
text block, detect and correct skew, and remove uneven spacing between lines, word and
characters to obtain optimal texture features for improved classification rate. Because, the
presence of noise, skew and uneven spacing and other degradations significantly affect
texture features leading to higher classification errors. But the preprocessing task is difficult,
computationally expensive and may not be suitable for applications that process small of
amount of text containing few lines. Hence, in this work, an attempt is made to evaluate
performance of new texture features extracted directly from variable sized word images
without removal of noise, skew and uneven spacing and other degradations. However, the
processing is done to binarize the image and generate bounding box around it.
3.3 Extraction of DCT Energy Features
In this phase, Dimensional Discrete Cosine Transformation is applied on the
processed image to obtain DCT matrix d of size MxN, and energy features E1, E2, and E3 are
computed on the chosen regions of DCT coefficients as depicted in equations (1) to (3).
……………………………. ……………………………………. (1)
…………………………………………………………………... (2)
Test Word Image Not Satisfied
Preprocessing for Binarization and Bounding Box
Generation
Computational Strategy for Hindi Script Identification
Threshold Based Classification
Word Image Classified as
Kannada/English
Compute Discrete Cosine Transform Energy Features
- 6. International Journal of Graphics and Multimedia (IJGM), ISSN 0976 – 6448(Print),
ISSN 0976 – 6456(Online) Volume 4, Issue 1, January - April 2013, © IAEME
36
……. ...(3)
Where,
• d is a DCT matrix of dimension MXN obtained after applying DCT on input image.
• Mid1 and Mid2 are column and row numbers used during computation of energy
feature E3.
The Fig. 2. shows regions chosen to calculate energy features E1, E2 and E3.
Fig. 2. DCT matrix and 3 chosen regions for determining energy features E1, E2 and E3
3.4. Script Classification Identification
The script identification task consists of 2 processing stages. In stage1, the test word
image is processed to determine whether it belongs to Hindi Script. Otherwise, stage 2 uses
threshold based classification to determine whether it belongs to Kannada or English Script.
The functionality in both stages is described in the following sections;
3.4.1 Computational strategy for Hindi Script Identification
In this stage, horizontal run statistics of test word image are used to determine
whether the written word in display board image belongs to Hindi or other scripts. Initially,
the horizontal runs of length greater than 6 are computed for every row of word image and
are stored into a feature vector. The vector records row number and run length count of all
runs for all rows. These run length values are thresholded to classify word image into two
classes’ w1 and w2. Where, w1 corresponds to Hindi script and w2 corresponds to other
scripts category. The classified word image into class w2 is further processed as in stage 2 to
determine whether it belongs to Kannada or English Script.
- 7. International Journal of Graphics and Multimedia (IJGM), ISSN 0976 – 6448(Print),
ISSN 0976 – 6456(Online) Volume 4, Issue 1, January - April 2013, © IAEME
37
3.4.2 Threshold based classification
The threshold classification phase of the proposed model uses discriminant
function to classify English and Kannada scripts. The discriminant function use thresholds to
determine the script class. The thresholds are heuristic values, chosen empirically. The
classification rules using discriminant function are stated below.
Algorithm 3.4.1: Threshold Based Classification
Input: E1, E2 and E3
Output: Script Class: English or Kannada
Begin
if E1>=0.1000 &E1<=0.4000 & E2>=0.0200 & E2<=0.2000
Print “Script is ENGLISH”
else if E1>=0.0300 & E1<0.1000 & E2>=0.0100 & E2<=0.0850
Print “Script is KANNADA”
end
End //end of begin
4. EXPERIMENTAL RESULTS AND DISCUSSION
The proposed methodology for script identification has been evaluated for low
resolution word images of display boards with varying font size and style. The experimental
tests were conducted for word images of 3 scripts; Hindi, Kannada and English and results
were highly encouraging. The results of processing several display board word images
dealing with various issues and the overall performance of the system are reported in section
4.1.
4.1 Script Identification: An experimental analysis dealing with various issues
The effectiveness of proposed methodology for script identification using DCT
features has been evaluated for 800 low resolution word images of display boards. The
images were captured from display boards of government offices in India. The image
database consists of 300 Kannada, 300 English, and 200 Hindi script word images of varying
resolutions. The images are characterized by variable number of characters, variable font size
and style, uneven thickness and spacing between characters, minimal information context,
small skew, noise and other degradations.
The proposed methodology has produced good results for low resolution word images
containing text of different size, font, and alignment with varying background. The approach
also identifies script of small skewed text regions. Hence, the proposed method achieves an
overall identification accuracy of 85.44% and individual identification accuracy of 100% for
Hindi Script, 70.33% for Kannada Script and 86% for English Script. A closer examination
of results revealed that misclassifications arise due to minimal information context, noise and
larger skew, which affect the texture of region of text and performance of the texture based
approach. The correctly classified images dealing with various issues are described in table 1.
And the overall performance of the system is reported in table 2.
- 8. International Journal of Graphics and Multimedia (IJGM), ISSN 0976 – 6448(Print),
ISSN 0976 – 6456(Online) Volume 4, Issue 1, January - April 2013, © IAEME
38
TABLE 1: The performance of the system of processing different images dealing with
various issues
Input Sample Image Image
Resolution
Description
33 x 101 Script identification of an image
having minimal information
context.
76 x 127 The effectiveness of the method in
processing degraded Kannada
Image containing characters of
uneven thickness, lighting and
spacing between characters, noise,
small skew and other degradations.
36 x 123 The robustness of the method in
identifying script of an image
containing 7 characters and
degraded background.
96 x 351 The texture of an image having
different font style and large font
size is correctly modeled as
Kannada Script
132 x 451 The method processes a larger size
blurred image with small skew and
classifies as English text.
78 x 151 The robustness of the method in
processing a degraded unusual font
image having English text.
92 x 190 The effectiveness of the method in
correctly classifying part of word
text.
TABLE 2: Overall System Performance
Number of
Word Images
Classified as
Kannada
Classified as
English
Classified
as Hindi
Accuracy
200 Hindi - - 200 100%
300 Kannada 211 89 - 70.33%
300 English - 258 - 86%
- 9. International Journal of Graphics and Multimedia (IJGM), ISSN 0976 – 6448(Print),
ISSN 0976 – 6456(Online) Volume 4, Issue 1, January - April 2013, © IAEME
39
5. CONCLUSION
In this paper, a approach for word level script identification of low resolution images
of display boards employing DCT energy features is proposed. The method identifies script
of word image without applying techniques for removal of noise and other degradations. This
aspect of work makes it more robust and efficient. The proposed set of texture features better
model/organize the texture of a region of text and thus provide sufficient characterization.
The threshold based classification function based on heuristics is found to be robust and
efficient for improving classification accuracy. The testing of methodology for 800 low
resolution word images containing text of different size, font, and alignment with varying
background has yielded an average classification accuracy of 85.44%. The system is found to
be resilient to the presence of small skew and degradations. This is a significant result, which
makes this work suitable for text understanding and translation systems especially in the
Indian context. The method can be extended for script identification of images belonging to
other scripts. And further investigations can focus on language identification of word images.
REFERENCES
[1] Abowd Gregory D. Christopher G. Atkeson, Jason Hong, Sue Long, Rob Kooper, and
Mike Pinkerton, 1997, “CyberGuide: A mobile context-aware tour guide”, Wireless
Networks, 3(5): pp.421-433.
[2] Natalia Marmasse and Chris Schamandt, 2000, “Location aware information delivery
with comMotion”, In Proceedings of Conference on Human Factors in Computing
Systems, pp.157-171.
[3] Tollmar K. Yeh T. and Darrell T., 2004, “IDeixis - Image-Based Deixis for Finding
Location-Based Information”, In Proceedings of Conference on Human Factors in
Computing Systems (CHI’04), pp.781-782.
[4] Gillian Leetch, Dr. Eleni Mangina, 2005, “A Multi-Agent System to Stream Multimedia
to Handheld Devices”, Proceedings of the Sixth International Conference on
Computational Intelligence and Multimedia Applications (ICCIMA’05).
[5] Wichian Premchaiswadi, 2009, “A mobile Image search for Tourist Information
System”, Proceedings of 9th international conference on SIGNAL PROCESSING,
COMPUTATIONAL GEOMETRY and ARTIFICIAL VISION, pp.62-67.
[6] Ma Chang-jie, Fang Jin-yun, 2008, “Location Based Mobile Tour Guide Services
Towards Digital Dunhaung”, International archives of phtotgrammtery, Remote
Sensing and Spatial Information Sciences, Vol. XXXVII, Part B4, Beijing.
[7] Shih-Hung Wu, Min-Xiang Li, Ping-che Yanga, Tsun Kub, 2010, “Ubiquitous
Wikipedia on Handheld Device for Mobile Learning”, 6th IEEE International
Conference on Wireless, Mobile, and Ubiquitous Technologies in Education, pp. 228-
230.
[8] Tom yeh, Kristen Grauman, and K. Tollmar., 2005, “A picture is worth a thousand
keywords: image-based object search on a mobile platform”, In Proceedings of
Conference on Human Factors in Computing Systems, pp.2025-2028.
[9] Fan X. Xie X. Li Z. Li M. and Ma. 2005, “Photo-to-search: using multimodal queries to
search web from mobile phones”, In proceedings of 7th
ACM SIGMM international
workshop on multimedia information retrieval.
- 10. International Journal of Graphics and Multimedia (IJGM), ISSN 0976 – 6448(Print),
ISSN 0976 – 6456(Online) Volume 4, Issue 1, January - April 2013, © IAEME
40
[10] Lim Joo Hwee, Jean Pierre Chevallet and Sihem Nouarah Merah, 2005, “SnapToTell:
Ubiquitous information access from camera”, Mobile human computer interaction with
mobile devices and services, Glasgow, Scotland.
[11] Lu Shijian, Chew Lim Tan, 2008, “Script and Language Identification in Noisy and
Degraded Document Images”, IEEE transactions on pattern analysis and machine
intelligence, 30(1), january.
[12] Linlin Li; Chew Lim Tan; , 2008, "Script identification of camera-based images”, ICPR
2008. 19th International Conference on Pattern Recognition, pp.1-4, 8-11 Dec. 2008.
[13] T.N. Tan, 1998, “Rotation Invariant Texture Features and Their Use in Automatic
Script Identification,” IEEE Trans. Pattern Analysis and Machine Intelligence, 20(7),
pp. 751-756..
[14] G.S. Peake and T.N. Tan, 1997, “Script and Language Identification from Document
Images,” Proc. Eighth British Mach. Vision Conf., vol. 2, pp. 230-233, Sept.
[15] A. Busch, W.W. Boles, and S. Sridharan, 2005, “Texture for Script Identification,”
IEEE Trans. Pattern Analysis and Machine Intelligence, 27(11), pp. 1720-1732.
[16] Hiremath P. S. et al., 2010, “Script identification in a handwritten document image
using texture features”, IEEE 2nd International Advance Computing
Conference,pp.110-114, 2010.
[17] Li Yang; Xuelong Hu; Jun Pan, 2008, "Approaches to image retrieval using fuzzy set
theory", International Conference on Neural Networks and Signal Processing, pp.422-
425, 7-11 June 2008.
[18] S. A. Angadi, M. M. Kodabagi, “Word Level Script Identification of Text in Low
Resolution Images of Display Boards using wavelet features ”, Proceedings of
International Conference on Advances in Computing (ICADC 2012),AISC 174, pp.209-
220, Springer India 2012.
[19] M. M. Kodabagi and S. R. Karjol, “Script Identification from Printed Document Images
using Statistical Features”, International Journal of Computer Engineering &
Technology (IJCET), Volume 4, Issue 2, 2013, pp. 607 - 622, ISSN Print: 0976 – 6367,
ISSN Online: 0976 – 6375.
[20] P. Prasanth Babu, L.Rangaiah and D.Maruthi Kumar, “Comparison and Improvement
of Image Compression using Dct, Dwt & Huffman Encoding Techniques”, International
Journal of Computer Engineering & Technology (IJCET), Volume 4, Issue 1, 2013,
pp. 54 - 60, ISSN Print: 0976 – 6367, ISSN Online: 0976 – 6375.