Texture features based text extraction from images using DWT and K-means clustering

TEXTURE FEATURES BASED TEXT
EXTRACTION
AND RECOGNITION SYSTEM
Guided By:-
Dr. Neelu Jain
Associate Professor
PEC University of Technology
Presented By:-
Divya Gera
13207003
ME(Electronics)

Contents
 Introduction
 Motivation
 Objectives
 Literature Survey
 Proposed Methodology
 Experimental Results
 Conclusion
 Future Scope
 References

 Text content contains high level of semantic information as
compared to visual information.
 It provides important contents for information indexing and
retrieval, automatic annotation and structuring of images.
 Text extraction involves detection, localization, binarization,
extraction, enhancement and recognition of the text from the
given image.
 These text characters are difficult to be detected and
recognized due to their deviation of size, font, style,
orientation, alignment, contrast, complex colored, textured
background

Types of text images
i.) Document Images

Process of Text Extraction &
Recognition

Applications of text extraction
and recognition
 Document indexing and
retrieving
Vehicle license plate detection

Identification of parts in industrial automation
Postal code from address on the envelope

Street signs
Libraries for computerized storage of books

Bank cheque processing
Blind and visually impaired person

Text extraction techniques
Classification of text extraction techniques

 Region-based method uses the properties of the color or gray
scale in the text region or their differences to the
corresponding properties of the background.
 Edge based method is focused on high contrast between the
text and the background.
 CC-based methods use a bottom-up approach by grouping
small components into successively larger components until
all regions are identified in the image.
 Texture based method uses the fact that text in images has
discrete textural properties that distinguish them from the
background.

Motivation
 Text in multimedia documents or images contains important
information for visual content understanding and information
retrieval.
 It is very challenging to design a general-purpose text
extraction system due to the variations in text due to font size,
style, color, orientation, and alignment.
 In case of scene text images problem of low contrast, blurring,
reflection, uneven illumination shadow, and perspective
distortion exists along with text variations.
 Also, less work has been done on multilingual text extraction.

Objectives
 To design and develop an algorithm for multilingual text
extraction based on DWT coefficients, texture features, and k-
means clustering algorithm.
 To overcome the problems like uneven illumination and
shadow, blurring and scratching, multi color, and complex
background in the scene text extraction.
 Develop a text recognition system for English language.
 Compare the performance of the system in terms of
parameters like Detection Rate, Precision Rate, and Recall Rate
with the existing systems.
 Developing a GUI for proposed algorithm.

Author
(year)
Technique used Images Parameter Remarks
Yao et al.
(2007)
CC and Support Vector
Machine (SVM)
Complex
background
images
PR=64%
RR=60%
Pixels of each
character
assumed to
have similar
color.
Lai et al.
(2008)
Edge detection and K-
means clustering
Signboard
Images
Efficient for
uneven
illumination
Song et al.
(2008)
Histogram Projection
and color based K-
means clustering
Chinese text PR=77.05%
RR=75.63%
K=3 gives best
performance

Author
(year)
Dinh et al.
(2008)
Edge detection and
Histogram Projection
Signboard
Texts
Low
complexity
algorithm.
Fan et al.
(2009)
Stroke features and
connected component
Caption text
images
PR=95.2%
RR= 94.5%
Color
information is
not fully used
Audithan
et al.
(2009)
Haar DWT, logical AND
operator, Dynamic
thresholding
Document
images
DR =94.8 % Independent
of contrast.
Angadi et
al. (2010)
Discrete Cosine
Transform and texture
features extraction
Natural
scene
images
DR=96.6% Inefficient for
complex
background

Author
(year)
Anoual et al.
(2010)
Edge detection,
texture features,
connected compo-
nent analysis
Complex
back-ground
images
PR=95%
RR=89%
Robust and
effective.
Kumar et al.
(2010)
CC Analysis ICDAR scene
images
PR=90%
RR=89%
Multilingual
Text
extraction.
Hassanz-
adeh et al.
(2011)
Morphological
operator, Decision
classifier
Logo in
document
images
PR=95.6%
Accuracy=86.
9%
A novel and
fast method
for logo
detection.
Chandra-
sekaran et
al. (2012)
Morphological
operation, and SVM
ICDAR 2003
dataset
PR=95%
RR=92%
Fails in case of
high
illumination

Author
(year)
Zaravi et al.
(2011)
DWT, Dynamic
thresholding, Region of
Interest (ROI)
Colored
books and
journal
covers
DR=91.20% Robust to
noise.
Zhang et al.
(2012)
Edge Enhancement and
CC
Web and
caption text
images
DR=92.4% Insensitive to
various types
of
background
noises.
Seeri et al.
(2012)
Median filter, Sobel edge
detector, connected
component labeling,
order static filter.
Kannada
text images
PR=84.21%
RR=83.16%
Accuracy =
75.77%
Fails to
extract very
small
characters.

Author
(Year)
Azadboni et
al. (2012)
FFT Domain Filtering
, SVM Classification,
K-means clustering
Scene text
images
DR= 98.10% Characters
having uniform
colour.
Anupama
et al.
(2013)
Morphology
operators, Histogram
Projection ( X and Y
histogram)
Handwritten
Telugu
document
images.
DR=98.54%
Accuracy
=98.29%
Fails in case of
touching
characters &
overlapping
lines.
Raj et al.
(2014)
CC based Natural Scene
Images
(Devanagari)
PR= 72.8%
RR=74.2 %
Fails for small
slanted/
curved text.

Block diagram of text extraction

Pre-processing
 Although the color component may differ in a text region, yet
it does not provide any information for text extraction.
 Also, the processing of three components in the RGB image is
difficult.
 Colored input image is converted into gray scale image.
Input image Gray Scale image

2D-DWT
 The level-1 2D DWT wavelet has been applied decomposition
to gray-scale images. It decomposes the image into four sub-
bands: 1 approximation sub band and 3 detailed sub bands.
 LL sub-band: Horizontal and vertical directions both are at low
frequencies.
 LH sub-band: Horizontal direction is at low frequency and
vertical direction is at high frequency.
 HL sub-band: Horizontal direction is at high frequency and
vertical direction is at low frequency.
 HH sub-band: Horizontal and vertical directions both are at
high frequencies.

2D- DWT sub-bands
(a) Approximation image (b) Vertical edges
(c)Horizontal edges (d) Diagonal edges

Feature extraction through sliding
window
 A small overlapped sliding window ( m×n) is scanned over
each high frequency sub bands.
 Zero padding is done, if required.
 The text area has irregular texture property to a certain extent,
so the text area can be looked as the special texture.
 Language independent statistical features i.e. mean and
standard deviation are calculated for the high frequency sub-
bands.

K-means clustering algorithm
 Unsupervised technique of classification.
 Divides the set of points into k clusters so that the intracluster
similarity is high but intercluster similarity is low.
 This is done by minimizing the sum of distances (euclidean
distance) between the points and the cluster centers.
 The clustering of image is done on the basis of texture features
of LH, HL, and HH sub-bands.
 This algorithm divides the image into k=3 clusters i.e. simple
background, complex background and text clusters. The
cluster that has the higher mean and standard deviation
values is the text cluster.
 For simple background, image is divided into 2 clusters i.e.
background and text cluster.

Morphological filter
 The text cluster is mapped on a mask image by replacing the
pixels in the text cluster by 1s and background cluster by 0s.
 Morphological dilation operation is employed to fill the gap in
the text region.
 Dilation, basically, adds pixels to the boundaries of objects in
an image. The number of pixels added to the objects in an
image depends on the size and shape of the structuring
element used to process the image.
 In case of complex background, there may be some non text
region in the mask image which is needed to be filter out.

Steps involved in text extraction
for simple background image
(a) Input image, (b) Gray scale image, (c) Output of 2D DWT,
(d) Background cluster, (e) text cluster, (f) Extracted text image.
(c)(a) (b)
(d) (e) (f)

Steps involved in text extraction
for complex background image
(a) Input image (b) Gray scale image
(c) Output of 2D DWT (d) Simple background cluster

(e) Complex background cluster (f) Text cluster
(g) Extracted text output

Block diagram of text recognition

 A template file has been created in which the letters A-Z, a-z,
and numbers 0-9 are stored in the form of images.
 A horizontal projection profile technique is used to isolate
each line of the text.
 The lines are segmented in the words by again scanning the
image horizontally and detecting the spaces in between the
words.
 Each line is scanned vertically to detect and isolate each
character within the line.
 Then the correlation of each character in the target image with
the template file character is found.

GUI of the proposed technique
(a) Input image

Text extracted from caption text
images
(a) multicolor text
(b) text of different size

Text extracted from outdoor scene
images

Extracted text output from indoor
scene images

Text extracted from scene images
with complex background

Text extracted from image with curved text
Text extracted from image with tilted text

Text extraction in Punjabi
(a) Multicolor document image
(b) Caption text image

Text extracted from multilingual
images

Parameters for performance
evaluation of text extraction
 Detection Rate (DR)= Ratio of the text regions correctly
detected by the algorithm to the total number of text regions.
 Precision Rate (PR)= Ratio of correctly detected characters to
the sum of correctly detected characters plus false positives.
 Recall Rate (RR)= Ratio of the correctly detected characters to
sum of correctly detected characters plus false negatives.

Dataset DR PR RR
ICDAR and Kaist 98.93% 97.69% 97.83%
Own dataset 98.99% 97.91% 98.27%
Wavelets DR PR RR
Haar 0.9979 0.9850 0.9835
db2 0.9942 0.9653 0.9497
bior1.3 0.9891 0.9618 0.8876
sym3 0.9894 0.9595 0.9131
coif1 0.9873 0.9364 0.9070
Comparative analysis of different wavelets
Performance analysis of proposed text extraction technique

Authors Method DR PR RR
Angadi et al.,
2010
Discrete Cosine Transform
and texture features
extraction
96.6% - -
Kumar et al.,
2010
Connected Component
Analysis
- 90% 89%
Azadboni et al.,
2012
FFT Domain Filtering , SVM
Classification, K-means
clustering
98.10% - -
Proposed
Method
DWT and K-means
Clustering
98.96% 97.8% 98.05%
Comparison of proposed text extraction technique with others

Output of text extraction and
recognition system
Sr.
No.
Input Image Extracted text
output
Recognized Text
1.
2.

Sr.
No.
Input Image Extracted text
output
Recognized Text
3.
4.

Parameter for text recognition
performance evaluation
Total Characters Falsely detected
characters
Accuracy (%)
Proposed
method
1000 49 95.10
Accuracy of text recognition system is defined as the ratio of
correctly recognized characters to the total number of
characters
Accuracy of proposed text recognition system

Conclusion
 Text is successfully extracted from complex color images i.e.
document text images, scene text images, caption text images,
multi-colored text images, newspaper images, book cover
images etc.
 The proposed technique is insensitive to font size, style, color
and alignment etc.
 Independent of language.
 The proposed method show that this method gives promising
results in terms of DR (98.96%), PR (97.8%), RR (98.05%), and
accuracy (95.1%) as compared to other methods.

Future scope
 Even though the performance of proposed system is excellent,
there is always scope for improvement.
 The proposed text extraction technique is inefficient for very
low contrast and high illumination images.
 The proposed system along with text to speech converter will
be helpful to blind people.
 The text after extraction and recognition can be translated
from one language to other and will help the people to
translate the text written on sign boards, street names etc. to
their native language.

References
 Angadi, S.A., & Kodabagi, M., Text region extraction from low resolution
natural scene images using texture features, IEEE 2nd International Advance
Computing Conference, Patiala, India, Feb 19-20, 2010, pp. 121-128.
 Anoual, H., Aboutajdine, D., Ensias, S.E., & Enset, A.J., Features extraction
for text detection and localization, 5th International Symposium on I/V
Communication and Mobile Network, Rabat, Sept. 30- Oct.2, 2010, pp. 1-4.
 Anupama, N., Rupa, C., & Reddy, E.S., 2013, Character segmentation for
telugu image document using multiple histogram projections, Global Journal
of Computer Science and Technology, Vol. 13, pp. 11-16.
 Audithan, S., & Chandrasekaran, RM., 2009, Document text extraction from
document images using haar discrete wavelet transform, European Journal of
Scientific Research, Vol. 36, pp. 502-512.
 Azadboni, M.K., & Behrad, A., Text detection and character extraction in
color images using FFT domain filtering and SVM classification, 6th
International Symposium on Telecommunications, Tehran, Nov. 6-8, 2012, pp.
794-799.

 Barina, d., 2011, Gabor wavelets in image processing.
Available: http://www.fit.vutbr.cz/research/pubs/index.php?file=%2Fpub
%2F9598%2Fprispevepdf&Id=9598.
 Chandrasekaran, R., Chandrasekaran, RM., & Natarajan, P., Text
localization and extraction in images using mathematical morphology and
SVM, IEEE International Conference On Advances in Engineering, Science
And Management, Nagapattinam, Tamil Nadu, March 30-31, 2012, pp. 55-
60.
 Dinh, T.N., Park, J., & Lee, G.S., Low-complexity text extraction in korean
signboards for mobile applications, IEEE International Conference on
Computer and Information Technology, Sydney, NSW, July 8-11, 2008, pp.
333-337.
 Eidheim, O.C., Introduction to mathematical morphology.
Available: https://www.idi.ntnu.no/emner/tdt4265/lectures/lecture3b.pdf
 Fan, W., Sun, J., Katsuyama, Y., Hotta, Y., & Naoi, S., Text detection in
images based on grayscale decomposition and stroke extraction, Chinese
Conference on Pattern Recognition, Nanjing, Nov. 4-6, 2009, pp. 1-4.

 Ghai, D., & Jain, N., 2013, Comparison of various text extraction
techniques for images- a review, International Journal of Graphics & Image
Processing, Vol. 3, pp. 210-218.
 Ham, Y.K., Kang, M.S., Chung, H.K., & Park, R.H., 1995, Recognition of
Raised Characters for Automatic Classification of Rubber Tires, Optical
Engineering, Vol. 34, pp. 102–109.
 Haritaoglu, I., Scene text extraction and translation for handheld devices,
Proceedings of IEEE Conference on Computer Vision and Pattern
Recognition, Hawaii, 2001, pp. 408–413.
 Hassanzadeh, S., & Pourghassem, H., Fast logo detection based on
morphological features in document image, IEEE 7th International
Colloquium on Signal Processing and its Applications, Penang, Mar 4- 6 ,
2011, pp. 283-286.
 Jain, A.K., & Zhong, Y., 1996, Page segmentation using texture analysis,
Pattern Recognition, Vol. 29, pp. 743–770.
 Jung, K., Kim, K.I., & Jain, A.K., 2004, Text information extraction in
images and video: a survey, Pattern Recognition, Vol. 37, pp. 977 – 997.

 Kanungo, T., Netanyahu, N.S., & Wu, A.Y., 2002, An efficient k-means
clustering algorithm: analysis and implementation, IEEE Transactions on
Pattern Analysis and Machine Intelligence, Vol. 24, pp. 881-892.
 Kim, D.S., & Chien, S.I., Automatic car license plate extraction using
modified generalized symmetry transform and image warping, Proceedings
of International Symposium on Industrial Electronics, Pusan, Jun 12-16,
2001, pp. 2022–2027.
 Kim, H.K., 1996, Efficient automatic text location method and content
based indexing and structuring of video database, Journal of Visual
Communication and Image Representation, Vol. 7, pp. 336-344.
 Kumar, M., Kim, Y.C., & Lee, G.S., Text detection using multilayer
separation in real scene images, 10th IEEE International Conference on
Computer and Information Technology, Bradford, June 29- July1, 2010, pp.
1413-1417.
 Lai, A.N., & Lee, G.S., Binarization by local k-means clustering for Korean
text extraction, IEEE Symposium on Signal Processing and Information
Technology, Sarajevo, Dec 16-19, 2008, pp. 117-122.

 Nathiya, N., & Pradeepa, K., Optical character recognition for scene text
detection, mining and recognition, IEEE International Conference on
Computational Intelligence and Computing Research, Enathi, Dec. 26-28,
2013, pp. 1-4.
 Raj, H., & Ghosh, R., Devanagari text extraction from natural scene images,
International Conference on Advances in Computing, Communications and
Informatics (ICACCI), IEEE, New Delhi, India, Sept. 24-27, 2014, pp.
513-517.
 Roy, S., Roy, P. P., Shivakumara, P., Louloudis, G., & Tan, C. L., HMM
based multi oriented text recognition in natural scene image, Second IAPR
Asian Conference on Pattern Recognition, Naha, Nov.5-8, 2013, pp. 288-
292.
 Seeri, S.V., Giraddi, S., & Prashant B.M., A novel approach for Kannada
text extraction, Proceedings of the International Conference on Pattern
Recognition, Informatics and Medical Engineering, Tamil Nadu, India, Mar
21-23, 2012, pp. 444-448.
 Song, Y., Liu, A., Pang, L., Lin, S., Zhang, Y. & Tang, S., A novel image
text extraction method based on k-means clustering, Seventh IEEE/ACIS
International Conference on Computer and Information Science, Portland,
May 14-16, 2008, pp. 185-190.
 Su, B., Lu, S., Phan, T.Q., & Tan, C.L., Character extraction in web image
for text recognition, 21st International Conference on Pattern Recognition,
Japan, Nov. 11-15, 2012, , pp. 3042-3045.

 Suen, C.Y., Lam, L., Guillevic, D., Strathy, N.W., 1996, Cheriet, M., Said,
J.N. & Fan, R., Bank check processing system, International Journal of
Imaging Systems and Technology, Vol. 7, pp. 392-403.
 Watanabe, Y., Okada, Y., Kim, Y.B., & Takeda, T., Translation camera,
Proceedings of International Conference on Pattern Recognition, Brisbane,
Australia, 1998, pp. 613–617.
 Yan, J., & Gao, X., 2014, Detection and recognition of text superimposed in
images base on layered method, Neurocomputing, Vol. 134, pp. 3–14.
 Yang, F., Qiu, Q., Bishop, M., & Wu, Q., Tag-assisted sentence
confabulation for intelligent text recognition, IEEE Symposium on
Computational Intelligence for Security and Defence Applications (CISDA),
Ottawa, July 11-13, 2012, pp. 1-7.
 Yao, J.L., Wang, Y.Q., Weng, L.B., & Yang, Y.P., Locating text based on
connected component and SVM, International Conference on Wavelet
Analysis and Pattern Recognition, Beijing, Nov 2-4, 2007, pp. 1418 - 1423.
 Ye, Q., Huang, Q., Gao, W., & Zhao, D., 2005, Fast and robust text
detection in images and video frames, Image and Vision Computing, Vol. 23,
pp. 565–576.
 Zhou, J., Lopresti, D., & Lei, Z., OCR for World Wide Web images,
Proceedings of SPIE on Document Recognition IV, April 3, 1997, pp. 58–66.

 Zhou, J., Lopresti, D., & Tasdizen, T., Finding text in color images,
Proceedings of SPIE on Document Recognition V, 1998, pp. 130–140.
 Zaravi, D., Rostami, H., Malahzaheh, A., & Mortazavi, S.S., 2011, Text
extraction using wavelet thresholding and new projection profile, World
Academy of Science, Engineering and Technology, Vol. 5, pp. 528-531.
 Zhang, Y., Wang, C., Xiao, B., & Shi, C., A new text extraction method
incorporating local information, International Conference on Frontiers in
Handwriting Recognition, Bari, Sept. 18-20, 2012, pp. 252-255.

Texture features based text extraction from images using DWT and K-means clustering

Texture features based text extraction from images using DWT and K-means clustering

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Texture features based text extraction from images using DWT and K-means clustering

Similar to Texture features based text extraction from images using DWT and K-means clustering (20)

Recently uploaded

Recently uploaded (20)

Texture features based text extraction from images using DWT and K-means clustering