Team Name :PATTERN CODER
Members :Amit Kumar
Contact Address : Room No. 272 , Kapili Hostel
Email id :
Institute : Indian Institute Of Technology ,Guwahati
An improved algorithm for
locating texts in camera
Table of Contents
2.Text detection algorithm
3.Flow diagram of the algorithm
Text data in images contain useful information. In this paper, we present an approach to
detect text in color images. The proposed approach is based on combination of edge
detection, connected component analysis at multiple resolutions. First, we utilize an
image edge detection algorithm to extract all possible text edge pixels. Dilation by a
specific structuring element is performed on the edge map. The dilation is followed by
erosion by a specific structuring element. Following some geometrical constraints we get
initial bounding boxes containing text regions. Then connected component analysis is
performed on corresponding binarized image to recover whole text portions.Finally,
multiresolution approach is used to make the approach applicable for large range of font
The retrieval of text information from color images has gained increasing attention in
recent years. Text appearing in images can provide very useful semantic information and
may be a good key to describe the image content. Text detection can be found in many
applications, such as road sign detection, map interpretation and engineering drawings
interpretations etc. Many papers about text detection from images have been
published[2,4,5,6,7]. Text detection generally can be classified into two categories:
Bottom-up methods: they segment images into regions and group character region into
Due to the difficulty of developing efficient segmentation algorithm for text in
complex background, the methods are not robust for detecting text in many camera based
Top-down methods: they first detect text regions in images using filters and then perform
bottom- up techniques inside the text regions. These methods are able to process more
complex images than bottom–up approaches. Top down methods are also divided into
Heuristic methods: they use heuristic filters
Machine learning methods: they use trained filters.
Shortcomings of many current methods include their inability to perform well in the
case of variant text orientation, size, language and low resolution image, where characters
may be touching.
2.Text detection algorithm:
2.1 Conversion of color image to grayscale image:
Colors in image can be converted to shades of gray by calculating the effective
brightness or luminance of the color and using this value to create a shade of gray that
matches the desired brightness.
2.2 Edge detection:
Edge detection is an important pre-processing step of our method. Using edge as the
prominent feature of our method gives us the opportunity to detect characters with
different fonts and colors since every character present strong edge despite its font or
color, in order to be readable. We used Canny edge detector for our purpose. Canny edge
detector takes grayscale image on input and returns bi-level image where non-
zero pixels mark detected edges.Canny uses Sobel masks in order to find the edge
magnitude of the image, in gray scale, and then uses no-Maxima suppression and
hysteresis thresholding. With these two post–processing operations Canny edge
detector manage to remove nonmaxima pixels, preserving the connectivity of the
Dilation is one of the two basic operators in the area of mathematical morphology, the
other being erosion. It is typically applied to binary images. The basic effect of the
operator on a binary image is to gradually enlarge the boundaries of regions of
foreground pixels (i.e. white pixels, typically). Thus areas of foreground pixels grow in
size while holes within those regions become smaller. Here, we are using 5x21 cross-
shaped structuring element. Dilation by this structuring element is performed to connect
the character contours of every text line.
Erosion is one of the two basic operators in the area of mathematical morphology, the
other being dilation. It is typically applied to binary images. The basic effect of the
operator on a binary image is to erode away the boundaries of regions of foreground
pixels (i.e. white pixels, typically). Thus areas of foreground pixels shrink in size, and
holes within those areas become larger. Here, we are using 11x45 cross-shaped
It results in removing the noise and smoothing the shape of the candidate text areas. By
doing this erosion process every component with height less than 11 or width less than 45
2.5 Computation of initial bounding boxes of the candidate text areas:
Now after erosion step we compute the bounding boxes containing the white pixel
portion of the image. Bounding boxes just contain the 8-connected white pixel
components inside them. We place bounding boxes on the corresponding color image.
So after this step we get the bounding boxes on the corresponding color image.
2.6 Applying geometrical constraints:
Now we discard some boxes on the following geometrical constraints:
1) Height is lower than a threshold (set to 12)
2) Height is greater than a threshold (set to 48)
3) Ratio of width to height is lower than a threshold (set to 1.5)
After this step we reduce number of bounding boxes.
2.7 Multiresolution analysis:
The whole algorithm till now is applied in a multiresolution fashion to ensure text
detection with size variability. In other words the methodology described above is
applied to image in different scales and finally results are fused to initial resolution. The
size of the element for the morphological operations (dilation, erosion) and the
geometrical constraints give to the algorithm the ability to detect text in a specific range
of character sizes(12-48 pixels). To overcome this problem we adopt multiresolution
approach .The algorithm above is applied to the images in different resolutions and
finally the results are fused to initial resolution. In this way we get a set of bounding
boxes on the color image for each resolution. We took resolution range from 0.1 to 1.5 at
the gapping of 0.1.For example if we have resolution parameter m, then fusing results to
the original resolution means that, size of the resized bounding box(x coordinate, y
coordinate, width, height) will be (x coordinate/m, y coordinate/m, width/m, height/m).
Similarly we do this for all resolutions in the range, resize the bounding boxes and then
fuse them on original image.
2.8 Selection of final bounding boxes:
We discard a smaller bounding box, if it is inside the bigger one. This way we reduce
drastically the number of bounding boxes. And these bounding boxes constitute final
region of interest. The reason behind this step is that, by doing this we can benefit in
terms of running time. Because now we have less number of bounding boxes and that
means less object to deal with without missing any significant text regions.
Now we binarize the grayscale image to get the corresponding binarized image. We used
Otsu’s method to perform thresholding, or the reduction of a gray level image to a binary
2.10 Connected component analysis:
From the bounding boxes obtained in the previous step we perform connected component
analysis to recover the whole text regions. While computing bounding boxes some part of
a character fall outside the bounding box .In order to obtain the whole character from that
left part inside the bounding box we perform the connected component analysis, to obtain
the whole part.
We see corresponding connected component in Otsu binarized image. If any pixel that is
not the background and that falls inside the bounding box, we generate the connected
component containing that particular pixel from the corresponding Otsu binarized image.
2.11 Discarding some connected components on the basis of area:
Here by area we simply mean
The number of pixels that constitute the particular connected component. So the number
of pixels for the particular component is the area of the particular component. And the
area of image is taken as width*length. Width and length both are in pixel dimensions.
Based on suitable threshold we discard some components if their areas are greater than
threshold value. They are discarded also if their areas are less than a suitable threshold
value. Threshold values are taken as a suitable percentage (fraction) of the whole image
area. This way we refine our areas of interest and get more specific areas of interest. Now
there is a problem that is due to binarization. What happens exactly is that, while
performing binarization some text portions like those which are against white
background or against more intense background, get lost and they become black in the
binarization process and they don’t participate in the further processing. To get rid of
this problem we invert the binarized image obtained after Otsu’s binarization step.
2.12 Inverting the binarized image obtained after the 2.9th Step and
performing the steps 2.10 and 2.11 on them.
By inverting the image we simply mean that make the white pixels black and black
pixels white. The we perform similar operation of 2.10th , 2.11th steps on the inverted
2.13 Adding the images obtained in steps 2.11 and 2.12
Now we add the images obtained after the 2.11th and 2.12th step to get the final result
image. By adding the images we simply mean that if either of the corresponding pixel in
two images are white make that white in resulting image and if neither of the
corresponding pixels are white then make that black in resulting image. This way we get
final image that is black and white.
In binarized resulting image text are in white pixels against black background.
3.Flow diagram of the algorithm:
Original Gray scale
These steps are performed for each resolution
After this we get bounding boxes for each resolution. Now
We resize them, in order to fuse results to original resolution as explained in step 2.7.
gray scale component
images C is the
C=a+b final image
We implemented this algorithm in MATLAB 6.1 under Microsoft Windows XP
Professional (5.1, Build 2600)
Processor: Intel(R) Pentium(R) D CPU
2.80 GHz (2 CPUs)
Memory: 1014MB RAM
We tested many color images which include different types of texts. Our algorithm
successfully detects text locations in these images. Our algorithm successfully detects
text in Indian languages script as well as English language script. Here we are showing
two example images and their outputs. In first example image texts are in Bangla. In
second example image texts are in English. In the output images we can see the detected
texts. The detected texts are in white against the black background. These two example
images are natural scene images.
Figure 1. Example image 1
Figure 2. Output image 1
Figure 3. Example image 2
Figure 4. Output image 2
In the results obtained, we can see the false alarms, i.e. white regions which are not text
actually. These can be removed in text recognition step because these regions represent
no text so they are not recognized.
This algorithm works fine in case of good contrast images, especially where texts have
good contrast against the background.
This work has been done at the Computer Vision and Pattern Recognition Unit, Indian
Statistical Institute, Kolkata under direct supervision of Ujjwal Bhattacharya.
 Rainer Lienhart and Frank Stuber, “Automatic text recognition in digital videos”,
Technical Report / Department for Mathematics and Computer Science, University of
Mannheim ; TR-1995-036
 Du, Yingzi, Chang, Chein-I Thouin, Paul D. “Automated system for text detection in
individual video Images”, Journal of Electronic Imaging, 12(3), 410 - 422. 2003.
 N.Otsu, "A Threshold Selection Method from Gray-Level Histogram," IEEE Trans.
Man, and Cybernetics, vol. 9, pp. 62-66, 1979.
 C. Li, X. Ding, and Y. Wu, “Automatic text location in natural scene
images,” Proc. Sixth International Conference on Document Analysis and Recognition,
pp.1069–1073, Sept. 2001.
 K. In Kim, K. Jung, and J. Hyung, “Texture-based approach for text detection in
images using support vector machines and continuously adaptive mean shift algorithm,”
IEEE Trans. Pattern Anal. Mach.Intell., vol.25, no.12, pp.1631
1639, Dec. 2003.
 X. Tang, X. Gao, J. Liu, and H. Zhang, “A spatial-temporal approach for video
caption detection and recognition,” IEEE Trans. Neural Netw., vol.13, no.4, pp.961–971,
 O. Hori and T. Mita, “A robust video text extraction method for
character recognition,” IEICE Trans. Inf. & Syst. (Japanese Edition),
vol.J84-D-II, no.8, pp.1800–1808, Aug. 2001.
 Yangxing LIU, Satoshi GOTO, Takeshi IKENAGA
“A Contour-Based Robust Algorithm for TextDetection in Color
IEICE TRANS. INF. & SYST., VOL.E89–D, NO.3 MARCH 2006
 M. Anthimopoulos, M. Gatos, I. Pratikakis "Multiresolution text detection in video
frames“, Second international conference on computer vision theory and applications
(VISAPP).Barcelona, Spain March 8-11, 2007