Team Name       :PATTERN CODER


Members        :Amit Kumar



Contact Address : Room No. 272 , Kapili Hostel
            ...
An improved algorithm for
 locating texts in camera
     captured images
Table of Contents

1.Introduction
2.Text detection algorithm
3.Flow diagram of the algorithm
4.Experimental results
5.Conc...
Abstract:


Text data in images contain useful information. In this paper, we present an approach to
detect text in color ...
2.Text detection algorithm:

2.1 Conversion of color image to grayscale image:

Colors in image can be converted to shades...
2.5 Computation of initial bounding boxes of the candidate text areas:

Now after erosion step we compute the bounding box...
2.10 Connected component analysis:
From the bounding boxes obtained in the previous step we perform connected component
an...
2.12 Inverting the binarized image obtained after the 2.9th Step and
performing the steps 2.10 and 2.11 on them.


   By i...
3.Flow diagram of the algorithm:




         Original        Gray scale
         image           image




        Dilati...
4.Experimental results:


We implemented this algorithm in MATLAB 6.1 under Microsoft Windows XP
Professional (5.1, Build ...
Figure 3. Example image 2




                           Figure 4. Output image 2




5.Conclusion:
In the results obtaine...
Acknowledgement
This work has been done at the Computer Vision and Pattern Recognition Unit, Indian
Statistical Institute,...
Upcoming SlideShare
Loading in …5
×

Sample Paper Techscribe

1,845 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,845
On SlideShare
0
From Embeds
0
Number of Embeds
230
Actions
Shares
0
Downloads
53
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Sample Paper Techscribe

  1. 1. Team Name :PATTERN CODER Members :Amit Kumar Contact Address : Room No. 272 , Kapili Hostel IIT Guwahati North Guwahati, Assam-781039. Email id : amit.k@iitg.ernet.in,amit.k203@gmail.com Institute : Indian Institute Of Technology ,Guwahati
  2. 2. An improved algorithm for locating texts in camera captured images
  3. 3. Table of Contents 1.Introduction 2.Text detection algorithm 3.Flow diagram of the algorithm 4.Experimental results 5.Conclusion 6.References
  4. 4. Abstract: Text data in images contain useful information. In this paper, we present an approach to detect text in color images. The proposed approach is based on combination of edge detection, connected component analysis at multiple resolutions. First, we utilize an image edge detection algorithm to extract all possible text edge pixels. Dilation by a specific structuring element is performed on the edge map. The dilation is followed by erosion by a specific structuring element. Following some geometrical constraints we get initial bounding boxes containing text regions. Then connected component analysis is performed on corresponding binarized image to recover whole text portions.Finally, multiresolution approach is used to make the approach applicable for large range of font sizes. 1. Introduction: The retrieval of text information from color images has gained increasing attention in recent years. Text appearing in images can provide very useful semantic information and may be a good key to describe the image content. Text detection can be found in many applications, such as road sign detection, map interpretation and engineering drawings interpretations etc. Many papers about text detection from images have been published[2,4,5,6,7]. Text detection generally can be classified into two categories: Bottom-up methods: they segment images into regions and group character region into words[1]. Due to the difficulty of developing efficient segmentation algorithm for text in complex background, the methods are not robust for detecting text in many camera based images. Top-down methods: they first detect text regions in images using filters and then perform bottom- up techniques inside the text regions[2]. These methods are able to process more complex images than bottom–up approaches. Top down methods are also divided into two categories: Heuristic methods: they use heuristic filters Machine learning methods: they use trained filters. Shortcomings of many current methods include their inability to perform well in the case of variant text orientation, size, language and low resolution image, where characters may be touching.
  5. 5. 2.Text detection algorithm: 2.1 Conversion of color image to grayscale image: Colors in image can be converted to shades of gray by calculating the effective brightness or luminance of the color and using this value to create a shade of gray that matches the desired brightness. 2.2 Edge detection: Edge detection is an important pre-processing step of our method. Using edge as the prominent feature of our method gives us the opportunity to detect characters with different fonts and colors since every character present strong edge despite its font or color, in order to be readable. We used Canny edge detector for our purpose. Canny edge detector takes grayscale image on input and returns bi-level image where non- zero pixels mark detected edges.Canny uses Sobel masks in order to find the edge magnitude of the image, in gray scale, and then uses no-Maxima suppression and hysteresis thresholding. With these two post–processing operations Canny edge detector manage to remove nonmaxima pixels, preserving the connectivity of the contours. 2.3 Dilation: Dilation is one of the two basic operators in the area of mathematical morphology, the other being erosion. It is typically applied to binary images. The basic effect of the operator on a binary image is to gradually enlarge the boundaries of regions of foreground pixels (i.e. white pixels, typically). Thus areas of foreground pixels grow in size while holes within those regions become smaller. Here, we are using 5x21 cross- shaped structuring element. Dilation by this structuring element is performed to connect the character contours of every text line. 2.4 Erosion: Erosion is one of the two basic operators in the area of mathematical morphology, the other being dilation. It is typically applied to binary images. The basic effect of the operator on a binary image is to erode away the boundaries of regions of foreground pixels (i.e. white pixels, typically). Thus areas of foreground pixels shrink in size, and holes within those areas become larger. Here, we are using 11x45 cross-shaped structuring element. It results in removing the noise and smoothing the shape of the candidate text areas. By doing this erosion process every component with height less than 11 or width less than 45 are suppressed.
  6. 6. 2.5 Computation of initial bounding boxes of the candidate text areas: Now after erosion step we compute the bounding boxes containing the white pixel portion of the image. Bounding boxes just contain the 8-connected white pixel components inside them. We place bounding boxes on the corresponding color image. So after this step we get the bounding boxes on the corresponding color image. 2.6 Applying geometrical constraints: Now we discard some boxes on the following geometrical constraints: 1) Height is lower than a threshold (set to 12) 2) Height is greater than a threshold (set to 48) 3) Ratio of width to height is lower than a threshold (set to 1.5) After this step we reduce number of bounding boxes. 2.7 Multiresolution analysis: The whole algorithm till now is applied in a multiresolution fashion to ensure text detection with size variability[9]. In other words the methodology described above is applied to image in different scales and finally results are fused to initial resolution. The size of the element for the morphological operations (dilation, erosion) and the geometrical constraints give to the algorithm the ability to detect text in a specific range of character sizes(12-48 pixels). To overcome this problem we adopt multiresolution approach .The algorithm above is applied to the images in different resolutions and finally the results are fused to initial resolution. In this way we get a set of bounding boxes on the color image for each resolution. We took resolution range from 0.1 to 1.5 at the gapping of 0.1.For example if we have resolution parameter m, then fusing results to the original resolution means that, size of the resized bounding box(x coordinate, y coordinate, width, height) will be (x coordinate/m, y coordinate/m, width/m, height/m). Similarly we do this for all resolutions in the range, resize the bounding boxes and then fuse them on original image. 2.8 Selection of final bounding boxes: We discard a smaller bounding box, if it is inside the bigger one. This way we reduce drastically the number of bounding boxes. And these bounding boxes constitute final region of interest. The reason behind this step is that, by doing this we can benefit in terms of running time. Because now we have less number of bounding boxes and that means less object to deal with without missing any significant text regions. 2.9 Binarization: Now we binarize the grayscale image to get the corresponding binarized image. We used Otsu’s method to perform thresholding, or the reduction of a gray level image to a binary image[3].
  7. 7. 2.10 Connected component analysis: From the bounding boxes obtained in the previous step we perform connected component analysis to recover the whole text regions. While computing bounding boxes some part of a character fall outside the bounding box .In order to obtain the whole character from that left part inside the bounding box we perform the connected component analysis, to obtain the whole part. We see corresponding connected component in Otsu binarized image. If any pixel that is not the background and that falls inside the bounding box, we generate the connected component containing that particular pixel from the corresponding Otsu binarized image. 2.11 Discarding some connected components on the basis of area: Here by area we simply mean The number of pixels that constitute the particular connected component. So the number of pixels for the particular component is the area of the particular component. And the area of image is taken as width*length. Width and length both are in pixel dimensions. Based on suitable threshold we discard some components if their areas are greater than threshold value. They are discarded also if their areas are less than a suitable threshold value. Threshold values are taken as a suitable percentage (fraction) of the whole image area. This way we refine our areas of interest and get more specific areas of interest. Now there is a problem that is due to binarization. What happens exactly is that, while performing binarization some text portions like those which are against white background or against more intense background, get lost and they become black in the binarization process and they don’t participate in the further processing. To get rid of this problem we invert the binarized image obtained after Otsu’s binarization step.
  8. 8. 2.12 Inverting the binarized image obtained after the 2.9th Step and performing the steps 2.10 and 2.11 on them. By inverting the image we simply mean that make the white pixels black and black pixels white. The we perform similar operation of 2.10th , 2.11th steps on the inverted image. 2.13 Adding the images obtained in steps 2.11 and 2.12 Now we add the images obtained after the 2.11th and 2.12th step to get the final result image. By adding the images we simply mean that if either of the corresponding pixel in two images are white make that white in resulting image and if neither of the corresponding pixels are white then make that black in resulting image. This way we get final image that is black and white. In binarized resulting image text are in white pixels against black background.
  9. 9. 3.Flow diagram of the algorithm: Original Gray scale image image Dilation Canny edge detection Erosion Bounding boxes selection Geometric -al constraint These steps are performed for each resolution value(0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0,1.1,1.2,1.3,1.4,1.5). After this we get bounding boxes for each resolution. Now We resize them, in order to fuse results to original resolution as explained in step 2.7. Binarize Connected gray scale component image analysis Connected Invert component binarized analysis(b) image Add two images C is the C=a+b final image
  10. 10. 4.Experimental results: We implemented this algorithm in MATLAB 6.1 under Microsoft Windows XP Professional (5.1, Build 2600) Processor: Intel(R) Pentium(R) D CPU 2.80 GHz (2 CPUs) Memory: 1014MB RAM We tested many color images which include different types of texts. Our algorithm successfully detects text locations in these images. Our algorithm successfully detects text in Indian languages script as well as English language script. Here we are showing two example images and their outputs. In first example image texts are in Bangla. In second example image texts are in English. In the output images we can see the detected texts. The detected texts are in white against the black background. These two example images are natural scene images. Figure 1. Example image 1 Figure 2. Output image 1
  11. 11. Figure 3. Example image 2 Figure 4. Output image 2 5.Conclusion: In the results obtained, we can see the false alarms, i.e. white regions which are not text actually. These can be removed in text recognition step because these regions represent no text so they are not recognized. This algorithm works fine in case of good contrast images, especially where texts have good contrast against the background.
  12. 12. Acknowledgement This work has been done at the Computer Vision and Pattern Recognition Unit, Indian Statistical Institute, Kolkata under direct supervision of Ujjwal Bhattacharya. 6.References: [1] Rainer Lienhart and Frank Stuber, “Automatic text recognition in digital videos”, Technical Report / Department for Mathematics and Computer Science, University of Mannheim ; TR-1995-036 [2] Du, Yingzi, Chang, Chein-I Thouin, Paul D. “Automated system for text detection in individual video Images”, Journal of Electronic Imaging, 12(3), 410 - 422. 2003. [3] N.Otsu, "A Threshold Selection Method from Gray-Level Histogram," IEEE Trans. Systems, Man, and Cybernetics, vol. 9, pp. 62-66, 1979. [4] C. Li, X. Ding, and Y. Wu, “Automatic text location in natural scene images,” Proc. Sixth International Conference on Document Analysis and Recognition, pp.1069–1073, Sept. 2001. [5] K. In Kim, K. Jung, and J. Hyung, “Texture-based approach for text detection in images using support vector machines and continuously adaptive mean shift algorithm,” IEEE Trans. Pattern Anal. Mach.Intell., vol.25, no.12, pp.1631 1639, Dec. 2003. [6] X. Tang, X. Gao, J. Liu, and H. Zhang, “A spatial-temporal approach for video caption detection and recognition,” IEEE Trans. Neural Netw., vol.13, no.4, pp.961–971, July 2002. [7] O. Hori and T. Mita, “A robust video text extraction method for character recognition,” IEICE Trans. Inf. & Syst. (Japanese Edition), vol.J84-D-II, no.8, pp.1800–1808, Aug. 2001. [8] Yangxing LIU, Satoshi GOTO, Takeshi IKENAGA “A Contour-Based Robust Algorithm for TextDetection in Color Images” IEICE TRANS. INF. & SYST., VOL.E89–D, NO.3 MARCH 2006 [9] M. Anthimopoulos, M. Gatos, I. Pratikakis "Multiresolution text detection in video frames“, Second international conference on computer vision theory and applications (VISAPP).Barcelona, Spain March 8-11, 2007

×