Hindi Scene Text
Recognition
Guide: Dr. Gaurav Harit
Surya Yadav, Vikas Yadav, Vikas Goyal
Objective
Create a system that detect and
recognize characters from natural scene
images containing Devanagari text.
Motivation
 Hindi is the most spoken language in India and third most spoken
language in the world.
 Most of the websites in Devnagri use images to represent text. There is
need to index such image based on the text in them so that they can be
easily searched.
 Tourist often face problem in India. So there is demand for automated
system that understand natural scene images and provide translated
information.
 Scene text like shop name, company name, traffic information, road signs
and other natural scene board display are important to be recognized
and processed.
Steps:
Natural
Scene
Image
Text block
detection
Word and
character
segmentation
Error
Correction
Feature
Detection and
classification
Output
Text Block Detection
Steps:
Image Gray scale
Image
Canny edge
map
Morphological
closing
Use of similarity
measures to
find text region
missed in
previous step
Use of Script
Specific Rules
Verification of
uniform
thickness
Connected
Component
region
Extraction
Input Image
Gray Image
Canny Edge Map
We compute canny edge map of gray image so as to get
the connected components.
Distance Transform of a binary image
Each pixel in the image is set to a value equal to distance from nearest background pixel
Computation of Stroke Thickness
 For each pixel with non zero value in distance
transformed image if the pixel is local maxima around
3x3 window centered at that pixel we store it in a list
 We compute the mean and variance of values in the
list.
 If mean value is greater than twice the standard
deviation then we decide that thickness of underlying
stroke transform is nearly uniform and select the sub
image as a candidate text region and draw the
bounding box.
Condition based on geometry
For each selected region we get in previous step we first
test it against these set of rules.
1. Aspect ratio of text region should vary between 0.1 to
10.
2. Both height and width of candidate text region cannot
be larger than half of the corresponding size of input
image.
3. Height of candidate text region should be greater than
10 pixels.
Overlapping problem
 There were many bounding box overlapped with each
other.
 Overlap between two bounding box of adjacent text
region should not be greater than 30% of either.
 For solving this issue we merge each pair of bounding
box which have intersection area greater than some
threshold value.
After applying geometry condition and
solving overlapping problem
Sobel Filtering
 Now we use Sobel edge detection algorithm to detect possible horizontal and
possible vertical lines.
Detection of head lines
 For each above region we compute probabilistic Hough transform of the image
in the previous step that is after Horizontal Sobel filtering of image to obtain
characteristic horizontal headlines in Devanagari texts.
 Necessary condition for selection of member as candidate headline is that it
should lie in the upper half part of bounding box.
Detection of vertical lines
 Final decision of existence of possible head line among the possible
horizontal lines is based on computation of vertical Hough lines.
 We compute vertical lines by again applying Hough transform with lower
threshold value as they are not as prominent as horizontal.
 If majority of vertical lines lie below member of horizontal line, the
corresponding horizontal line will be treated as headline.
Detected Horizontal and vertical Lines
Output Image
Character Segmentation
(Next Proposed step)
 Applying Sobel Filter only in one direction that is in vertical direction
removes the headline from candidate region.
 After the removal of headline in each of the bounding box we segment the
word based on vertical histogram analysis.
Next Step ………………Phase ii
 After headline removal we perform Character
Segmentation in selected image.
 After the character segmentation of image we get
each particular characters of Devanagari Script.
 For each character we then perform character
recognition.
Segmentation
Guide: Dr. Gaurav harit
Vikas Yadav, Vikas Goyal, Surya Yadav
Previous Work
 Until now we are able to get bounding box around words.
Segmentation
Character
segmentation from
middle and lower
zone
Baseline
Detection
Character
segmentation from
upper and middle-
lower zone
Headline
Detection
Obtain skew
corrected image
Obtain skew angle
by detecting near
horizontal line in
upper half of image
Obtain thin
image
Conversion of text
to black and
background to
white
Text and
background
separation
Combine
cluster from
both method
Otsu’s
threshholding on
pixels not
normalized
K-mean clustering
on normalized
pixel
RGB
Normalization
where needed
Image
Text and Background Detection
 Converting the image into a binary image by applying popular global or
local thresholding method cannot segment the text from the background
properly.
 Therefore, we applied combination of otsu’s thresholding and unsupervised
k mean clustering to cluster different colour regions in an image.
 Often scene image texts are effected by varying lightness. To handle this
lightness effect on an image we normalize the RGB values of an image
before implementing K-means clustering. But we do not normalize those
pixels where the pixel have near gray RGB values.
 For each pixel we check
(max(R, G, B) - min(R,G,B)/ max(R,G,B)) > 0.2
threshold value 0.2 is selected to filter out the RGB values having near gray
values.
 For the set of pixels not satisfying above criteria, we convert RGB values to
gray and perform otsu’s threshholding.
 For the set pixels satisfying above criteria, RGB normalization is carried out
on this set to remove the lightness effect from those pixel, keeping color
information intact.
 Perform K-mean clustering after normalizing the set satisfying criteria to
obtain text and background separately.
 Combine the clusters from otsu’s thresholding and K-mean clustering to
obtain text and background clusters.
Skew Correction
 Apply thinning algorithm on text region to obtain skeleton image.
 Use Hough transform to obtain all line segments in the upper half of image
with slopes less than 65o.
 If the length of the longest line segment among them is greater than an
empirically selected threshold value, it is decided as the headline.
 If this headline is not parallel to the x-axis then its skew is corrected by
rotating the word image.
(i) Skeleton image obtained for detecting headline for skew correction
Headline Detection
 In order to segment the characters we need to detect the thick headline.
 Compute the projection profile by row-wise sum of gray values for each
row in the upper half of word image.
 Scan the normalized projection profiles of successive rows in the upward
direction starting from the spine and stop scanning when this value drops to
less than a pre-defined threshold value. This row of the word image is
considered as the upper boundary of the headline.
 Similarly, we scan these projection profile values downward starting from
the spine and the row, for which this value drops to less than the same
threshold value, is considered as the lower boundary of the headline.
Character Segmentation
 Use the region growing method to extract the individual characters or their
parts from the binarized and skew corrected word image.
 Locate the lowest and leftmost black pixel in B, and consider it as the seed
point for region growing module.
 The current segment is extracted using the standard region growing
approach based on 8-neighborhood. The stopping criteria for the
implementation of region growing is either
(i) reach the upper or lower boundary of the thick headline
or (ii) reach at a white pixel.
 The extraction of the current segment is continued until no pixel is left to visit
satisfying the above.
Appending local headline
 Append the part of the headline to the above extracted segment as
follows.
 The top left and top right pixels of this segment lie on the lower boundary of
the headline and the portion of the thick headline just above these two
pixels are appended to the segment before its extraction.
 Repeat until there is no black pixel left.
Baseline Detection
 For baseline detection module we feed all the segments of the middle-lower
zone which either hang from the headline or from immediate below (at most 0.2
times the height of the middle-lower region) the headline.
 Find the respective heights hi of each segment and then normalize it to
hi
’ where 0< hi
’<10. Now find
hmin = min{ hi
’ | hi
’ >6.0 }
Next we find
h* = maxi {hi
’ | hi
’ > hmin & hi
’ < floor(hmin) + 1}
 The horizontal line through the bottom most pixel of the segment with
normalized height h* is the baseline
(I) Input image
(ii) Image obtained after applying K-mean
clustering and Otsu's threshholding and skew
correction.
(iii) Segments obtained after character segmentation
(I) Input image
(ii) Image obtained after applying K-mean
clustering and Otsu's threshholding and skew
correction.
(iii) Segments obtained after character segmentation
References
 Prakriti Banik , Ujjwal Bhattacharya, Swapan K. Parui. Segmentation of
Bangla Words in Scene Images.
 U. Bhattacharya, S. K. Parui, and S. Mondal. Devanagari and bangla text
extraction from natural scene images. Proc. of Int. Conf. on Document
Analysis and Recognition, pages 171{175, 2009.

Presen_Segmentation

  • 1.
    Hindi Scene Text Recognition Guide:Dr. Gaurav Harit Surya Yadav, Vikas Yadav, Vikas Goyal
  • 2.
    Objective Create a systemthat detect and recognize characters from natural scene images containing Devanagari text.
  • 3.
    Motivation  Hindi isthe most spoken language in India and third most spoken language in the world.  Most of the websites in Devnagri use images to represent text. There is need to index such image based on the text in them so that they can be easily searched.  Tourist often face problem in India. So there is demand for automated system that understand natural scene images and provide translated information.  Scene text like shop name, company name, traffic information, road signs and other natural scene board display are important to be recognized and processed.
  • 4.
  • 5.
  • 6.
    Steps: Image Gray scale Image Cannyedge map Morphological closing Use of similarity measures to find text region missed in previous step Use of Script Specific Rules Verification of uniform thickness Connected Component region Extraction
  • 7.
  • 8.
  • 9.
    Canny Edge Map Wecompute canny edge map of gray image so as to get the connected components.
  • 10.
    Distance Transform ofa binary image Each pixel in the image is set to a value equal to distance from nearest background pixel
  • 11.
    Computation of StrokeThickness  For each pixel with non zero value in distance transformed image if the pixel is local maxima around 3x3 window centered at that pixel we store it in a list  We compute the mean and variance of values in the list.  If mean value is greater than twice the standard deviation then we decide that thickness of underlying stroke transform is nearly uniform and select the sub image as a candidate text region and draw the bounding box.
  • 12.
    Condition based ongeometry For each selected region we get in previous step we first test it against these set of rules. 1. Aspect ratio of text region should vary between 0.1 to 10. 2. Both height and width of candidate text region cannot be larger than half of the corresponding size of input image. 3. Height of candidate text region should be greater than 10 pixels.
  • 13.
    Overlapping problem  Therewere many bounding box overlapped with each other.  Overlap between two bounding box of adjacent text region should not be greater than 30% of either.  For solving this issue we merge each pair of bounding box which have intersection area greater than some threshold value.
  • 14.
    After applying geometrycondition and solving overlapping problem
  • 15.
    Sobel Filtering  Nowwe use Sobel edge detection algorithm to detect possible horizontal and possible vertical lines.
  • 16.
    Detection of headlines  For each above region we compute probabilistic Hough transform of the image in the previous step that is after Horizontal Sobel filtering of image to obtain characteristic horizontal headlines in Devanagari texts.  Necessary condition for selection of member as candidate headline is that it should lie in the upper half part of bounding box.
  • 17.
    Detection of verticallines  Final decision of existence of possible head line among the possible horizontal lines is based on computation of vertical Hough lines.  We compute vertical lines by again applying Hough transform with lower threshold value as they are not as prominent as horizontal.  If majority of vertical lines lie below member of horizontal line, the corresponding horizontal line will be treated as headline.
  • 18.
  • 19.
  • 20.
    Character Segmentation (Next Proposedstep)  Applying Sobel Filter only in one direction that is in vertical direction removes the headline from candidate region.  After the removal of headline in each of the bounding box we segment the word based on vertical histogram analysis.
  • 21.
    Next Step ………………Phaseii  After headline removal we perform Character Segmentation in selected image.  After the character segmentation of image we get each particular characters of Devanagari Script.  For each character we then perform character recognition.
  • 22.
    Segmentation Guide: Dr. Gauravharit Vikas Yadav, Vikas Goyal, Surya Yadav
  • 23.
    Previous Work  Untilnow we are able to get bounding box around words.
  • 24.
    Segmentation Character segmentation from middle andlower zone Baseline Detection Character segmentation from upper and middle- lower zone Headline Detection Obtain skew corrected image Obtain skew angle by detecting near horizontal line in upper half of image Obtain thin image Conversion of text to black and background to white Text and background separation Combine cluster from both method Otsu’s threshholding on pixels not normalized K-mean clustering on normalized pixel RGB Normalization where needed Image
  • 25.
    Text and BackgroundDetection  Converting the image into a binary image by applying popular global or local thresholding method cannot segment the text from the background properly.  Therefore, we applied combination of otsu’s thresholding and unsupervised k mean clustering to cluster different colour regions in an image.  Often scene image texts are effected by varying lightness. To handle this lightness effect on an image we normalize the RGB values of an image before implementing K-means clustering. But we do not normalize those pixels where the pixel have near gray RGB values.
  • 26.
     For eachpixel we check (max(R, G, B) - min(R,G,B)/ max(R,G,B)) > 0.2 threshold value 0.2 is selected to filter out the RGB values having near gray values.  For the set of pixels not satisfying above criteria, we convert RGB values to gray and perform otsu’s threshholding.  For the set pixels satisfying above criteria, RGB normalization is carried out on this set to remove the lightness effect from those pixel, keeping color information intact.  Perform K-mean clustering after normalizing the set satisfying criteria to obtain text and background separately.  Combine the clusters from otsu’s thresholding and K-mean clustering to obtain text and background clusters.
  • 27.
    Skew Correction  Applythinning algorithm on text region to obtain skeleton image.  Use Hough transform to obtain all line segments in the upper half of image with slopes less than 65o.  If the length of the longest line segment among them is greater than an empirically selected threshold value, it is decided as the headline.  If this headline is not parallel to the x-axis then its skew is corrected by rotating the word image. (i) Skeleton image obtained for detecting headline for skew correction
  • 28.
    Headline Detection  Inorder to segment the characters we need to detect the thick headline.  Compute the projection profile by row-wise sum of gray values for each row in the upper half of word image.  Scan the normalized projection profiles of successive rows in the upward direction starting from the spine and stop scanning when this value drops to less than a pre-defined threshold value. This row of the word image is considered as the upper boundary of the headline.  Similarly, we scan these projection profile values downward starting from the spine and the row, for which this value drops to less than the same threshold value, is considered as the lower boundary of the headline.
  • 29.
    Character Segmentation  Usethe region growing method to extract the individual characters or their parts from the binarized and skew corrected word image.  Locate the lowest and leftmost black pixel in B, and consider it as the seed point for region growing module.  The current segment is extracted using the standard region growing approach based on 8-neighborhood. The stopping criteria for the implementation of region growing is either (i) reach the upper or lower boundary of the thick headline or (ii) reach at a white pixel.  The extraction of the current segment is continued until no pixel is left to visit satisfying the above.
  • 30.
    Appending local headline Append the part of the headline to the above extracted segment as follows.  The top left and top right pixels of this segment lie on the lower boundary of the headline and the portion of the thick headline just above these two pixels are appended to the segment before its extraction.  Repeat until there is no black pixel left.
  • 31.
    Baseline Detection  Forbaseline detection module we feed all the segments of the middle-lower zone which either hang from the headline or from immediate below (at most 0.2 times the height of the middle-lower region) the headline.  Find the respective heights hi of each segment and then normalize it to hi ’ where 0< hi ’<10. Now find hmin = min{ hi ’ | hi ’ >6.0 } Next we find h* = maxi {hi ’ | hi ’ > hmin & hi ’ < floor(hmin) + 1}  The horizontal line through the bottom most pixel of the segment with normalized height h* is the baseline
  • 32.
    (I) Input image (ii)Image obtained after applying K-mean clustering and Otsu's threshholding and skew correction. (iii) Segments obtained after character segmentation (I) Input image (ii) Image obtained after applying K-mean clustering and Otsu's threshholding and skew correction. (iii) Segments obtained after character segmentation
  • 33.
    References  Prakriti Banik, Ujjwal Bhattacharya, Swapan K. Parui. Segmentation of Bangla Words in Scene Images.  U. Bhattacharya, S. K. Parui, and S. Mondal. Devanagari and bangla text extraction from natural scene images. Proc. of Int. Conf. on Document Analysis and Recognition, pages 171{175, 2009.