Presen_Segmentation

Hindi Scene Text
Recognition
Guide: Dr. Gaurav Harit
Surya Yadav, Vikas Yadav, Vikas Goyal

Objective
Create a system that detect and
recognize characters from natural scene
images containing Devanagari text.

Motivation
 Hindi is the most spoken language in India and third most spoken
language in the world.
 Most of the websites in Devnagri use images to represent text. There is
need to index such image based on the text in them so that they can be
easily searched.
 Tourist often face problem in India. So there is demand for automated
system that understand natural scene images and provide translated
information.
 Scene text like shop name, company name, traffic information, road signs
and other natural scene board display are important to be recognized
and processed.

Steps:
Natural
Scene
Image
Text block
detection
Word and
character
segmentation
Error
Correction
Feature
Detection and
classification
Output

Steps:
Image Gray scale
Image
Canny edge
map
Morphological
closing
Use of similarity
measures to
find text region
missed in
previous step
Use of Script
Specific Rules
Verification of
uniform
thickness
Connected
Component
region
Extraction

Canny Edge Map
We compute canny edge map of gray image so as to get
the connected components.

Distance Transform of a binary image
Each pixel in the image is set to a value equal to distance from nearest background pixel

Computation of Stroke Thickness
 For each pixel with non zero value in distance
transformed image if the pixel is local maxima around
3x3 window centered at that pixel we store it in a list
 We compute the mean and variance of values in the
list.
 If mean value is greater than twice the standard
deviation then we decide that thickness of underlying
stroke transform is nearly uniform and select the sub
image as a candidate text region and draw the
bounding box.

Condition based on geometry
For each selected region we get in previous step we first
test it against these set of rules.
1. Aspect ratio of text region should vary between 0.1 to
10.
2. Both height and width of candidate text region cannot
be larger than half of the corresponding size of input
image.
3. Height of candidate text region should be greater than
10 pixels.

Overlapping problem
 There were many bounding box overlapped with each
other.
 Overlap between two bounding box of adjacent text
region should not be greater than 30% of either.
 For solving this issue we merge each pair of bounding
box which have intersection area greater than some
threshold value.

After applying geometry condition and
solving overlapping problem

Sobel Filtering
 Now we use Sobel edge detection algorithm to detect possible horizontal and
possible vertical lines.

Detection of head lines
 For each above region we compute probabilistic Hough transform of the image
in the previous step that is after Horizontal Sobel filtering of image to obtain
characteristic horizontal headlines in Devanagari texts.
 Necessary condition for selection of member as candidate headline is that it
should lie in the upper half part of bounding box.

Detection of vertical lines
 Final decision of existence of possible head line among the possible
horizontal lines is based on computation of vertical Hough lines.
 We compute vertical lines by again applying Hough transform with lower
threshold value as they are not as prominent as horizontal.
 If majority of vertical lines lie below member of horizontal line, the
corresponding horizontal line will be treated as headline.

Detected Horizontal and vertical Lines

Character Segmentation
(Next Proposed step)
 Applying Sobel Filter only in one direction that is in vertical direction
removes the headline from candidate region.
 After the removal of headline in each of the bounding box we segment the
word based on vertical histogram analysis.

Next Step ………………Phase ii
 After headline removal we perform Character
Segmentation in selected image.
 After the character segmentation of image we get
each particular characters of Devanagari Script.
 For each character we then perform character
recognition.

Segmentation
Guide: Dr. Gaurav harit
Vikas Yadav, Vikas Goyal, Surya Yadav

Previous Work
 Until now we are able to get bounding box around words.

Segmentation
Character
segmentation from
middle and lower
zone
Baseline
Detection
Character
segmentation from
upper and middle-
lower zone
Headline
Detection
Obtain skew
corrected image
Obtain skew angle
by detecting near
horizontal line in
upper half of image
Obtain thin
image
Conversion of text
to black and
background to
white
Text and
background
separation
Combine
cluster from
both method
Otsu’s
threshholding on
pixels not
normalized
K-mean clustering
on normalized
pixel
RGB
Normalization
where needed
Image

Text and Background Detection
 Converting the image into a binary image by applying popular global or
local thresholding method cannot segment the text from the background
properly.
 Therefore, we applied combination of otsu’s thresholding and unsupervised
k mean clustering to cluster different colour regions in an image.
 Often scene image texts are effected by varying lightness. To handle this
lightness effect on an image we normalize the RGB values of an image
before implementing K-means clustering. But we do not normalize those
pixels where the pixel have near gray RGB values.

 For each pixel we check
(max(R, G, B) - min(R,G,B)/ max(R,G,B)) > 0.2
threshold value 0.2 is selected to filter out the RGB values having near gray
values.
 For the set of pixels not satisfying above criteria, we convert RGB values to
gray and perform otsu’s threshholding.
 For the set pixels satisfying above criteria, RGB normalization is carried out
on this set to remove the lightness effect from those pixel, keeping color
information intact.
 Perform K-mean clustering after normalizing the set satisfying criteria to
obtain text and background separately.
 Combine the clusters from otsu’s thresholding and K-mean clustering to
obtain text and background clusters.

Skew Correction
 Apply thinning algorithm on text region to obtain skeleton image.
 Use Hough transform to obtain all line segments in the upper half of image
with slopes less than 65o.
 If the length of the longest line segment among them is greater than an
empirically selected threshold value, it is decided as the headline.
 If this headline is not parallel to the x-axis then its skew is corrected by
rotating the word image.
(i) Skeleton image obtained for detecting headline for skew correction

Headline Detection
 In order to segment the characters we need to detect the thick headline.
 Compute the projection profile by row-wise sum of gray values for each
row in the upper half of word image.
 Scan the normalized projection profiles of successive rows in the upward
direction starting from the spine and stop scanning when this value drops to
less than a pre-defined threshold value. This row of the word image is
considered as the upper boundary of the headline.
 Similarly, we scan these projection profile values downward starting from
the spine and the row, for which this value drops to less than the same
threshold value, is considered as the lower boundary of the headline.

Character Segmentation
 Use the region growing method to extract the individual characters or their
parts from the binarized and skew corrected word image.
 Locate the lowest and leftmost black pixel in B, and consider it as the seed
point for region growing module.
 The current segment is extracted using the standard region growing
approach based on 8-neighborhood. The stopping criteria for the
implementation of region growing is either
(i) reach the upper or lower boundary of the thick headline
or (ii) reach at a white pixel.
 The extraction of the current segment is continued until no pixel is left to visit
satisfying the above.

Appending local headline
 Append the part of the headline to the above extracted segment as
follows.
 The top left and top right pixels of this segment lie on the lower boundary of
the headline and the portion of the thick headline just above these two
pixels are appended to the segment before its extraction.
 Repeat until there is no black pixel left.

Baseline Detection
 For baseline detection module we feed all the segments of the middle-lower
zone which either hang from the headline or from immediate below (at most 0.2
times the height of the middle-lower region) the headline.
 Find the respective heights hi of each segment and then normalize it to
hi
’ where 0< hi
’<10. Now find
hmin = min{ hi
’ | hi
’ >6.0 }
Next we find
h* = maxi {hi
’ | hi
’ > hmin & hi
’ < floor(hmin) + 1}
 The horizontal line through the bottom most pixel of the segment with
normalized height h* is the baseline

(I) Input image
(ii) Image obtained after applying K-mean
clustering and Otsu's threshholding and skew
correction.
(iii) Segments obtained after character segmentation
(I) Input image
(ii) Image obtained after applying K-mean
clustering and Otsu's threshholding and skew
correction.
(iii) Segments obtained after character segmentation

References
 Prakriti Banik , Ujjwal Bhattacharya, Swapan K. Parui. Segmentation of
Bangla Words in Scene Images.
 U. Bhattacharya, S. K. Parui, and S. Mondal. Devanagari and bangla text
extraction from natural scene images. Proc. of Int. Conf. on Document
Analysis and Recognition, pages 171{175, 2009.

Presen_Segmentation

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Presen_Segmentation

Similar to Presen_Segmentation (20)

Recently uploaded

Recently uploaded (20)

Presen_Segmentation