PPP-DB-FINAL PPT.pdf

Recognition of Printed Bilingual (Odia and English)
Scripts and Numbers
Conference on
VLSI Design, Signal Processing, Image Processing, Communications & Embedded Systems
VSPICE,2020
Prangya Paramita Pradhan
Department of Instrumentation &
Electronics Engineering
College of Engineering and Technology
Bhubaneswar
Debashree Brahma
Department of Instrumentation &
Electronics Engineering
College of Engineering and Technology
Bhubaneswar
Department of Instrumentation & Electronics Engineering, CET
Bhubaneswar,VSPICE,2020

Outlines:
 Introduction
 Bilingual scripts identification system
 Different features of English and Odia scripts
 Proposed method
 Experimental setup
 Conclusion
 Future work
 References
1 2

 What is bilingual script identification?
 Necessity of bilingual script identification
Figure1. A typical bilingual documents in Roman and Odia scripts
Introduction
1 3

Skew correction: The detected skew angle can be corrected
by rotating the entire document in opposite direction.
Line segmentation: The skew corrected document is
segmented into lines
Word segmentation: After line is separated, it is necessary to
differentiate, the individual words.
Classification:Image classification analyses the properties of
various image features and organizes data into same categories.
Bilingual Scripts identification system
1
4

Figure2. A typical bilingual script identification system for both English and Odia scripts
Input text
(scanned image)
PREPROCESSING
(BINARIZATION, SKEW
DETECTION &
CORRECTION)
LINE AND WORD
SEGMENTATION
ENGLISH
WORDS
ODIA WORDS
BILINGUAL
WORDS
WHICH
TYPE OF
WORD
IS IT?
CLASSIFICATION
Output document
image as
character
1
5

Odia scripts
 12 vowels and 38 consonants
having matras and
yuktaakshara
 Matras are used in upper zone
and lower zone
 Scripts are cursive in nature
 The basic characters are in
same level
English scripts
 26upper case &26 lower
case
 Scripts are Straight and
slant in nature
 Basic characters are not
in same level
Different Features of English and Odia scripts
1 6

Figure3. Scripts occupies three different zones
Figure4.Showing Vertical strokes in English and odia
scripts
Figure5. English scripts are in different levels
1 7

Preprocessing:
 Text binarization
 Skew Correction
Step1: Estimate the skew angle
Step2: Correct the skew angle
Experimental setup
1
8

Figure7. (a) and (b) document images with skews of -4.7 degree and 5.3 degree
respectively and (c) and (d) the corresponding skew corrected images
1
9

Line segmentation
 Find the ON pixel on the starting row
 Find the OFF pixel on the next Row, call it R1
 Find the ON pixel in the next row, call it R2.
 Find the spacing between them, that segment the
first line
1
10

Line segmentation:
Figure8: Experimental output for line segmentation
1
11

Word segmentation
 Scan the image vertically from top to bottom
 Find the distance between the characters
 By getting the maximum distance between two
characters, the word can be differentiated
Character segmentation
 For each word, Scan from left to right. Identify
the consecutive OFF and ON pixels.
 The OFF and ON pixels in a particular order will
segment the word into characters.
1
12

Figure 9: Line is segmented into words
Figure10. Character segmentation
1 13

Figure6: Flow chart for word identification
1
14
Segment the words
English or odia or bilingual?
Identify the matras
English or bilingual
Are matras
Present?
Are no.of
vertical strokes
≤ no.of chars ?
Are the
templates
“ra” or “re ”
matched?
English
yes
no
odia or
bilingual
Are no.of
vertical
strokes>no.
of chars ?
yes
NO
Are ‘s,’
‘c’,’g’,’x’
are
matched
Is none of the vert
strokes at the
beginning of the
char and the
characters are in
the same level?
yes
yes
no
yes
no
Odia
no
no
yes
Odia
odia
bilingual

Step1: Identify the matras. If a matra is present the word is Oriya or
bilingual. Otherwise the word is Oriya, English or bilingual
Step2: Identify the vertical strokes in a word. [The vertical stroke
feature is obtained by identifying the columns with maximum number
of on pixels].If the number of vertical strokes is greater than number
of characters in the word, it is an English or a mixed English-Oriya
word.
Step3: : Identify a mixed English-Oriya word by noting the matra at the
word-end or match for ‘ra’ or ‘re’
Step4:If number of full vertical strokes less than the number of character per
word, if a vertical stroke is present at the beginning or the basic characters or
the characters in the word are not at the same level, it is decided to be English
or bilingual. in such a situation, if there is a matra or a match for ‘ra’ or ‘re’ at
the word-end, the word is decided as the bilingual.
Proposed identification method for Odia & English Scripts
1
15

Step6: The words with less number of full vertical strokes in step 2,
search for the English characters with no vertical stroke. Template
matching using the correlation method is applied for identifying these
letters. If one such character is present, the corresponding word is
decided as English.
1 16

Table 1. Number of characters in a word and in a word average number of
vertical strokes in English scripts
Number of char in a word
1
2
3
4
5
6
7
8
9
10
Average number of vertical
strokes
1
2
6.5
5
7.3
7.4
10.5
10.4
11
14
1
17

Table 2. Number of characters in a word and in a word average number of
vertical strokes in Odia scripts
Number of char in a word
1
2
3
4
5
6
7
8
9
Average number of vertical
strokes
0.5
1
1.75
2.5
2.9
2.1
4
2
4
1
18

Comparing both the output
Odia scripts
Figure 11. Comparing the vertical strokes for both the scripts
English script
1
19

Conclusion and future scope:
The performance of the proposed method may
be studied in more details include the case of
variation of font sizes
The method may be extended to other numerical
values with bilingual scripts
The performance of the proposed method in the
ambiguous case like ‘I’ in Roman script and the
Oriya punctuation mark ‘-‘ is to be improved
1
20

References
[1] D. DHANYA, A. G. RAMAKRISHNAN, and P. B. PATI, “Script identification in printed
bilingual documents,” Sadhana,, vol.VOI. 27, Part 1, pp. 73-82, February 2002.
[2] B. CHAUDHURI, U. PAL, and M. MITRA, “Automatic recognition of printed oriya script,”
Sadhana, vol.VOI. 27,pp. 23-34, February 2002.
[3] U. Pal and B. Chaudhuri, “Indian script character recognition: a survey,” Elsevier, vol. 37, pp.
1887-1889,September 2004.
[4] P.B. Pati, S. S. R.Nishikanta, and A. G. Ramkrishnan, “Gabor filters for document analysisin
Indian bilingual documents,” Proceedings of International Conference IEEE, pp. 123-126,
2004.
[5] U. Pal and B.B.Chaudhuri, “Script line separation from Indian multi-script documents,”
Proceedings of the Fifth International Conference on, ICDAR, pp. 406-409, January 1999.
[6] S. Mori, C. Y. Suen, and K. Yamamoto, “Historical review of ocr research and development,”
IEEE, vol. 22,January 1992.
[7] S. N. Srihari and J. J. Hull, “On-line and off-line handwriting recognition: a comprehensive
survey,” IEEE, Computer SocietyWashington.
[8] S. Wood, X. Yao, K. , and L.Dang, “Language identification from printed text independent of
segmentation,” proc. Ofint’1. Conf on image processing,” January 1995.
[9] D. Dhanya and A. Ramakrishnan, “Script identification in printed bilingual documents,”
SpringerVerlag Berlin Heidelberg, vol. 2423,pp.pp. 13-24,2002.
[10] B. V. Dhandra, Malikarjun, Hangarge, and V. S. Malemathl, “Separation of English numeral
from the multi lingual document text image,” IEEE-ICSCN, International Conference on
Signal Processing, Communications and Networking at MIT,, February 2007.
1
21

1 22

Segment the words
English or odia or bilingual?
Identify the matras
English or bilingual
Are matras
Present?
Are no.of vertical
strokes≤ no.of
chars ?
Are the
templates
“ra” or “re ”
matched?
English
yes
no
odia or
bilingual
Are no.of
vertical
strokes≤
no.of chars
?
yes
NO
Are ‘s,’
‘c’,’g’,’x’
are
matched
Is none of the
vertical strokes at
the beginning of
the character and
the characters are
in the same level?
yes
yes
no
yes
no
biningual
no
no
yes
Odia
odia
bilingual
1 27

PPP-DB-FINAL PPT.pdf

Recommended

Recommended

More Related Content

Similar to PPP-DB-FINAL PPT.pdf

Similar to PPP-DB-FINAL PPT.pdf (20)

Recently uploaded

Recently uploaded (20)

PPP-DB-FINAL PPT.pdf