Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
PPP-DB-FINAL PPT.pdf
1. Recognition of Printed Bilingual (Odia and English)
Scripts and Numbers
Conference on
VLSI Design, Signal Processing, Image Processing, Communications & Embedded Systems
VSPICE,2020
Prangya Paramita Pradhan
Department of Instrumentation &
Electronics Engineering
College of Engineering and Technology
Bhubaneswar
Debashree Brahma
Department of Instrumentation &
Electronics Engineering
College of Engineering and Technology
Bhubaneswar
Department of Instrumentation & Electronics Engineering, CET
Bhubaneswar,VSPICE,2020
Department of Instrumentation & Electronics Engineering, CET
Bhubaneswar,VSPICE,2020
2. Outlines:
Introduction
Bilingual scripts identification system
Different features of English and Odia scripts
Proposed method
Experimental setup
Conclusion
Future work
References
Department of Instrumentation & Electronics Engineering, CET
Bhubaneswar,VSPICE,2020
1 2
3. What is bilingual script identification?
Necessity of bilingual script identification
Figure1. A typical bilingual documents in Roman and Odia scripts
Introduction
Department of Instrumentation & Electronics Engineering, CET
Bhubaneswar,VSPICE,2020
1 3
4. Skew correction: The detected skew angle can be corrected
by rotating the entire document in opposite direction.
Line segmentation: The skew corrected document is
segmented into lines
Word segmentation: After line is separated, it is necessary to
differentiate, the individual words.
Classification:Image classification analyses the properties of
various image features and organizes data into same categories.
Bilingual Scripts identification system
Department of Instrumentation & Electronics Engineering, CET
Bhubaneswar,VSPICE,2020
1
4
5. Figure2. A typical bilingual script identification system for both English and Odia scripts
Input text
(scanned image)
PREPROCESSING
(BINARIZATION, SKEW
DETECTION &
CORRECTION)
LINE AND WORD
SEGMENTATION
ENGLISH
WORDS
ODIA WORDS
BILINGUAL
WORDS
WHICH
TYPE OF
WORD
IS IT?
CLASSIFICATION
Output document
image as
character
Department of Instrumentation & Electronics Engineering, CET
Bhubaneswar,VSPICE,2020
1
5
6. Odia scripts
12 vowels and 38 consonants
having matras and
yuktaakshara
Matras are used in upper zone
and lower zone
Scripts are cursive in nature
The basic characters are in
same level
English scripts
26upper case &26 lower
case
Scripts are Straight and
slant in nature
Basic characters are not
in same level
Different Features of English and Odia scripts
Department of Instrumentation & Electronics Engineering, CET
Bhubaneswar,VSPICE,2020
1 6
7. Figure3. Scripts occupies three different zones
Figure4.Showing Vertical strokes in English and odia
scripts
Figure5. English scripts are in different levels
Department of Instrumentation & Electronics Engineering, CET
Bhubaneswar,VSPICE,2020
1 7
8. Preprocessing:
Text binarization
Skew Correction
Step1: Estimate the skew angle
Step2: Correct the skew angle
Experimental setup
Department of Instrumentation & Electronics Engineering, CET
Bhubaneswar,VSPICE,2020
1
8
9. Figure7. (a) and (b) document images with skews of -4.7 degree and 5.3 degree
respectively and (c) and (d) the corresponding skew corrected images
Department of Instrumentation & Electronics Engineering, CET
Bhubaneswar,VSPICE,2020
1
9
10. Line segmentation
Find the ON pixel on the starting row
Find the OFF pixel on the next Row, call it R1
Find the ON pixel in the next row, call it R2.
Find the spacing between them, that segment the
first line
Department of Instrumentation & Electronics Engineering, CET
Bhubaneswar,VSPICE,2020
1
10
12. Word segmentation
Scan the image vertically from top to bottom
Find the distance between the characters
By getting the maximum distance between two
characters, the word can be differentiated
Department of Instrumentation & Electronics Engineering, CET
Bhubaneswar,VSPICE,2020
Character segmentation
For each word, Scan from left to right. Identify
the consecutive OFF and ON pixels.
The OFF and ON pixels in a particular order will
segment the word into characters.
1
12
13. Figure 9: Line is segmented into words
Figure10. Character segmentation
Department of Instrumentation & Electronics Engineering, CET
Bhubaneswar,VSPICE,2020
1 13
14. Figure6: Flow chart for word identification
Department of Instrumentation & Electronics Engineering, CET
Bhubaneswar,VSPICE,2020
1
14
Segment the words
English or odia or bilingual?
Identify the matras
English or bilingual
Are matras
Present?
Are no.of
vertical strokes
≤ no.of chars ?
Are the
templates
“ra” or “re ”
matched?
English
yes
no
odia or
bilingual
Are no.of
vertical
strokes>no.
of chars ?
yes
NO
Are ‘s,’
‘c’,’g’,’x’
are
matched
Is none of the vert
strokes at the
beginning of the
char and the
characters are in
the same level?
yes
yes
no
yes
no
Odia
no
no
yes
Odia
odia
bilingual
15. Step1: Identify the matras. If a matra is present the word is Oriya or
bilingual. Otherwise the word is Oriya, English or bilingual
Step2: Identify the vertical strokes in a word. [The vertical stroke
feature is obtained by identifying the columns with maximum number
of on pixels].If the number of vertical strokes is greater than number
of characters in the word, it is an English or a mixed English-Oriya
word.
Step3: : Identify a mixed English-Oriya word by noting the matra at the
word-end or match for ‘ra’ or ‘re’
Step4:If number of full vertical strokes less than the number of character per
word, if a vertical stroke is present at the beginning or the basic characters or
the characters in the word are not at the same level, it is decided to be English
or bilingual. in such a situation, if there is a matra or a match for ‘ra’ or ‘re’ at
the word-end, the word is decided as the bilingual.
Proposed identification method for Odia & English Scripts
Department of Instrumentation & Electronics Engineering, CET
Bhubaneswar,VSPICE,2020
1
15
16. Step6: The words with less number of full vertical strokes in step 2,
search for the English characters with no vertical stroke. Template
matching using the correlation method is applied for identifying these
letters. If one such character is present, the corresponding word is
decided as English.
Department of Instrumentation & Electronics Engineering, CET
Bhubaneswar,VSPICE,2020
1 16
17. Table 1. Number of characters in a word and in a word average number of
vertical strokes in English scripts
Department of Instrumentation & Electronics Engineering, CET
Bhubaneswar,VSPICE,2020
Number of char in a word
1
2
3
4
5
6
7
8
9
10
Average number of vertical
strokes
1
2
6.5
5
7.3
7.4
10.5
10.4
11
14
1
17
18. Table 2. Number of characters in a word and in a word average number of
vertical strokes in Odia scripts
Department of Instrumentation & Electronics Engineering, CET
Bhubaneswar,VSPICE,2020
Number of char in a word
1
2
3
4
5
6
7
8
9
Average number of vertical
strokes
0.5
1
1.75
2.5
2.9
2.1
4
2
4
1
18
19. Comparing both the output
Odia scripts
Figure 11. Comparing the vertical strokes for both the scripts
Department of Instrumentation & Electronics Engineering, CET
Bhubaneswar,VSPICE,2020
English script
1
19
20. Conclusion and future scope:
The performance of the proposed method may
be studied in more details include the case of
variation of font sizes
The method may be extended to other numerical
values with bilingual scripts
The performance of the proposed method in the
ambiguous case like ‘I’ in Roman script and the
Oriya punctuation mark ‘-‘ is to be improved
Department of Instrumentation & Electronics Engineering, CET
Bhubaneswar,VSPICE,2020
1
20
21. References
[1] D. DHANYA, A. G. RAMAKRISHNAN, and P. B. PATI, “Script identification in printed
bilingual documents,” Sadhana,, vol.VOI. 27, Part 1, pp. 73-82, February 2002.
[2] B. CHAUDHURI, U. PAL, and M. MITRA, “Automatic recognition of printed oriya script,”
Sadhana, vol.VOI. 27,pp. 23-34, February 2002.
[3] U. Pal and B. Chaudhuri, “Indian script character recognition: a survey,” Elsevier, vol. 37, pp.
1887-1889,September 2004.
[4] P.B. Pati, S. S. R.Nishikanta, and A. G. Ramkrishnan, “Gabor filters for document analysisin
Indian bilingual documents,” Proceedings of International Conference IEEE, pp. 123-126,
2004.
[5] U. Pal and B.B.Chaudhuri, “Script line separation from Indian multi-script documents,”
Proceedings of the Fifth International Conference on, ICDAR, pp. 406-409, January 1999.
[6] S. Mori, C. Y. Suen, and K. Yamamoto, “Historical review of ocr research and development,”
IEEE, vol. 22,January 1992.
[7] S. N. Srihari and J. J. Hull, “On-line and off-line handwriting recognition: a comprehensive
survey,” IEEE, Computer SocietyWashington.
[8] S. Wood, X. Yao, K. , and L.Dang, “Language identification from printed text independent of
segmentation,” proc. Ofint’1. Conf on image processing,” January 1995.
[9] D. Dhanya and A. Ramakrishnan, “Script identification in printed bilingual documents,”
SpringerVerlag Berlin Heidelberg, vol. 2423,pp.pp. 13-24,2002.
[10] B. V. Dhandra, Malikarjun, Hangarge, and V. S. Malemathl, “Separation of English numeral
from the multi lingual document text image,” IEEE-ICSCN, International Conference on
Signal Processing, Communications and Networking at MIT,, February 2007.
Department of Instrumentation & Electronics Engineering, CET
Bhubaneswar,VSPICE,2020
1
21
22. 1 22
Department of Instrumentation & Electronics Engineering, CET
Bhubaneswar,VSPICE,2020
27. Segment the words
English or odia or bilingual?
Identify the matras
English or bilingual
Are matras
Present?
Are no.of vertical
strokes≤ no.of
chars ?
Are the
templates
“ra” or “re ”
matched?
English
yes
no
odia or
bilingual
Are no.of
vertical
strokes≤
no.of chars
?
yes
NO
Are ‘s,’
‘c’,’g’,’x’
are
matched
Is none of the
vertical strokes at
the beginning of
the character and
the characters are
in the same level?
yes
yes
no
yes
no
biningual
no
no
yes
Odia
odia
bilingual
1 27