Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
character recognition: Scope and challenges
1. 03/17/14 Devnagari Character Recognition 1of 62
by
Vikas J. Dongre
Lecturer Electronics,
Government Polytechnic Gondia
2. 03/17/14 Devnagari Character Recognition 2of 62
Contents
Introduction
Scope
Features Of Devnagari Script
Image Preprocessing
Feature Extraction
Character Classification
Post processing
Character Recognition challenges
Current research results
3. 03/17/14 Devnagari Character Recognition 3of 62
OCR (Optical Character Recognition)
Character recognition is a part of pattern or object
recognition with special focus to Natural language
processing (NLP).
“…a system that provides a full alphanumeric recognition of
printed or handwritten characters at electronic speed by
simply scanning the document.”
Documents can be scanned through a scanner and then the
recognition engine of the OCR system interpret the images
and turn images of handwritten or printed characters into
ASCII data (machine-readable characters).
4. 03/17/14 Devnagari Character Recognition 4of 62
Some applications
•Postal address reading
•Check reading
•Census data collection and processing
•Image document reading
•Digitizing old books in editable form
•Extended research:
• text to speech conversion (e-book reading)
•Visually impaired should be able to access
computers in their native language Indian
languages
11. 03/17/14 Devnagari Character Recognition 11of 62
International Scenario (Source IBM)
Internet Users by Language
English
Chinese
JapaneseSpanish
German
French
Korean
Italian
Portuguese
Dutch
Other
12. 03/17/14 Devnagari Character Recognition 12of 62
International Scenario (Source IBM)
Internet Users: Growth
English
Chinese
Japanese
Spanish
German
French
KoreanItalian
Portuguese
Dutch
Other
13. 03/17/14 Devnagari Character Recognition 13of 62
Main Research Themes
Online character Recognition
Printed Text Recognition
Handwriting Recognition
Language Recognition
Graphics Document Recognition
Document Understanding
Tables and Forms Processing
Document Engineering
14. 03/17/14 Devnagari Character Recognition 14of 62
Introduction to
Devnagari character Recognition
Devnagari Optical Character recognition (DOCR) is more
complicated as compared to English.
various soft computing tools involved in other types of
pattern recognition and image processing can be used
for DOCR.
15. 03/17/14 Devnagari Character Recognition 15of 62
Features Of Devnagari Script
Devnagari is the most popular script in India.
Hindi, the national language of India, is written in the
Devnagari script.
It is also used for writing Marathi, Konkani, Sanskrit
and Nepali languages.
Moreover, Hindi is the third most popular language in
the world.
Alphabet set tends to be quite large.
It has 11 vowels and 33 consonants as basic
characters.
Compound characters can be formed by joining
characters in various ways.
characters have a horizontal line at the upper part,
known as Shirorekha or headline
16. 03/17/14 Devnagari Character Recognition 16of 62
Vowels and Corresponding Modifiers
Consonants
Half Form of Consonants with Vertical Bar
17. 03/17/14 Devnagari Character Recognition 17of 62
Examples of Combination of Half-Consonant and
Consonant
Examples of Special Combination of Half-Consonant
and Consonant.
Special Symbols
18. 03/17/14 Devnagari Character Recognition 18of 62
Character recognition Process
Image
digit-
zation
using
Scann
er
Imag
e
pre-
proce
ssing
Featur
e
extracti
on &
Normal
ization
Charac
ter
Classifi
er
Charac
ter
Segme
ntation
Storing
charac
ter in
text file
21. 03/17/14 Devnagari Character Recognition 21of 62
Slant Correction
• The dominant slope of the word is found from the slope corrected
words which gives the minimum entropy of a vertical projection
histogram. The vertical histogram projection is calculated for a range
of angles ± R. In our case R=60, seems to cover all writing styles. The
slope of the word, ,is found from:
ma
H
Ra
m
±∈
= minα i
N
i
i ppH log
1
∑=
−=
• The character is then corrected by using:
ma
)tan( mayxx −=′ yy =′
23. 03/17/14 Devnagari Character Recognition 23of 62
Feature Extraction
A set of features are extracted for each class that helps
distinguish it from other classes, while remaining
invariant to characteristic differences within the class
Various methods are:
Global Transformation and Series Expansion
Statistical Features
Geometrical and Topological Features
24. 03/17/14 Devnagari Character Recognition 24of 62
Global Transformation and Series Expansion
Fourier Transforms
Gabor Transform
Wavelets
Moments
Karhunen-Loeve( KL) Expansion
Statistical Features
Zoning
Crossings and Distances
Projections
25. 03/17/14 Devnagari Character Recognition 25of 62
Geometrical and Topological Features
Extracting and Counting Topological
Structures
Measuring and Approximating the
Geometrical Properties
Coding
Graphs and Trees
30. 03/17/14 Devnagari Character Recognition 30of 62
Character Classification
Template Matching.
Statistical Techniques.
Neural Networks.
Support Vector Machine (SVM) algorithms.
Combination classifier.
OCR systems extensively use the methodologies of
pattern recognition, which assigns an unknown sample
to a predefined class. Various methods are
31. 03/17/14 Devnagari Character Recognition 31of 62
Template Matching
Euclidean Distance
Mahalanobis, Jaccard or Yule similarity measures
K-Nearest Neighbor measurements
This is the simplest way of character recognition. The
recognition rate of this method is very sensitive to noise
and image deformation. Various methods are
Character Classification…
32. 03/17/14 Devnagari Character Recognition 32of 62
Character Classification…
Statistical Techniques
Likelihood or Bayes classifier
Clustering Analysis
Hidden Markov Modeling (HMM)
Fuzzy Set Reasoning
Quadratic classifier
33. 03/17/14 Devnagari Character Recognition 33of 62
Character Classification…
Neural Networks
multilayer
perceptron (MLP)
Kohonen's Self
Organizing Map
(SOM)
Back Propagation
algorithm
Support Vector
Machine (SVM)
algorithms
34. 03/17/14 Devnagari Character Recognition 34of 62
Character Classification…
Combination Classifier
ANN and HMM
K-Means and SVM
MLP and SVM
MLP and minimum edit
SVM and ANN
fuzzy neural network
NN, fuzzy logic and genetic algorithm
35. 03/17/14 Devnagari Character Recognition 35of 62
Post processing
save in text file
Refine OCR output using spell check ,
grammar check and other knowledge
source comparisons
other applications using standard word
processors.
42. 03/17/14 Devnagari Character Recognition 42of 62
Devnagari Word
Individual Devnagari
symbols
Word Segmentation
Segmented word
43. 03/17/14 Devnagari Character Recognition 43of 62
Word Segmentation
Devnagari Word
Individual Devnagari
symbols
Segmented word
44. 03/17/14 Devnagari Character Recognition 44of 62
Some observations
Experiments with degraded text images show that
the chief source of error is at the level of
segmentation of characters.
A similar situation exists for recognition of hand
written texts.
Error rates are at acceptable levels for the other
stages i.e. line segmentation, word segmentation,
character recognition etc.
45. 03/17/14 Devnagari Character Recognition 45of 62
Character classification
Recognized
characters
Input
characters
(54)
Correct=42
Icorrect=9
Not recognized:
3
Accuracy=77.8 %
Features used:
Filled Area
Euler Number
Perimeter
Convex Area
Classifier
used
Absolute
difference
46. 03/17/14 Devnagari Character Recognition 46of 62
Research Publications
Vikas J Dongre, Vijay H Mankar, “A Review of Research on
Devnagari Character Recognition”, International Journal of
Computer Applications (0975 – 8887) Volume 12– No.2, pp.
8-15, November 2010.
48. 03/17/14 Devnagari Character Recognition 48of 62
Devnagari Character recognition challenges -1
•Devnagari is Two dimensional script as consonants are
modified in many ways to form a meaningful letter.
•Same is also true for its recognition.
•The recognizer has to identify all the modifiers present in a
letter.
•Generated ISCII codes or Unicode are the combined
properly to display the digitized document.
49. 03/17/14 Devnagari Character Recognition 49of 62
Devnagari Character recognition challenges -2
•Compound letter segmentation.
•Upper and lower modifier segmentation.
•Left and right modifier segmentation
•Separating anuswara (.) and full stop from noise.
•Understanding punctuation marks in the document.
•Unconnected compound letters handwritten document.
•Connected simple letters in handwritten document.
50. 03/17/14 Devnagari Character Recognition 50of 62
Devnagari Character recognition challenges -3
•India is multilingual country. More than one language is
used in a document frequently.
•Recognition of more than one language at a time is a
great challenge.
•Initially Language recognition is to be done by looking
into the properties of the script.
•English–Hindi language discrimination is moderately
simple as compared to Marathi-Hindi.
•Various forms in Banks uses three languages (Marathi-
State language, Hindi-National language and English-
International language). This this work is still more
challenging.
58. 03/17/14 Devnagari Character Recognition 58of 62
Image Document recognition
Message on glass door with
complex background
Document recognition on
mobile phone
60. 03/17/14 Devnagari Character Recognition 60of 62
Conclusion
Development in character recognition will boost word
processing and image understanding.
Devnagari character recognition will help readers to
listen to Indian literature using computers and PDA or e-
book readers.
It will help in language translation which is complex
problem in multilingual country like India where each
state have its own language.
Many modern innovative applications will evolve which
is the need of time in this information age.
This will help in information processing to a large
extent.
62. 03/17/14 Devnagari Character Recognition 62of 62
Acknowledgement
Friend, Philosopher and “ GUIDE”
Dr. V.H. Mankar
for his consistent help
and encouragement