03/17/14 Devnagari Character Recognition 1of 62
by
Vikas J. Dongre
Lecturer Electronics,
Government Polytechnic Gondia
03/17/14 Devnagari Character Recognition 2of 62
Contents
 Introduction
 Scope
 Features Of Devnagari Script
 Image Pre...
03/17/14 Devnagari Character Recognition 3of 62
OCR (Optical Character Recognition)
 Character recognition is a part of p...
03/17/14 Devnagari Character Recognition 4of 62
Some applications
•Postal address reading
•Check reading
•Census data coll...
03/17/14 Devnagari Character Recognition 5of 62
Postal Address
Recognition
03/17/14 Devnagari Character Recognition 6of 62
03/17/14 Devnagari Character Recognition 7of 62
03/17/14 Devnagari Character Recognition 8of 62
03/17/14 Devnagari Character Recognition 9of 62
03/17/14 Devnagari Character Recognition 10of 62
Prime
comitments
03/17/14 Devnagari Character Recognition 11of 62
International Scenario (Source IBM)
Internet Users by Language
English
Ch...
03/17/14 Devnagari Character Recognition 12of 62
International Scenario (Source IBM)
Internet Users: Growth
English
Chines...
03/17/14 Devnagari Character Recognition 13of 62
Main Research Themes
Online character Recognition
Printed Text Recognit...
03/17/14 Devnagari Character Recognition 14of 62
Introduction to
Devnagari character Recognition
 Devnagari Optical Chara...
03/17/14 Devnagari Character Recognition 15of 62
Features Of Devnagari Script
 Devnagari is the most popular script in In...
03/17/14 Devnagari Character Recognition 16of 62
Vowels and Corresponding Modifiers
Consonants
Half Form of Consonants wit...
03/17/14 Devnagari Character Recognition 17of 62
Examples of Combination of Half-Consonant and
Consonant
Examples of Speci...
03/17/14 Devnagari Character Recognition 18of 62
Character recognition Process
Image
digit-
zation
using
Scann
er
Imag
e
p...
03/17/14 Devnagari Character Recognition 19of 62
Image Preprocessing
 Thresholding & Binarization
 Noise Reduction
 Seg...
03/17/14 Devnagari Character Recognition 20of 62
Preprocessed Images (a) Original, (b) segmented (c)
Shirorekha removed (d...
03/17/14 Devnagari Character Recognition 21of 62
Slant Correction
• The dominant slope of the word is found from the slope...
03/17/14 Devnagari Character Recognition 22of 62
Skew Correction
03/17/14 Devnagari Character Recognition 23of 62
Feature Extraction
 A set of features are extracted for each class that ...
03/17/14 Devnagari Character Recognition 24of 62
Global Transformation and Series Expansion
 Fourier Transforms
 Gabor T...
03/17/14 Devnagari Character Recognition 25of 62
Geometrical and Topological Features
 Extracting and Counting Topologica...
03/17/14 Devnagari Character Recognition 26of 62
Zoning
03/17/14 Devnagari Character Recognition 27of 62
Structural Features
03/17/14 Devnagari Character Recognition 28of 62
03/17/14 Devnagari Character Recognition 29of 62
03/17/14 Devnagari Character Recognition 30of 62
Character Classification
 Template Matching.
 Statistical Techniques.
...
03/17/14 Devnagari Character Recognition 31of 62
Template Matching
 Euclidean Distance
 Mahalanobis, Jaccard or Yule sim...
03/17/14 Devnagari Character Recognition 32of 62
Character Classification…
 Statistical Techniques
 Likelihood or Bayes ...
03/17/14 Devnagari Character Recognition 33of 62
Character Classification…
 Neural Networks
 multilayer
perceptron (MLP)...
03/17/14 Devnagari Character Recognition 34of 62
Character Classification…
Combination Classifier
 ANN and HMM
 K-Means ...
03/17/14 Devnagari Character Recognition 35of 62
Post processing
 save in text file
 Refine OCR output using spell check...
03/17/14 Devnagari Character Recognition 36of 62
Some Research results
Scanned document (input image)
03/17/14 Devnagari Character Recognition 37of 62
3703/17/14
Paragraph Segmentation
03/17/14 Devnagari Character Recognition 38of 62
Segmented Paragraph
03/17/14 Devnagari Character Recognition 39of 62
3903/17/14
Segmented Paragraph
03/17/14 Devnagari Character Recognition 40of 62
Zero pixel zone
Line Segmentation
03/17/14 Devnagari Character Recognition 41of 62
Line Segmentation
03/17/14 Devnagari Character Recognition 42of 62
Devnagari Word
Individual Devnagari
symbols
Word Segmentation
Segmented w...
03/17/14 Devnagari Character Recognition 43of 62
Word Segmentation
Devnagari Word
Individual Devnagari
symbols
Segmented w...
03/17/14 Devnagari Character Recognition 44of 62
Some observations
 Experiments with degraded text images show that
the c...
03/17/14 Devnagari Character Recognition 45of 62
Character classification
Recognized
characters
Input
characters
(54)
Corr...
03/17/14 Devnagari Character Recognition 46of 62
Research Publications
Vikas J Dongre, Vijay H Mankar, “A Review of Resear...
03/17/14 Devnagari Character Recognition 47of 62
Complexity in Indic writing
03/17/14 Devnagari Character Recognition 48of 62
Devnagari Character recognition challenges -1
•Devnagari is Two dimension...
03/17/14 Devnagari Character Recognition 49of 62
Devnagari Character recognition challenges -2
•Compound letter segmentati...
03/17/14 Devnagari Character Recognition 50of 62
Devnagari Character recognition challenges -3
•India is multilingual coun...
03/17/14 Devnagari Character Recognition 51of 62
Multilingual character recognition
03/17/14 Devnagari Character Recognition 52of 62
Examples of multi-oriented documents
03/17/14 Devnagari Character Recognition 53of 62
Two column documents with image
03/17/14 Devnagari Character Recognition 54of 62
Image Document recognition
03/17/14 Devnagari Character Recognition 55of 62
Image Document recognition
Video caption
text recognition
Cargo container...
03/17/14 Devnagari Character Recognition 56of 62
Image Document recognition
Poster capturing License plate reading
03/17/14 Devnagari Character Recognition 57of 62
Image Document recognition
Whiteboard reading Road sign recognition
03/17/14 Devnagari Character Recognition 58of 62
Image Document recognition
Message on glass door with
complex background
...
03/17/14 Devnagari Character Recognition 59of 62
International journals related to
Character recognition
03/17/14 Devnagari Character Recognition 60of 62
Conclusion
Development in character recognition will boost word
processi...
03/17/14 Devnagari Character Recognition 61of 62
dongrevj@yahoo.co.in
Mob: 9370668979
03/17/14 Devnagari Character Recognition 62of 62
Acknowledgement
 Friend, Philosopher and “ GUIDE”
Dr. V.H. Mankar
for hi...
Upcoming SlideShare
Loading in...5
×

character recognition: Scope and challenges

8,362
-1

Published on

useful in research for character recognition in general and Devnagari character recognition in perticular

Published in: Education
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
8,362
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
192
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide
  • Source: Internet World Stats – Usage and Population Statistics
    http://www.internetworldstats.com/stats7.htm
    ©Copyright 2005, Miniwatts International, Ltd. All rights reserved.
  • Source: Internet World Stats – Usage and Population Statistics
    http://www.internetworldstats.com/stats7.htm
    ©Copyright 2005, Miniwatts International, Ltd. All rights reserved.
  • character recognition: Scope and challenges

    1. 1. 03/17/14 Devnagari Character Recognition 1of 62 by Vikas J. Dongre Lecturer Electronics, Government Polytechnic Gondia
    2. 2. 03/17/14 Devnagari Character Recognition 2of 62 Contents  Introduction  Scope  Features Of Devnagari Script  Image Preprocessing  Feature Extraction  Character Classification  Post processing  Character Recognition challenges  Current research results
    3. 3. 03/17/14 Devnagari Character Recognition 3of 62 OCR (Optical Character Recognition)  Character recognition is a part of pattern or object recognition with special focus to Natural language processing (NLP).  “…a system that provides a full alphanumeric recognition of printed or handwritten characters at electronic speed by simply scanning the document.”  Documents can be scanned through a scanner and then the recognition engine of the OCR system interpret the images and turn images of handwritten or printed characters into ASCII data (machine-readable characters).
    4. 4. 03/17/14 Devnagari Character Recognition 4of 62 Some applications •Postal address reading •Check reading •Census data collection and processing •Image document reading •Digitizing old books in editable form •Extended research: • text to speech conversion (e-book reading) •Visually impaired should be able to access computers in their native language Indian languages
    5. 5. 03/17/14 Devnagari Character Recognition 5of 62 Postal Address Recognition
    6. 6. 03/17/14 Devnagari Character Recognition 6of 62
    7. 7. 03/17/14 Devnagari Character Recognition 7of 62
    8. 8. 03/17/14 Devnagari Character Recognition 8of 62
    9. 9. 03/17/14 Devnagari Character Recognition 9of 62
    10. 10. 03/17/14 Devnagari Character Recognition 10of 62 Prime comitments
    11. 11. 03/17/14 Devnagari Character Recognition 11of 62 International Scenario (Source IBM) Internet Users by Language English Chinese JapaneseSpanish German French Korean Italian Portuguese Dutch Other
    12. 12. 03/17/14 Devnagari Character Recognition 12of 62 International Scenario (Source IBM) Internet Users: Growth English Chinese Japanese Spanish German French KoreanItalian Portuguese Dutch Other
    13. 13. 03/17/14 Devnagari Character Recognition 13of 62 Main Research Themes Online character Recognition Printed Text Recognition Handwriting Recognition Language Recognition Graphics Document Recognition Document Understanding Tables and Forms Processing Document Engineering
    14. 14. 03/17/14 Devnagari Character Recognition 14of 62 Introduction to Devnagari character Recognition  Devnagari Optical Character recognition (DOCR) is more complicated as compared to English.  various soft computing tools involved in other types of pattern recognition and image processing can be used for DOCR.
    15. 15. 03/17/14 Devnagari Character Recognition 15of 62 Features Of Devnagari Script  Devnagari is the most popular script in India.  Hindi, the national language of India, is written in the Devnagari script.  It is also used for writing Marathi, Konkani, Sanskrit and Nepali languages.  Moreover, Hindi is the third most popular language in the world.  Alphabet set tends to be quite large.  It has 11 vowels and 33 consonants as basic characters.  Compound characters can be formed by joining characters in various ways.  characters have a horizontal line at the upper part, known as Shirorekha or headline
    16. 16. 03/17/14 Devnagari Character Recognition 16of 62 Vowels and Corresponding Modifiers Consonants Half Form of Consonants with Vertical Bar
    17. 17. 03/17/14 Devnagari Character Recognition 17of 62 Examples of Combination of Half-Consonant and Consonant Examples of Special Combination of Half-Consonant and Consonant. Special Symbols
    18. 18. 03/17/14 Devnagari Character Recognition 18of 62 Character recognition Process Image digit- zation using Scann er Imag e pre- proce ssing Featur e extracti on & Normal ization Charac ter Classifi er Charac ter Segme ntation Storing charac ter in text file
    19. 19. 03/17/14 Devnagari Character Recognition 19of 62 Image Preprocessing  Thresholding & Binarization  Noise Reduction  Segmentation  Skew Detection And Correction  Size Normalization  Thinning
    20. 20. 03/17/14 Devnagari Character Recognition 20of 62 Preprocessed Images (a) Original, (b) segmented (c) Shirorekha removed (d) Thinned (e) image edging
    21. 21. 03/17/14 Devnagari Character Recognition 21of 62 Slant Correction • The dominant slope of the word is found from the slope corrected words which gives the minimum entropy of a vertical projection histogram. The vertical histogram projection is calculated for a range of angles ± R. In our case R=60, seems to cover all writing styles. The slope of the word, ,is found from: ma H Ra m ±∈ = minα i N i i ppH log 1 ∑= −= • The character is then corrected by using: ma )tan( mayxx −=′ yy =′
    22. 22. 03/17/14 Devnagari Character Recognition 22of 62 Skew Correction
    23. 23. 03/17/14 Devnagari Character Recognition 23of 62 Feature Extraction  A set of features are extracted for each class that helps distinguish it from other classes, while remaining invariant to characteristic differences within the class  Various methods are:  Global Transformation and Series Expansion  Statistical Features  Geometrical and Topological Features
    24. 24. 03/17/14 Devnagari Character Recognition 24of 62 Global Transformation and Series Expansion  Fourier Transforms  Gabor Transform  Wavelets  Moments  Karhunen-Loeve( KL) Expansion Statistical Features  Zoning  Crossings and Distances  Projections
    25. 25. 03/17/14 Devnagari Character Recognition 25of 62 Geometrical and Topological Features  Extracting and Counting Topological Structures  Measuring and Approximating the Geometrical Properties  Coding  Graphs and Trees
    26. 26. 03/17/14 Devnagari Character Recognition 26of 62 Zoning
    27. 27. 03/17/14 Devnagari Character Recognition 27of 62 Structural Features
    28. 28. 03/17/14 Devnagari Character Recognition 28of 62
    29. 29. 03/17/14 Devnagari Character Recognition 29of 62
    30. 30. 03/17/14 Devnagari Character Recognition 30of 62 Character Classification  Template Matching.  Statistical Techniques.  Neural Networks.  Support Vector Machine (SVM) algorithms.  Combination classifier. OCR systems extensively use the methodologies of pattern recognition, which assigns an unknown sample to a predefined class. Various methods are
    31. 31. 03/17/14 Devnagari Character Recognition 31of 62 Template Matching  Euclidean Distance  Mahalanobis, Jaccard or Yule similarity measures  K-Nearest Neighbor measurements This is the simplest way of character recognition. The recognition rate of this method is very sensitive to noise and image deformation. Various methods are Character Classification…
    32. 32. 03/17/14 Devnagari Character Recognition 32of 62 Character Classification…  Statistical Techniques  Likelihood or Bayes classifier  Clustering Analysis  Hidden Markov Modeling (HMM)  Fuzzy Set Reasoning  Quadratic classifier
    33. 33. 03/17/14 Devnagari Character Recognition 33of 62 Character Classification…  Neural Networks  multilayer perceptron (MLP)  Kohonen's Self Organizing Map (SOM)  Back Propagation algorithm  Support Vector Machine (SVM) algorithms
    34. 34. 03/17/14 Devnagari Character Recognition 34of 62 Character Classification… Combination Classifier  ANN and HMM  K-Means and SVM  MLP and SVM  MLP and minimum edit  SVM and ANN  fuzzy neural network  NN, fuzzy logic and genetic algorithm
    35. 35. 03/17/14 Devnagari Character Recognition 35of 62 Post processing  save in text file  Refine OCR output using spell check , grammar check and other knowledge source comparisons  other applications using standard word processors.
    36. 36. 03/17/14 Devnagari Character Recognition 36of 62 Some Research results Scanned document (input image)
    37. 37. 03/17/14 Devnagari Character Recognition 37of 62 3703/17/14 Paragraph Segmentation
    38. 38. 03/17/14 Devnagari Character Recognition 38of 62 Segmented Paragraph
    39. 39. 03/17/14 Devnagari Character Recognition 39of 62 3903/17/14 Segmented Paragraph
    40. 40. 03/17/14 Devnagari Character Recognition 40of 62 Zero pixel zone Line Segmentation
    41. 41. 03/17/14 Devnagari Character Recognition 41of 62 Line Segmentation
    42. 42. 03/17/14 Devnagari Character Recognition 42of 62 Devnagari Word Individual Devnagari symbols Word Segmentation Segmented word
    43. 43. 03/17/14 Devnagari Character Recognition 43of 62 Word Segmentation Devnagari Word Individual Devnagari symbols Segmented word
    44. 44. 03/17/14 Devnagari Character Recognition 44of 62 Some observations  Experiments with degraded text images show that the chief source of error is at the level of segmentation of characters.  A similar situation exists for recognition of hand written texts.  Error rates are at acceptable levels for the other stages i.e. line segmentation, word segmentation, character recognition etc.
    45. 45. 03/17/14 Devnagari Character Recognition 45of 62 Character classification Recognized characters Input characters (54) Correct=42 Icorrect=9 Not recognized: 3 Accuracy=77.8 % Features used: Filled Area Euler Number Perimeter Convex Area Classifier used Absolute difference
    46. 46. 03/17/14 Devnagari Character Recognition 46of 62 Research Publications Vikas J Dongre, Vijay H Mankar, “A Review of Research on Devnagari Character Recognition”, International Journal of Computer Applications (0975 – 8887) Volume 12– No.2, pp. 8-15, November 2010.
    47. 47. 03/17/14 Devnagari Character Recognition 47of 62 Complexity in Indic writing
    48. 48. 03/17/14 Devnagari Character Recognition 48of 62 Devnagari Character recognition challenges -1 •Devnagari is Two dimensional script as consonants are modified in many ways to form a meaningful letter. •Same is also true for its recognition. •The recognizer has to identify all the modifiers present in a letter. •Generated ISCII codes or Unicode are the combined properly to display the digitized document.
    49. 49. 03/17/14 Devnagari Character Recognition 49of 62 Devnagari Character recognition challenges -2 •Compound letter segmentation. •Upper and lower modifier segmentation. •Left and right modifier segmentation •Separating anuswara (.) and full stop from noise. •Understanding punctuation marks in the document. •Unconnected compound letters handwritten document. •Connected simple letters in handwritten document.
    50. 50. 03/17/14 Devnagari Character Recognition 50of 62 Devnagari Character recognition challenges -3 •India is multilingual country. More than one language is used in a document frequently. •Recognition of more than one language at a time is a great challenge. •Initially Language recognition is to be done by looking into the properties of the script. •English–Hindi language discrimination is moderately simple as compared to Marathi-Hindi. •Various forms in Banks uses three languages (Marathi- State language, Hindi-National language and English- International language). This this work is still more challenging.
    51. 51. 03/17/14 Devnagari Character Recognition 51of 62 Multilingual character recognition
    52. 52. 03/17/14 Devnagari Character Recognition 52of 62 Examples of multi-oriented documents
    53. 53. 03/17/14 Devnagari Character Recognition 53of 62 Two column documents with image
    54. 54. 03/17/14 Devnagari Character Recognition 54of 62 Image Document recognition
    55. 55. 03/17/14 Devnagari Character Recognition 55of 62 Image Document recognition Video caption text recognition Cargo container code recognition
    56. 56. 03/17/14 Devnagari Character Recognition 56of 62 Image Document recognition Poster capturing License plate reading
    57. 57. 03/17/14 Devnagari Character Recognition 57of 62 Image Document recognition Whiteboard reading Road sign recognition
    58. 58. 03/17/14 Devnagari Character Recognition 58of 62 Image Document recognition Message on glass door with complex background Document recognition on mobile phone
    59. 59. 03/17/14 Devnagari Character Recognition 59of 62 International journals related to Character recognition
    60. 60. 03/17/14 Devnagari Character Recognition 60of 62 Conclusion Development in character recognition will boost word processing and image understanding. Devnagari character recognition will help readers to listen to Indian literature using computers and PDA or e- book readers. It will help in language translation which is complex problem in multilingual country like India where each state have its own language. Many modern innovative applications will evolve which is the need of time in this information age. This will help in information processing to a large extent.
    61. 61. 03/17/14 Devnagari Character Recognition 61of 62 dongrevj@yahoo.co.in Mob: 9370668979
    62. 62. 03/17/14 Devnagari Character Recognition 62of 62 Acknowledgement  Friend, Philosopher and “ GUIDE” Dr. V.H. Mankar for his consistent help and encouragement
    1. ¿Le ha llamado la atención una diapositiva en particular?

      Recortar diapositivas es una manera útil de recopilar información importante para consultarla más tarde.

    ×