A SEMINAR
PRESENTATION
ON
IMPROVEMENTS IN OCR ALGORITHM
Seminar Guide :
Mr. Deepak Agarwal
Seminar Coordinator :
Mr. Abhishek Gupta
Presented By:
Rahul Taneja
PCE/CE/13/146
IV Year CS B
CSE Department, PCE
CONTENTS
1. Introduction
2. Applications
3. Comparison of Few similar technologies/ Algorithms/ Approaches
4. Working Architecture/Component/Methods
5. Limitations
6. Future Scope
7. Conclusion
8. References
9. Queries
CSE Department, PCE
1. Introduction
• Optical character recognition is the mechanical or electronic conversion
of images of typed, handwritten or printed text into machine-encoded text.
CSE Department, PCE
• It is a common method of digitising printed texts so that they can be
electronically edited, searched and stored more compactly.
CSE Department, PCE
2.Applications
• Data entry e.g. check, passport, invoice, bank statement and receipt
• Automatic number plate recognition
• Speech Synthesis
• Textual versions of printed documents, e.g. book scanning
• Defeating CAPTCHA anti-bot systems, though these are specifically
designed to prevent OCR
• Assistive technology for blind and visually impaired users
CSE Department, PCE
3.Comparison of Few similar technologies/
Algorithms/ Approaches
• Intelligent Character Recognition (ICR) is an extended technology of OCR
(optical character recognition).
• ICR is about hand written characters that are separated and written
as single characters.
• OCR is designed to work on printed characters while ICR is focusing
on hand printed characters.
• To enhance the ICR recognition accuracy it is common in all technologies
to use meta data, for example:
• regular expression
• name/city and other dictionaries
• database look ups
CSE Department, PCE
4.Working Architecture/Component/Methods
• Broadly speaking, there are two different ways to solve this problem,
1. by recognizing characters in their entirety (pattern recognition) or
2. by detecting the individual lines and strokes characters are made from
(feature detection) and identifying them that way
CSE Department, PCE
5.Limitations
• Recognition of typewritten text is still not 100% accurate even where
clear imaging is available. Character-by-character OCR accuracy for
commercial OCR software varies from 81% to 99%
• Recognition of cursive handwriting, and printed text in other scripts
are still the subject of active research.
CSE Department, PCE
6.Future Scope
• Total accuracy can be achieved by:
1. Human review (Manual Correction)
In this, Humans intervention is required as the word may not be available in data
dictionary
2. Data Dictionary Authentication (Automatic Correction)
In this, the Algorithm searches for the unrecognized pattern or feature in a data
dictionary which then provides with the probable correct pattern or feature.
CSE Department, PCE
7.CONCLUSION
• Thus we can now conclude that :
1. OCR algorithm works well for printed text and fairly for hand written text
2. It needs research for cursive handwritten text
3. Accuracy is not 100% even if the input provided is in its best form
4. Accuracy can be improved by implementing
1. Human review
2. Data dictionary authentication
CSE Department, PCE
8.REFERENCES
• Research papers :-
1. OCR accuracy improvement by Jyoti Goyal, CLDU, Sirsa.
2. Automatic Number plate recognition by Mohd. Arif, University of
Karachi.
• www.Wikipedia.com
• www.explainthatstuff.com
CSE Department, PCE
9.Queries

OCR Algorithm Accuracy Enhancement

  • 1.
    A SEMINAR PRESENTATION ON IMPROVEMENTS INOCR ALGORITHM Seminar Guide : Mr. Deepak Agarwal Seminar Coordinator : Mr. Abhishek Gupta Presented By: Rahul Taneja PCE/CE/13/146 IV Year CS B CSE Department, PCE
  • 2.
    CONTENTS 1. Introduction 2. Applications 3.Comparison of Few similar technologies/ Algorithms/ Approaches 4. Working Architecture/Component/Methods 5. Limitations 6. Future Scope 7. Conclusion 8. References 9. Queries CSE Department, PCE
  • 3.
    1. Introduction • Opticalcharacter recognition is the mechanical or electronic conversion of images of typed, handwritten or printed text into machine-encoded text. CSE Department, PCE
  • 4.
    • It isa common method of digitising printed texts so that they can be electronically edited, searched and stored more compactly. CSE Department, PCE
  • 5.
    2.Applications • Data entrye.g. check, passport, invoice, bank statement and receipt • Automatic number plate recognition • Speech Synthesis • Textual versions of printed documents, e.g. book scanning • Defeating CAPTCHA anti-bot systems, though these are specifically designed to prevent OCR • Assistive technology for blind and visually impaired users CSE Department, PCE
  • 6.
    3.Comparison of Fewsimilar technologies/ Algorithms/ Approaches • Intelligent Character Recognition (ICR) is an extended technology of OCR (optical character recognition). • ICR is about hand written characters that are separated and written as single characters. • OCR is designed to work on printed characters while ICR is focusing on hand printed characters. • To enhance the ICR recognition accuracy it is common in all technologies to use meta data, for example: • regular expression • name/city and other dictionaries • database look ups CSE Department, PCE
  • 7.
    4.Working Architecture/Component/Methods • Broadlyspeaking, there are two different ways to solve this problem, 1. by recognizing characters in their entirety (pattern recognition) or 2. by detecting the individual lines and strokes characters are made from (feature detection) and identifying them that way CSE Department, PCE
  • 8.
    5.Limitations • Recognition oftypewritten text is still not 100% accurate even where clear imaging is available. Character-by-character OCR accuracy for commercial OCR software varies from 81% to 99% • Recognition of cursive handwriting, and printed text in other scripts are still the subject of active research. CSE Department, PCE
  • 9.
    6.Future Scope • Totalaccuracy can be achieved by: 1. Human review (Manual Correction) In this, Humans intervention is required as the word may not be available in data dictionary 2. Data Dictionary Authentication (Automatic Correction) In this, the Algorithm searches for the unrecognized pattern or feature in a data dictionary which then provides with the probable correct pattern or feature. CSE Department, PCE
  • 10.
    7.CONCLUSION • Thus wecan now conclude that : 1. OCR algorithm works well for printed text and fairly for hand written text 2. It needs research for cursive handwritten text 3. Accuracy is not 100% even if the input provided is in its best form 4. Accuracy can be improved by implementing 1. Human review 2. Data dictionary authentication CSE Department, PCE
  • 11.
    8.REFERENCES • Research papers:- 1. OCR accuracy improvement by Jyoti Goyal, CLDU, Sirsa. 2. Automatic Number plate recognition by Mohd. Arif, University of Karachi. • www.Wikipedia.com • www.explainthatstuff.com CSE Department, PCE
  • 12.