Optical Character
Recognition (OCR)
Supervisor : Er. Bishnu Hari Poudel
Team Members:
Binay Thapa Magar (070/611)
Neeraj Neupane (070/622)
Prem Kumari Pun (070/627)
Sumit Gautam (070/642)
Overview
● a technology that enables
converting non-digital files
into editable text files
●recognition of printed or
written text characters by a
computer
Architecture of OCR
GRAYSCALE &
Objectives
● Converting paper document into editable text
format
● Speeding up character recognition in
document processing
● Embedding to different application systems
● Creating an OCR System for uni-font and uni-
text size recognition
Methodologies
Used in OCR
1. Grayscaling
2. Binarization
3. Noise Removing
4. Image Sharpening
5. Line - Word - Character
Segmentation
6. Feature Extraction
7. Recognition
Method 1 : Grayscaling
● Normal image is converted to Grayscale image
● This image has equal intensity of Red, Green and Blue
colors
● Algorithm used : Luminosity Method
Normal Image Grayscale Image
Method 2 : Binarization
● Converting the image into black and white representation.
i.e., intensity information will be reduced to two values
respectively 0 and 1.
● Algorithm used : Otsu’s Algorithm
Grayscale Image
Binarized Image
Method 3 : Noise Removing
● Noise: Random fluctuations in
brightness, color and intensity that
usually make it more difficult to
process image properly
● Salt-Pepper Noise (Common
Noise in Images)
● This will have dark pixels in bright
regions and bright pixels in dark
regions
Example of Salt-Pepper Noise
Method 4 : Image Sharpening
● Algorithm used : Erosion and Dilation Algorithm
● Erosion reduces image by getting rid of small
extrusions
● Dilation expands the region as it tends to fill out
small intrusions
Dilation Erosion
Method 5 : Line - word - character segmentation
● Separate different parts of the image i.e. isolate one word from another
and then separate letters of the word.
● Algorithm used : Histogram and Mathematical Computation
Segmentation
Method 6 : Feature Extraction
● Finding a set of features that define the shape of the
underlying character as precisely and Uniquely as
possible.
● Algorithm used : Zoning Method
Method 7 : Classification
● Use extracted features to identify character according
to preset rules.
● Determining region of feature space in which the
unknown pattern falls.
● Algorithm used : K- Nearest Neighbor Classification
Method
Use Case Diagram of OCR System
Sequence Diagram of OCR System
Screenshots of OCR System
Screenshots of OCR System
Screenshots of OCR System
Conclusion and
Future Scopes
● Automatic data entry is most
attractive and lucrative
technology
● Saving time and space with
electronic data entry procedure
● OCR can be used for Research
and Development Purposes
● Handwritten documents
recognized in future
Some References
1. https://github.com/tesseract-ocr/tesseract
2. https://www.youtube.com/watch?v=uU1LHp-wFDQ
3. https://www.scribd.com/doc/87514317/Project-Report-OCR
4. https://www.iiti.ac.in/people/~tanimad/JavaTheCompleteReference.pdf
5. https://www.robotix.in/tutorial/imageprocessing/noise.html
6. http://ksuweb.kennesaw.edu/~plaval/research/mean/noise/ocr.htm
7. http://www.felixniklas.com/imageprocessing/binarization
8. http://www.felixniklas.com/imageprocessing/segmentation
9. https://www.tutorialspoint.com/dip/grayscale_to_rgb_conversion.htm
10. Digital Image Processing Using MATLAB by R.C. Gonzalez
THANK YOU

OCR Presentation (Optical Character Recognition)

  • 1.
    Optical Character Recognition (OCR) Supervisor: Er. Bishnu Hari Poudel Team Members: Binay Thapa Magar (070/611) Neeraj Neupane (070/622) Prem Kumari Pun (070/627) Sumit Gautam (070/642)
  • 2.
    Overview ● a technologythat enables converting non-digital files into editable text files ●recognition of printed or written text characters by a computer
  • 3.
  • 4.
    Objectives ● Converting paperdocument into editable text format ● Speeding up character recognition in document processing ● Embedding to different application systems ● Creating an OCR System for uni-font and uni- text size recognition
  • 5.
    Methodologies Used in OCR 1.Grayscaling 2. Binarization 3. Noise Removing 4. Image Sharpening 5. Line - Word - Character Segmentation 6. Feature Extraction 7. Recognition
  • 6.
    Method 1 :Grayscaling ● Normal image is converted to Grayscale image ● This image has equal intensity of Red, Green and Blue colors ● Algorithm used : Luminosity Method Normal Image Grayscale Image
  • 7.
    Method 2 :Binarization ● Converting the image into black and white representation. i.e., intensity information will be reduced to two values respectively 0 and 1. ● Algorithm used : Otsu’s Algorithm Grayscale Image Binarized Image
  • 8.
    Method 3 :Noise Removing ● Noise: Random fluctuations in brightness, color and intensity that usually make it more difficult to process image properly ● Salt-Pepper Noise (Common Noise in Images) ● This will have dark pixels in bright regions and bright pixels in dark regions Example of Salt-Pepper Noise
  • 9.
    Method 4 :Image Sharpening ● Algorithm used : Erosion and Dilation Algorithm ● Erosion reduces image by getting rid of small extrusions ● Dilation expands the region as it tends to fill out small intrusions Dilation Erosion
  • 10.
    Method 5 :Line - word - character segmentation ● Separate different parts of the image i.e. isolate one word from another and then separate letters of the word. ● Algorithm used : Histogram and Mathematical Computation Segmentation
  • 11.
    Method 6 :Feature Extraction ● Finding a set of features that define the shape of the underlying character as precisely and Uniquely as possible. ● Algorithm used : Zoning Method
  • 12.
    Method 7 :Classification ● Use extracted features to identify character according to preset rules. ● Determining region of feature space in which the unknown pattern falls. ● Algorithm used : K- Nearest Neighbor Classification Method
  • 13.
    Use Case Diagramof OCR System
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
    Conclusion and Future Scopes ●Automatic data entry is most attractive and lucrative technology ● Saving time and space with electronic data entry procedure ● OCR can be used for Research and Development Purposes ● Handwritten documents recognized in future
  • 19.
    Some References 1. https://github.com/tesseract-ocr/tesseract 2.https://www.youtube.com/watch?v=uU1LHp-wFDQ 3. https://www.scribd.com/doc/87514317/Project-Report-OCR 4. https://www.iiti.ac.in/people/~tanimad/JavaTheCompleteReference.pdf 5. https://www.robotix.in/tutorial/imageprocessing/noise.html 6. http://ksuweb.kennesaw.edu/~plaval/research/mean/noise/ocr.htm 7. http://www.felixniklas.com/imageprocessing/binarization 8. http://www.felixniklas.com/imageprocessing/segmentation 9. https://www.tutorialspoint.com/dip/grayscale_to_rgb_conversion.htm 10. Digital Image Processing Using MATLAB by R.C. Gonzalez
  • 20.