OCR Presentation (Optical Character Recognition)

Optical Character
Recognition (OCR)
Supervisor : Er. Bishnu Hari Poudel
Team Members:
Binay Thapa Magar (070/611)
Neeraj Neupane (070/622)
Prem Kumari Pun (070/627)
Sumit Gautam (070/642)

Overview
● a technology that enables
converting non-digital files
into editable text files
●recognition of printed or
written text characters by a
computer

Architecture of OCR
GRAYSCALE &

Objectives
● Converting paper document into editable text
format
● Speeding up character recognition in
document processing
● Embedding to different application systems
● Creating an OCR System for uni-font and uni-
text size recognition

Methodologies
Used in OCR
1. Grayscaling
2. Binarization
3. Noise Removing
4. Image Sharpening
5. Line - Word - Character
Segmentation
6. Feature Extraction
7. Recognition

Method 1 : Grayscaling
● Normal image is converted to Grayscale image
● This image has equal intensity of Red, Green and Blue
colors
● Algorithm used : Luminosity Method
Normal Image Grayscale Image

Method 2 : Binarization
● Converting the image into black and white representation.
i.e., intensity information will be reduced to two values
respectively 0 and 1.
● Algorithm used : Otsu’s Algorithm
Grayscale Image
Binarized Image

Method 3 : Noise Removing
● Noise: Random fluctuations in
brightness, color and intensity that
usually make it more difficult to
process image properly
● Salt-Pepper Noise (Common
Noise in Images)
● This will have dark pixels in bright
regions and bright pixels in dark
regions
Example of Salt-Pepper Noise

Method 4 : Image Sharpening
● Algorithm used : Erosion and Dilation Algorithm
● Erosion reduces image by getting rid of small
extrusions
● Dilation expands the region as it tends to fill out
small intrusions
Dilation Erosion

Method 5 : Line - word - character segmentation
● Separate different parts of the image i.e. isolate one word from another
and then separate letters of the word.
● Algorithm used : Histogram and Mathematical Computation
Segmentation

Method 6 : Feature Extraction
● Finding a set of features that define the shape of the
underlying character as precisely and Uniquely as
possible.
● Algorithm used : Zoning Method

Method 7 : Classification
● Use extracted features to identify character according
to preset rules.
● Determining region of feature space in which the
unknown pattern falls.
● Algorithm used : K- Nearest Neighbor Classification
Method

Use Case Diagram of OCR System

Sequence Diagram of OCR System

Conclusion and
Future Scopes
● Automatic data entry is most
attractive and lucrative
technology
● Saving time and space with
electronic data entry procedure
● OCR can be used for Research
and Development Purposes
● Handwritten documents
recognized in future

Some References
1. https://github.com/tesseract-ocr/tesseract
2. https://www.youtube.com/watch?v=uU1LHp-wFDQ
3. https://www.scribd.com/doc/87514317/Project-Report-OCR
4. https://www.iiti.ac.in/people/~tanimad/JavaTheCompleteReference.pdf
5. https://www.robotix.in/tutorial/imageprocessing/noise.html
6. http://ksuweb.kennesaw.edu/~plaval/research/mean/noise/ocr.htm
7. http://www.felixniklas.com/imageprocessing/binarization
8. http://www.felixniklas.com/imageprocessing/segmentation
9. https://www.tutorialspoint.com/dip/grayscale_to_rgb_conversion.htm
10. Digital Image Processing Using MATLAB by R.C. Gonzalez

OCR Presentation (Optical Character Recognition)

More Related Content

What's hot

Similar to OCR Presentation (Optical Character Recognition)

Recently uploaded

OCR Presentation (Optical Character Recognition)