Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
A survey on optical character recognition system
1. A Survey on Optical Character Recognition System
In this paper author is doing survey on extracting text from images using
Optical Recognition System which consistof two parts online which extracts
text while personis writing text and it’s easy for recognition and other is offline
which extracts or recognize text from images.
Extracting text from images consists of following modules
1) ImageAcquisition: using this module users can upload image to
application
2) Preprocessing: using this module we can filter or clean the image to
make recognition better
3) Feature Extraction: Means extracting features from images to read
characters
4) Classification & Segmentation: Here each character will be divided into
segments and then classify with train characters for recognition.
Here we are using Tesseractsoftwarewith python to implement above
concepts
To run projectfollow below steps
First install python
Then install ‘tesseract-ocr-setup-3.02.02’ (this software u can find inside code
folder) software in C directory and then set path for tesseract-ocr-setup-3.02.02
file using below environment variable under system tab
Variable name : tesseract
Variable value = C:Tesseract-OCRtesseract.exe
Install below softwares using PIP command
pip install pytesseract
pip install opencv-python
pip install Pillow
After installing all files then run codeby double click on ‘run.bat’ file to get
below screen
2. In above screen click on ‘Image Acquisition/Upload Image’ button to upload
image with text
In above screen I am uploading one text image and then click on ‘Open’ button
to upload image
3. After uploading image click on ‘Preprocessing’ button to clean image, see in
below screen
In above screen we can see two images with text, left hand side image is
original image and right hand side image is pre-process image and from both
images we can see pre-process image looking little clean compare to original
image. Now click on ‘Feature Extraction & Image Segmentation’ to extract text
from uploaded clean image.
4. In above screen extracted or recognize text we can see in above screen text area.
Similarly u can upload any text image and then read it.
Note: No OCR software is 100% accurate in reading so it can recognize text
from image up to 80 to 90%
One more this the text in image should have little big size so application can
recognize it. U can test all images given by us in images folder