OCR.pdf write the principale and explain steps

See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/360620085
OPTICAL CHARACTER RECOGNITION
Article · May 2022
CITATIONS
14
READS
10,412
1 author:
C K Gomathy
Sri Chandrasekharendra Saraswathi Viswa Mahavidyalaya University
281 PUBLICATIONS 1,110 CITATIONS
SEE PROFILE
All content following this page was uploaded by C K Gomathy on 16 May 2022.
The user has requested enhancement of the downloaded file.

OPTICAL CHARACTER RECOGNITION
Dr. V. Geetha1
, Ch V V Sudheer2
, A V Saikumar3
Dr C K Gomathy4
1, 4 Assistant Professor, Dept. of CSE, SCSVMV (Deemed to be University), Kancheepuram, TamilNadu, India
2, 3 Student, Dept. of CSE, SCSVMV (Deemed to be University), Kancheepuram, TamilNadu, India
ABSTRACT:
The project is about OCR. It stands for "Optical Character Recognition." OCR is an input
device used to read a printed text. OCR scans the text optically, character by character,
converts them into a machine readable code and stores the text on the system memory and
convert it into Document. The technology of optical character recognition (OCR) was used to
transform printed text into editable text. In a variety of applications, OCR is a very helpful
and popular approach. Text preparation and segmentation techniques can influence OCR
accuracy. Because of the image's varying size, style, orientation, and intricate backdrop,
retrieving text from it might be challenging at times.It is a technology that recognizes text
within a digital image. It is commonly used to recognize text in scanned documents and
images. OCR can be used to convert a physical paper document, or an image into an
accessible electronic version with text.
Keywords: OCR, Tesseract, OpenCV, python.
I. INTRODUCTION
Text recognition is one of the most
prominent applications of computer vision
which is being used by several
multinational Tech companies such as
Apple, Google, etc. Apple recently
announced including the "Live Text"
feature in iOS15.This functionality is
similar to how Google Lens works on
Android phones and the Google Search
and Photos apps on iOS. So, the basic
procedure of how these feature works are,
a person has to point the camera at an
image or text given on a board sign or a
paper. The Live Text feature recognizes
the text present in the image, be it a
contact number or an email id. These
features work on a service or technology
called OCR (Optical Character
Recognition). For decades, OCR was the
sole means to transform printouts into
computer-processable data, and it is still
the preferred method for turning paper
invoices into extractable data that can be
linked into financial systems, for example.
JOURNAL OF ENGINEERING, COMPUTING & ARCHITECTURE
Volume 12, Issue 3, MARCH - 2022
ISSN NO:1934-7197
http://www.journaleca.com/ Page No: 211

However, electronic document submission
now provides organizations with a
significantly improved approach to areas
like invoicing and sales processing,
lowering costs and allowing employees to
focus on higher-value activities.
II. PROBLEM STATEMENT
The problem here is for the software
systems to recognize characters in
computer system when information is
scanned through paper documents as we
know that we have number of newspapers
and books which are in printed format
related to different subjects. Whenever we
scan the documents through the scanner,
the documents are stored as images such as
jpeg, gif etc., in the computer system.
These images cannot be read or edited by
the user. But to reuse this information it is
very difficult to read the individual
contents and searching the contents form
these documents line-by-line and word-by-
word. These days there is a huge demand
in "storing the information available in
these paper documents in to a computer
storage disk and then later editing or
reusing this information by searching
process.
III. LITERATURE SURVEY
The first commercialized OCR of this
generation was IBM 1418, which was
designed to read a special IBM font407.
The recognition method was template
matching, which compares the character
image with a library of prototype images
for each character of each font. Next
generation machines were able to
recognize regular machine-printed and
hand printed characters. The character set
was limited to numerals and a few letters
and symbols. Such machines appeared in
the middle of 1960s to early 1970s.For the
third generation of OCR systems, the
challenges were documents of poor quality
and large printed and hand-written
character sets. Low cost and high
performance were also important
objectives. Commercial OCR systems with
such capabilities appeared during the
decade 1975 to 1985.The fourth generation
can be characterized by the OCR of
complex documents intermixing with text,
graphics, tables and mathematical
symbols, unconstrained handwritten
characters, color documents, low-quality
noisy documents, etc. Among the
commercial products, postal address
readers, and reading aids for the blind are
available in the market.
IV.EXISTING SYSTEM
In the running world there is a growing
demand for the users to convert the printed
documents for maintaining the security of
their data.Manually,it is time taken process
to note text in a image without a
software.So The basic OCR system was
invented to convert the data available on
papers and images into computer
processed documents.
V. PROPOSED METHOD
OCR is able to recognize the text in the
scanned documents and images into an
accessible electronic version with text and
it will convert into documents and we are
doing it with real time example using
webcam.So the characters in the images
can be recognized. The OCR technology
allows for us to search the text by words
found within the document.
ISSN NO:1934-7197

VI.ARCHITECTURE
Optical Character Recognition, or OCR, is
a technology that enables us to convert
different types of documents, such as
scanned paper documents, PDF files or
images captured by a digital camera or
phone into editable and searchable data.
Fig 1: OCR Process
VII. CONCLUSION
Optical character recognition is a
necessary first step for all applications that
consider image as input Recognition of
printed text gives good results. Almost all
the data read was correct. Only few
recognized fields contained mistakes, but
they have been unreadable or damaged
during the scanning process. Our
evaluation shows that LBP with SVM
gives optimal results with accuracy of
96.5%. Our survey has shown that data
manually rewritten from the form by an
experienced user contains less mistakes
than the data recognized by OCR/system.
VIII. REFERENCES
[1] DR.C.K.Gomathy , V.Geetha , S.Madhumitha
, S.Sangeetha , R.Vishnupriya Article: A
Secure With Efficient Data Transaction In
Cloud Service, Published by International
Journal of Advanced Research in Computer
Engineering & Technology (IJARCET)
Volume 5 Issue 4, March 2016, ISSN: 2278 –
1323.
[2] Dr.C.K.Gomathy,C K Hemalatha, Article: A
Study On Employee Safety And Health
Management International Research Journal
Of Engineering And Technology (Irjet)-
Volume: 08 Issue: 04 | Apr 2021
[3] Dr.C K Gomathy, Article: A Study on the
Effect of Digital Literacy and information
Management, IAETSD Journal For Advanced
Research In Applied Sciences, Volume 7 Issue
3, P.No-51-57, ISSN NO: 2279-
543X,Mar/2018
[4] Dr.C K Gomathy, Article: An Effective
Innovation Technology In Enhancing Teaching
And Learning Of Knowledge Using Ict
Methods, International Journal Of
Contemporary Research In Computer Science
And Technology (Ijcrcst) E-Issn: 2395-5325
Volume3, Issue 4,P.No-10-13, April ’2017
[5] Dr.C K Gomathy, Article: Supply chain-
Impact of importance and Technology in
Software Release Management, International
Journal of Scientific Research in Computer
Science Engineering and Information
Technology ( IJSRCSEIT ) Volume 3 | Issue 6
| ISSN : 2456-3307, P.No:1-4, July-2018.
[6] C K Gomathy and V Geetha. Article: A Real
Time Analysis of Service based using Mobile
Phone Controlled Vehicle using DTMF for
Accident Prevention. International Journal of
Computer Applications 138(2):11-13, March
2016. Published by Foundation of Computer
Science (FCS), NY, USA,ISSN No: 0975-
8887
[7] C K Gomathy and V Geetha. Article:
Evaluation on Ethernet based Passive Optical
Network Service Enhancement through
Splitting of Architecture. International Journal
of Computer Applications 138(2):14-17,
March 2016. Published by Foundation of
Computer Science (FCS), NY, USA, ISSN No:
0975-8887
ISSN NO:1934-7197

[8] C.K.Gomathy and Dr.S.Rajalakshmi.(2014),
"A Software Design Pattern for Bank Service
Oriented Architecture", International Journal
of Advanced Research in Computer
Engineering and Technology(IJARCET),
Volume 3,Issue IV, April 2014,P.No:1302-
1306, ,ISSN:2278-1323.
[9] C. K. Gomathy and S. Rajalakshmi, "A
software quality metric performance of
professional management in service
oriented architecture," Second
International Conference on Current
Trends in Engineering and Technology -
ICCTET 2014, 2014, pp. 41-47, doi:
10.1109/ICCTET.2014.6966260.
[10] Dr.C K Gomathy, V Geetha ,T N V Siddartha,
M Sandeep , B Srinivasa Srujay Article: Web
Service Composition In A Digitalized Health Care
Environment For Effective Communications,
Published by International Journal of Advanced
Research in Computer Engineering & Technology
(IJARCET) Volume 5 Issue 4, April 2016, ISSN:
2278 – 1323.
[11] Dr.C K Gomathy, V Geetha , T.Jayanthi,
M.Bhargavi, P.Sai Haritha Article: A Medical
Information Security Using Cryptosystem For
Wireless Sensor Networks, International Journal Of
Contemporary Research In Computer Science And
Technology (Ijcrcst) E-Issn: 2395-5325 Volume3,
Issue 4, P.No-1-5,April ’2017
[12] V Geetha , Dr.C K Gomathy T.Jayanthi, R.
Jayashree,, S. Indhumathi, E. Avinash,, Article:
An Efficient Prediction Of Medical Diseases Using
Pattern Mining In Data Exploration, International
Journal Of Contemporary Research In Computer
Science And Technology (Ijcrcst) E-Issn: 2395-
5325 Volume3, Issue 4,P.No-18-21,April ’2017
[13]V Geetha , Dr.C K Gomathy T.Jayanthi,
G.Vamsi , N.P.Ganesh, G.Raheshwara Rao,
Article: An Effective Implementation Of Data
Prefetching To Alleviate The Storage Access
Latency, International Journal Of Contemporary
Research In Computer Science And Technology
(Ijcrcst) E-Issn: 2395-5325 Volume3, Issue 4,P.No-
14-17.April ’2017
ISSN NO:1934-7197
View publication stats

OCR.pdf write the principale and explain steps

More Related Content

Similar to OCR.pdf write the principale and explain steps

Recently uploaded

OCR.pdf write the principale and explain steps