See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/360620085
OPTICAL CHARACTER RECOGNITION
Article · May 2022
CITATIONS
14
READS
10,412
1 author:
C K Gomathy
Sri Chandrasekharendra Saraswathi Viswa Mahavidyalaya University
281 PUBLICATIONS 1,110 CITATIONS
SEE PROFILE
All content following this page was uploaded by C K Gomathy on 16 May 2022.
The user has requested enhancement of the downloaded file.
OPTICAL CHARACTER RECOGNITION
Dr. V. Geetha1
, Ch V V Sudheer2
, A V Saikumar3
Dr C K Gomathy4
1, 4 Assistant Professor, Dept. of CSE, SCSVMV (Deemed to be University), Kancheepuram, TamilNadu, India
2, 3 Student, Dept. of CSE, SCSVMV (Deemed to be University), Kancheepuram, TamilNadu, India
ABSTRACT:
The project is about OCR. It stands for "Optical Character Recognition." OCR is an input
device used to read a printed text. OCR scans the text optically, character by character,
converts them into a machine readable code and stores the text on the system memory and
convert it into Document. The technology of optical character recognition (OCR) was used to
transform printed text into editable text. In a variety of applications, OCR is a very helpful
and popular approach. Text preparation and segmentation techniques can influence OCR
accuracy. Because of the image's varying size, style, orientation, and intricate backdrop,
retrieving text from it might be challenging at times.It is a technology that recognizes text
within a digital image. It is commonly used to recognize text in scanned documents and
images. OCR can be used to convert a physical paper document, or an image into an
accessible electronic version with text.
Keywords: OCR, Tesseract, OpenCV, python.
I. INTRODUCTION
Text recognition is one of the most
prominent applications of computer vision
which is being used by several
multinational Tech companies such as
Apple, Google, etc. Apple recently
announced including the "Live Text"
feature in iOS15.This functionality is
similar to how Google Lens works on
Android phones and the Google Search
and Photos apps on iOS. So, the basic
procedure of how these feature works are,
a person has to point the camera at an
image or text given on a board sign or a
paper. The Live Text feature recognizes
the text present in the image, be it a
contact number or an email id. These
features work on a service or technology
called OCR (Optical Character
Recognition). For decades, OCR was the
sole means to transform printouts into
computer-processable data, and it is still
the preferred method for turning paper
invoices into extractable data that can be
linked into financial systems, for example.
JOURNAL OF ENGINEERING, COMPUTING & ARCHITECTURE
Volume 12, Issue 3, MARCH - 2022
ISSN NO:1934-7197
http://www.journaleca.com/ Page No: 211
However, electronic document submission
now provides organizations with a
significantly improved approach to areas
like invoicing and sales processing,
lowering costs and allowing employees to
focus on higher-value activities.
II. PROBLEM STATEMENT
The problem here is for the software
systems to recognize characters in
computer system when information is
scanned through paper documents as we
know that we have number of newspapers
and books which are in printed format
related to different subjects. Whenever we
scan the documents through the scanner,
the documents are stored as images such as
jpeg, gif etc., in the computer system.
These images cannot be read or edited by
the user. But to reuse this information it is
very difficult to read the individual
contents and searching the contents form
these documents line-by-line and word-by-
word. These days there is a huge demand
in "storing the information available in
these paper documents in to a computer
storage disk and then later editing or
reusing this information by searching
process.
III. LITERATURE SURVEY
The first commercialized OCR of this
generation was IBM 1418, which was
designed to read a special IBM font407.
The recognition method was template
matching, which compares the character
image with a library of prototype images
for each character of each font. Next
generation machines were able to
recognize regular machine-printed and
hand printed characters. The character set
was limited to numerals and a few letters
and symbols. Such machines appeared in
the middle of 1960s to early 1970s.For the
third generation of OCR systems, the
challenges were documents of poor quality
and large printed and hand-written
character sets. Low cost and high
performance were also important
objectives. Commercial OCR systems with
such capabilities appeared during the
decade 1975 to 1985.The fourth generation
can be characterized by the OCR of
complex documents intermixing with text,
graphics, tables and mathematical
symbols, unconstrained handwritten
characters, color documents, low-quality
noisy documents, etc. Among the
commercial products, postal address
readers, and reading aids for the blind are
available in the market.
IV.EXISTING SYSTEM
In the running world there is a growing
demand for the users to convert the printed
documents for maintaining the security of
their data.Manually,it is time taken process
to note text in a image without a
software.So The basic OCR system was
invented to convert the data available on
papers and images into computer
processed documents.
V. PROPOSED METHOD
OCR is able to recognize the text in the
scanned documents and images into an
accessible electronic version with text and
it will convert into documents and we are
doing it with real time example using
webcam.So the characters in the images
can be recognized. The OCR technology
allows for us to search the text by words
found within the document.
JOURNAL OF ENGINEERING, COMPUTING & ARCHITECTURE
Volume 12, Issue 3, MARCH - 2022
ISSN NO:1934-7197
http://www.journaleca.com/ Page No: 212
VI.ARCHITECTURE
Optical Character Recognition, or OCR, is
a technology that enables us to convert
different types of documents, such as
scanned paper documents, PDF files or
images captured by a digital camera or
phone into editable and searchable data.
Fig 1: OCR Process
VII. CONCLUSION
Optical character recognition is a
necessary first step for all applications that
consider image as input Recognition of
printed text gives good results. Almost all
the data read was correct. Only few
recognized fields contained mistakes, but
they have been unreadable or damaged
during the scanning process. Our
evaluation shows that LBP with SVM
gives optimal results with accuracy of
96.5%. Our survey has shown that data
manually rewritten from the form by an
experienced user contains less mistakes
than the data recognized by OCR/system.
VIII. REFERENCES
[1] DR.C.K.Gomathy , V.Geetha , S.Madhumitha
, S.Sangeetha , R.Vishnupriya Article: A
Secure With Efficient Data Transaction In
Cloud Service, Published by International
Journal of Advanced Research in Computer
Engineering & Technology (IJARCET)
Volume 5 Issue 4, March 2016, ISSN: 2278 –
1323.
[2] Dr.C.K.Gomathy,C K Hemalatha, Article: A
Study On Employee Safety And Health
Management International Research Journal
Of Engineering And Technology (Irjet)-
Volume: 08 Issue: 04 | Apr 2021
[3] Dr.C K Gomathy, Article: A Study on the
Effect of Digital Literacy and information
Management, IAETSD Journal For Advanced
Research In Applied Sciences, Volume 7 Issue
3, P.No-51-57, ISSN NO: 2279-
543X,Mar/2018
[4] Dr.C K Gomathy, Article: An Effective
Innovation Technology In Enhancing Teaching
And Learning Of Knowledge Using Ict
Methods, International Journal Of
Contemporary Research In Computer Science
And Technology (Ijcrcst) E-Issn: 2395-5325
Volume3, Issue 4,P.No-10-13, April ’2017
[5] Dr.C K Gomathy, Article: Supply chain-
Impact of importance and Technology in
Software Release Management, International
Journal of Scientific Research in Computer
Science Engineering and Information
Technology ( IJSRCSEIT ) Volume 3 | Issue 6
| ISSN : 2456-3307, P.No:1-4, July-2018.
[6] C K Gomathy and V Geetha. Article: A Real
Time Analysis of Service based using Mobile
Phone Controlled Vehicle using DTMF for
Accident Prevention. International Journal of
Computer Applications 138(2):11-13, March
2016. Published by Foundation of Computer
Science (FCS), NY, USA,ISSN No: 0975-
8887
[7] C K Gomathy and V Geetha. Article:
Evaluation on Ethernet based Passive Optical
Network Service Enhancement through
Splitting of Architecture. International Journal
of Computer Applications 138(2):14-17,
March 2016. Published by Foundation of
Computer Science (FCS), NY, USA, ISSN No:
0975-8887
JOURNAL OF ENGINEERING, COMPUTING & ARCHITECTURE
Volume 12, Issue 3, MARCH - 2022
ISSN NO:1934-7197
http://www.journaleca.com/ Page No: 213
[8] C.K.Gomathy and Dr.S.Rajalakshmi.(2014),
"A Software Design Pattern for Bank Service
Oriented Architecture", International Journal
of Advanced Research in Computer
Engineering and Technology(IJARCET),
Volume 3,Issue IV, April 2014,P.No:1302-
1306, ,ISSN:2278-1323.
[9] C. K. Gomathy and S. Rajalakshmi, "A
software quality metric performance of
professional management in service
oriented architecture," Second
International Conference on Current
Trends in Engineering and Technology -
ICCTET 2014, 2014, pp. 41-47, doi:
10.1109/ICCTET.2014.6966260.
[10] Dr.C K Gomathy, V Geetha ,T N V Siddartha,
M Sandeep , B Srinivasa Srujay Article: Web
Service Composition In A Digitalized Health Care
Environment For Effective Communications,
Published by International Journal of Advanced
Research in Computer Engineering & Technology
(IJARCET) Volume 5 Issue 4, April 2016, ISSN:
2278 – 1323.
[11] Dr.C K Gomathy, V Geetha , T.Jayanthi,
M.Bhargavi, P.Sai Haritha Article: A Medical
Information Security Using Cryptosystem For
Wireless Sensor Networks, International Journal Of
Contemporary Research In Computer Science And
Technology (Ijcrcst) E-Issn: 2395-5325 Volume3,
Issue 4, P.No-1-5,April ’2017
[12] V Geetha , Dr.C K Gomathy T.Jayanthi, R.
Jayashree,, S. Indhumathi, E. Avinash,, Article:
An Efficient Prediction Of Medical Diseases Using
Pattern Mining In Data Exploration, International
Journal Of Contemporary Research In Computer
Science And Technology (Ijcrcst) E-Issn: 2395-
5325 Volume3, Issue 4,P.No-18-21,April ’2017
[13]V Geetha , Dr.C K Gomathy T.Jayanthi,
G.Vamsi , N.P.Ganesh, G.Raheshwara Rao,
Article: An Effective Implementation Of Data
Prefetching To Alleviate The Storage Access
Latency, International Journal Of Contemporary
Research In Computer Science And Technology
(Ijcrcst) E-Issn: 2395-5325 Volume3, Issue 4,P.No-
14-17.April ’2017
JOURNAL OF ENGINEERING, COMPUTING & ARCHITECTURE
Volume 12, Issue 3, MARCH - 2022
ISSN NO:1934-7197
http://www.journaleca.com/ Page No: 214
View publication stats

OCR.pdf write the principale and explain steps

  • 1.
    See discussions, stats,and author profiles for this publication at: https://www.researchgate.net/publication/360620085 OPTICAL CHARACTER RECOGNITION Article · May 2022 CITATIONS 14 READS 10,412 1 author: C K Gomathy Sri Chandrasekharendra Saraswathi Viswa Mahavidyalaya University 281 PUBLICATIONS 1,110 CITATIONS SEE PROFILE All content following this page was uploaded by C K Gomathy on 16 May 2022. The user has requested enhancement of the downloaded file.
  • 2.
    OPTICAL CHARACTER RECOGNITION Dr.V. Geetha1 , Ch V V Sudheer2 , A V Saikumar3 Dr C K Gomathy4 1, 4 Assistant Professor, Dept. of CSE, SCSVMV (Deemed to be University), Kancheepuram, TamilNadu, India 2, 3 Student, Dept. of CSE, SCSVMV (Deemed to be University), Kancheepuram, TamilNadu, India ABSTRACT: The project is about OCR. It stands for "Optical Character Recognition." OCR is an input device used to read a printed text. OCR scans the text optically, character by character, converts them into a machine readable code and stores the text on the system memory and convert it into Document. The technology of optical character recognition (OCR) was used to transform printed text into editable text. In a variety of applications, OCR is a very helpful and popular approach. Text preparation and segmentation techniques can influence OCR accuracy. Because of the image's varying size, style, orientation, and intricate backdrop, retrieving text from it might be challenging at times.It is a technology that recognizes text within a digital image. It is commonly used to recognize text in scanned documents and images. OCR can be used to convert a physical paper document, or an image into an accessible electronic version with text. Keywords: OCR, Tesseract, OpenCV, python. I. INTRODUCTION Text recognition is one of the most prominent applications of computer vision which is being used by several multinational Tech companies such as Apple, Google, etc. Apple recently announced including the "Live Text" feature in iOS15.This functionality is similar to how Google Lens works on Android phones and the Google Search and Photos apps on iOS. So, the basic procedure of how these feature works are, a person has to point the camera at an image or text given on a board sign or a paper. The Live Text feature recognizes the text present in the image, be it a contact number or an email id. These features work on a service or technology called OCR (Optical Character Recognition). For decades, OCR was the sole means to transform printouts into computer-processable data, and it is still the preferred method for turning paper invoices into extractable data that can be linked into financial systems, for example. JOURNAL OF ENGINEERING, COMPUTING & ARCHITECTURE Volume 12, Issue 3, MARCH - 2022 ISSN NO:1934-7197 http://www.journaleca.com/ Page No: 211
  • 3.
    However, electronic documentsubmission now provides organizations with a significantly improved approach to areas like invoicing and sales processing, lowering costs and allowing employees to focus on higher-value activities. II. PROBLEM STATEMENT The problem here is for the software systems to recognize characters in computer system when information is scanned through paper documents as we know that we have number of newspapers and books which are in printed format related to different subjects. Whenever we scan the documents through the scanner, the documents are stored as images such as jpeg, gif etc., in the computer system. These images cannot be read or edited by the user. But to reuse this information it is very difficult to read the individual contents and searching the contents form these documents line-by-line and word-by- word. These days there is a huge demand in "storing the information available in these paper documents in to a computer storage disk and then later editing or reusing this information by searching process. III. LITERATURE SURVEY The first commercialized OCR of this generation was IBM 1418, which was designed to read a special IBM font407. The recognition method was template matching, which compares the character image with a library of prototype images for each character of each font. Next generation machines were able to recognize regular machine-printed and hand printed characters. The character set was limited to numerals and a few letters and symbols. Such machines appeared in the middle of 1960s to early 1970s.For the third generation of OCR systems, the challenges were documents of poor quality and large printed and hand-written character sets. Low cost and high performance were also important objectives. Commercial OCR systems with such capabilities appeared during the decade 1975 to 1985.The fourth generation can be characterized by the OCR of complex documents intermixing with text, graphics, tables and mathematical symbols, unconstrained handwritten characters, color documents, low-quality noisy documents, etc. Among the commercial products, postal address readers, and reading aids for the blind are available in the market. IV.EXISTING SYSTEM In the running world there is a growing demand for the users to convert the printed documents for maintaining the security of their data.Manually,it is time taken process to note text in a image without a software.So The basic OCR system was invented to convert the data available on papers and images into computer processed documents. V. PROPOSED METHOD OCR is able to recognize the text in the scanned documents and images into an accessible electronic version with text and it will convert into documents and we are doing it with real time example using webcam.So the characters in the images can be recognized. The OCR technology allows for us to search the text by words found within the document. JOURNAL OF ENGINEERING, COMPUTING & ARCHITECTURE Volume 12, Issue 3, MARCH - 2022 ISSN NO:1934-7197 http://www.journaleca.com/ Page No: 212
  • 4.
    VI.ARCHITECTURE Optical Character Recognition,or OCR, is a technology that enables us to convert different types of documents, such as scanned paper documents, PDF files or images captured by a digital camera or phone into editable and searchable data. Fig 1: OCR Process VII. CONCLUSION Optical character recognition is a necessary first step for all applications that consider image as input Recognition of printed text gives good results. Almost all the data read was correct. Only few recognized fields contained mistakes, but they have been unreadable or damaged during the scanning process. Our evaluation shows that LBP with SVM gives optimal results with accuracy of 96.5%. Our survey has shown that data manually rewritten from the form by an experienced user contains less mistakes than the data recognized by OCR/system. VIII. REFERENCES [1] DR.C.K.Gomathy , V.Geetha , S.Madhumitha , S.Sangeetha , R.Vishnupriya Article: A Secure With Efficient Data Transaction In Cloud Service, Published by International Journal of Advanced Research in Computer Engineering & Technology (IJARCET) Volume 5 Issue 4, March 2016, ISSN: 2278 – 1323. [2] Dr.C.K.Gomathy,C K Hemalatha, Article: A Study On Employee Safety And Health Management International Research Journal Of Engineering And Technology (Irjet)- Volume: 08 Issue: 04 | Apr 2021 [3] Dr.C K Gomathy, Article: A Study on the Effect of Digital Literacy and information Management, IAETSD Journal For Advanced Research In Applied Sciences, Volume 7 Issue 3, P.No-51-57, ISSN NO: 2279- 543X,Mar/2018 [4] Dr.C K Gomathy, Article: An Effective Innovation Technology In Enhancing Teaching And Learning Of Knowledge Using Ict Methods, International Journal Of Contemporary Research In Computer Science And Technology (Ijcrcst) E-Issn: 2395-5325 Volume3, Issue 4,P.No-10-13, April ’2017 [5] Dr.C K Gomathy, Article: Supply chain- Impact of importance and Technology in Software Release Management, International Journal of Scientific Research in Computer Science Engineering and Information Technology ( IJSRCSEIT ) Volume 3 | Issue 6 | ISSN : 2456-3307, P.No:1-4, July-2018. [6] C K Gomathy and V Geetha. Article: A Real Time Analysis of Service based using Mobile Phone Controlled Vehicle using DTMF for Accident Prevention. International Journal of Computer Applications 138(2):11-13, March 2016. Published by Foundation of Computer Science (FCS), NY, USA,ISSN No: 0975- 8887 [7] C K Gomathy and V Geetha. Article: Evaluation on Ethernet based Passive Optical Network Service Enhancement through Splitting of Architecture. International Journal of Computer Applications 138(2):14-17, March 2016. Published by Foundation of Computer Science (FCS), NY, USA, ISSN No: 0975-8887 JOURNAL OF ENGINEERING, COMPUTING & ARCHITECTURE Volume 12, Issue 3, MARCH - 2022 ISSN NO:1934-7197 http://www.journaleca.com/ Page No: 213
  • 5.
    [8] C.K.Gomathy andDr.S.Rajalakshmi.(2014), "A Software Design Pattern for Bank Service Oriented Architecture", International Journal of Advanced Research in Computer Engineering and Technology(IJARCET), Volume 3,Issue IV, April 2014,P.No:1302- 1306, ,ISSN:2278-1323. [9] C. K. Gomathy and S. Rajalakshmi, "A software quality metric performance of professional management in service oriented architecture," Second International Conference on Current Trends in Engineering and Technology - ICCTET 2014, 2014, pp. 41-47, doi: 10.1109/ICCTET.2014.6966260. [10] Dr.C K Gomathy, V Geetha ,T N V Siddartha, M Sandeep , B Srinivasa Srujay Article: Web Service Composition In A Digitalized Health Care Environment For Effective Communications, Published by International Journal of Advanced Research in Computer Engineering & Technology (IJARCET) Volume 5 Issue 4, April 2016, ISSN: 2278 – 1323. [11] Dr.C K Gomathy, V Geetha , T.Jayanthi, M.Bhargavi, P.Sai Haritha Article: A Medical Information Security Using Cryptosystem For Wireless Sensor Networks, International Journal Of Contemporary Research In Computer Science And Technology (Ijcrcst) E-Issn: 2395-5325 Volume3, Issue 4, P.No-1-5,April ’2017 [12] V Geetha , Dr.C K Gomathy T.Jayanthi, R. Jayashree,, S. Indhumathi, E. Avinash,, Article: An Efficient Prediction Of Medical Diseases Using Pattern Mining In Data Exploration, International Journal Of Contemporary Research In Computer Science And Technology (Ijcrcst) E-Issn: 2395- 5325 Volume3, Issue 4,P.No-18-21,April ’2017 [13]V Geetha , Dr.C K Gomathy T.Jayanthi, G.Vamsi , N.P.Ganesh, G.Raheshwara Rao, Article: An Effective Implementation Of Data Prefetching To Alleviate The Storage Access Latency, International Journal Of Contemporary Research In Computer Science And Technology (Ijcrcst) E-Issn: 2395-5325 Volume3, Issue 4,P.No- 14-17.April ’2017 JOURNAL OF ENGINEERING, COMPUTING & ARCHITECTURE Volume 12, Issue 3, MARCH - 2022 ISSN NO:1934-7197 http://www.journaleca.com/ Page No: 214 View publication stats