2. When we want to know about
something, at first 4
question comes in our
mind. These are:
•What is it?
•What are the use
of it?
•How it works?
•Why it is needed?
3. Like the same way
when we want to know
about OCR,
these question comes
in our mind.
•What is OCR?
•Where is OCR
commonly used?
•How OCR works?
•Why is OCR
needed?
4. Question 1: What is OCR?
• OCR stands for "Optical Character Recognition." OCR is a technology that
recognizes text within a digital image. It is commonly used to recognize text
in scanned documents, but it serves many other purposes as well.
• OCR (optical character recognition) is the use of technology to distinguish
printed or handwritten text characters inside digital images of physical
documents, such as a scanned paper document. The basic process of OCR
involves examining the text of a document and translating the characters
into code that can be used for data processing. OCR is sometimes also
referred to as text recognition.
• OCR systems are made up of a combination of hardware and software that is
used to convert physical documents into machine-readable text. Hardware,
such as an optical scanner or specialized circuit board is used to copy or read
text while software typically handles the advanced processing. Software can
also take advantage of artificial intelligence (AI) to implement more
advanced methods of intelligent character recognition (ICR), like identifying
languages or styles of handwriting.
5.
6. Question 2: Where is OCR
commonly used?
•Probably the most well known use case for
OCR is converting printed paper documents
into machine-readable text documents. Once
a scanned paper document went through
OCR processing, the text of the document
can be edited with word processors like
Microsoft Word or Google Docs.
7. Using OCR to convert printed document to computer text
8. Question 3: How OCR works?
• OCR is the process of turning a picture of text into text itself—in
other words, producing something like a TXT or DOC file from a
scanned JPG of a printed or handwritten page.
• OCR software programs all work a little differently depending on
the developer and its intended purpose but still follow several
common principles.
• The software typically has a preprocessing phase that attempts to
make the text in the document clearer and easier to read. No
scanner is perfect, so with most modern, commercial scanners,
there are bound to be imperfections in the scanned image. It
does this by cleaning up the image and isolating the characters
from everything else. It makes sure the lines of text are properly
aligned and the pixels are smoothed out.
9. Continue…
• The next common step is for the software to isolate each individual
character, recognizing the characters consisting of pixels and the spaces
between them. This allows the program to process each individual
character, as well as recognize that a grouping of characters makes up a
word.
• The next stage is the trickiest, and commonly the one that separates
different OCR programs. Once the OCR program knows what constitutes a
character that it must recognize, it’s time to figure out what the character
it is, so it can assign the corresponding metadata to it. Simple OCR
software back checks the characters with common fonts from a library to
recognize if they match and the data can assigned. However, for text that
doesn’t match any recognizable fonts in a library, such as uncommon
fonts or handwritten text, more sophisticated techniques are required.
10.
11. Question 4: Why is OCR needed?
• OCR converts normal scanned documents text-searchable so to allow content search on
the same. ... It is important that the OCR software integrated with your document
management system is a reliable one as inconsistent search results would defeat the very
purpose of installing a DMS in your organization.