SlideShare a Scribd company logo
1 of 49
Biniam Asnake
Girma Aweke
Presentation Outline
• Introduction
• Global Research Works
1) English OCR
2) Arabic OCR
3) Indian (Devanagari) OCR
• Local Research Works
1) Worku (1997)
2) Ermias (1998)
3) Dereje (1999)
4) Million (2000)
5) Nigussie (2000)
6) Yaregal (2002)
7) Million and Jawahar (2007)
8) Teshome (2009)
9) Abay (2010)
10) Yaregal and Bigun (2011)
• Conclusion
• Recommendation
Definition
“Optical character recognition (OCR) systems
take scanned images of paper documents as
input, and automatically convert them into
digital format for computer-aided data
processing”.
Architecture of an OCR system
Binarization
1. Template-matching
and correlation
2. Feature based
Filling
Thinning
Normalization
Skew Correction
I. Grouping.
II. Error-detection & correction
Benefit of using OCR
• Improve accuracy of data entry
• Increase efficiency in data storage, retrieval
and processing
• Identify the specific form within a particular
application
• Read texts and produce a synthesized voice
translation for visually impaired users.
Applications of OCR
• Data entry
• Text entry
• Process automation
• Other applications
– Aid for visually impaired people
– Automatic plate number readers
– Automatic cartography
– Signature verification and identification
Issues in Developing OCR
• Noise in Input Image
• Skew in Input Image
• Images embedded with Text in Input Image
Introduction to Global OCR system
• Modern OCR technology has been started in 1951 with
invention of GISMO A Robot Reader-Writer by M. Sheppard's
• Today, OCR systems are less expensive, faster, and more
reliable.
 Handwritten recognition
 Form reading Current research area of OCR
However , there is an intensive research particularly on
Reliable recognition of handwritten cursive script.
--- Introduction cont’d
• Hundreds of OCR systems have been developed since the 1950s
and many are commercially available today.
• Commercial OCR systems:
1. task- specific readers : handles only specific document types. It
includes read bank check, letter mail, or credit card slips. and
2. general purpose page readers: are designed to handle a broader
range of documents such as business letters, technical writings
and newspapers.
A task-specific reader [Address Readers, Form Readers, Check
Readers, Bill Processing Systems, Airline Ticket Readers, Passport
Readers]
• General purpose page readers
Handwritten English Character Recognition Using Neural
Network
By Anita Pal & Dayashankar Singh (2010)
Identified
Problem
difficulty of recognizing the handwritten characters of one
person from other.
Methods ,
Techniques
and Algorithms
Scan character: Acquire the sample handwritten character by
scanning. Then it has been converted into 1024 (32X32) binary
pixels
 Skeletonization : used to binary pixel image , remove extra
pixels and reduce broad strokes to thin lines. Eg,
Normalization operations: they used (30X30) standard.
 Reorganization Algorithms:
 Feature extraction: Boundary Detection Feature Extraction
technique
• Classification of Character :Neural network is used.
… Handwritten English Character Recognition Using Neural Network
By Anita Pal & Dayashankar Singh (2010)
Performance
Registered
The system has tested by application of Fourier descriptors with back
propagation technique and provides good recognition accuracy of
94%.
They used a sample of 250 for training and testing the data
Further
Research
Directions
 Implement and improve the recognition accuracy by using a new
technique or algorithms of feature extraction.
Recognition of on-line Arabic handwritten characters
using structural features
Ahmad T. Al-Taani and Saeed Al-Haj(2010)
Identified
Problem
Characters written by different persons representing the same
character are not identical but can vary in both size and shape.
Scope and
limitation of the
study
The proposed system works only on Arabic isolated letters
Methods ,
Techniques
and Algorithms
 Digitalization: hand-held and digital tablet as an input device
Feature extraction: used structure feature to extract the
character
 Recognition . Decision trees was used to classify the
characters based on the features that were extracted from the
input character.
… ……Recognition of on-line Arabic handwritten characters using structural
features
Ahmad T. Al-Taani and Saeed Al-Haj(2010)
Performance
Registered
Acc to the experiment , the following performance were achieved:
 the recognition rate of about 75.3% for all letters ( with letter
containing sharp edges )
The system may reach an average performance of 85.3% .(with
excluding sharp edge.)
Further
Research
Directions
 Those letter containing sharp edge that are not recognized by the
current system so that it is open for further research
Segmentation of Printed Text in Devanagari Script and Gurmukhi
Script
By Vijay Kumar and Pankaj K. Sengar (2010)
Identified
Problem
In segmentation, there is an error committed due to touching
characters, which the classifier cannot properly tackle.
Methods ,
Techniques
and Algorithms
 image categories and preprocessing
The researchers used: Binary level images, pseudo color and true
color images categorization to standardize the scanned image as
an input.
 Segmentation. used stage by stage segmentation methods.
… …… Segmentation of Printed Text in Devanagari Script and Gurmukhi
Script
By Vijay Kumar and Pankaj K. Sengar (2010)
Performance
Registered
After the proposed system has been manifested the following
performance at different level of text:
Devanagari script Gurmukhi script
@ Line level has a performance of 100% at 100% a
@ word level has a performance of ~ 100% 99%
@ charater level has a performance of 99%
@ top character level has a performance of 97% ------
Further
Research
Directions
Introduction to local research
As we investigated the local researches from various
sources, Much of the effort was made to study the application
of OCR to Amharic language. The main focus area of these
studies are :
the recognition of machine printed,
typewritten and handwritten Amharic documents and
the recognition of computer printouts on real-life documents
such as books, magazines and newspapers. This is therefore ,
the main objectives of this presentation is to review those
work based on specific criteria .
Application of OCR techniques To the Amharic text
Worku Alemu (1997)
Identified
Problem Huge amount of printed information resources are available
with Amharic language. However ,some of them are
irreplaceable, accessibly and modification is difficulty
Methods ,
Techniques
and Algorithms
Preprocessing task
Digitalization: To digitize the printed text(document image)
flatbed scanner (HP scan Jet IIc) at standard resolution
(300bpi) is used.
Segmentation: The stage by stage segmentation algorithm
was used in this research to segment lines and characters.
…….Application of OCR techniques To the Amharic text
Worku Alemu (1997)
Methods ,
Techniques
and Algorithms
Recognition Algorithms: pattern classification task which maps
each character image onto its symbolic identification. Two
algorithms that used for recognition of characters are:
 polygonal approximations and relaxation
topological features. He used a tree classification schemes built
by using topological features of a character.
To code the selected algorithms turbo c++ for windows was
used.
Performance
Registered
Four test cases he selected one main test case and achieved
good accuracy rate (98.87%) of recognition (laser print out of
Amharic text with normal typestyle of WashRa font , with 12
points font size.)
…….Application of OCR techniques To the Amharic text
Work Alemu (1997)
Further Research
Directions
Application of pre-processing and post-processing technique to
detect and correct error.
Segmentation of picture and text regions
Recognition of text in forms , tables and also in picture
Recognition of Character w/c are printed using any color on
whatever color of paper
Recognition of handwritten Amharic text
Recognition of formatted Amharic text
Recognition of formatted Amharic text using OCR techniques Ermias
Abebe (1998)
Identified
Problem
The researcher particularly improves the algorithms adopted by
Worku ( developed for single and fixed font size)in response to
font size and Typestyle .
So the aim of the researcher was to enable the algorithms
recognize character printed in different size and format
feature
Scope and
Limitation of the
study
•Researchers attempt was only limited with the problem of size,
underline and italics.
•The method is tested with only 231 basic Amharic characters
•Only thinning, normalization and underline removal technique of
segmentation are used
Methods ,
Techniques
and Algorithms
1. Preprocessing task
Digitalization: flatbed scanner at 300 dot per inch resolution
is used
Thinning Algorithms(suggested by Zang and suen) it used
to convert digital image into unit width image (skeleton).
Underline detection and removal: to remove underline from
segmented text line.
Normalization: this is the process of making all the size of
the character in the image must be equal .
……. Recognition of formatted Amharic text using OCR techniques
Ermias Abebe (1998)
Methods ,
Techniques
and Algorithms
Recognition Algorithms
The researcher used two recognition algorithms to work with
formatted text. He selected. because of their accessibility and
their supposed relevance to the problem at hand. These are
1.a generalized Character recognition Algorithms: A graphical
approach and ,
2. Symbol recognition without prior segmentation
Performance
Registered
By including preprocessing algorithms over worku’s work , we
can achieve better performance.
Further Research
Directions
Skew detection and correction
Recognition of typewritten texts and texts written on poor
quality papers.
Implement other algorithm that improves segmenting and
recognizing character irrespective of the size and style should be
test using Amharic script
Optical Character Recognition of Typewritten Amharic Character
Dereje Teferi (1999)
Identified
Problem
The conventional way of converting text in electronic format, is
typing through keyboard which is tedious, impossible in view of
the magnitude of document. Moreover, typing Amharic character
on computer needs two key strokes on average and worse
Scope and
Limitation of the
study
•The study is focus on developing a segmentation algorithm to
segment an Amharic character from document and forward the
result for recognition.
•It was confined only with 231 character of mechanical
typewriter.
Methods ,
Techniques
and Algorithms
Preprocessing stages
feature extraction/. So as to tackle the problem mentioned above
application of OCR that is capable of recognizing poor quality
Amharic typewritten character is needed.Detection
The features used for identifying and recognizing each character
is extracted from contour/line analysis.
…..Optical Character Recognition of Typewritten Amharic Character
Dereje Teferi (1999)
Methods ,
Techniques
and Algorithms
 Segmentation
Recursive segmentation :it is an approach that merges
segmentation and recognition together recursively
stage by stage segmentation:The researchers applied a
modified stage by stage segmentation algorithms. A threshold
vale is set for the width of the characters through experiment.
Image restoration: It is the process by which a degraded image
is fixed so that a better recognition performance is achieved.
Mathematical Morphology. This technique used by the
researcher to remove salt-and- pepper noise.
Binary morphological filter. This is used for removal of
subtractive and additive noise.
Performance
Registered
the researcher assured the OCR system produce a recognition
accuracy of 53.47% for documents written with mechanical
typewriter.
…..Optical Character Recognition of Typewritten Amharic Character
Dereje Teferi (1999)
Further Research
Directions
Integration of normalization technique to the present
development of Amharic OCR so that the system will be size
independent(size independent Amharic typewritten recognition
system)
Skew detection and correction algorithms should be developed
Form detection and removal algorithm to detect and extract text
from tables, forms etc. should be developed
Algorithms that recognize text written in any color on any
background should be developed
Recognition that are not very sensitive to the feature of the
characters should be developed
Algorithms for detecting formats such as, indention and
bulleting, and restoring them after recognition should be
developed.
Handwritten Amharic text Recognition applied to The processing of Bank checks
By Nigussie Tadesse (2000)
This was the first attempt reported that the researcher
studied the application of Handwritten Amharic text
recognition for the processing of bank checks.
Identified
Problem
The majority of clients in CBE’s are Ethiopians, thus most
request are made using Amharic language. To this end a big share
of checks is filled using Amharic character by hand. The bank used
a semi automated process so that the check filled by someone
else is keyed in the database using keyboard.This activity is very
slow, costly, and error prone.
Scope and
limitation of the
study
The research was limited to the recognition of handwritten legal
amounts.
The research also does not include the cents (fraction) or it
confined only on birr part of the legal amounts.
…..
Handwritten Amharic text Recognition applied to The processing of Bank checks
By Nigussie Tadesse (2000)
Methods ,
Techniques
and Algorithms
Preprocessing task
 the researcher used the following algorithm
1.underline removal: to remove underline stage by stage
segmentation (adopted from Ermias (1998)) and the connected
component analysis.
2.slant normalization :To normalize the slant the chain code
method was implemented
3.Recognition
• Size normalization: Normalize each character image to a fixed
size
• Training: In order to construct and experiment with different
neural network architectures, the EasyNN neural network
development tool was used. It helps to create, control, train,
validate and query different Multilayer networks with back-
propagation algorithm.
….. Handwritten Amharic text Recognition applied to The processing of Bank
checks
By Nigussie Tadesse (2000)
Performance
registered
The classification accuracy of the mentioned networks was tested
using 38 characters from the test data set. The first neural
network with 256 input, 7 hidden and 8 output node and trained
with 135 sample characters correctly classified 2 of the characters
of the test set. And the second network, which has 256 input, 20
hidden and 8 output node and trained with498 sample character
correctly classified 16 out of 38.
Future research
Direction
Apply further prepossessing so that the intra –class variability
between characters will be minimized.
Train a neural network with sufficient data and consider other
feature set for training.
It is the first attempt for application of OCR for Amharic
language and the research was not validate all the
check reading activities so it is open for future research.
cases Network
Architecture(M
LNN)
Sample trained
character
Test case/
Used sample
Exp.result
1 256 -7- 8 135 38 2
2 256-20-8 498 38 16 correctly classified
A Generalized Approach to Optical Character
Recognition (OCR) of Amharic texts
Million Meshesha (2000)
Identified
Problem
to generalize the previously adopted
recognition algorithm insensitive to the
different font types
Scope and
Limitations
• Only for three commonly used font types
namely Agafari, WashRa and Visual Geez
… A Generalized Approach to Optical Character Recognition (OCR) of
Amharic texts (Million Meshesha)
Methods ,
Techniques
and
Algorithms
Digitization flat bed scanner HP ScanJet at 300 dpi
resolution
Binarization threshold value of 112 intensity level.
Thinning Hybrid of parallel & Zang-Suen thinning
algorithm
Segmentation A step-by-step segmentation with some
modification
Feature
Extraction
and Detection
1. Topological features were extracted
2. a database was developed using binary tree
… A Generalized Approach to Optical Character Recognition (OCR) of
Amharic texts (Million Meshesha)
Performance
Registered
49.38% for WashRa
26.04% for Agafari_Addis Zemen
15.75% for Visual Geez.
Future
Research
Directions
 an algorithm for form detection and removal.
 Recognition of characters written on any paper color using any
color type.
 development post processing techniques such as a spell checker,
thesaurus, grammar, etc.
 Normalization of input image patterns
 Standardization of font types and their representation.
 mechanism should be designed to flag unrecognized and suspicious
characters
Optical Character Recognition of Amharic Text: An
Integrated Approach
Yaregal Assabie (2002)
Identified
Problem
to come up with a versatile algorithm that is
independent of the font size and other
quantitative parameters of Amharic
characters
Scope and
Limitations
there is no previously well-formed algorithm
… Optical Character Recognition of Amharic Text: An Integrated Approach
(Yaregal Assabie)
Methods ,
Techniques
and
Algorithms
Digitization Scanjet Pro scanner
Segmentation A step-by-step segmentation with some
modification
Training
Patterns with
Neural
Network
BrainMaker and NetMaker to converts text files
Feature
Extraction
and Detection
An improved primitive extraction algorithm is
developed
… Optical Character Recognition of Amharic Text: An Integrated Approach
(Yaregal Assabie)
Performance
Registered
Font included in the training set Not included
8 65.02% 62.87%
12 74.68% 81.07%
14 73.18% 70.04%
Future
Research
Directions
 detailed image preprocessing techniques need to be developed.
 an algorithm to detect forms, graphs and tables
recognition documents written with any color with any background
color
development of algorithms for primitive extraction, identification,
and relationship/connection handling
 character image databases that represent of different font types,
styles, and sizes for research purpose should be developed
 Post processing techniques such as spell checking, grammar and
semantic analysis need to be incorporated
 standardization of the representation of Ethiopic characters
Optical Character Recognition of Amharic Documents
Million Meshesha and C. V. Jawahar (2007)
Identified
Problem
Challenges in building an OCR for African scripts
• Degradation of documents
• Printing variations
• Large number of characters in the script
• Visual similarity of most characters in the script
• Language related issues.
… Optical Character Recognition of Amharic Documents
(Million Meshesha and C. V. Jawahar)
Methods ,
Techniques
and
Algorithms
Digitization flat-bed HP7670 Scanjet scanner (300 dpi)
Binarization Was done.
Skew corrected by a range of ± 20%.
Segmentation using different horizontal and vertical
projections
Normalization standard size of 20 x 20
Feature
Extraction and
Detection
1. Principal Component Analysis (PCA)
2. Linear Discriminant Analysis (LDA)
Classification Support Vector Machine or SVM-based
decision directed acyclic graph (DDAG)
classifier
… Optical Character Recognition of Amharic Documents
(Million Meshesha and C. V. Jawahar)
Performance
Registered
•On the average 96.95% accuracy is obtained.
paper and printing qualities were reasonably good.
•on the average around 90 %
considering degraded documents
Future
Research
Directions
• the feasibility of designing a data-driven OCR and
• in the area of indexing and retrieval from degraded
document images using only image properties (without
explicit recognition)
• recognition of other indigenous African scripts.
• develop an approach to come up with an intelligent OCR
that can learn from its mistake and improve its performance
overtime
Recognition of Amharic Braille
Teshome Alemu (2009)
Identified
Problem
For the large amount of visually impaired people,
Amharic Braille-to-print documents recognizer
Scope and
Limitations
The system only used for single sided
Braille document.
… Recognition of Amharic Braille
(Teshome Alemu)
Methods ,
Techniques
and
Algorithms
Digitization flat-bed scanner with 200 dpi
Segmentation Mesh-grid
Feature
Extraction and
Detection
1. Modified region based approach
2. Content analysis based on rules defined
Classification Neural network classifier
Performance
Registered
with all training and test set, 92.5% accuracy
Amharic CR System for Printed Real-Life Documents
Abay Teshager (2010)
Identified
Problem
applying robust preprocessing techniques in
detecting and removing various noise types and
simplifying the extraction of features
Scope and
Limitations
•Slant corrections for Amharic characters were
not considered.
… Amharic CR System for Printed Real-Life Documents
Abay Teshager
Methods ,
Techniques
and
Algorithms
Digitization flat-bed BENQ Scanner (300 dpi)
Noise Detection
and Removal
Adaptive filtering method
MATLAB Image Processing Toolbox
Binarization Otsu global image thresholding
Normalization Linear interpolation technique - (20 x 20 size)
Segmentation stage by stage segmentation algorithm
Thinning Hit-and-miss morphological analysis
using Bwmorph() function (in MATLAB)
Skew and slant
correction
using Microsoft Paint Rotation Toolbox.
Representation binary representation of characters will be fed to
the neural network procedure – (coded in MATLAB)
Recognition Artificial Neural Network (ANN) is created, trained
and refined
… Amharic CR System for Printed Real-Life Documents
Abay Teshager
Performance
Registered
96.87% for the test sets from the training sets and
11.40% recognition rate is observed for the new test sets.
Future
Research
Directions
• Invariant shape feature extraction techniques should be developed
• Apply advanced noise detection and removal algorithms for highly
degraded Amharic document images.
• better segmentation algorithm that tolerates space between
connected characters.
• Implementation of word segmentation
• Additional processing algorithm for skew detection and slant
correction should be considered
Offline handwritten Amharic word recognition
Yaregal Assabie, Josef Bigun (2011)
Identified
Problem
For the Hidden Markov Model (HMM) with the
compact notation,
evaluation
decoding and
training problems.
… Offline handwritten Amharic word recognition
Yaregal Assabie, Josef Bigun
Methods ,
Techniques
and
Algorithms
Digitization A total of 307 pages were collected and
scanned at a resolution of 300 dpi
Image processing and
Feature extraction
Gaussian filters and derivatives of
Gaussians.
Text line detection and
word segmentation
Direction field image
Normalization a symmetric Gaussian window of
5 x 5 pixels for noisy characters and
3 x 3 pixels for texts -small characters
Recognition of
unconstrained
handwritten Amharic
words
feature-level and HMM-level
concatenation of characters
… … Offline handwritten Amharic word recognition
Yaregal Assabie, Josef Bigun
Recognition result for feature-level concatenation method
Recognition result for HMM-level concatenation method
Future
Research
Directions
The recognition result can be further improved by
employing language models in HMMs.
Performance Registered
Conclusion
• Abroad, hundreds of OCR systems have been
developed since the 1950s and many are
commercially available today.
– Address Readers, Form Readers, Check Readers,
Airline Ticket Readers, Passport Readers, …
• Ethiopic handwriting recognition in general and
Amharic word recognition in particular, is one of
the least investigated problems and no
commercial OCR system is available.
• No OCR system is developed for local languages
in Ethiopia
Recommendations
• Performing character recognition directly on grey-level
images.
• Combined character recognition model is the only
solution to practical problems.
• Promising techniques within this area, deal with the
recognition of entire words instead of individual
characters.
• Future researches should focus on the linguistic and
contextual information for further improvements.
• The recognition of cursive script that is handwritten
connected or calligraphic characters.
… Recommendations
1) OCR system for other local languages
2) Bi-lingual (Multi-lingual) OCR system
3) Automatic plate number recognition system
4) Integration of OCR and Speech Synthesizer -
specially for Visually Impaired Persons
5) Commercial OCR systems for
Passport reader
Bill Processing System
Airline ticket reader
Address readers, …
Can the machines read human
writing with the same fluency as
human?
Not yet!
Optical Character Recognition (OCR) based Retrieval

More Related Content

What's hot

Final Report on Optical Character Recognition
Final Report on Optical Character Recognition Final Report on Optical Character Recognition
Final Report on Optical Character Recognition Vidyut Singhania
 
Optical Character Recognition
Optical Character RecognitionOptical Character Recognition
Optical Character RecognitionRahul Mallik
 
Optical Character Recognition
Optical Character RecognitionOptical Character Recognition
Optical Character RecognitionDurjoy Saha
 
A STUDY ON OPTICAL CHARACTER RECOGNITION TECHNIQUES
A STUDY ON OPTICAL CHARACTER RECOGNITION TECHNIQUESA STUDY ON OPTICAL CHARACTER RECOGNITION TECHNIQUES
A STUDY ON OPTICAL CHARACTER RECOGNITION TECHNIQUESijcsitcejournal
 
Project report of OCR Recognition
Project report of OCR RecognitionProject report of OCR Recognition
Project report of OCR RecognitionBharat Kalia
 
Optical Character Recognition (OCR) System
Optical Character Recognition (OCR) SystemOptical Character Recognition (OCR) System
Optical Character Recognition (OCR) Systemiosrjce
 
Presentation on OCR
Presentation on OCRPresentation on OCR
Presentation on OCRxsconfused
 
Handwritten Character Recognition: A Comprehensive Review on Geometrical Anal...
Handwritten Character Recognition: A Comprehensive Review on Geometrical Anal...Handwritten Character Recognition: A Comprehensive Review on Geometrical Anal...
Handwritten Character Recognition: A Comprehensive Review on Geometrical Anal...iosrjce
 
Handwriting Recognition Using Deep Learning and Computer Version
Handwriting Recognition Using Deep Learning and Computer VersionHandwriting Recognition Using Deep Learning and Computer Version
Handwriting Recognition Using Deep Learning and Computer VersionNaiyan Noor
 
Handwritten character recognition using artificial neural network
Handwritten character recognition using artificial neural networkHandwritten character recognition using artificial neural network
Handwritten character recognition using artificial neural networkHarshana Madusanka Jayamaha
 
offline character recognition for handwritten gujarati text
offline character recognition for handwritten gujarati textoffline character recognition for handwritten gujarati text
offline character recognition for handwritten gujarati textBhumika Patel
 
Handwriting Recognition
Handwriting RecognitionHandwriting Recognition
Handwriting RecognitionBindu Karki
 
Optical Character Reader - Project Report BTech
Optical Character Reader - Project Report BTechOptical Character Reader - Project Report BTech
Optical Character Reader - Project Report BTechKushagraChadha1
 
Texture features based text extraction from images using DWT and K-means clus...
Texture features based text extraction from images using DWT and K-means clus...Texture features based text extraction from images using DWT and K-means clus...
Texture features based text extraction from images using DWT and K-means clus...Divya Gera
 
Character Recognition using Machine Learning
Character Recognition using Machine LearningCharacter Recognition using Machine Learning
Character Recognition using Machine LearningRitwikSaurabh1
 
Optical Character Recognition Using Python
Optical Character Recognition Using PythonOptical Character Recognition Using Python
Optical Character Recognition Using PythonYogeshIJTSRD
 

What's hot (20)

Final Report on Optical Character Recognition
Final Report on Optical Character Recognition Final Report on Optical Character Recognition
Final Report on Optical Character Recognition
 
Optical Character Recognition
Optical Character RecognitionOptical Character Recognition
Optical Character Recognition
 
Optical Character Recognition
Optical Character RecognitionOptical Character Recognition
Optical Character Recognition
 
A STUDY ON OPTICAL CHARACTER RECOGNITION TECHNIQUES
A STUDY ON OPTICAL CHARACTER RECOGNITION TECHNIQUESA STUDY ON OPTICAL CHARACTER RECOGNITION TECHNIQUES
A STUDY ON OPTICAL CHARACTER RECOGNITION TECHNIQUES
 
OCR Text Extraction
OCR Text ExtractionOCR Text Extraction
OCR Text Extraction
 
Project report of OCR Recognition
Project report of OCR RecognitionProject report of OCR Recognition
Project report of OCR Recognition
 
Optical Character Recognition (OCR) System
Optical Character Recognition (OCR) SystemOptical Character Recognition (OCR) System
Optical Character Recognition (OCR) System
 
Ocr abstract
Ocr abstractOcr abstract
Ocr abstract
 
Presentation on OCR
Presentation on OCRPresentation on OCR
Presentation on OCR
 
Handwritten Character Recognition: A Comprehensive Review on Geometrical Anal...
Handwritten Character Recognition: A Comprehensive Review on Geometrical Anal...Handwritten Character Recognition: A Comprehensive Review on Geometrical Anal...
Handwritten Character Recognition: A Comprehensive Review on Geometrical Anal...
 
ocr
ocrocr
ocr
 
Handwriting Recognition Using Deep Learning and Computer Version
Handwriting Recognition Using Deep Learning and Computer VersionHandwriting Recognition Using Deep Learning and Computer Version
Handwriting Recognition Using Deep Learning and Computer Version
 
Basics of-optical-character-recognition
Basics of-optical-character-recognitionBasics of-optical-character-recognition
Basics of-optical-character-recognition
 
Handwritten character recognition using artificial neural network
Handwritten character recognition using artificial neural networkHandwritten character recognition using artificial neural network
Handwritten character recognition using artificial neural network
 
offline character recognition for handwritten gujarati text
offline character recognition for handwritten gujarati textoffline character recognition for handwritten gujarati text
offline character recognition for handwritten gujarati text
 
Handwriting Recognition
Handwriting RecognitionHandwriting Recognition
Handwriting Recognition
 
Optical Character Reader - Project Report BTech
Optical Character Reader - Project Report BTechOptical Character Reader - Project Report BTech
Optical Character Reader - Project Report BTech
 
Texture features based text extraction from images using DWT and K-means clus...
Texture features based text extraction from images using DWT and K-means clus...Texture features based text extraction from images using DWT and K-means clus...
Texture features based text extraction from images using DWT and K-means clus...
 
Character Recognition using Machine Learning
Character Recognition using Machine LearningCharacter Recognition using Machine Learning
Character Recognition using Machine Learning
 
Optical Character Recognition Using Python
Optical Character Recognition Using PythonOptical Character Recognition Using Python
Optical Character Recognition Using Python
 

Similar to Optical Character Recognition (OCR) based Retrieval

Opticalcharacter recognition
Opticalcharacter recognition Opticalcharacter recognition
Opticalcharacter recognition Shobhit Saxena
 
Design and Description of Feature Extraction Algorithm for Old English Font
Design and Description of Feature Extraction Algorithm for Old English FontDesign and Description of Feature Extraction Algorithm for Old English Font
Design and Description of Feature Extraction Algorithm for Old English FontIRJET Journal
 
Character Recognition (Devanagari Script)
Character Recognition (Devanagari Script)Character Recognition (Devanagari Script)
Character Recognition (Devanagari Script)IJERA Editor
 
A Survey on Tamil Handwritten Character Recognition using OCR Techniques
A Survey on Tamil Handwritten Character Recognition using OCR TechniquesA Survey on Tamil Handwritten Character Recognition using OCR Techniques
A Survey on Tamil Handwritten Character Recognition using OCR Techniquescscpconf
 
A Review on Geometrical Analysis in Character Recognition
A Review on Geometrical Analysis in Character RecognitionA Review on Geometrical Analysis in Character Recognition
A Review on Geometrical Analysis in Character Recognitioniosrjce
 
Volume 2-issue-6-2009-2015
Volume 2-issue-6-2009-2015Volume 2-issue-6-2009-2015
Volume 2-issue-6-2009-2015Editor IJARCET
 
Volume 2-issue-6-2009-2015
Volume 2-issue-6-2009-2015Volume 2-issue-6-2009-2015
Volume 2-issue-6-2009-2015Editor IJARCET
 
Character recognition of Devanagari characters using Artificial Neural Network
Character recognition of Devanagari characters using Artificial Neural NetworkCharacter recognition of Devanagari characters using Artificial Neural Network
Character recognition of Devanagari characters using Artificial Neural Networkijceronline
 
OCR for Gujarati Numeral using Neural Network
OCR for Gujarati Numeral using Neural NetworkOCR for Gujarati Numeral using Neural Network
OCR for Gujarati Numeral using Neural Networkijsrd.com
 
A Comprehensive Study On Handwritten Character Recognition System
A Comprehensive Study On Handwritten Character Recognition SystemA Comprehensive Study On Handwritten Character Recognition System
A Comprehensive Study On Handwritten Character Recognition Systemiosrjce
 
BLOB DETECTION TECHNIQUE USING IMAGE PROCESSING FOR IDENTIFICATION OF MACHINE...
BLOB DETECTION TECHNIQUE USING IMAGE PROCESSING FOR IDENTIFICATION OF MACHINE...BLOB DETECTION TECHNIQUE USING IMAGE PROCESSING FOR IDENTIFICATION OF MACHINE...
BLOB DETECTION TECHNIQUE USING IMAGE PROCESSING FOR IDENTIFICATION OF MACHINE...ijiert bestjournal
 
Optical Character Recognition from Text Image
Optical Character Recognition from Text ImageOptical Character Recognition from Text Image
Optical Character Recognition from Text ImageEditor IJCATR
 

Similar to Optical Character Recognition (OCR) based Retrieval (20)

Opticalcharacter recognition
Opticalcharacter recognition Opticalcharacter recognition
Opticalcharacter recognition
 
Design and Description of Feature Extraction Algorithm for Old English Font
Design and Description of Feature Extraction Algorithm for Old English FontDesign and Description of Feature Extraction Algorithm for Old English Font
Design and Description of Feature Extraction Algorithm for Old English Font
 
Character Recognition (Devanagari Script)
Character Recognition (Devanagari Script)Character Recognition (Devanagari Script)
Character Recognition (Devanagari Script)
 
A Survey on Tamil Handwritten Character Recognition using OCR Techniques
A Survey on Tamil Handwritten Character Recognition using OCR TechniquesA Survey on Tamil Handwritten Character Recognition using OCR Techniques
A Survey on Tamil Handwritten Character Recognition using OCR Techniques
 
A Review on Geometrical Analysis in Character Recognition
A Review on Geometrical Analysis in Character RecognitionA Review on Geometrical Analysis in Character Recognition
A Review on Geometrical Analysis in Character Recognition
 
I017256165
I017256165I017256165
I017256165
 
O45018291
O45018291O45018291
O45018291
 
E123440
E123440E123440
E123440
 
Volume 2-issue-6-2009-2015
Volume 2-issue-6-2009-2015Volume 2-issue-6-2009-2015
Volume 2-issue-6-2009-2015
 
Volume 2-issue-6-2009-2015
Volume 2-issue-6-2009-2015Volume 2-issue-6-2009-2015
Volume 2-issue-6-2009-2015
 
Ocr 1
Ocr 1Ocr 1
Ocr 1
 
Character recognition of Devanagari characters using Artificial Neural Network
Character recognition of Devanagari characters using Artificial Neural NetworkCharacter recognition of Devanagari characters using Artificial Neural Network
Character recognition of Devanagari characters using Artificial Neural Network
 
OCR for Gujarati Numeral using Neural Network
OCR for Gujarati Numeral using Neural NetworkOCR for Gujarati Numeral using Neural Network
OCR for Gujarati Numeral using Neural Network
 
A Comprehensive Study On Handwritten Character Recognition System
A Comprehensive Study On Handwritten Character Recognition SystemA Comprehensive Study On Handwritten Character Recognition System
A Comprehensive Study On Handwritten Character Recognition System
 
A017240107
A017240107A017240107
A017240107
 
D017222226
D017222226D017222226
D017222226
 
BLOB DETECTION TECHNIQUE USING IMAGE PROCESSING FOR IDENTIFICATION OF MACHINE...
BLOB DETECTION TECHNIQUE USING IMAGE PROCESSING FOR IDENTIFICATION OF MACHINE...BLOB DETECTION TECHNIQUE USING IMAGE PROCESSING FOR IDENTIFICATION OF MACHINE...
BLOB DETECTION TECHNIQUE USING IMAGE PROCESSING FOR IDENTIFICATION OF MACHINE...
 
05a
05a05a
05a
 
L017248388
L017248388L017248388
L017248388
 
Optical Character Recognition from Text Image
Optical Character Recognition from Text ImageOptical Character Recognition from Text Image
Optical Character Recognition from Text Image
 

More from Biniam Asnake

Software Trends: Past, Present and Future
Software Trends: Past, Present and FutureSoftware Trends: Past, Present and Future
Software Trends: Past, Present and FutureBiniam Asnake
 
Service Oriented Architecture (SOA)
Service Oriented Architecture (SOA)Service Oriented Architecture (SOA)
Service Oriented Architecture (SOA)Biniam Asnake
 
Information Systems: A Case Study of Bank of America and Commercial Bank of E...
Information Systems: A Case Study of Bank of America and Commercial Bank of E...Information Systems: A Case Study of Bank of America and Commercial Bank of E...
Information Systems: A Case Study of Bank of America and Commercial Bank of E...Biniam Asnake
 
Computer vision and robotics
Computer vision and roboticsComputer vision and robotics
Computer vision and roboticsBiniam Asnake
 

More from Biniam Asnake (6)

Text Mining
Text MiningText Mining
Text Mining
 
Software Trends: Past, Present and Future
Software Trends: Past, Present and FutureSoftware Trends: Past, Present and Future
Software Trends: Past, Present and Future
 
Service Oriented Architecture (SOA)
Service Oriented Architecture (SOA)Service Oriented Architecture (SOA)
Service Oriented Architecture (SOA)
 
Multimedia Mining
Multimedia Mining Multimedia Mining
Multimedia Mining
 
Information Systems: A Case Study of Bank of America and Commercial Bank of E...
Information Systems: A Case Study of Bank of America and Commercial Bank of E...Information Systems: A Case Study of Bank of America and Commercial Bank of E...
Information Systems: A Case Study of Bank of America and Commercial Bank of E...
 
Computer vision and robotics
Computer vision and roboticsComputer vision and robotics
Computer vision and robotics
 

Recently uploaded

Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptkotipi9215
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024StefanoLambiase
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3
 
What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....kzayra69
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...Christina Lin
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanyChristoph Pohl
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesPhilip Schwarz
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsAhmed Mohamed
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesŁukasz Chruściel
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEOrtus Solutions, Corp
 
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)jennyeacort
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfAlina Yurenko
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...OnePlan Solutions
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...soniya singh
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样umasea
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作qr0udbr0
 

Recently uploaded (20)

Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStack
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.ppt
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
 
What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a series
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML Diagrams
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New Features
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
 
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作
 

Optical Character Recognition (OCR) based Retrieval

  • 2. Presentation Outline • Introduction • Global Research Works 1) English OCR 2) Arabic OCR 3) Indian (Devanagari) OCR • Local Research Works 1) Worku (1997) 2) Ermias (1998) 3) Dereje (1999) 4) Million (2000) 5) Nigussie (2000) 6) Yaregal (2002) 7) Million and Jawahar (2007) 8) Teshome (2009) 9) Abay (2010) 10) Yaregal and Bigun (2011) • Conclusion • Recommendation
  • 3. Definition “Optical character recognition (OCR) systems take scanned images of paper documents as input, and automatically convert them into digital format for computer-aided data processing”.
  • 4. Architecture of an OCR system Binarization 1. Template-matching and correlation 2. Feature based Filling Thinning Normalization Skew Correction I. Grouping. II. Error-detection & correction
  • 5. Benefit of using OCR • Improve accuracy of data entry • Increase efficiency in data storage, retrieval and processing • Identify the specific form within a particular application • Read texts and produce a synthesized voice translation for visually impaired users.
  • 6. Applications of OCR • Data entry • Text entry • Process automation • Other applications – Aid for visually impaired people – Automatic plate number readers – Automatic cartography – Signature verification and identification
  • 7. Issues in Developing OCR • Noise in Input Image • Skew in Input Image • Images embedded with Text in Input Image
  • 8. Introduction to Global OCR system • Modern OCR technology has been started in 1951 with invention of GISMO A Robot Reader-Writer by M. Sheppard's • Today, OCR systems are less expensive, faster, and more reliable.  Handwritten recognition  Form reading Current research area of OCR However , there is an intensive research particularly on Reliable recognition of handwritten cursive script.
  • 9. --- Introduction cont’d • Hundreds of OCR systems have been developed since the 1950s and many are commercially available today. • Commercial OCR systems: 1. task- specific readers : handles only specific document types. It includes read bank check, letter mail, or credit card slips. and 2. general purpose page readers: are designed to handle a broader range of documents such as business letters, technical writings and newspapers. A task-specific reader [Address Readers, Form Readers, Check Readers, Bill Processing Systems, Airline Ticket Readers, Passport Readers] • General purpose page readers
  • 10. Handwritten English Character Recognition Using Neural Network By Anita Pal & Dayashankar Singh (2010) Identified Problem difficulty of recognizing the handwritten characters of one person from other. Methods , Techniques and Algorithms Scan character: Acquire the sample handwritten character by scanning. Then it has been converted into 1024 (32X32) binary pixels  Skeletonization : used to binary pixel image , remove extra pixels and reduce broad strokes to thin lines. Eg, Normalization operations: they used (30X30) standard.  Reorganization Algorithms:  Feature extraction: Boundary Detection Feature Extraction technique • Classification of Character :Neural network is used.
  • 11. … Handwritten English Character Recognition Using Neural Network By Anita Pal & Dayashankar Singh (2010) Performance Registered The system has tested by application of Fourier descriptors with back propagation technique and provides good recognition accuracy of 94%. They used a sample of 250 for training and testing the data Further Research Directions  Implement and improve the recognition accuracy by using a new technique or algorithms of feature extraction.
  • 12. Recognition of on-line Arabic handwritten characters using structural features Ahmad T. Al-Taani and Saeed Al-Haj(2010) Identified Problem Characters written by different persons representing the same character are not identical but can vary in both size and shape. Scope and limitation of the study The proposed system works only on Arabic isolated letters Methods , Techniques and Algorithms  Digitalization: hand-held and digital tablet as an input device Feature extraction: used structure feature to extract the character  Recognition . Decision trees was used to classify the characters based on the features that were extracted from the input character.
  • 13. … ……Recognition of on-line Arabic handwritten characters using structural features Ahmad T. Al-Taani and Saeed Al-Haj(2010) Performance Registered Acc to the experiment , the following performance were achieved:  the recognition rate of about 75.3% for all letters ( with letter containing sharp edges ) The system may reach an average performance of 85.3% .(with excluding sharp edge.) Further Research Directions  Those letter containing sharp edge that are not recognized by the current system so that it is open for further research
  • 14. Segmentation of Printed Text in Devanagari Script and Gurmukhi Script By Vijay Kumar and Pankaj K. Sengar (2010) Identified Problem In segmentation, there is an error committed due to touching characters, which the classifier cannot properly tackle. Methods , Techniques and Algorithms  image categories and preprocessing The researchers used: Binary level images, pseudo color and true color images categorization to standardize the scanned image as an input.  Segmentation. used stage by stage segmentation methods.
  • 15. … …… Segmentation of Printed Text in Devanagari Script and Gurmukhi Script By Vijay Kumar and Pankaj K. Sengar (2010) Performance Registered After the proposed system has been manifested the following performance at different level of text: Devanagari script Gurmukhi script @ Line level has a performance of 100% at 100% a @ word level has a performance of ~ 100% 99% @ charater level has a performance of 99% @ top character level has a performance of 97% ------ Further Research Directions
  • 16. Introduction to local research As we investigated the local researches from various sources, Much of the effort was made to study the application of OCR to Amharic language. The main focus area of these studies are : the recognition of machine printed, typewritten and handwritten Amharic documents and the recognition of computer printouts on real-life documents such as books, magazines and newspapers. This is therefore , the main objectives of this presentation is to review those work based on specific criteria .
  • 17. Application of OCR techniques To the Amharic text Worku Alemu (1997) Identified Problem Huge amount of printed information resources are available with Amharic language. However ,some of them are irreplaceable, accessibly and modification is difficulty Methods , Techniques and Algorithms Preprocessing task Digitalization: To digitize the printed text(document image) flatbed scanner (HP scan Jet IIc) at standard resolution (300bpi) is used. Segmentation: The stage by stage segmentation algorithm was used in this research to segment lines and characters.
  • 18. …….Application of OCR techniques To the Amharic text Worku Alemu (1997) Methods , Techniques and Algorithms Recognition Algorithms: pattern classification task which maps each character image onto its symbolic identification. Two algorithms that used for recognition of characters are:  polygonal approximations and relaxation topological features. He used a tree classification schemes built by using topological features of a character. To code the selected algorithms turbo c++ for windows was used. Performance Registered Four test cases he selected one main test case and achieved good accuracy rate (98.87%) of recognition (laser print out of Amharic text with normal typestyle of WashRa font , with 12 points font size.)
  • 19. …….Application of OCR techniques To the Amharic text Work Alemu (1997) Further Research Directions Application of pre-processing and post-processing technique to detect and correct error. Segmentation of picture and text regions Recognition of text in forms , tables and also in picture Recognition of Character w/c are printed using any color on whatever color of paper Recognition of handwritten Amharic text Recognition of formatted Amharic text
  • 20. Recognition of formatted Amharic text using OCR techniques Ermias Abebe (1998) Identified Problem The researcher particularly improves the algorithms adopted by Worku ( developed for single and fixed font size)in response to font size and Typestyle . So the aim of the researcher was to enable the algorithms recognize character printed in different size and format feature Scope and Limitation of the study •Researchers attempt was only limited with the problem of size, underline and italics. •The method is tested with only 231 basic Amharic characters •Only thinning, normalization and underline removal technique of segmentation are used Methods , Techniques and Algorithms 1. Preprocessing task Digitalization: flatbed scanner at 300 dot per inch resolution is used Thinning Algorithms(suggested by Zang and suen) it used to convert digital image into unit width image (skeleton). Underline detection and removal: to remove underline from segmented text line. Normalization: this is the process of making all the size of the character in the image must be equal .
  • 21. ……. Recognition of formatted Amharic text using OCR techniques Ermias Abebe (1998) Methods , Techniques and Algorithms Recognition Algorithms The researcher used two recognition algorithms to work with formatted text. He selected. because of their accessibility and their supposed relevance to the problem at hand. These are 1.a generalized Character recognition Algorithms: A graphical approach and , 2. Symbol recognition without prior segmentation Performance Registered By including preprocessing algorithms over worku’s work , we can achieve better performance. Further Research Directions Skew detection and correction Recognition of typewritten texts and texts written on poor quality papers. Implement other algorithm that improves segmenting and recognizing character irrespective of the size and style should be test using Amharic script
  • 22. Optical Character Recognition of Typewritten Amharic Character Dereje Teferi (1999) Identified Problem The conventional way of converting text in electronic format, is typing through keyboard which is tedious, impossible in view of the magnitude of document. Moreover, typing Amharic character on computer needs two key strokes on average and worse Scope and Limitation of the study •The study is focus on developing a segmentation algorithm to segment an Amharic character from document and forward the result for recognition. •It was confined only with 231 character of mechanical typewriter. Methods , Techniques and Algorithms Preprocessing stages feature extraction/. So as to tackle the problem mentioned above application of OCR that is capable of recognizing poor quality Amharic typewritten character is needed.Detection The features used for identifying and recognizing each character is extracted from contour/line analysis.
  • 23. …..Optical Character Recognition of Typewritten Amharic Character Dereje Teferi (1999) Methods , Techniques and Algorithms  Segmentation Recursive segmentation :it is an approach that merges segmentation and recognition together recursively stage by stage segmentation:The researchers applied a modified stage by stage segmentation algorithms. A threshold vale is set for the width of the characters through experiment. Image restoration: It is the process by which a degraded image is fixed so that a better recognition performance is achieved. Mathematical Morphology. This technique used by the researcher to remove salt-and- pepper noise. Binary morphological filter. This is used for removal of subtractive and additive noise. Performance Registered the researcher assured the OCR system produce a recognition accuracy of 53.47% for documents written with mechanical typewriter.
  • 24. …..Optical Character Recognition of Typewritten Amharic Character Dereje Teferi (1999) Further Research Directions Integration of normalization technique to the present development of Amharic OCR so that the system will be size independent(size independent Amharic typewritten recognition system) Skew detection and correction algorithms should be developed Form detection and removal algorithm to detect and extract text from tables, forms etc. should be developed Algorithms that recognize text written in any color on any background should be developed Recognition that are not very sensitive to the feature of the characters should be developed Algorithms for detecting formats such as, indention and bulleting, and restoring them after recognition should be developed.
  • 25. Handwritten Amharic text Recognition applied to The processing of Bank checks By Nigussie Tadesse (2000) This was the first attempt reported that the researcher studied the application of Handwritten Amharic text recognition for the processing of bank checks. Identified Problem The majority of clients in CBE’s are Ethiopians, thus most request are made using Amharic language. To this end a big share of checks is filled using Amharic character by hand. The bank used a semi automated process so that the check filled by someone else is keyed in the database using keyboard.This activity is very slow, costly, and error prone. Scope and limitation of the study The research was limited to the recognition of handwritten legal amounts. The research also does not include the cents (fraction) or it confined only on birr part of the legal amounts.
  • 26. ….. Handwritten Amharic text Recognition applied to The processing of Bank checks By Nigussie Tadesse (2000) Methods , Techniques and Algorithms Preprocessing task  the researcher used the following algorithm 1.underline removal: to remove underline stage by stage segmentation (adopted from Ermias (1998)) and the connected component analysis. 2.slant normalization :To normalize the slant the chain code method was implemented 3.Recognition • Size normalization: Normalize each character image to a fixed size • Training: In order to construct and experiment with different neural network architectures, the EasyNN neural network development tool was used. It helps to create, control, train, validate and query different Multilayer networks with back- propagation algorithm.
  • 27. ….. Handwritten Amharic text Recognition applied to The processing of Bank checks By Nigussie Tadesse (2000) Performance registered The classification accuracy of the mentioned networks was tested using 38 characters from the test data set. The first neural network with 256 input, 7 hidden and 8 output node and trained with 135 sample characters correctly classified 2 of the characters of the test set. And the second network, which has 256 input, 20 hidden and 8 output node and trained with498 sample character correctly classified 16 out of 38. Future research Direction Apply further prepossessing so that the intra –class variability between characters will be minimized. Train a neural network with sufficient data and consider other feature set for training. It is the first attempt for application of OCR for Amharic language and the research was not validate all the check reading activities so it is open for future research. cases Network Architecture(M LNN) Sample trained character Test case/ Used sample Exp.result 1 256 -7- 8 135 38 2 2 256-20-8 498 38 16 correctly classified
  • 28. A Generalized Approach to Optical Character Recognition (OCR) of Amharic texts Million Meshesha (2000) Identified Problem to generalize the previously adopted recognition algorithm insensitive to the different font types Scope and Limitations • Only for three commonly used font types namely Agafari, WashRa and Visual Geez
  • 29. … A Generalized Approach to Optical Character Recognition (OCR) of Amharic texts (Million Meshesha) Methods , Techniques and Algorithms Digitization flat bed scanner HP ScanJet at 300 dpi resolution Binarization threshold value of 112 intensity level. Thinning Hybrid of parallel & Zang-Suen thinning algorithm Segmentation A step-by-step segmentation with some modification Feature Extraction and Detection 1. Topological features were extracted 2. a database was developed using binary tree
  • 30. … A Generalized Approach to Optical Character Recognition (OCR) of Amharic texts (Million Meshesha) Performance Registered 49.38% for WashRa 26.04% for Agafari_Addis Zemen 15.75% for Visual Geez. Future Research Directions  an algorithm for form detection and removal.  Recognition of characters written on any paper color using any color type.  development post processing techniques such as a spell checker, thesaurus, grammar, etc.  Normalization of input image patterns  Standardization of font types and their representation.  mechanism should be designed to flag unrecognized and suspicious characters
  • 31. Optical Character Recognition of Amharic Text: An Integrated Approach Yaregal Assabie (2002) Identified Problem to come up with a versatile algorithm that is independent of the font size and other quantitative parameters of Amharic characters Scope and Limitations there is no previously well-formed algorithm
  • 32. … Optical Character Recognition of Amharic Text: An Integrated Approach (Yaregal Assabie) Methods , Techniques and Algorithms Digitization Scanjet Pro scanner Segmentation A step-by-step segmentation with some modification Training Patterns with Neural Network BrainMaker and NetMaker to converts text files Feature Extraction and Detection An improved primitive extraction algorithm is developed
  • 33. … Optical Character Recognition of Amharic Text: An Integrated Approach (Yaregal Assabie) Performance Registered Font included in the training set Not included 8 65.02% 62.87% 12 74.68% 81.07% 14 73.18% 70.04% Future Research Directions  detailed image preprocessing techniques need to be developed.  an algorithm to detect forms, graphs and tables recognition documents written with any color with any background color development of algorithms for primitive extraction, identification, and relationship/connection handling  character image databases that represent of different font types, styles, and sizes for research purpose should be developed  Post processing techniques such as spell checking, grammar and semantic analysis need to be incorporated  standardization of the representation of Ethiopic characters
  • 34. Optical Character Recognition of Amharic Documents Million Meshesha and C. V. Jawahar (2007) Identified Problem Challenges in building an OCR for African scripts • Degradation of documents • Printing variations • Large number of characters in the script • Visual similarity of most characters in the script • Language related issues.
  • 35. … Optical Character Recognition of Amharic Documents (Million Meshesha and C. V. Jawahar) Methods , Techniques and Algorithms Digitization flat-bed HP7670 Scanjet scanner (300 dpi) Binarization Was done. Skew corrected by a range of ± 20%. Segmentation using different horizontal and vertical projections Normalization standard size of 20 x 20 Feature Extraction and Detection 1. Principal Component Analysis (PCA) 2. Linear Discriminant Analysis (LDA) Classification Support Vector Machine or SVM-based decision directed acyclic graph (DDAG) classifier
  • 36. … Optical Character Recognition of Amharic Documents (Million Meshesha and C. V. Jawahar) Performance Registered •On the average 96.95% accuracy is obtained. paper and printing qualities were reasonably good. •on the average around 90 % considering degraded documents Future Research Directions • the feasibility of designing a data-driven OCR and • in the area of indexing and retrieval from degraded document images using only image properties (without explicit recognition) • recognition of other indigenous African scripts. • develop an approach to come up with an intelligent OCR that can learn from its mistake and improve its performance overtime
  • 37. Recognition of Amharic Braille Teshome Alemu (2009) Identified Problem For the large amount of visually impaired people, Amharic Braille-to-print documents recognizer Scope and Limitations The system only used for single sided Braille document.
  • 38. … Recognition of Amharic Braille (Teshome Alemu) Methods , Techniques and Algorithms Digitization flat-bed scanner with 200 dpi Segmentation Mesh-grid Feature Extraction and Detection 1. Modified region based approach 2. Content analysis based on rules defined Classification Neural network classifier Performance Registered with all training and test set, 92.5% accuracy
  • 39. Amharic CR System for Printed Real-Life Documents Abay Teshager (2010) Identified Problem applying robust preprocessing techniques in detecting and removing various noise types and simplifying the extraction of features Scope and Limitations •Slant corrections for Amharic characters were not considered.
  • 40. … Amharic CR System for Printed Real-Life Documents Abay Teshager Methods , Techniques and Algorithms Digitization flat-bed BENQ Scanner (300 dpi) Noise Detection and Removal Adaptive filtering method MATLAB Image Processing Toolbox Binarization Otsu global image thresholding Normalization Linear interpolation technique - (20 x 20 size) Segmentation stage by stage segmentation algorithm Thinning Hit-and-miss morphological analysis using Bwmorph() function (in MATLAB) Skew and slant correction using Microsoft Paint Rotation Toolbox. Representation binary representation of characters will be fed to the neural network procedure – (coded in MATLAB) Recognition Artificial Neural Network (ANN) is created, trained and refined
  • 41. … Amharic CR System for Printed Real-Life Documents Abay Teshager Performance Registered 96.87% for the test sets from the training sets and 11.40% recognition rate is observed for the new test sets. Future Research Directions • Invariant shape feature extraction techniques should be developed • Apply advanced noise detection and removal algorithms for highly degraded Amharic document images. • better segmentation algorithm that tolerates space between connected characters. • Implementation of word segmentation • Additional processing algorithm for skew detection and slant correction should be considered
  • 42. Offline handwritten Amharic word recognition Yaregal Assabie, Josef Bigun (2011) Identified Problem For the Hidden Markov Model (HMM) with the compact notation, evaluation decoding and training problems.
  • 43. … Offline handwritten Amharic word recognition Yaregal Assabie, Josef Bigun Methods , Techniques and Algorithms Digitization A total of 307 pages were collected and scanned at a resolution of 300 dpi Image processing and Feature extraction Gaussian filters and derivatives of Gaussians. Text line detection and word segmentation Direction field image Normalization a symmetric Gaussian window of 5 x 5 pixels for noisy characters and 3 x 3 pixels for texts -small characters Recognition of unconstrained handwritten Amharic words feature-level and HMM-level concatenation of characters
  • 44. … … Offline handwritten Amharic word recognition Yaregal Assabie, Josef Bigun Recognition result for feature-level concatenation method Recognition result for HMM-level concatenation method Future Research Directions The recognition result can be further improved by employing language models in HMMs. Performance Registered
  • 45. Conclusion • Abroad, hundreds of OCR systems have been developed since the 1950s and many are commercially available today. – Address Readers, Form Readers, Check Readers, Airline Ticket Readers, Passport Readers, … • Ethiopic handwriting recognition in general and Amharic word recognition in particular, is one of the least investigated problems and no commercial OCR system is available. • No OCR system is developed for local languages in Ethiopia
  • 46. Recommendations • Performing character recognition directly on grey-level images. • Combined character recognition model is the only solution to practical problems. • Promising techniques within this area, deal with the recognition of entire words instead of individual characters. • Future researches should focus on the linguistic and contextual information for further improvements. • The recognition of cursive script that is handwritten connected or calligraphic characters.
  • 47. … Recommendations 1) OCR system for other local languages 2) Bi-lingual (Multi-lingual) OCR system 3) Automatic plate number recognition system 4) Integration of OCR and Speech Synthesizer - specially for Visually Impaired Persons 5) Commercial OCR systems for Passport reader Bill Processing System Airline ticket reader Address readers, …
  • 48. Can the machines read human writing with the same fluency as human? Not yet!

Editor's Notes

  1. It was the first attempt to work researches on area of OCR.