SlideShare a Scribd company logo
1 of 44
OPTICAL CHARACTER RECOGNITION
FOR
URDU HANDWRITING
Presented by:
SAAD USMAN
USMAN Ali
YASIR HAYAT
INTRODUCTION:
■ Optical character recognition is the process of converting scanned images of
machine printed or handwritten text (numerals, letters and symbols), into a
computer process able format or machine-encoded text.
OCR
■ It is widely used as a form of information entry from printed paper data records,
whether passport documents, invoices, bank statements, computerized receipts,
business cards, mail, printouts of static-data, or any suitable documentation.
■ It is a common method of digitizing printed texts so that they can be electronically
edited, searched, stored more compactly, displayed on-line, and used in machine
processes
■ OCR is a field of research in pattern recognition, artificial intelligence and computer
vision.
EARLIER WORK
■ Early versions needed to be trained with images of each character, and worked on one
font at a time.
■ Advanced systems capable of producing a high degree of recognition accuracy for most
fonts are now common, and with support for a variety of digital image file format
inputs.
■ Some systems are capable of reproducing formatted output that closely approximates
the original page including images, columns, and other non-textual components.
Steps in OCR:
1. Image Acquisition:
digital image acquisition is the creation of photographic images, such as of a physical scene or of the
interior structure of an object.
2. Preprocessing:
Preprocessing consists of text area extraction, text line extraction, baseline detection, component
segmentation, character segmentation, primary and secondary stroke extraction. It also involves noise
reduction, document decomposition etc.
3. Segmentation:
Segmentation is the process of dividing an image into regions , each region to containing a single object
or a group of objects of the same type.
4. Feature Extraction:
Selection of appropriate feature extraction, structured information which is more related to writing like dot,
loops and branches are computed.
5. Classification:
Classification is the process of identifying each character and assigning to it the correct character class.
6. Recognition:
Recognition is the last step in order to achieve our desired output, the extracted features of character is
then match with the stored features to recognize the character.
Over view of Urdu:
 Urdu derived from the mixture of Arabic, Turkish, Farsi and Hindi Languages with 58 character set
defined by National Language Authority Pakistan.
 Each letter has multiple forms depending on its position in the word.
 four forms: isolated , initial , medial and final.
 Urdu characters can be divided into two groups, separators and non-separators .
 The separators or non-joiners can acquire only isolated and final shape. On contrary non-separators
or joiners can acquire all the four shapes
Components and variation in writing:
RECOGNITION OF OFFLINE
HANDWRITTEN ISOLATED URDU
CHARACTER
System Methodology used:
■ Data set:
■ proposed method is applied on 36800 handwritten Urdu characters. For each of 46
characters 200 image samples were used for training and 600 for testing respectively .
■ Methodology:
■ Moment Invariants (MI) are used to evaluate seven distributed parameters for
handwritten isolated Urdu character.
■ Initially verified whether character consists of single component or more than one component.
■ If single component then image is normalized into 60 X 60 and divided into 3 horizontal zones for
features extraction
■ From each zone 7 MI features and from whole image 7 MI features were computed, hence total 28 MI
features.
■ SVM is used for classification
■ character is put into appropriate class as single stork component character.
■ secondary component is normalized into 22 X 22 and divided into 2 horizontal zones
Results:
Achieved overall accuracy up to 93.59% for
all offline handwritten isolated Urdu characters
OPTICAL CHARACTER RECOGNITION SYSTEM
FOR URDU. ONLINE AND OFFLINE OCR
IRRESPECTIVE OF FONTS.
■ Urdu language is also one of the languages which contain the features, properties, scripts and writing
styles of duo languages Arabic and Persian.
■ Urdu script is blend of Naskh, Arabic style and Talique, Persian style called Nastalique script.
■ It is cursive in nature, which makes it more difficult for conventional algorithms to work on it.
■ Online character recognition is a process which is used for handwriting recognition, example
digital pens.
■ Offline character recognition can have both handwriting and printed material. Offline character
recognition mostly used for printed papers, book etc.
■ We have used segmentation free approach in which only ligatures are segmented.
Proposed system:
Text Line Extraction
Text Area Extraction:
Compound Component Extraction:
■ The whole compound ligature along with its primary and secondary stroke is called compound
component.
■ Each connected component is inspected and is formed according to the following rules:
1. If area of a connected component reside over another connected component than both are the parts of one
compound component.
2. If area of a connected component reside under another connected component than both are the part of one
compound component.
3. If area of a connected component reside over or under another connected component more than 50% than it is
a part of one compound component.
4. If area of a connected component does not reside over or under another connected component than
component it self is compound connected component
Base Line Detection:
■ Base line is the horizontal point where maximum black pixels are present.
■ Connected component that lies on the base line are primary strokes and others are
secondary strokes.
■ average horizontal lines are drawn on 50% and 35% of the height of compound component.
Stroke Identification:
■ Primary and secondary strokes are identified on the basis of base line and average
horizontal lines according to the following rules:
1. Strokes that lie on base line and one of the average horizontal lines are primary strokes.
2. Strokes that lie on both average horizontal lines are primary strokes.
3. Strokes that do not lie on any line are secondary strokes.
4. Strokes that lie only on one average horizontal line are secondary strokes
5. If one stroke lies on base line and other stroke do not lie on any line than base line stroke
is primary and other is secondary.
■ For handwriting we have developed stroke identification improvement algorithm.
Features Extraction:
■ We computed five features for single character or ligature. First of all image is resize into 64x64
pixels after that features are extracted.
1. 8x64 pixels: 8x64 pixel window move from right to left and compute the ratio between white
and black pixels.
2. 64x8 pixels:64x8 pixels window move from top to bottom.
3. 8x8 pixels:8x8 pixels window moves from top right to bottom left.
4. Square shape: Square shape method first read 2x2 pixels from right top to bottom left and if
any black pixel is found than whole 2x2 pixel are converted in black pixels.
5. Hu invariant moments are calculated for character and ligatures.
Recognition:
■ the extracted features are then match with the stored features to recognize the character.
■ We have used K-Nearest Neighbors (KNN) algorithm for features matching.
■ In KNN we have applied Euclidean distance with 10 nearest neighbors.
■ If matched five features independently by KNN, then the maximum same result given by independent
result is the final recognition result.
Results and conclusion:
Developed the proposed OCR system on MATLAB and Microsoft C#.Net.
■ system gives 97.09% accuracy in extracting text lines.
■ Accuracy of 98.86% found in primary and secondary stroke extraction.
■ Recognition gives accuracy of 97.12%.
Proposed a system for both online and offline Urdu OCR system.
CLASSIFICATION OF URDU LIGATURES USING
CONVOLUTIONAL
NEURAL NETWORKS – A NOVELAPPROACH
■ Ligature: word or a sub-word that is a combination of (one
to eight) connected characters.
■ Why Ligature ?
■ Why Neural Network ?
 Data set: 55,000 Urdu ligatures and are extracted
from scanned pages of a famous Urdu book (‘Zawiya’).
 Training: Trained on dataset of 38000 ligature/552 Classes where as
Previous work was on Approx. 10000 ligatures.
 Accuracy: Beat the state of art (93.59%)
Preprocessing:
 Each image is binarized(global thresholding ) and resized(55x55).
 The ligature is then copied at the top left corner of the standard 55x55 image.
Proposed Architecture:
 We have used six convolutional layers stacked with each other
with pooling layers after every two convolutional layers.
 This arrangement yield maximum accuracy
 Finally, there is a single fully connected layer which computes
the class scores
PERFORMANCE COMPARISON OF DIFFERENT CNN
ARCHITECTURES (ACCURACY)
Improvement:
 For High Accuracy more training data
 The realized results show that deeper the network and
smaller the kernel size, better are the recognition rates.
 Addition of Special characters, Numerals and recognition as special ligatures.
SEGMENTATION BASED URDU
NASTALIQUE OCR
Classification:
■ Some letters belong to multiple classes because they contain isolated and
final forms different from initial and medial forms.
Methodology
Jang ching Algo
End Point:
■ Skeletonized image is then segmented after determining the ending point of the
ligature.
■ In Nastalique it is very difficult to determine the exact starting point of the
ligature so instead of that we start with the ending point of the ligature which is
more deterministic
■ Sixty segments were extracted from all shapes.
Segmentation of the Ligatures:
Results:
 A total of 1692 ligatures, which are formed from the six base forms.
 Got accuracy of 92.73%
 The Urdu words were written in font Noori Nastalique and font size was 36.
Problems:
 Some letters were not recognized correctly may be due to scanning,binarization.
Future/Improvements:
 Increasing the number of training samples so there will be less
chances of error.
 As in previous slide problem by giving original segment as
HMM input instead of giving skeletonized segment.
 Can be extended to cover all set of Urdu letters and for different
font size.
Reference:
■ S. Sardar and A. Wahab, "Optical character recognition system for Urdu," 2010 International
Conference on Information and Emerging Technologies, Karachi, 2010, pp. 1-5.
■ N. Javed, S. Shabbir, I. Siddiqi and K. Khurshid, "Classification of Urdu Ligatures Using Convolutional
Neural Networks - A Novel Approach," 2017 International Conference on Frontiers of Information
Technology (FIT), Islamabad, 2017, pp. 93-97.
■ Pathan, Imran & Ramteke, Rakesh. (2012). Recognition of Offline Handwritten Isolated Urdu
Character. Advances in Computational Research. 4. 117-121.
■ Javed S.T., Hussain S. (2013) Segmentation Based Urdu Nastalique OCR. In: Ruiz-Shulcloper J., Sanniti
di Baja G. (eds) Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications.
CIARP 2013. Lecture Notes in Computer Science, vol 8259. Springer, Ber.lin, Heidelberg
■ Z. Ahmad, J. K. Orakzai and I. Shamsher, "Urdu compound Character Recognition using feed forward
neural networks," 2009 2nd IEEE International Conference on Computer Science and Information
Technology, Beijing, 2009, pp. 457-462.
■ Naz, Saeeda, Arif Iqbal Umar, Riaz Ahmad, Saad Bin Ahmed, Syed Hamad Shirazi, Imran Siddiqi and
Muhammad Imran Razzak. “Offline cursive Urdu-Nastaliq script recognition using multidimensional
recurrent neural networks.” Neurocomputing 177 (2016): 228-241.

More Related Content

What's hot

File replication
File replicationFile replication
File replicationKlawal13
 
Video display devices
Video display devicesVideo display devices
Video display devicesMohd Arif
 
Polygon filling
Polygon fillingPolygon filling
Polygon fillingAnkit Garg
 
Visual surface detection i
Visual surface detection   iVisual surface detection   i
Visual surface detection ielaya1984
 
Analysis and design of algorithms part 4
Analysis and design of algorithms part 4Analysis and design of algorithms part 4
Analysis and design of algorithms part 4Deepak John
 
Introduction to NP Completeness
Introduction to NP CompletenessIntroduction to NP Completeness
Introduction to NP CompletenessGene Moo Lee
 
Comuter graphics ellipse drawing algorithm
Comuter graphics ellipse drawing algorithmComuter graphics ellipse drawing algorithm
Comuter graphics ellipse drawing algorithmRachana Marathe
 
Computability - Tractable, Intractable and Non-computable Function
Computability - Tractable, Intractable and Non-computable FunctionComputability - Tractable, Intractable and Non-computable Function
Computability - Tractable, Intractable and Non-computable FunctionReggie Niccolo Santos
 
Seed filling algorithm
Seed filling algorithmSeed filling algorithm
Seed filling algorithmMani Kanth
 
Stored program concept
Stored program conceptStored program concept
Stored program conceptgaurav jain
 
Curves and fractals b spline and bezier
Curves and fractals b spline and bezierCurves and fractals b spline and bezier
Curves and fractals b spline and bezierMr. Amol Sawant
 
Pipelining and ILP (Instruction Level Parallelism)
Pipelining and ILP (Instruction Level Parallelism) Pipelining and ILP (Instruction Level Parallelism)
Pipelining and ILP (Instruction Level Parallelism) A B Shinde
 
Travelling Salesman Problem using Partical Swarm Optimization
Travelling Salesman Problem using Partical Swarm OptimizationTravelling Salesman Problem using Partical Swarm Optimization
Travelling Salesman Problem using Partical Swarm OptimizationIlgın Kavaklıoğulları
 
Register transfer language & its micro operations
Register transfer language & its micro operationsRegister transfer language & its micro operations
Register transfer language & its micro operationsLakshya Sharma
 
Computer Graphic - Lines, Circles and Ellipse
Computer Graphic - Lines, Circles and EllipseComputer Graphic - Lines, Circles and Ellipse
Computer Graphic - Lines, Circles and Ellipse2013901097
 

What's hot (20)

File replication
File replicationFile replication
File replication
 
Video display devices
Video display devicesVideo display devices
Video display devices
 
Polygon filling
Polygon fillingPolygon filling
Polygon filling
 
Visual surface detection i
Visual surface detection   iVisual surface detection   i
Visual surface detection i
 
Analysis and design of algorithms part 4
Analysis and design of algorithms part 4Analysis and design of algorithms part 4
Analysis and design of algorithms part 4
 
Introduction to NP Completeness
Introduction to NP CompletenessIntroduction to NP Completeness
Introduction to NP Completeness
 
Comuter graphics ellipse drawing algorithm
Comuter graphics ellipse drawing algorithmComuter graphics ellipse drawing algorithm
Comuter graphics ellipse drawing algorithm
 
Computability - Tractable, Intractable and Non-computable Function
Computability - Tractable, Intractable and Non-computable FunctionComputability - Tractable, Intractable and Non-computable Function
Computability - Tractable, Intractable and Non-computable Function
 
Seed filling algorithm
Seed filling algorithmSeed filling algorithm
Seed filling algorithm
 
Stored program concept
Stored program conceptStored program concept
Stored program concept
 
Backpropagation algo
Backpropagation  algoBackpropagation  algo
Backpropagation algo
 
Dynamic pgmming
Dynamic pgmmingDynamic pgmming
Dynamic pgmming
 
Knapsack problem using fixed tuple
Knapsack problem using fixed tupleKnapsack problem using fixed tuple
Knapsack problem using fixed tuple
 
Curves and fractals b spline and bezier
Curves and fractals b spline and bezierCurves and fractals b spline and bezier
Curves and fractals b spline and bezier
 
Pipelining and ILP (Instruction Level Parallelism)
Pipelining and ILP (Instruction Level Parallelism) Pipelining and ILP (Instruction Level Parallelism)
Pipelining and ILP (Instruction Level Parallelism)
 
Problem Solving
Problem Solving Problem Solving
Problem Solving
 
Bezier curve computer graphics
Bezier curve computer graphics Bezier curve computer graphics
Bezier curve computer graphics
 
Travelling Salesman Problem using Partical Swarm Optimization
Travelling Salesman Problem using Partical Swarm OptimizationTravelling Salesman Problem using Partical Swarm Optimization
Travelling Salesman Problem using Partical Swarm Optimization
 
Register transfer language & its micro operations
Register transfer language & its micro operationsRegister transfer language & its micro operations
Register transfer language & its micro operations
 
Computer Graphic - Lines, Circles and Ellipse
Computer Graphic - Lines, Circles and EllipseComputer Graphic - Lines, Circles and Ellipse
Computer Graphic - Lines, Circles and Ellipse
 

Similar to Optical Character Recognition for Urdu Handwriting and Ligatures using Deep Learning

Pattern_Recognition_via_Character_Recogn.pptx
Pattern_Recognition_via_Character_Recogn.pptxPattern_Recognition_via_Character_Recogn.pptx
Pattern_Recognition_via_Character_Recogn.pptxEngRSMY2
 
IRJET- Optical Character Recognition using Image Processing
IRJET-  	  Optical Character Recognition using Image ProcessingIRJET-  	  Optical Character Recognition using Image Processing
IRJET- Optical Character Recognition using Image ProcessingIRJET Journal
 
Segmentation and recognition of handwritten gurmukhi script
Segmentation  and recognition of handwritten gurmukhi scriptSegmentation  and recognition of handwritten gurmukhi script
Segmentation and recognition of handwritten gurmukhi scriptRAJENDRA VERMA
 
OCR-THE 3 LAYERED APPROACH FOR CLASSIFICATION AND IDENTIFICATION OF TELUGU HA...
OCR-THE 3 LAYERED APPROACH FOR CLASSIFICATION AND IDENTIFICATION OF TELUGU HA...OCR-THE 3 LAYERED APPROACH FOR CLASSIFICATION AND IDENTIFICATION OF TELUGU HA...
OCR-THE 3 LAYERED APPROACH FOR CLASSIFICATION AND IDENTIFICATION OF TELUGU HA...cscpconf
 
OCR-THE 3 LAYERED APPROACH FOR CLASSIFICATION AND IDENTIFICATION OF TELUGU HA...
OCR-THE 3 LAYERED APPROACH FOR CLASSIFICATION AND IDENTIFICATION OF TELUGU HA...OCR-THE 3 LAYERED APPROACH FOR CLASSIFICATION AND IDENTIFICATION OF TELUGU HA...
OCR-THE 3 LAYERED APPROACH FOR CLASSIFICATION AND IDENTIFICATION OF TELUGU HA...csandit
 
Artificial Neural Network For Recognition Of Handwritten Devanagari Character
Artificial Neural Network For Recognition Of Handwritten Devanagari CharacterArtificial Neural Network For Recognition Of Handwritten Devanagari Character
Artificial Neural Network For Recognition Of Handwritten Devanagari CharacterIOSR Journals
 
An OCR System for recognition of Urdu text in Nastaliq Font
An OCR System for recognition of Urdu text in Nastaliq FontAn OCR System for recognition of Urdu text in Nastaliq Font
An OCR System for recognition of Urdu text in Nastaliq FontDr. Syed Hassan Amin
 
A Survey of Modern Character Recognition Techniques
A Survey of Modern Character Recognition TechniquesA Survey of Modern Character Recognition Techniques
A Survey of Modern Character Recognition Techniquesijsrd.com
 
A Novel Framework For Numerical Character Recognition With Zoning Distance Fe...
A Novel Framework For Numerical Character Recognition With Zoning Distance Fe...A Novel Framework For Numerical Character Recognition With Zoning Distance Fe...
A Novel Framework For Numerical Character Recognition With Zoning Distance Fe...IJERD Editor
 
Optical Character Recognition (OCR) based Retrieval
Optical Character Recognition (OCR) based RetrievalOptical Character Recognition (OCR) based Retrieval
Optical Character Recognition (OCR) based RetrievalBiniam Asnake
 
Two Methods for Recognition of Hand Written Farsi Characters
Two Methods for Recognition of Hand Written Farsi CharactersTwo Methods for Recognition of Hand Written Farsi Characters
Two Methods for Recognition of Hand Written Farsi CharactersCSCJournals
 
FREEMAN CODE BASED ONLINE HANDWRITTEN CHARACTER RECOGNITION FOR MALAYALAM USI...
FREEMAN CODE BASED ONLINE HANDWRITTEN CHARACTER RECOGNITION FOR MALAYALAM USI...FREEMAN CODE BASED ONLINE HANDWRITTEN CHARACTER RECOGNITION FOR MALAYALAM USI...
FREEMAN CODE BASED ONLINE HANDWRITTEN CHARACTER RECOGNITION FOR MALAYALAM USI...acijjournal
 
Handwritten Script Recognition
Handwritten Script RecognitionHandwritten Script Recognition
Handwritten Script Recognitionijsrd.com
 
Optical Character Recognition System for Urdu (Naskh Font)Using Pattern Match...
Optical Character Recognition System for Urdu (Naskh Font)Using Pattern Match...Optical Character Recognition System for Urdu (Naskh Font)Using Pattern Match...
Optical Character Recognition System for Urdu (Naskh Font)Using Pattern Match...CSCJournals
 
OCV & OCR - A Validation Perspective
OCV & OCR - A Validation PerspectiveOCV & OCR - A Validation Perspective
OCV & OCR - A Validation PerspectiveMALAY MEHTA
 
Khmer ocr using gfd_seminar_day
Khmer ocr using gfd_seminar_dayKhmer ocr using gfd_seminar_day
Khmer ocr using gfd_seminar_daySolin TEM
 
Paper id 24201433
Paper id 24201433Paper id 24201433
Paper id 24201433IJRAT
 

Similar to Optical Character Recognition for Urdu Handwriting and Ligatures using Deep Learning (20)

Pattern_Recognition_via_Character_Recogn.pptx
Pattern_Recognition_via_Character_Recogn.pptxPattern_Recognition_via_Character_Recogn.pptx
Pattern_Recognition_via_Character_Recogn.pptx
 
IRJET- Optical Character Recognition using Image Processing
IRJET-  	  Optical Character Recognition using Image ProcessingIRJET-  	  Optical Character Recognition using Image Processing
IRJET- Optical Character Recognition using Image Processing
 
Segmentation and recognition of handwritten gurmukhi script
Segmentation  and recognition of handwritten gurmukhi scriptSegmentation  and recognition of handwritten gurmukhi script
Segmentation and recognition of handwritten gurmukhi script
 
OCR-THE 3 LAYERED APPROACH FOR CLASSIFICATION AND IDENTIFICATION OF TELUGU HA...
OCR-THE 3 LAYERED APPROACH FOR CLASSIFICATION AND IDENTIFICATION OF TELUGU HA...OCR-THE 3 LAYERED APPROACH FOR CLASSIFICATION AND IDENTIFICATION OF TELUGU HA...
OCR-THE 3 LAYERED APPROACH FOR CLASSIFICATION AND IDENTIFICATION OF TELUGU HA...
 
OCR-THE 3 LAYERED APPROACH FOR CLASSIFICATION AND IDENTIFICATION OF TELUGU HA...
OCR-THE 3 LAYERED APPROACH FOR CLASSIFICATION AND IDENTIFICATION OF TELUGU HA...OCR-THE 3 LAYERED APPROACH FOR CLASSIFICATION AND IDENTIFICATION OF TELUGU HA...
OCR-THE 3 LAYERED APPROACH FOR CLASSIFICATION AND IDENTIFICATION OF TELUGU HA...
 
Artificial Neural Network For Recognition Of Handwritten Devanagari Character
Artificial Neural Network For Recognition Of Handwritten Devanagari CharacterArtificial Neural Network For Recognition Of Handwritten Devanagari Character
Artificial Neural Network For Recognition Of Handwritten Devanagari Character
 
L017116064
L017116064L017116064
L017116064
 
An OCR System for recognition of Urdu text in Nastaliq Font
An OCR System for recognition of Urdu text in Nastaliq FontAn OCR System for recognition of Urdu text in Nastaliq Font
An OCR System for recognition of Urdu text in Nastaliq Font
 
Isolated Kannada Character Recognition using Chain Code Features
Isolated Kannada Character Recognition using Chain Code FeaturesIsolated Kannada Character Recognition using Chain Code Features
Isolated Kannada Character Recognition using Chain Code Features
 
A Survey of Modern Character Recognition Techniques
A Survey of Modern Character Recognition TechniquesA Survey of Modern Character Recognition Techniques
A Survey of Modern Character Recognition Techniques
 
A Novel Framework For Numerical Character Recognition With Zoning Distance Fe...
A Novel Framework For Numerical Character Recognition With Zoning Distance Fe...A Novel Framework For Numerical Character Recognition With Zoning Distance Fe...
A Novel Framework For Numerical Character Recognition With Zoning Distance Fe...
 
Optical Character Recognition (OCR) based Retrieval
Optical Character Recognition (OCR) based RetrievalOptical Character Recognition (OCR) based Retrieval
Optical Character Recognition (OCR) based Retrieval
 
Two Methods for Recognition of Hand Written Farsi Characters
Two Methods for Recognition of Hand Written Farsi CharactersTwo Methods for Recognition of Hand Written Farsi Characters
Two Methods for Recognition of Hand Written Farsi Characters
 
FREEMAN CODE BASED ONLINE HANDWRITTEN CHARACTER RECOGNITION FOR MALAYALAM USI...
FREEMAN CODE BASED ONLINE HANDWRITTEN CHARACTER RECOGNITION FOR MALAYALAM USI...FREEMAN CODE BASED ONLINE HANDWRITTEN CHARACTER RECOGNITION FOR MALAYALAM USI...
FREEMAN CODE BASED ONLINE HANDWRITTEN CHARACTER RECOGNITION FOR MALAYALAM USI...
 
E123440
E123440E123440
E123440
 
Handwritten Script Recognition
Handwritten Script RecognitionHandwritten Script Recognition
Handwritten Script Recognition
 
Optical Character Recognition System for Urdu (Naskh Font)Using Pattern Match...
Optical Character Recognition System for Urdu (Naskh Font)Using Pattern Match...Optical Character Recognition System for Urdu (Naskh Font)Using Pattern Match...
Optical Character Recognition System for Urdu (Naskh Font)Using Pattern Match...
 
OCV & OCR - A Validation Perspective
OCV & OCR - A Validation PerspectiveOCV & OCR - A Validation Perspective
OCV & OCR - A Validation Perspective
 
Khmer ocr using gfd_seminar_day
Khmer ocr using gfd_seminar_dayKhmer ocr using gfd_seminar_day
Khmer ocr using gfd_seminar_day
 
Paper id 24201433
Paper id 24201433Paper id 24201433
Paper id 24201433
 

Recently uploaded

Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile servicerehmti665
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 
Current Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLCurrent Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLDeelipZope
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSRajkumarAkumalla
 
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...RajaP95
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat
 
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...ZTE
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).pptssuser5c9d4b1
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N
 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineeringmalavadedarshan25
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxJoão Esperancinha
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxDeepakSakkari2
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxwendy cai
 

Recently uploaded (20)

Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile service
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
Current Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLCurrent Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCL
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
 
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
 
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
 
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEDJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineering
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptx
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptx
 

Optical Character Recognition for Urdu Handwriting and Ligatures using Deep Learning

  • 1. OPTICAL CHARACTER RECOGNITION FOR URDU HANDWRITING Presented by: SAAD USMAN USMAN Ali YASIR HAYAT
  • 2. INTRODUCTION: ■ Optical character recognition is the process of converting scanned images of machine printed or handwritten text (numerals, letters and symbols), into a computer process able format or machine-encoded text.
  • 3. OCR ■ It is widely used as a form of information entry from printed paper data records, whether passport documents, invoices, bank statements, computerized receipts, business cards, mail, printouts of static-data, or any suitable documentation. ■ It is a common method of digitizing printed texts so that they can be electronically edited, searched, stored more compactly, displayed on-line, and used in machine processes ■ OCR is a field of research in pattern recognition, artificial intelligence and computer vision.
  • 4. EARLIER WORK ■ Early versions needed to be trained with images of each character, and worked on one font at a time. ■ Advanced systems capable of producing a high degree of recognition accuracy for most fonts are now common, and with support for a variety of digital image file format inputs. ■ Some systems are capable of reproducing formatted output that closely approximates the original page including images, columns, and other non-textual components.
  • 6. 1. Image Acquisition: digital image acquisition is the creation of photographic images, such as of a physical scene or of the interior structure of an object. 2. Preprocessing: Preprocessing consists of text area extraction, text line extraction, baseline detection, component segmentation, character segmentation, primary and secondary stroke extraction. It also involves noise reduction, document decomposition etc. 3. Segmentation: Segmentation is the process of dividing an image into regions , each region to containing a single object or a group of objects of the same type.
  • 7. 4. Feature Extraction: Selection of appropriate feature extraction, structured information which is more related to writing like dot, loops and branches are computed. 5. Classification: Classification is the process of identifying each character and assigning to it the correct character class. 6. Recognition: Recognition is the last step in order to achieve our desired output, the extracted features of character is then match with the stored features to recognize the character.
  • 8. Over view of Urdu:  Urdu derived from the mixture of Arabic, Turkish, Farsi and Hindi Languages with 58 character set defined by National Language Authority Pakistan.  Each letter has multiple forms depending on its position in the word.  four forms: isolated , initial , medial and final.  Urdu characters can be divided into two groups, separators and non-separators .  The separators or non-joiners can acquire only isolated and final shape. On contrary non-separators or joiners can acquire all the four shapes
  • 9.
  • 11. RECOGNITION OF OFFLINE HANDWRITTEN ISOLATED URDU CHARACTER
  • 12. System Methodology used: ■ Data set: ■ proposed method is applied on 36800 handwritten Urdu characters. For each of 46 characters 200 image samples were used for training and 600 for testing respectively . ■ Methodology: ■ Moment Invariants (MI) are used to evaluate seven distributed parameters for handwritten isolated Urdu character.
  • 13. ■ Initially verified whether character consists of single component or more than one component. ■ If single component then image is normalized into 60 X 60 and divided into 3 horizontal zones for features extraction ■ From each zone 7 MI features and from whole image 7 MI features were computed, hence total 28 MI features. ■ SVM is used for classification ■ character is put into appropriate class as single stork component character. ■ secondary component is normalized into 22 X 22 and divided into 2 horizontal zones
  • 14.
  • 15. Results: Achieved overall accuracy up to 93.59% for all offline handwritten isolated Urdu characters
  • 16. OPTICAL CHARACTER RECOGNITION SYSTEM FOR URDU. ONLINE AND OFFLINE OCR IRRESPECTIVE OF FONTS.
  • 17. ■ Urdu language is also one of the languages which contain the features, properties, scripts and writing styles of duo languages Arabic and Persian. ■ Urdu script is blend of Naskh, Arabic style and Talique, Persian style called Nastalique script. ■ It is cursive in nature, which makes it more difficult for conventional algorithms to work on it. ■ Online character recognition is a process which is used for handwriting recognition, example digital pens. ■ Offline character recognition can have both handwriting and printed material. Offline character recognition mostly used for printed papers, book etc. ■ We have used segmentation free approach in which only ligatures are segmented.
  • 21. Compound Component Extraction: ■ The whole compound ligature along with its primary and secondary stroke is called compound component. ■ Each connected component is inspected and is formed according to the following rules: 1. If area of a connected component reside over another connected component than both are the parts of one compound component. 2. If area of a connected component reside under another connected component than both are the part of one compound component. 3. If area of a connected component reside over or under another connected component more than 50% than it is a part of one compound component. 4. If area of a connected component does not reside over or under another connected component than component it self is compound connected component
  • 22. Base Line Detection: ■ Base line is the horizontal point where maximum black pixels are present. ■ Connected component that lies on the base line are primary strokes and others are secondary strokes. ■ average horizontal lines are drawn on 50% and 35% of the height of compound component.
  • 23. Stroke Identification: ■ Primary and secondary strokes are identified on the basis of base line and average horizontal lines according to the following rules: 1. Strokes that lie on base line and one of the average horizontal lines are primary strokes. 2. Strokes that lie on both average horizontal lines are primary strokes. 3. Strokes that do not lie on any line are secondary strokes. 4. Strokes that lie only on one average horizontal line are secondary strokes 5. If one stroke lies on base line and other stroke do not lie on any line than base line stroke is primary and other is secondary. ■ For handwriting we have developed stroke identification improvement algorithm.
  • 24. Features Extraction: ■ We computed five features for single character or ligature. First of all image is resize into 64x64 pixels after that features are extracted. 1. 8x64 pixels: 8x64 pixel window move from right to left and compute the ratio between white and black pixels. 2. 64x8 pixels:64x8 pixels window move from top to bottom. 3. 8x8 pixels:8x8 pixels window moves from top right to bottom left. 4. Square shape: Square shape method first read 2x2 pixels from right top to bottom left and if any black pixel is found than whole 2x2 pixel are converted in black pixels. 5. Hu invariant moments are calculated for character and ligatures.
  • 25. Recognition: ■ the extracted features are then match with the stored features to recognize the character. ■ We have used K-Nearest Neighbors (KNN) algorithm for features matching. ■ In KNN we have applied Euclidean distance with 10 nearest neighbors. ■ If matched five features independently by KNN, then the maximum same result given by independent result is the final recognition result.
  • 26. Results and conclusion: Developed the proposed OCR system on MATLAB and Microsoft C#.Net. ■ system gives 97.09% accuracy in extracting text lines. ■ Accuracy of 98.86% found in primary and secondary stroke extraction. ■ Recognition gives accuracy of 97.12%. Proposed a system for both online and offline Urdu OCR system.
  • 27. CLASSIFICATION OF URDU LIGATURES USING CONVOLUTIONAL NEURAL NETWORKS – A NOVELAPPROACH
  • 28. ■ Ligature: word or a sub-word that is a combination of (one to eight) connected characters. ■ Why Ligature ? ■ Why Neural Network ?
  • 29.  Data set: 55,000 Urdu ligatures and are extracted from scanned pages of a famous Urdu book (‘Zawiya’).  Training: Trained on dataset of 38000 ligature/552 Classes where as Previous work was on Approx. 10000 ligatures.  Accuracy: Beat the state of art (93.59%)
  • 30. Preprocessing:  Each image is binarized(global thresholding ) and resized(55x55).  The ligature is then copied at the top left corner of the standard 55x55 image.
  • 31. Proposed Architecture:  We have used six convolutional layers stacked with each other with pooling layers after every two convolutional layers.  This arrangement yield maximum accuracy  Finally, there is a single fully connected layer which computes the class scores
  • 32. PERFORMANCE COMPARISON OF DIFFERENT CNN ARCHITECTURES (ACCURACY)
  • 33. Improvement:  For High Accuracy more training data  The realized results show that deeper the network and smaller the kernel size, better are the recognition rates.  Addition of Special characters, Numerals and recognition as special ligatures.
  • 35. Classification: ■ Some letters belong to multiple classes because they contain isolated and final forms different from initial and medial forms.
  • 38. End Point: ■ Skeletonized image is then segmented after determining the ending point of the ligature. ■ In Nastalique it is very difficult to determine the exact starting point of the ligature so instead of that we start with the ending point of the ligature which is more deterministic
  • 39. ■ Sixty segments were extracted from all shapes.
  • 40. Segmentation of the Ligatures:
  • 41. Results:  A total of 1692 ligatures, which are formed from the six base forms.  Got accuracy of 92.73%  The Urdu words were written in font Noori Nastalique and font size was 36.
  • 42. Problems:  Some letters were not recognized correctly may be due to scanning,binarization.
  • 43. Future/Improvements:  Increasing the number of training samples so there will be less chances of error.  As in previous slide problem by giving original segment as HMM input instead of giving skeletonized segment.  Can be extended to cover all set of Urdu letters and for different font size.
  • 44. Reference: ■ S. Sardar and A. Wahab, "Optical character recognition system for Urdu," 2010 International Conference on Information and Emerging Technologies, Karachi, 2010, pp. 1-5. ■ N. Javed, S. Shabbir, I. Siddiqi and K. Khurshid, "Classification of Urdu Ligatures Using Convolutional Neural Networks - A Novel Approach," 2017 International Conference on Frontiers of Information Technology (FIT), Islamabad, 2017, pp. 93-97. ■ Pathan, Imran & Ramteke, Rakesh. (2012). Recognition of Offline Handwritten Isolated Urdu Character. Advances in Computational Research. 4. 117-121. ■ Javed S.T., Hussain S. (2013) Segmentation Based Urdu Nastalique OCR. In: Ruiz-Shulcloper J., Sanniti di Baja G. (eds) Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications. CIARP 2013. Lecture Notes in Computer Science, vol 8259. Springer, Ber.lin, Heidelberg ■ Z. Ahmad, J. K. Orakzai and I. Shamsher, "Urdu compound Character Recognition using feed forward neural networks," 2009 2nd IEEE International Conference on Computer Science and Information Technology, Beijing, 2009, pp. 457-462. ■ Naz, Saeeda, Arif Iqbal Umar, Riaz Ahmad, Saad Bin Ahmed, Syed Hamad Shirazi, Imran Siddiqi and Muhammad Imran Razzak. “Offline cursive Urdu-Nastaliq script recognition using multidimensional recurrent neural networks.” Neurocomputing 177 (2016): 228-241.