VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
offline character recognition for handwritten gujarati text
1. A Presentation
on
Mid Sem Review(2740001)
GUIDED BY:
Prof. Hinaxi M. Patel
Department of CSE & IT
SVMIT-Bharuch
PREPARED BY:
Bhumika B. Patel
(160450723006)
ME-IT 4th Sem
SVMIT-Bharuch
2. Outlines
Introduction of OCR
Objective
Literature Survey
Research Gap
Problem Definition
Proposed Approach
Work Flow
Convolutional Neural Network
Implementation Results
Conclusion
Work Plan
References
8-Mar-18 160450723006 2
3. Optical Character Recognition (OCR) -
What it is ?
Convert scanned image into readable text.
8-Mar-18 160450723006 3
Fig. Sample Handwritten Data[2]
4. Types of OCR
Types Description Examples [23]
1.Handwritten character
recognition(HCR)
HCR is more difficult to
implement than Printed Character
Recognition due to diverse
human handwriting styles and
customs.
2.Printed character
recognition(PCR)
In PCR, the images to be
processed are in the forms of
standard fonts like Times New
Roman, Arial, Courier, etc
8-Mar-18 160450723006 4
5. Comparison between Online & Offline Handwritten Character
Recognition
S.No Comparison Online Characters Offline Characters
1
Availability of
number of pen
strokes
Yes No
2
Raw data
requirements
#sample/sec(e.g. 100) #dots/inch( )e.g.300
3 Way of writing
Using digital pen
on Liquid Crystal
Display (LCD)
Paper document
4 Recognition rates Higher Lower
5 Accuracy Higher Lower
8-Mar-18 160450723006 5
10. Objective
Recognize handwritten Guajarati character from scanned images.
Improve accuracy of each character.
8-Mar-18 160450723006 10
11. Literature Survey
8-Mar-18 160450723006 11
Sr. No. Title
1 Deep Learning Based Large Scale Handwritten Devanagari Character Recognition
2 From Machine Generated To Handwritten Character Recognition; A Deep Learning Approach
3 Classification of Offline Gujarati Handwritten Characters
4 Text-Based Image Segmentation Methodology
5 Image Normalization and Preprocessing for Gujarati Character Recognition
6 Features Fusion based Approach for Handwritten Gujarati Character Recognition
7 Handwritten Recognition Using SVM , KNN and Neural Network
8 Gujarati handwritten numeral optical character reorganization through neural network
9 Gujarati Character Recognition using Adaptive Neuro Fuzzy Classifier
10 Performance Scrutiny of Thinning Algorithms on Printed Gujarati Characters and Handwritten Numerals
11 Stroke Identification in Gujarati Text using Directional Feature
12 Support vector machine for identification of handwritten Gujarati alphabets using hybrid feature space
12. Author &
Publication/
Year
Objective Method Dataset Critical Comment
Acharya, S., Pant,
A. K., &
Gyawali, P. K
(IEEE/2015) [4]
Recognize
Devanagari characters
and numbers using
Deep Convolutional
MATLAB
• Deep CNN
-
Some matching character are not
recognized.
K. Peymani
and M.
Soryani,.
(IEEE/2017) [5]
Recognize Farsi
characters from
machine generated to
handwritten using
Deep Convolutional
Neural Network
layers
MATLAB
• Deep CNN
-
Feature extraction layers is that
they do not extract the same set of
features. Instead, they adopt and
learn through time.
8-Mar-18 160450723006 12
13. Author &
Publication/
Year
Objective Method Dataset Critical Comment
Swital J.
Macwan,
Archana N. Vyas
(IEEE/2015)
[6]
Recognizing Gujarati
Characters from
Handwritten character
scanned Image.
Different method
applied on different
language.
MATLAB
•Transform Domain (DWT,
DCT and DFT), from Spatial
Domain
• Geometric Method
(Gradient feature)
• Structural method
(Freeman chain code)
•Statistical method (Zernike
Moments).
ક,ખ,ઘ,ઞ Applied SVM on datasets have
many attributes and also can
handle large number of classes.
Gupta Mehula et
al.
(Elsevier/2014)
[7]
Applied
Segmentation
technique on printed
text image.
MATLAB
•Pixel counting approach
•Histogram Approach
•Smearing Approach
•Stochastic Approach
• Water flow Approach
eighteen
[In English
word]
Histogram algorithm as compared
to the pixel counting approach is
the increased computation and the
resulting space complexity
8-Mar-18 160450723006 13
14. Author &
Publication/Y
ear
Objective Method Dataset Critical comment
Jayashree
Rajesh Prasad
(IJCSN/2014)
[8]
The goal for character normalization
is to reduce the within class
variation of the shapes of the
characters in order to facilitate
feature extraction process and also
improve their classification accuracy
MATLAB
•Image Normalization
ક,ખ,ચ,જ,ટ Gujarati characters
achieved 86.6 %
recognition rate for the
isolated
Ankit Sharma,
et al.
(NU/2016) [9]
Extract character using Feature
Fusion technique.
MATLAB
•Naïve Bayes classifier
•SVM
ક Improve Machine learning
techniques based on
Bayes classifier and
Support Vector Machines.
Increase in number of
classes can make the
problem difficult
8-Mar-18 160450723006 14
15. Author &
Publication/Ye
ar
Objective Method Dataset Critical comment
Norhidayu binti
et al.
(IEEE/2016)
[10]
Reduce the features to achieve
the same or better result after
ranking and reduce the
processing time for
classification.
MATLAB
•SVM,
•K-NN
•Neural Network
0,1,2,5,8,9 Different random weight
initializations can lead to
different validation accuracy.
Apurva
A.Desai
(Elsevier/2010)
[11]
To Recognized Handwritten
Numeral Optical Character
Recognition using Neural
Network.
MATLAB
•Feed forward
Back propagation
Neural Network
પ To improve the performance of
this prototype the improved
feature abstraction technique and
preprocessing techniques are
possibly required.
8-Mar-18 160450723006 15
16. Author &
Publication/Yea
r
Objective Method Dataset Critical comment
Jayashree Rajesh
Prasad,
Uday V. Kulkarni
(IEEE/ 2014)[12]
Evaluates performance of
some efficient classifiers
for handwritten
characters of Gujarati
using Adaptive Neuro
Fuzzy Classifiers.
MATLAB
•Adaptive
Neuro Fuzzy
Classifiers
ક, ખ,ચ,છ Implemented an ANFC to handle large
dataset of handwritten characters of
Gujarati.
S. B. Suthar, R.
S. Goradia, B. N.
Dalwadi, and S.
M.
Patel(Springer/20
18)[13]
Applying thinning on
binary Gujarati characters
which converts binary
images to single pixel
wide line.
MATLAB
•Hilditch
Sequential
thinning
algorithm
3,બ,ભ Within the same domain, it gives different
result for different patterns.
8-Mar-18 160450723006 16
17. Author &
Publication/Year
Objective Method & Tools Dataset Critical comment
Mahendra B.
Mendapara,
Mukesh M.
Goswami(IEEE/2
014)[14]
Stroke will separate from the
thinned binary image of text
and extract the directional
features from the separated
stroke.
MATLAB
•K-Nearest
Neighbor
Method
૧ Difficult to obtain the stroke
information in the absence of any
temporal information about the
character formation.
Apurva A.
Desai(Springer/20
15)[15]
Objective of this paper is to
classify handwritten Guajarati
alphabets.
MATLAB
•Support Vector
Machine (SVM)
પ Difficult for printed scan text, only
used for handwritten script
language
8-Mar-18 160450723006 17
18. Research Gap
Most of the characters are recognized by using support vector machine, neural
network, K-Nearest Neighbor methods.
More complex for connected or joint character to identify.
8-Mar-18 160450723006 18
19. Problem Definition
To recognize characters and vowels(with characters). Also improve Accuracy of
character using deep learning-Convolutional Neural Network.
8-Mar-18 160450723006 19
20. Proposed Approach
8-Mar-18
160450723006
20
Fig Proposed Work
Dataset Pre-processing
Training & Testing
Set Separation
Convolution
Neural
Network
• Collecting handwritten data
(ક-જ્ઞ),(કા-ક:).
• Scanned dataset.
• Crop each character
manually and Label each
character.
• Resize image to 30*30
pixels and Convert to
Gray Scale form.
• Invert pixels intensity.
Add padding of 2 pixels
on all sides.
Randomly split dataset to
training(80%) & Testing
(20%) set.
Difficult to identify matching
character.
Recognize vowels.
Try to find improvement with
different configuration of CNN
layer.
Output Image
21. Description of Proposed Approach
Step 1: Collecting handwritten data from different writers.
Step 2: Scanned dataset.
Step 3: Crop each character manually and Label each character.
Step 4: Resize image to 30*30 pixels and Convert to Gray Scale form.
Step 5: Invert pixels intensity. Add padding of 2 pixels on all sides.
Step 6: Randomly split dataset to training and testing set.
Step 7: CNN Training & Run CNN Trainer.
8-Mar-18 160450723006 21
22. Classification Techniques of Offline Handwritten Character
Recognition
Template matching is simple technique of character recognition; depend on matching the stored templates with the character or
word to be recognized. The matching operation finds out the similarity between two vectors. An input image is matched with set of
already stored templates. The recognition rate of template matching is proportional to noise and image deformation [23].
Neural Networks is composed of interconnected nodes that are connected via links. Learning is provided by example via training,
or exposure to a set of input output data (patterns), where the training algorithm adjusts the link weights [23].
K-NN Classifiers is a nonparametric method used for classification. It is a Statistical method. So, basically the k-nearest neighbor
algorithm (k-NN) is a method for classifying objects based on closest training examples in the feature space [23].
Support vector machines (SVM), when applied to text classification provide high accuracy, but poor recall. One means of
customizing SVMs to improve recall, is to adjust the threshold associated with an SVM. SVMs have achieved excellent recognition
results in various pattern recognition applications [23].
Hidden Markov Model is a finite set of states, each of which is associated with a probability distribution. Transitions among the
states are governed by a set of probabilities called transition probabilities. In a state an outcome or observation can be generated,
according to the associated probability distribution. The probabilities for each candidate character are calculated. Then, the
probabilities are counted to obtain a final best character-list for character recognition [23].
8-Mar-18 160450723006 22
23. Convolutional Neural Network(CNN)
CNN used for image classification, image segmentation, object detection in images,
etc.
CNNs are comprised of three types of layers. These are Convolutional layers, Pooling
layers and Fully-connected layers. A simplified CNN architecture is illustrated in
Figure.
8-Mar-18 160450723006 23
Fig. A simple CNN Architecture
24. CNN Layers
Basic Functionalities of CNN Layer are explained below:
1. The Input layer will hold the pixel values of the image.
2. The Convolutional layer will determine the output of neurons of which are connected to local regions of the input
through the calculation of the scalar product between their weights and the region connected to the input volume. The
rectified linear unit (commonly shortened to ReLu) aims to apply an ‘elementwise’ activation function such as
sigmoid to the output of the activation produced by the previous layer.
3. The Pooling layer will then simply perform down sampling along the spatial dimensionality of the given input,
further reducing the number of parameters within that activation.
4. The Fully-Connected layers will then perform the same duties found in standard ANNs and attempt to produce
class scores from the activations, to be used for classification. It is also suggested that ReLu may be used between
these layers, as to improve performance.
8-Mar-18 160450723006 24
26. Implementation
8-Mar-18 160450723006 26
DP-1 Work
• Literature Survey
• Defined Proposed
Scheme
• Methods recognized for
character
• Analysis of methods
• Survey Paper
MSR Work
• Created Dataset
• Preprocessing applied
on all dataset
• Training & Testing
dataset
• CNN method applied
on each character
• Find accuracy
DP-2 Work
Remaining vowels
with characters will
be implemented.
Try to find
improvement if
possible with another
configuration.
Estimate and align
vertically to improve
accuracy.
• Work Flow
27. • Detail Procedure
Character: “ક”
Phase 1: Scanned Handwritten Document and Crop each character manually and Label each
character.
8-Mar-18 160450723006 27
Start of Algorithm 1
Step 1: Collecting dataset from different writers.
Step 2: Scanned documents.
Step 3: Crop each character manually.
Step 4: Label each character.
End of Algorithm 1
28. Phase 2: Preprocessing steps applied on character.
8-Mar-18 160450723006 28
Start of Algorithm 2
Step 1: Resize image to 30*30 pixel.
Step 2: Convert image to gray scale form.
Step 3: Invert Pixel Intensity.
Step 4: Add padding of 2 pixels on all sides.
End of Algorithm 2
29. Phase 3: Configure Network Layers and trained dataset.
Phase 4: Training Process
8-Mar-18 160450723006 29
30. Phase 5: Calculate Accuracy
Phase 6: Select one character and recognized as specific character.
8-Mar-18 160450723006 30
31. Character: “સ”
Phase : Scanned Handwritten Document and Crop each character manually and Label each
character.
Phase 2: Preprocessing steps applied on character.
8-Mar-18 160450723006 31
32. Phase 3: Configure Network Layers and trained dataset.
Phase 4: Training Process
8-Mar-18 160450723006 32
33. Phase 5: Calculate Accuracy
Phase 6: Select one character and recognized as specific character.
8-Mar-18 160450723006 33
39. Conclusion
We created a new dataset of Handwritten Character which is prepared by different people
writing style. It consists 1360 images of 34 characters of Gujarati script. Also explored
vowels with characters are implemented. In Future, recognize remaining vowels and
collecting more dataset. Try to find improvement with different configuration of CNN layer.
8-Mar-18 160450723006 39
40. /IT-160450723007 40
Year Time Duration Work
2017 May – June Theoretical Study
2017 July – August Literature Survey
2017 September - October Problem Statement
2017 November Proposed Approach
2017-2018 December – February Implementation
2018 March Testing
2018 April Improvement & Thesis
Documentation
Work Plan
41. References
[1] http://shodhganga.inflibnet.ac.in/bitstream/10603/123978/10/10_chapter%204.pdf Visited on 13th November, 2017
[2] K. S. Siddharth, M. Jangid, R. Dhir, and R. Rani, “Handwritten Gurmukhi Character Recognition Using Statistical and Background
Directional Distribution Features,” vol. 3, no. 6, pp. 2332–2345, 2011.
[3] M. V. Beigi, “Handwritten Character Recognition Using BP NN , LAMSTAR NN and SVM,” 2015.
[4] Acharya, S., Pant, A. K., & Gyawali, P. K. (2015). Deep Learning Based Large Scale Handwritten Devanagari Character
Recognition.
[5] K. Peymani and M. Soryani, “From machine generated to handwritten character recognition; a deep learning approach,” 2017 3rd Int.
Conf. Pattern Recognit. Image Anal., no. Ipria, pp. 243–247, 2017.
[6] S. G. Trivedi and A. Nandurbarkar, “Offline Handwritten Character Recognition for Gujarati Language,” no. May, pp. 136–139, 2017.
[7] S. J. Macwan, “Classification of Offline Gujarati Handwritten Characters,” pp. 1535–1541, 2015.
[8] G. Mehul, P. Ankita, D. Namrata, G. Rahul, and S. Sheth, “Text-Based Image Segmentation Methodology,” Procedia Technol., vol. 14,
pp. 465–472, 2014.
[9] J. R. Prasad, “Image Normalization and Preprocessing for Gujarati Character Recognition,” vol. 3, no. 5, pp. 334–339, 2014.
[10] A. Sharma, P. Thakkar, D. M. Adhyaru, and T. H. Zaveri, “Features Fusion based Approach for Handwritten Gujarati Character
Recognition,” vol. 5, 2016.
[11] N. Nur, B. Amir, and A. Hamid, “Handwritten Recognition Using SVM , KNN and Neural Network dependent and independent features
of text in the process of.”
8-Mar-18 160450723006 41
42. [12] D. Berchmans, “Optical Character Recognition : An Overview and an Insight,” pp. 1361–1365, 2014.
[13] A. A. Desai, “Gujarati handwritten numeral optical character reorganization through neural network,” Pattern Recognit., vol. 43,
no. 7, pp. 2582–2589, 2010.
[14] J. R. Prasad and U. V. Kulkarni, “Gujrati Character Recognition Using Adaptive Neuro Fuzzy Classifier,” 2014 Int. Conf.
Electron. Syst. Signal Process. Comput. Technol., pp. 402–407, 2014.
[15] N. Mehta, “A Review of Handwritten Character Recognition,” vol. 165, no. 4, pp. 37–40, 2017.
[16] S. B. Suthar, R. S. Goradia, B. N. Dalwadi, and S. M. Patel, “Performance Scrutiny of Thinning Algorithms on Printed Gujarati
Characters and Handwritten Numerals,” pp. 261–269, 2018.
[17] M. B. Mendapara, “Stroke Identification in Gujarati Text using Directional Feature,” IEEE, 2014.
[18] A. A. Desai, “Support vector machine for identification of handwritten Gujarati alphabets using hybrid feature space,” 2015.
[19] N. R. Soora and P. S. Deshpande, “Review of Feature Extraction Techniques for Character Recognition,” vol. 2063, no. July,
2017.
[20] J. Rajesh and P. Uday, “Gujrati character recognition using weighted k -NN and Mean v 2 distance measure,” no. 123, 2013.
[21] A. Marial, “Feature Extraction Of Optical Character Recognition : Survey,” vol. 12, no. 7, pp. 1129–1137, 2017.
[22] “A Study of Different Methodologies Helpful in the Identification of Offline Handwritten Script,” vol. 9359, no. 6, pp. 307–310,
2017.
[23] G. Mehul, P. Ankita, D. Namrata, G. Rahul, and S. Sheth, “Text-Based Image Segmentation Methodology,” Procedia Technol.,
vol. 14, pp. 465–472, 2014.
[24] N. Nur, B. Amir, and A. Hamid, “Handwritten Recognition Using SVM , KNN and Neural Network dependent and independent
features of text in the process of.s”
[25] V. Patel and A. Pandya, “A Survey on Gujarati Handwritten OCR using Morphological Analysis,” vol. 2, no. 2, pp. 2395–1990,
2016.
[26] Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet Classification with Deep Convolutional Neural Networks.
Advances In Neural Information Processing Systems, 1–9.
https://doi.org/http://dx.doi.org/10.1016/j.protcy.2014.09.007
8-Mar-18 160450723006 42