SlideShare a Scribd company logo
1 of 6
Bangladesh Army University of Science and Technology
(BAUST)
Department of Computer Science and Engineering
Assignment #1, Winter 2023 Level-4 Term-I
Course Code: CSE 4131 Course Title: Artificial Neural Networks and Fuzzy
Systems
Submission Date: CO Number: CO2 Full Marks: 15
ID: 200101103 Name: Khondoker Abu Naim
Bangla Handwritten Digit Recognition
1. Introduction
Bangla handwritten digit recognition is a classical problem in the field of computer vision. There
are various kinds of practical application of this system such as OCR, postal code recognition,
license plate recognition, bank checks recognition etc. Recognizing Bangla digit from documents
is becoming more important. The unique number of Bangla digits are total 10. So the recognition
task is to classify 10 different classes. The critical task of handwritten digit recognition is
recognizing unique handwritten digits. Because every human has his own writing styles. But our
contribution is for the more challenging task. The challenging task is about getting robust
performance and high accuracy for large, unbiased, unprocessed, and highly augmented “bangla-
digit” dataset. The dataset is a combination of ten class datasets that were gathered from different
sources and at different times containing blurring, noise, rotation, translation, shear, zooming,
height/width shift,brightness, contrast, occlusions, and superimposition. We have not processed
all kinds of augmentation of this dataset. We have processed blur and noisy images mainly. Then
our processed image are classified by a deep convolutional neural network (CNN).
2. Literature Review
2.1 Method 1
Proposed Method:
The purpose of OCR is to recognize and identify characters in images of text documents and map
them to computer-readable character codes that can be used for further text processing. A typical
workflow for recognizing characters from image documents is shown in FIG. This includes the
following steps:
1) Preprocessing: The input image goes through a series of preprocessing or preprocessing steps.
The purpose of preprocessing is to allow the OCR Engine to work with greater accuracy. This
can be achieved through a series of operations.
a) Binarization: The document image is thresholded to convert the grayscale image to a
binary image. Image thresholding can be global or local (adaptive). Global image
thresholding uses only one threshold for the entire image, whereas local (adaptive)
thresholding uses different thresholds for different image segments according to local
information.
b) Noise Reduction: Noise reduction improves image quality. Usually two common
approaches are taken for noise reduction: 1) image filtering such as wiener filter, Gaussian
filter, and median filter, and 2) morphological operations such as erosion and dilation.
c) Normalization: Normalizing inter-user and intra-user variability due to character size or
choice of font family such as boldface is always a good idea. Common normalization steps
include stroke width normalization or thinning, and normalization of aspect ratio and size of
the image.
d) Skew correction: Skew correction methods are employed in order to align the image
document. Major approaches for skew detection include correlation, projection profiles, and
Hough transform.
e) De-skew: The skew of handwritten text is user dependent. The Slant elimination method is
used to reduce variability due to different typefaces and normalize all characters to a
canonical form.
2) Segmentation: The purpose of image segmentation in OCR systems is to extract isolated
characters from image documents. The segmentation step includes the following operations: text line
detection, word extraction, and character segmentation. The segmentation of identified characters is
usually performed in a top-down manner . Line segmentation is performed first, then word
segmentation, then character segmentation
3) Feature Extraction: In the feature extraction step, the segmented characters are transformed into a
set of features called feature vectors. Each character is represented by its feature vector. Feature
extractionprovides dimensionality reduction to extract relevant information from character images to
facilitate better separation and identification of different characters in feature space.
4) Classification: Classification schemes provide decision rules for identifying characters based on
feature vectors. This task can be accomplished by leveraging machine learning approaches such as
Artificial Neural Networks (ANN), K-Nearest Neighbors (KNN), Hidden Markov Models (HMM),
Support Vector Machines (SVM), and standard classifiers.
5) Post-processing: Dictionary-based approaches and contexts can be used to improve recognition
rates. B. Correct spelling errors and select good words.
Result: The recognition accuracy of digit recognition for different feature sets by is based on the
dimension of the zone. For the original 32×32 image (without zoning), the detection accuracy was
rather poor at 78.5%. However, applying zoning to the character image significantly improves the
recognition accuracy of the 16x16 zoning gives a recognition accuracy of 86.5 and 8x8 zoning gives
a best accuracy of 94.0%. If the dimensionality of the zone is further reduced, the detection accuracy
will be lower. For example, 4x4 zoning gives an accuracy of 89.2. This result reflects the fact that
while zoning can help reduce feature dimensionality, as discussed in Section III, excessive feature
reduction can reduce recognition accuracy. For the feature set (8×8 zones) with the best performance,
we also calculated the recognition accuracy for each digit separately. This shows that digits such as 4
and 8 have very high accuracy, while other digits such as 3 and 5 have relatively poor recognition
accuracy.
Limitation: Bangla numeral recognition. The method demonstrates an excellent result with 94%
overall accuracy. This result is very promising, and is likely to improve if pre-processing techniques
such as normalization, skew correction, and slant removal are applied. Further improvement may be
achieved with the use of appropriate features specific to the Bangla digits, and different variants of
SRC such as regularized SRC and kernel SRC. Comparison with other conventional classifiers
should be considered in future as a continuation of this work. The results should also be verified for
other standard handwritten character databases such as the ISI database of handwritten Bangla
numerals
2.2 Method 2
Proposed Method: This latest CNN model that is proposed here is called "MathNET" has several
phasesas illustrated beneath.
Dataset: In this CNN model mentioned in record,6000 image (0-9) data from 'Ekush' [8] and 44 other
classes of mathematical symbols collected a total of 26,400 images.These 44 handwritten symbols
were collected by 500 students. Image clarity is based on character size. His background padding of
black in each image is less and the text is white. The images in this data set have an undistorted size
of 28 x 28 pixels, and the edges of the images appear blurred. Then concatenate the two datasets to
get a final dataset with a total of 32,400 images.
Preparation of dataset: In deep learning, variety of data inside dataset is very important.Then resize
the dataset 28x28px, remove the unnecessary black pixel and converted the whole dataset in csv
format for high speed calculation process. The whole data set has 785 columns in every row. Where
28 x 28 = 784 columns contain the pixel or dot value which represent the image and 785 number
columns store the label or class for the digits and symbols.
This model has Maxpool layer, completely attached Dense layer and used Dropout [9] for
regularization method.The first two convolutional layer has filter size of 32 and kernel_size (5,5) and
use activation function ReLU with padding = `same`. The output of dropout_1 goes into layer
conv2d_3 and conv2d_4 as an input. Max_pooling2d_2 layer which is take input from conv2d_3 and
conv2d_4 and gives the output as an input to 25% dropout_2 layer. After performing these 8
operations, the output goes through flatten_1 layer and attached to a dense_1 layer with 256
backstage units.
In this MathNET model refer to used RMSprop [10] [11] optimizer and set learning rate value to 0.
The RMSprop optimizer is equivalent to the momentum gradient descent algorithm. The RMSprop
optimizer limits vertical direction of the oscillations.
Moreover, this can accelerate the learning rate and our algorithm will take bigger steps in a more
rapidly converging horizontal direction.
CNN model works better when it finds a lot of data during training time. Here comes the data
augmentation method. It helps to generate artificial data, to avoid the overfitting of model. By
choosing several augmentation methods these are: zoom_range set to 0.1, haphazardly shift images
horizontally 0.1, haphazardly shift images vertically 0.1.
Result:
Limitation: Finding the delusion from given test set, can be declare that MathNET has been
successfully recognize 97% of the images from test data. On fig 4 top 6 error has been shown. This is
happened because of the wrong labeled data in the test set. And some of the error also confuse us this
can also be made by human.
2.3 Method 3
Proposed Method: The digit recognition process is mainly divided into three main parts:
preprocessing, feature extraction, and classification.
Preprocessing: The steps performed before feature extraction are called preprocessing. The purpose
of preprocessing is to improve the image data to suppress unwanted distortions or to enhance
important image features for further processing. Preprocessing steps include image acquisition,
binarization, denoising, skew detection, segmentation, and scaling .
1) Image Capture: Anydevice with a camera or scanner can capture images [4]. Images from PDF
files can also be imported into the system. The image is a single digit or a series of numbers
collected from license plates, bank checks, zip codes, etc.
2) Image binarization: RGB images are converted to grayscale before binarization. Binarization
is performed based on based on a fixed threshold using Otsu's threshold method .
3) Denoising: Denoising is performed to reduce the possibility of misclassification due to poor
image quality. Here a median filter is used for noise reduction [6]. This is the commonly used
smoothing method.
4) Skew Detection: Skew is usually caused by the image being placed at an angle when it is
captured. Skew is usually removed by rotating the image to an angle opposite the estimated skew
value includes line splitting and character splitting. Line segmentation is performed bhorizontally
scanning the image for a number of white pixel frequencies in each original image.Next, digit
segmentation is performed by scanning each line vertically, gaps between digits are detected, and
subimages are saved.
6) Scaling: To compare feature vectors, all digits must be scaled to a certain size. As the size of
the image increases, more features can be extracted, thus increasing the accuracy of the. memory
requirements and the time taken have also increased by. In contrast, the smaller image has less
features, resulting in less accuracy. All images are scaled to 32x32 matrices to balance the feature
size and processing time.
Result: The result is summarized in Table 3 Shows that for a very large number of training features
linear SVM works very much efficiently. But if the feature size is smaller than the number of
observation then RBF or polynomial kernel is preferred because they fit this kind of data set
properly, resulting in higher accuracy than linear SVM.
Limitation: In this paper, comparative performance of three well-known kernels of SVM
classification algorithm has been investigated to find out the appropriate kernel function for used
sample dataset of Bangla handwritten digits. Experimental result shows that using HOG features,
handwritten digit recognition shows at most 97.08% accuracy for polynomial kernel function. This
performance mostly depends on the preprocessing and feature extraction techniques. However, the
recognition rate can be improved using the combination of more than one feature extraction
techniques.
2.4 Method 4
Proposed Method:
A.Dataset preparation and image preprocessing In this study we`ve trained our model with
recently developed large dataset NumtaDB consists of 85000+ data and trained with 72040
specimens from the dataset initially. Before feeding data into the model we`ve done some image
preprocessing tasks to clean unnecessary features and artifact as much as possible to train
efficiently. At first we`ve converted images from RGB to grayscale images then reshaped into
64x64x1 dimension to maintain same volume among all training data. Then we`ve applied
Gaussian blur on the image with a standard deviation of 10. After that, blurred images have been
blended with the grayscale images again using cv2. 5 for blurred image. Peprocessing has been
applied on all train and test images.
B. Image Augmentation: Training data in provided dataset is cleaner and most of them are easily
comprehensible but the validation data or test data have some of the most challenging test cases
to evaluate model performance in most noisy condition. So we had to artificially generate or
augment our dataset to increase the variation with built-in augmentation and image preprocessing
functions from Keras library initially 0.2, height shift range of 0.2. Later we improved accuracy
by increasing main database by generating more augmented image manually for more variation.
Images with salt and pepper noise have been generated using MATLAB function imnoise().
Blurred washed out images with random angle ranging -35,-30,20,10,20,30,40 degree have been
generated using cv2. It has applied normalize box filter on image.
C. Proposed model for classification In this method we`ve experimented with different CNN
models and taken two of our best performing models (Model A and Model B) for ensembling.
First convolutional layers consist of 32 filters with 5x5 kernel size generally extract low level
features like vertical horizontal edges at greater extend followed by second layers consist of 32
filters with 3x3 kernel size in both models. After that, Maxpool layers with 2x2 kernel size and
strides of 2 are employed to reduce the features by taking maximum value which greatly cut the
computation curve and overfitting. Similarly, two convolutional layers consists of 64 filters each
with 3x3 kernel size and Maxpool layers are added similar to the previous Maxpool layer`s
configuration. Experimenting with different configurations, we have eventually found that, using
slightly wider convolutional layer at the end of Model A has offer slight accuracy boost in
Model. Rectified Linear Unit (ReLu) activation function is used in every layer including fully
connected (FC) layers except the final FC layers in both models. The convolution feature maps
are flattened and connected with FC layer with 64 neurons. Dropout layers are added before the
final FC layers with value of 0. Finally FC layers of 10 neurons are added with Softmax
activation function for classification of 10 classes and ensembled the models by averaging the
final output layers. Same padding configuration is used in all convolutional layer in both models.
Result: In this training, 20 percent of 116395 specimens have been used as validation set for
determining model performance and other 96116 specimens were used for training. We`ve also
compared the result by training with original non augmented data size of 72000+ to reflect the
performance comparison among our proposed method and other nonaugmented machine learning
and feature extraction based approach. The model has been implemented with python library
namely Keras v2.4. 2 python library and MATLAB image processing toolbox from MATLAB
r2017a have been used for manual augmentation. 50 GHz, RAM: 8.00 GB, Graphics: NVIDIA
GT-940MX, 2GB) and Google Colaboratory [24] Cloud platform with NVIDIA Tesla K-80 GPU
and 12GB RAM support. We`ve tested with various iteration level and found that minimum 30
and 6 epochs are needed to get the maximum performance in Model A and Model B respectively.
We conclude that our proposed method performs worst in detecting numeral '১' which is
misclassified in 57 cases among 1750 specimens. That’s mean lowest 96.74% accuracy. After
that numeral ‘৯’ has second lowest detection rate with 97.360% accuracy. Our proposed model
confuses these two numerals and misclassified '১' as '৯' 26 times and ‘৯’ as ‘১’ 25 times because
numeral '১' and '৯' are sometimes might be bit confusing even in human eyes depending on test
cases. Numeral '৪' has highest detection rate misclassified in only 25 specimens among 1774 test
cases (98.59% accuracy). Some of the test specimens are very confusing even for the human
eyes. among the 17760 specimens only 570 test cases are misclassified. Most of the misclassified
specimens are heavily augmented, noisy data. The model has 96.788%.
Limitation: 98.98% accuracy with image augmentation though he had test images which were not
so noisy. Our proposed model outperforms the previous works on clear images, where we have
achieved 99.2, also very good accuracy with beyond 90% in noisy, highly augmented specimens.
Before augmenting, we can see, the accuracy has been very low for tilted, random box noise and
color shifted specimens. For tilted images the accuracy has been only 13. After augmentation,
accuracy has jumped to 95. As shown in Table VI, our proposed model outperforms some well
stablished models like resnet-18 and lenet-5. Also we compare our model with another ensemble
technique where we`ve trained Model A with 5 fold in Kfold cross validation and got 5 different
model with 5 fold in same architecture but our proposed model outperforms it also.
Result and Analysis
Font Family: Times New Roman, Font Size: 12, Justified
Compare the results of all four methods and write your analysis.
3. Conclusion
Font Family: Times New Roman, Font Size: 12, Justified
References
[1] Khan, Haider Adnan, Abdullah Al Helal, and Khawza I. Ahmed. "Handwritten bangla
digit recognition using sparse representation classifier." 2014 International Conference on
Informatics, Electronics & Vision (ICIEV). IEEE, 2014.
[2] Shuvo, Shifat Nayme, et al. "MathNET: using CNN bangla handwritten digit,
mathematical symbols, and trigonometric function recognition." Soft Computing Techniques
and Applications: Proceeding of the International Conference on Computing and
Communication (IC3 2020). Springer Singapore, 2021.
[3 Rehana, Hasin. "Bangla handwritten digit classification and recognition using SVM
algorithm with HOG features." 2017 3rd International Conference on Electrical Information
and Communication Technology (EICT). IEEE, 2017. .
[4] Noor, Rouhan, Kazi Mejbaul Islam, and Md Jakaria Rahimi. "Handwritten bangla
numeral recognition using ensembling of convolutional neural network." 2018 21st
international conference of computer and information technology (ICCIT). IEEE, 2018.

More Related Content

Similar to Bangla Handwritten Digit Recognition Methods CNN

IRJET- Optical Character Recognition using Image Processing
IRJET-  	  Optical Character Recognition using Image ProcessingIRJET-  	  Optical Character Recognition using Image Processing
IRJET- Optical Character Recognition using Image ProcessingIRJET Journal
 
Handwriting_Recognition_using_KNN_classificatiob_algorithm_ijariie6729 (1).pdf
Handwriting_Recognition_using_KNN_classificatiob_algorithm_ijariie6729 (1).pdfHandwriting_Recognition_using_KNN_classificatiob_algorithm_ijariie6729 (1).pdf
Handwriting_Recognition_using_KNN_classificatiob_algorithm_ijariie6729 (1).pdfSachin414679
 
Feature Extraction and Feature Selection using Textual Analysis
Feature Extraction and Feature Selection using Textual AnalysisFeature Extraction and Feature Selection using Textual Analysis
Feature Extraction and Feature Selection using Textual Analysisvivatechijri
 
Comparative study of two methods for Handwritten Devanagari Numeral Recognition
Comparative study of two methods for Handwritten Devanagari Numeral RecognitionComparative study of two methods for Handwritten Devanagari Numeral Recognition
Comparative study of two methods for Handwritten Devanagari Numeral RecognitionIOSR Journals
 
OCR for Gujarati Numeral using Neural Network
OCR for Gujarati Numeral using Neural NetworkOCR for Gujarati Numeral using Neural Network
OCR for Gujarati Numeral using Neural Networkijsrd.com
 
Tracking number plate from vehicle using
Tracking number plate from vehicle usingTracking number plate from vehicle using
Tracking number plate from vehicle usingijfcstjournal
 
Bangla Optical Digits Recognition using Edge Detection Method
Bangla Optical Digits Recognition using Edge Detection MethodBangla Optical Digits Recognition using Edge Detection Method
Bangla Optical Digits Recognition using Edge Detection MethodIOSR Journals
 
Recognition Technology for Four Arithmetic Operations
Recognition Technology for Four Arithmetic OperationsRecognition Technology for Four Arithmetic Operations
Recognition Technology for Four Arithmetic OperationsTELKOMNIKA JOURNAL
 
IRJET- Document Layout analysis using Inverse Support Vector Machine (I-SV...
IRJET- 	  Document Layout analysis using Inverse Support Vector Machine (I-SV...IRJET- 	  Document Layout analysis using Inverse Support Vector Machine (I-SV...
IRJET- Document Layout analysis using Inverse Support Vector Machine (I-SV...IRJET Journal
 
Document Layout analysis using Inverse Support Vector Machine (I-SVM) for Hin...
Document Layout analysis using Inverse Support Vector Machine (I-SVM) for Hin...Document Layout analysis using Inverse Support Vector Machine (I-SVM) for Hin...
Document Layout analysis using Inverse Support Vector Machine (I-SVM) for Hin...IRJET Journal
 
AN INTEGRATED APPROACH TO CONTENT BASED IMAGE RETRIEVAL by Madhu
AN INTEGRATED APPROACH TO CONTENT BASED IMAGERETRIEVAL by MadhuAN INTEGRATED APPROACH TO CONTENT BASED IMAGERETRIEVAL by Madhu
AN INTEGRATED APPROACH TO CONTENT BASED IMAGE RETRIEVAL by MadhuMadhu Rock
 
DCT based Steganographic Evaluation parameter analysis in Frequency domain by...
DCT based Steganographic Evaluation parameter analysis in Frequency domain by...DCT based Steganographic Evaluation parameter analysis in Frequency domain by...
DCT based Steganographic Evaluation parameter analysis in Frequency domain by...IOSR Journals
 
Enhancement and Segmentation of Historical Records
Enhancement and Segmentation of Historical RecordsEnhancement and Segmentation of Historical Records
Enhancement and Segmentation of Historical Recordscsandit
 
Finding similarities between structured documents as a crucial stage for gene...
Finding similarities between structured documents as a crucial stage for gene...Finding similarities between structured documents as a crucial stage for gene...
Finding similarities between structured documents as a crucial stage for gene...Alexander Decker
 
Opticalcharacter recognition
Opticalcharacter recognition Opticalcharacter recognition
Opticalcharacter recognition Shobhit Saxena
 
Bangla Handwritten Digit Recognition Report.pdf
Bangla Handwritten Digit Recognition  Report.pdfBangla Handwritten Digit Recognition  Report.pdf
Bangla Handwritten Digit Recognition Report.pdfKhondokerAbuNaim
 
Introduction to image processing and pattern recognition
Introduction to image processing and pattern recognitionIntroduction to image processing and pattern recognition
Introduction to image processing and pattern recognitionSaibee Alam
 
Text Extraction and Recognition Using Median Filter
Text Extraction and Recognition Using Median FilterText Extraction and Recognition Using Median Filter
Text Extraction and Recognition Using Median FilterIRJET Journal
 

Similar to Bangla Handwritten Digit Recognition Methods CNN (20)

IRJET- Optical Character Recognition using Image Processing
IRJET-  	  Optical Character Recognition using Image ProcessingIRJET-  	  Optical Character Recognition using Image Processing
IRJET- Optical Character Recognition using Image Processing
 
Handwriting_Recognition_using_KNN_classificatiob_algorithm_ijariie6729 (1).pdf
Handwriting_Recognition_using_KNN_classificatiob_algorithm_ijariie6729 (1).pdfHandwriting_Recognition_using_KNN_classificatiob_algorithm_ijariie6729 (1).pdf
Handwriting_Recognition_using_KNN_classificatiob_algorithm_ijariie6729 (1).pdf
 
Feature Extraction and Feature Selection using Textual Analysis
Feature Extraction and Feature Selection using Textual AnalysisFeature Extraction and Feature Selection using Textual Analysis
Feature Extraction and Feature Selection using Textual Analysis
 
Comparative study of two methods for Handwritten Devanagari Numeral Recognition
Comparative study of two methods for Handwritten Devanagari Numeral RecognitionComparative study of two methods for Handwritten Devanagari Numeral Recognition
Comparative study of two methods for Handwritten Devanagari Numeral Recognition
 
OCR for Gujarati Numeral using Neural Network
OCR for Gujarati Numeral using Neural NetworkOCR for Gujarati Numeral using Neural Network
OCR for Gujarati Numeral using Neural Network
 
Tracking number plate from vehicle using
Tracking number plate from vehicle usingTracking number plate from vehicle using
Tracking number plate from vehicle using
 
Bangla Optical Digits Recognition using Edge Detection Method
Bangla Optical Digits Recognition using Edge Detection MethodBangla Optical Digits Recognition using Edge Detection Method
Bangla Optical Digits Recognition using Edge Detection Method
 
Recognition Technology for Four Arithmetic Operations
Recognition Technology for Four Arithmetic OperationsRecognition Technology for Four Arithmetic Operations
Recognition Technology for Four Arithmetic Operations
 
IRJET- Document Layout analysis using Inverse Support Vector Machine (I-SV...
IRJET- 	  Document Layout analysis using Inverse Support Vector Machine (I-SV...IRJET- 	  Document Layout analysis using Inverse Support Vector Machine (I-SV...
IRJET- Document Layout analysis using Inverse Support Vector Machine (I-SV...
 
Document Layout analysis using Inverse Support Vector Machine (I-SVM) for Hin...
Document Layout analysis using Inverse Support Vector Machine (I-SVM) for Hin...Document Layout analysis using Inverse Support Vector Machine (I-SVM) for Hin...
Document Layout analysis using Inverse Support Vector Machine (I-SVM) for Hin...
 
AN INTEGRATED APPROACH TO CONTENT BASED IMAGE RETRIEVAL by Madhu
AN INTEGRATED APPROACH TO CONTENT BASED IMAGERETRIEVAL by MadhuAN INTEGRATED APPROACH TO CONTENT BASED IMAGERETRIEVAL by Madhu
AN INTEGRATED APPROACH TO CONTENT BASED IMAGE RETRIEVAL by Madhu
 
DCT based Steganographic Evaluation parameter analysis in Frequency domain by...
DCT based Steganographic Evaluation parameter analysis in Frequency domain by...DCT based Steganographic Evaluation parameter analysis in Frequency domain by...
DCT based Steganographic Evaluation parameter analysis in Frequency domain by...
 
J017156874
J017156874J017156874
J017156874
 
Enhancement and Segmentation of Historical Records
Enhancement and Segmentation of Historical RecordsEnhancement and Segmentation of Historical Records
Enhancement and Segmentation of Historical Records
 
Finding similarities between structured documents as a crucial stage for gene...
Finding similarities between structured documents as a crucial stage for gene...Finding similarities between structured documents as a crucial stage for gene...
Finding similarities between structured documents as a crucial stage for gene...
 
Opticalcharacter recognition
Opticalcharacter recognition Opticalcharacter recognition
Opticalcharacter recognition
 
Bangla Handwritten Digit Recognition Report.pdf
Bangla Handwritten Digit Recognition  Report.pdfBangla Handwritten Digit Recognition  Report.pdf
Bangla Handwritten Digit Recognition Report.pdf
 
Introduction to image processing and pattern recognition
Introduction to image processing and pattern recognitionIntroduction to image processing and pattern recognition
Introduction to image processing and pattern recognition
 
Text Extraction and Recognition Using Median Filter
Text Extraction and Recognition Using Median FilterText Extraction and Recognition Using Median Filter
Text Extraction and Recognition Using Median Filter
 
E017322833
E017322833E017322833
E017322833
 

More from KhondokerAbuNaim

A Comparative Analysis of Deep Learning Modelsfor Flower Recognition and Heal...
A Comparative Analysis of Deep Learning Modelsfor Flower Recognition and Heal...A Comparative Analysis of Deep Learning Modelsfor Flower Recognition and Heal...
A Comparative Analysis of Deep Learning Modelsfor Flower Recognition and Heal...KhondokerAbuNaim
 
A Comparative Analysis of Deep Learning Modelsfor Flower Recognition and Heal...
A Comparative Analysis of Deep Learning Modelsfor Flower Recognition and Heal...A Comparative Analysis of Deep Learning Modelsfor Flower Recognition and Heal...
A Comparative Analysis of Deep Learning Modelsfor Flower Recognition and Heal...KhondokerAbuNaim
 
Student_results_management_system proposel.pdf
Student_results_management_system proposel.pdfStudent_results_management_system proposel.pdf
Student_results_management_system proposel.pdfKhondokerAbuNaim
 
Bangla Hand Written Digit Recognition presentation slide .pptx
Bangla Hand Written Digit Recognition presentation slide .pptxBangla Hand Written Digit Recognition presentation slide .pptx
Bangla Hand Written Digit Recognition presentation slide .pptxKhondokerAbuNaim
 
Bangla_handwritten_dig1] final proposal .pdf
Bangla_handwritten_dig1] final proposal .pdfBangla_handwritten_dig1] final proposal .pdf
Bangla_handwritten_dig1] final proposal .pdfKhondokerAbuNaim
 
Project t Proposal Bangla alphabet handwritten recognition using deep learnin...
Project t Proposal Bangla alphabet handwritten recognition using deep learnin...Project t Proposal Bangla alphabet handwritten recognition using deep learnin...
Project t Proposal Bangla alphabet handwritten recognition using deep learnin...KhondokerAbuNaim
 
BAUST Lab report cover page .docx
BAUST Lab report cover page .docxBAUST Lab report cover page .docx
BAUST Lab report cover page .docxKhondokerAbuNaim
 
BAUST Assignment cover page .docx
BAUST Assignment cover page .docxBAUST Assignment cover page .docx
BAUST Assignment cover page .docxKhondokerAbuNaim
 
Online Voting System Project Proposal ( Presentation Slide).pptx
Online Voting System Project Proposal ( Presentation Slide).pptxOnline Voting System Project Proposal ( Presentation Slide).pptx
Online Voting System Project Proposal ( Presentation Slide).pptxKhondokerAbuNaim
 
Online Voting System project proposal report.doc
Online Voting System project proposal report.docOnline Voting System project proposal report.doc
Online Voting System project proposal report.docKhondokerAbuNaim
 

More from KhondokerAbuNaim (11)

A Comparative Analysis of Deep Learning Modelsfor Flower Recognition and Heal...
A Comparative Analysis of Deep Learning Modelsfor Flower Recognition and Heal...A Comparative Analysis of Deep Learning Modelsfor Flower Recognition and Heal...
A Comparative Analysis of Deep Learning Modelsfor Flower Recognition and Heal...
 
A Comparative Analysis of Deep Learning Modelsfor Flower Recognition and Heal...
A Comparative Analysis of Deep Learning Modelsfor Flower Recognition and Heal...A Comparative Analysis of Deep Learning Modelsfor Flower Recognition and Heal...
A Comparative Analysis of Deep Learning Modelsfor Flower Recognition and Heal...
 
Student_results_management_system proposel.pdf
Student_results_management_system proposel.pdfStudent_results_management_system proposel.pdf
Student_results_management_system proposel.pdf
 
Bangla Hand Written Digit Recognition presentation slide .pptx
Bangla Hand Written Digit Recognition presentation slide .pptxBangla Hand Written Digit Recognition presentation slide .pptx
Bangla Hand Written Digit Recognition presentation slide .pptx
 
Bangla_handwritten_dig1] final proposal .pdf
Bangla_handwritten_dig1] final proposal .pdfBangla_handwritten_dig1] final proposal .pdf
Bangla_handwritten_dig1] final proposal .pdf
 
Project t Proposal Bangla alphabet handwritten recognition using deep learnin...
Project t Proposal Bangla alphabet handwritten recognition using deep learnin...Project t Proposal Bangla alphabet handwritten recognition using deep learnin...
Project t Proposal Bangla alphabet handwritten recognition using deep learnin...
 
Quiz-Gam_1-converted.pptx
Quiz-Gam_1-converted.pptxQuiz-Gam_1-converted.pptx
Quiz-Gam_1-converted.pptx
 
BAUST Lab report cover page .docx
BAUST Lab report cover page .docxBAUST Lab report cover page .docx
BAUST Lab report cover page .docx
 
BAUST Assignment cover page .docx
BAUST Assignment cover page .docxBAUST Assignment cover page .docx
BAUST Assignment cover page .docx
 
Online Voting System Project Proposal ( Presentation Slide).pptx
Online Voting System Project Proposal ( Presentation Slide).pptxOnline Voting System Project Proposal ( Presentation Slide).pptx
Online Voting System Project Proposal ( Presentation Slide).pptx
 
Online Voting System project proposal report.doc
Online Voting System project proposal report.docOnline Voting System project proposal report.doc
Online Voting System project proposal report.doc
 

Recently uploaded

Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxAvyJaneVismanos
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
MARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupMARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupJonathanParaisoCruz
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfUjwalaBharambe
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Capitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitolTechU
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
History Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptxHistory Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptxsocialsciencegdgrohi
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxRaymartEstabillo3
 
internship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerinternship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerunnathinaik
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaVirag Sontakke
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfSumit Tiwari
 

Recently uploaded (20)

Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptx
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
MARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupMARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized Group
 
OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
9953330565 Low Rate Call Girls In Rohini Delhi NCR
9953330565 Low Rate Call Girls In Rohini  Delhi NCR9953330565 Low Rate Call Girls In Rohini  Delhi NCR
9953330565 Low Rate Call Girls In Rohini Delhi NCR
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
Capitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptx
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
History Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptxHistory Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptx
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
 
internship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerinternship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developer
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of India
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
 

Bangla Handwritten Digit Recognition Methods CNN

  • 1. Bangladesh Army University of Science and Technology (BAUST) Department of Computer Science and Engineering Assignment #1, Winter 2023 Level-4 Term-I Course Code: CSE 4131 Course Title: Artificial Neural Networks and Fuzzy Systems Submission Date: CO Number: CO2 Full Marks: 15 ID: 200101103 Name: Khondoker Abu Naim Bangla Handwritten Digit Recognition 1. Introduction Bangla handwritten digit recognition is a classical problem in the field of computer vision. There are various kinds of practical application of this system such as OCR, postal code recognition, license plate recognition, bank checks recognition etc. Recognizing Bangla digit from documents is becoming more important. The unique number of Bangla digits are total 10. So the recognition task is to classify 10 different classes. The critical task of handwritten digit recognition is recognizing unique handwritten digits. Because every human has his own writing styles. But our contribution is for the more challenging task. The challenging task is about getting robust performance and high accuracy for large, unbiased, unprocessed, and highly augmented “bangla- digit” dataset. The dataset is a combination of ten class datasets that were gathered from different sources and at different times containing blurring, noise, rotation, translation, shear, zooming, height/width shift,brightness, contrast, occlusions, and superimposition. We have not processed all kinds of augmentation of this dataset. We have processed blur and noisy images mainly. Then our processed image are classified by a deep convolutional neural network (CNN). 2. Literature Review 2.1 Method 1 Proposed Method: The purpose of OCR is to recognize and identify characters in images of text documents and map them to computer-readable character codes that can be used for further text processing. A typical workflow for recognizing characters from image documents is shown in FIG. This includes the following steps: 1) Preprocessing: The input image goes through a series of preprocessing or preprocessing steps. The purpose of preprocessing is to allow the OCR Engine to work with greater accuracy. This can be achieved through a series of operations. a) Binarization: The document image is thresholded to convert the grayscale image to a binary image. Image thresholding can be global or local (adaptive). Global image thresholding uses only one threshold for the entire image, whereas local (adaptive) thresholding uses different thresholds for different image segments according to local information.
  • 2. b) Noise Reduction: Noise reduction improves image quality. Usually two common approaches are taken for noise reduction: 1) image filtering such as wiener filter, Gaussian filter, and median filter, and 2) morphological operations such as erosion and dilation. c) Normalization: Normalizing inter-user and intra-user variability due to character size or choice of font family such as boldface is always a good idea. Common normalization steps include stroke width normalization or thinning, and normalization of aspect ratio and size of the image. d) Skew correction: Skew correction methods are employed in order to align the image document. Major approaches for skew detection include correlation, projection profiles, and Hough transform. e) De-skew: The skew of handwritten text is user dependent. The Slant elimination method is used to reduce variability due to different typefaces and normalize all characters to a canonical form. 2) Segmentation: The purpose of image segmentation in OCR systems is to extract isolated characters from image documents. The segmentation step includes the following operations: text line detection, word extraction, and character segmentation. The segmentation of identified characters is usually performed in a top-down manner . Line segmentation is performed first, then word segmentation, then character segmentation 3) Feature Extraction: In the feature extraction step, the segmented characters are transformed into a set of features called feature vectors. Each character is represented by its feature vector. Feature extractionprovides dimensionality reduction to extract relevant information from character images to facilitate better separation and identification of different characters in feature space. 4) Classification: Classification schemes provide decision rules for identifying characters based on feature vectors. This task can be accomplished by leveraging machine learning approaches such as Artificial Neural Networks (ANN), K-Nearest Neighbors (KNN), Hidden Markov Models (HMM), Support Vector Machines (SVM), and standard classifiers. 5) Post-processing: Dictionary-based approaches and contexts can be used to improve recognition rates. B. Correct spelling errors and select good words. Result: The recognition accuracy of digit recognition for different feature sets by is based on the dimension of the zone. For the original 32×32 image (without zoning), the detection accuracy was rather poor at 78.5%. However, applying zoning to the character image significantly improves the recognition accuracy of the 16x16 zoning gives a recognition accuracy of 86.5 and 8x8 zoning gives a best accuracy of 94.0%. If the dimensionality of the zone is further reduced, the detection accuracy will be lower. For example, 4x4 zoning gives an accuracy of 89.2. This result reflects the fact that while zoning can help reduce feature dimensionality, as discussed in Section III, excessive feature reduction can reduce recognition accuracy. For the feature set (8×8 zones) with the best performance, we also calculated the recognition accuracy for each digit separately. This shows that digits such as 4 and 8 have very high accuracy, while other digits such as 3 and 5 have relatively poor recognition accuracy. Limitation: Bangla numeral recognition. The method demonstrates an excellent result with 94% overall accuracy. This result is very promising, and is likely to improve if pre-processing techniques such as normalization, skew correction, and slant removal are applied. Further improvement may be
  • 3. achieved with the use of appropriate features specific to the Bangla digits, and different variants of SRC such as regularized SRC and kernel SRC. Comparison with other conventional classifiers should be considered in future as a continuation of this work. The results should also be verified for other standard handwritten character databases such as the ISI database of handwritten Bangla numerals 2.2 Method 2 Proposed Method: This latest CNN model that is proposed here is called "MathNET" has several phasesas illustrated beneath. Dataset: In this CNN model mentioned in record,6000 image (0-9) data from 'Ekush' [8] and 44 other classes of mathematical symbols collected a total of 26,400 images.These 44 handwritten symbols were collected by 500 students. Image clarity is based on character size. His background padding of black in each image is less and the text is white. The images in this data set have an undistorted size of 28 x 28 pixels, and the edges of the images appear blurred. Then concatenate the two datasets to get a final dataset with a total of 32,400 images. Preparation of dataset: In deep learning, variety of data inside dataset is very important.Then resize the dataset 28x28px, remove the unnecessary black pixel and converted the whole dataset in csv format for high speed calculation process. The whole data set has 785 columns in every row. Where 28 x 28 = 784 columns contain the pixel or dot value which represent the image and 785 number columns store the label or class for the digits and symbols. This model has Maxpool layer, completely attached Dense layer and used Dropout [9] for regularization method.The first two convolutional layer has filter size of 32 and kernel_size (5,5) and use activation function ReLU with padding = `same`. The output of dropout_1 goes into layer conv2d_3 and conv2d_4 as an input. Max_pooling2d_2 layer which is take input from conv2d_3 and conv2d_4 and gives the output as an input to 25% dropout_2 layer. After performing these 8 operations, the output goes through flatten_1 layer and attached to a dense_1 layer with 256 backstage units. In this MathNET model refer to used RMSprop [10] [11] optimizer and set learning rate value to 0. The RMSprop optimizer is equivalent to the momentum gradient descent algorithm. The RMSprop optimizer limits vertical direction of the oscillations. Moreover, this can accelerate the learning rate and our algorithm will take bigger steps in a more rapidly converging horizontal direction. CNN model works better when it finds a lot of data during training time. Here comes the data augmentation method. It helps to generate artificial data, to avoid the overfitting of model. By choosing several augmentation methods these are: zoom_range set to 0.1, haphazardly shift images horizontally 0.1, haphazardly shift images vertically 0.1. Result: Limitation: Finding the delusion from given test set, can be declare that MathNET has been successfully recognize 97% of the images from test data. On fig 4 top 6 error has been shown. This is happened because of the wrong labeled data in the test set. And some of the error also confuse us this can also be made by human.
  • 4. 2.3 Method 3 Proposed Method: The digit recognition process is mainly divided into three main parts: preprocessing, feature extraction, and classification. Preprocessing: The steps performed before feature extraction are called preprocessing. The purpose of preprocessing is to improve the image data to suppress unwanted distortions or to enhance important image features for further processing. Preprocessing steps include image acquisition, binarization, denoising, skew detection, segmentation, and scaling . 1) Image Capture: Anydevice with a camera or scanner can capture images [4]. Images from PDF files can also be imported into the system. The image is a single digit or a series of numbers collected from license plates, bank checks, zip codes, etc. 2) Image binarization: RGB images are converted to grayscale before binarization. Binarization is performed based on based on a fixed threshold using Otsu's threshold method . 3) Denoising: Denoising is performed to reduce the possibility of misclassification due to poor image quality. Here a median filter is used for noise reduction [6]. This is the commonly used smoothing method. 4) Skew Detection: Skew is usually caused by the image being placed at an angle when it is captured. Skew is usually removed by rotating the image to an angle opposite the estimated skew value includes line splitting and character splitting. Line segmentation is performed bhorizontally scanning the image for a number of white pixel frequencies in each original image.Next, digit segmentation is performed by scanning each line vertically, gaps between digits are detected, and subimages are saved. 6) Scaling: To compare feature vectors, all digits must be scaled to a certain size. As the size of the image increases, more features can be extracted, thus increasing the accuracy of the. memory requirements and the time taken have also increased by. In contrast, the smaller image has less features, resulting in less accuracy. All images are scaled to 32x32 matrices to balance the feature size and processing time. Result: The result is summarized in Table 3 Shows that for a very large number of training features linear SVM works very much efficiently. But if the feature size is smaller than the number of observation then RBF or polynomial kernel is preferred because they fit this kind of data set properly, resulting in higher accuracy than linear SVM. Limitation: In this paper, comparative performance of three well-known kernels of SVM classification algorithm has been investigated to find out the appropriate kernel function for used sample dataset of Bangla handwritten digits. Experimental result shows that using HOG features, handwritten digit recognition shows at most 97.08% accuracy for polynomial kernel function. This performance mostly depends on the preprocessing and feature extraction techniques. However, the recognition rate can be improved using the combination of more than one feature extraction techniques. 2.4 Method 4 Proposed Method: A.Dataset preparation and image preprocessing In this study we`ve trained our model with recently developed large dataset NumtaDB consists of 85000+ data and trained with 72040
  • 5. specimens from the dataset initially. Before feeding data into the model we`ve done some image preprocessing tasks to clean unnecessary features and artifact as much as possible to train efficiently. At first we`ve converted images from RGB to grayscale images then reshaped into 64x64x1 dimension to maintain same volume among all training data. Then we`ve applied Gaussian blur on the image with a standard deviation of 10. After that, blurred images have been blended with the grayscale images again using cv2. 5 for blurred image. Peprocessing has been applied on all train and test images. B. Image Augmentation: Training data in provided dataset is cleaner and most of them are easily comprehensible but the validation data or test data have some of the most challenging test cases to evaluate model performance in most noisy condition. So we had to artificially generate or augment our dataset to increase the variation with built-in augmentation and image preprocessing functions from Keras library initially 0.2, height shift range of 0.2. Later we improved accuracy by increasing main database by generating more augmented image manually for more variation. Images with salt and pepper noise have been generated using MATLAB function imnoise(). Blurred washed out images with random angle ranging -35,-30,20,10,20,30,40 degree have been generated using cv2. It has applied normalize box filter on image. C. Proposed model for classification In this method we`ve experimented with different CNN models and taken two of our best performing models (Model A and Model B) for ensembling. First convolutional layers consist of 32 filters with 5x5 kernel size generally extract low level features like vertical horizontal edges at greater extend followed by second layers consist of 32 filters with 3x3 kernel size in both models. After that, Maxpool layers with 2x2 kernel size and strides of 2 are employed to reduce the features by taking maximum value which greatly cut the computation curve and overfitting. Similarly, two convolutional layers consists of 64 filters each with 3x3 kernel size and Maxpool layers are added similar to the previous Maxpool layer`s configuration. Experimenting with different configurations, we have eventually found that, using slightly wider convolutional layer at the end of Model A has offer slight accuracy boost in Model. Rectified Linear Unit (ReLu) activation function is used in every layer including fully connected (FC) layers except the final FC layers in both models. The convolution feature maps are flattened and connected with FC layer with 64 neurons. Dropout layers are added before the final FC layers with value of 0. Finally FC layers of 10 neurons are added with Softmax activation function for classification of 10 classes and ensembled the models by averaging the final output layers. Same padding configuration is used in all convolutional layer in both models. Result: In this training, 20 percent of 116395 specimens have been used as validation set for determining model performance and other 96116 specimens were used for training. We`ve also compared the result by training with original non augmented data size of 72000+ to reflect the performance comparison among our proposed method and other nonaugmented machine learning and feature extraction based approach. The model has been implemented with python library namely Keras v2.4. 2 python library and MATLAB image processing toolbox from MATLAB r2017a have been used for manual augmentation. 50 GHz, RAM: 8.00 GB, Graphics: NVIDIA GT-940MX, 2GB) and Google Colaboratory [24] Cloud platform with NVIDIA Tesla K-80 GPU and 12GB RAM support. We`ve tested with various iteration level and found that minimum 30 and 6 epochs are needed to get the maximum performance in Model A and Model B respectively. We conclude that our proposed method performs worst in detecting numeral '১' which is misclassified in 57 cases among 1750 specimens. That’s mean lowest 96.74% accuracy. After that numeral ‘৯’ has second lowest detection rate with 97.360% accuracy. Our proposed model
  • 6. confuses these two numerals and misclassified '১' as '৯' 26 times and ‘৯’ as ‘১’ 25 times because numeral '১' and '৯' are sometimes might be bit confusing even in human eyes depending on test cases. Numeral '৪' has highest detection rate misclassified in only 25 specimens among 1774 test cases (98.59% accuracy). Some of the test specimens are very confusing even for the human eyes. among the 17760 specimens only 570 test cases are misclassified. Most of the misclassified specimens are heavily augmented, noisy data. The model has 96.788%. Limitation: 98.98% accuracy with image augmentation though he had test images which were not so noisy. Our proposed model outperforms the previous works on clear images, where we have achieved 99.2, also very good accuracy with beyond 90% in noisy, highly augmented specimens. Before augmenting, we can see, the accuracy has been very low for tilted, random box noise and color shifted specimens. For tilted images the accuracy has been only 13. After augmentation, accuracy has jumped to 95. As shown in Table VI, our proposed model outperforms some well stablished models like resnet-18 and lenet-5. Also we compare our model with another ensemble technique where we`ve trained Model A with 5 fold in Kfold cross validation and got 5 different model with 5 fold in same architecture but our proposed model outperforms it also. Result and Analysis Font Family: Times New Roman, Font Size: 12, Justified Compare the results of all four methods and write your analysis. 3. Conclusion Font Family: Times New Roman, Font Size: 12, Justified References [1] Khan, Haider Adnan, Abdullah Al Helal, and Khawza I. Ahmed. "Handwritten bangla digit recognition using sparse representation classifier." 2014 International Conference on Informatics, Electronics & Vision (ICIEV). IEEE, 2014. [2] Shuvo, Shifat Nayme, et al. "MathNET: using CNN bangla handwritten digit, mathematical symbols, and trigonometric function recognition." Soft Computing Techniques and Applications: Proceeding of the International Conference on Computing and Communication (IC3 2020). Springer Singapore, 2021. [3 Rehana, Hasin. "Bangla handwritten digit classification and recognition using SVM algorithm with HOG features." 2017 3rd International Conference on Electrical Information and Communication Technology (EICT). IEEE, 2017. . [4] Noor, Rouhan, Kazi Mejbaul Islam, and Md Jakaria Rahimi. "Handwritten bangla numeral recognition using ensembling of convolutional neural network." 2018 21st international conference of computer and information technology (ICCIT). IEEE, 2018.