SlideShare a Scribd company logo
1 of 13
Download to read offline
Indian Institute of Technology Jodhpur
Computer Science of Engineering
Sixth Semester (2015-2016)
Machine learning(Building and comparing various machine learning
models to recognize hand written digits)
Team Members:Shrey Maheshwari(ug201314017)
:Ravi Prakash Gupta(ug201310027)
Mentor:Prof. K.R.Chowdhary
1
Contents
1 Introduction 3
2 Theory 5
3 Implementation(Data Structures And Algorithms) 7
4 Application 10
5 Result 11
6 Conclusion 12
2
1 Introduction
The data file contains grayscale images of handdrawn digits, from zero through
nine. Each image is 16 pixels in height and 16 pixels in width, for a total
of 256 pixels in total. Each pixel has a single pixelvalue associated with it,
indicating the lightness or darkness of that pixel.Each image is 8bit depth
single channel so this pixelvalue is an integer between 0 and 255, inclusive.
We have modified it in the following way value=1 if pixel value >127 value
=0 otherwise Previously each pixel value was taking 8 bits. But now each
pixel value is taking 1 bit only. So 1 image is taking 256 bits only. The
data set, (train.csv), has 266 columns. The first 256 columns are pixel values
associated and other 10 indicate the label i.e. the digit that was drawn by
the user. We divided our data into 2 sets 1. Training data which comprises
of 80 % of the data. 2.Test data which comprises of 20% of the data.
Figure 1: Data.
Figure 1 shows the data.
The test data set, (test.csv), is the same as the training set, except that
it does not contain the ”label” column.
3
Figure 2: Visualization of data
Classification is a process of assigning new data to a category based on
training data in known categories. In this paper, we use a number of human
identified digit images split into training and test set. A classifier learns on
training images and labels and produces output based on test images. Output
is then compared to test labels to evaluate the classification performance. A
good classifier should be able to learn on the training data but maintain the
generalization property to be accurate when identifying the test set.
4
2 Theory
The given problem falls under the category of Supervised Learning. Su-
pervised learning is the machine learning task of inferring a function from
supervised training data. The training data consist of a set of training ex-
amples. In supervised learning, each example is a pair consisting of an input
object (typically a vector) and a desired output value (also called the su-
pervisory signal).Our problem is basically a multiclass classification problem
. To solve this problem we used Logistic Regression. logistic regression is
a regression model where the dependent variable (DV) is categorical. The
logistic function is defined as follows:
σ(t) =
et
1 + e−t
Figure 3: Logistic Function.
Figure 3 shows the Logistic Function.
5
The range of logistic function is [0,1] So our prediction will fall in [0,1]
which indicates the probability that output is that number of which logistic
regression is applied.And then our final answer will be the index value at
which the probability is maximum. We applied gradient descent algorithm
as it is simple to implement. Gradient descent is a first-order optimization
algorithm. To find a local minimum of a function using gradient descent, one
takes steps proportional to the negative of the gradient(or of the approximate
gradient) of the function at the current point. If instead one takes steps
proportional to the positive of the gradient, one approaches a local maximum
of that function; the procedure is then known as gradient ascent. We used
around 500 iterations to reach to a saturation state after which error was not
decreasing much.We plotted error vs iterations curve to ensure that the error
is always decreasing.If it had not been the case then we would have reduced
our learning rate. We used Regularization to ensure that our model do not
overfit the training data. Regularization, in mathematics and statistics and
particularly in the fields of machine learning and inverse problems, refers to
a process of introducing additional information in order to solve an ill- posed
problem or to preventoverfitting. In general, a regularization term R(f) is
introduced to a general loss function:
x
min
n
i=1
V (f(ˆxi), ˆyi) + λR(f)
for a loss function V that describes the cost of predicting f(x) when the label
is y , such as the square loss or hinge loss, and for the term λ which controls
the importance of the regularization term. R(f) is typically a penalty on the
complexity of f , such as restrictions for smoothness or bounds on the vector
space norm. There are 10 labels but logictic regression is binary classifier so
we need to train 10 logistic regression in binary classifier so we need to 10
logistic classifier . Then we applied one vs all method to finally choose the
final answer.
The one-vs.-all strategy involves training a single classifier per class, with
the samples of that class as positive samples and all other samples as neg-
atives. This strategy requires the base classifiers to produce a realvalued
confidence score for its decision, rather than just a class label; discrete class
labels alone can lead to ambiguities, where multiple classes are predicted for
a single sample.
6
3 Implementation(Data Structures And Al-
gorithms)
Firstly we initialized our learning parameter which we denoted by theta as all
zeros.Our hypothesis was sigmoid of Xθ where X is training data and theta is
learned parameter.We used sigmoid function because it gives output between
0 and 1.We wanted output of hypothesis between 0 and 1 because this is
multiclass problem which we are converting into 10 binary class classifier.
Sigmoid function looks like-
S(t) =
1
1 + e−t
Its range is [0,1].
We defined our cost function which shows the error between our prediction
and actual value as
J (θ) =
1
m
[
m
i=1
y(i)
logθ(x(i)
) + (1 − y(i)
) log(1 − hθ(x(i)
))]
It is a convex function so the problem of converging at local optima will not
come into picture. The initial value of error was
array([ 0.69314718, 0.69314718, 0.69314718, 0.69314718, 0.69314718,
0.69314718, 0.69314718, 0.69314718, 0.69314718, 0.69314718])
We initialized learning rate with some value,initially high value was cre-
ating problem so we decreased the value until the problem was solved.To get
the best value of learning rate is hit and trial. Then we implemented the
loop in which we updated our learning parameter as-
θj := θj − α
1
m
m
i=1
(hθ(x(i)
) − y(i)
)x
(i)
j
In every iteration we stored the value of cost error to ensure that it is decreas-
ing with every iteration.Initially in the project there was a problem that the
value of cost error was not decreasing continuously, it was increasing some-
times in between. So we found out that the value of learning rate is high due
to which it was overshooting. So we decreased the value of learning rate and
set it to maximum value at which cost error function was not overshooting.
7
Then we plotted cost error vs no. of iterations graph to ensure that the func-
tion is strictly decreasing and also to see that have we achieved saturation in
the error. So we recognized that after 200 iterations the error was not still
saturated so we increased the number of iterations.We achieved best possible
number of iterations by hit and trial and finally decided to keep it 500.
Figure 4: Cost Error Function.
Figure 4 shows the Cost Error Function.
Our hypothesis is a linear model which is x1θ1 + x2θ2 + x3θ3 + x4θ4........
Then we predicted output on test data by multiplying matrix of test data by
learned parameters. We obtained 10 output for every fresh example.Output
of every test example is 10x1 row. The value at each index indicates the
probability of that test example being that index. So we choose index with
maximum value as our final answer.
In sensitive applications like cheque reading in banks we cant afford to
make a single mistake so there is a different approach for it.If all the 10 values
8
of our prediction are less than some threshold(say 0.7) i.e. none of the model
is much confident about the prediction, we can output not able to recognize
so that the case is handled manually.Then we tried different combinations of
learning rate and number of iterations to achieve the best accuracy on test
data.
We finally set the value of number of iterations to 500 and learning rate
to 0.5 . Data structures arrays and matrices, list We used matrix as data
structures to store training data,test data and learned parameters.It was best
data structure to use because we need to take transpose of data, sometimes
we need to add rows,coloumns and remove rows,coloumns .It was also very
easy to perform matrix multiplication and we saved a lot of time by doing
implementing vectorization instead of loops which was only possible with the
help of matrix data structure in numpy module. We used list data structure
of python to store the values of cost error in every iterations, it was easy to
append the values in the list.
We used arrays(single dimension matrix) to plot the graphs and contain
some other useful information. We used Matplotlib package to plot graph.We
provided values for the x-axis and plot the cost error values at those x-axis
values and joined them.We obtained a decreasing curve which saturated after
some value of x.
def sigmoid(z):
a = np. exp(−z)
a = 1 + a
a = 1
a
return a
9
4 Application
It has wide applications .It can be used in banks for amount reading although
it very sensitive because we cant afford a single mistake so we should add
a new feature in our model that if the confidence is low in classifying then
it should give output as not able to recognize and that case should be han-
dled manually.It can further be extended to character recognition of various
languages.It can be used in post offices for postal code reading,there it will
reduce the workload significantlly and make the process faster.
The same project can be further extended to read telephone numbers
. In that case we first need to separate different digits and then recognize
them individually.This can also convert a handwritten document to digital
document which you can edit. This can be useful for the applications which
translate the sentence from one language to another .Those applications can
only work if the input is keyboard written so handwritten character recog-
nition system can recognize those characters and then provide input to the
translator application.
10
5 Result
Figure 5: Table of accuacy obtained.
Figure 5 shows table of accuracy obtained.
11
6 Conclusion
In this paper, a method to increase handwritten digits recognition rates by
combining feature extractions methods is proposed. Experimental results
showed that complementary features can significantly improve recognition
performance. The proposed concavity feature extraction method in conjunc-
tion with gradient features gave the highest recognition accuracy in majority
of experiments. The method worked well with chaincode features as well,
being one out of two top performers. It also has the lowest feature count
among observed complementary features, which lowers computational cost
of classification. Experiments using reduced training sets showed that the
proposed concavity method outperforms other observed approaches making
it useful for applications requiring use of a small training set. Adding training
instances from another dataset reflected on the recognition accuracy differ-
ently for different datasets. Accuracy was increased on two datasets and
decreased on one, indicating that learning process is sensitive to small dif-
ferences in image retrieval and preprocessing. Overall, the proposed method
achieved the best performance.
12
References
[1] Recognizing Handwritten Digits Using Mixtures of Linear Models Ge-
offrey E Hinton Michael Revow Peter Dayan Department of Computer
Science, University of Toronto Toronto, Ontario, Canada M5S 1A4.
[2] Representation and Recognition of Handwritten Digits Using Deformable
Templates Anil K. Jain, Fellow, IEEE, and Douglas Zongke
[3] Comparison of Learning Algorithms For Handwritten digit recogni-
tion Y.LeCun,L.Jackel,L.Bottou,C.Cortes Bell Laboratories,Holmdel,NJ
07733, USA
[4] Handwritten Digit Recognition using DCT and HMMs Syed Salman Ali,
Muhammad Usman Ghani Lahore, Pakistan
[5] Neocognitron for handwritten digit recognition Kunihiko Fukushima
Tokyo University of Technology, 14041, Ktakura, Hachioji, Tokyo
1920982, Japan
[6] R. P. W. Duin, The combining classifier: to train or not to train?, in
Pattern Recognition, 2002. Proceedings. 16th International Conference
on, 2002, vol. 2, pp. 765770 vol.2.
[7] Josef Kittler, Mohamad Hatef, Robert P. W. Duin, and Jiri Matas, On
combining classifiers, IEEE Transactions on Pattern Analysis and Ma-
chine Intelligence, vol. 20, pp. 226239,1998
[8] M. Riedmiller and H. Braun, A direct adaptive method for faster back-
propagation learning: The RPROP algorithm,International Conference
on Neural Networks, pp. 586591,1993.
[9] Y.C. Chim, A. A. Kassim, and Y. Ibrahim, Dual classifier system for
handprinted alphanumeric character recognition, Pattern Analysis and
Application, , no. 1, pp. 155162, 1998.
13

More Related Content

What's hot

Dictionary Learning for Massive Matrix Factorization
Dictionary Learning for Massive Matrix FactorizationDictionary Learning for Massive Matrix Factorization
Dictionary Learning for Massive Matrix Factorizationrecsysfr
 
Data-Driven Recommender Systems
Data-Driven Recommender SystemsData-Driven Recommender Systems
Data-Driven Recommender Systemsrecsysfr
 
Introduction to conventional machine learning techniques
Introduction to conventional machine learning techniquesIntroduction to conventional machine learning techniques
Introduction to conventional machine learning techniquesXavier Rafael Palou
 
Tree models with Scikit-Learn: Great models with little assumptions
Tree models with Scikit-Learn: Great models with little assumptionsTree models with Scikit-Learn: Great models with little assumptions
Tree models with Scikit-Learn: Great models with little assumptionsGilles Louppe
 
Implementing the Perceptron Algorithm for Finding the weights of a Linear Dis...
Implementing the Perceptron Algorithm for Finding the weights of a Linear Dis...Implementing the Perceptron Algorithm for Finding the weights of a Linear Dis...
Implementing the Perceptron Algorithm for Finding the weights of a Linear Dis...Dipesh Shome
 
Comparitive Analysis of Algorithm strategies
Comparitive Analysis of Algorithm strategiesComparitive Analysis of Algorithm strategies
Comparitive Analysis of Algorithm strategiesTalha Shaikh
 
Vc dimension in Machine Learning
Vc dimension in Machine LearningVc dimension in Machine Learning
Vc dimension in Machine LearningVARUN KUMAR
 
Implementation of K-Nearest Neighbor Algorithm
Implementation of K-Nearest Neighbor AlgorithmImplementation of K-Nearest Neighbor Algorithm
Implementation of K-Nearest Neighbor AlgorithmDipesh Shome
 
Ot regularization and_gradient_descent
Ot regularization and_gradient_descentOt regularization and_gradient_descent
Ot regularization and_gradient_descentankit_ppt
 
IRJET- Unabridged Review of Supervised Machine Learning Regression and Classi...
IRJET- Unabridged Review of Supervised Machine Learning Regression and Classi...IRJET- Unabridged Review of Supervised Machine Learning Regression and Classi...
IRJET- Unabridged Review of Supervised Machine Learning Regression and Classi...IRJET Journal
 
InfoGAN and Generative Adversarial Networks
InfoGAN and Generative Adversarial NetworksInfoGAN and Generative Adversarial Networks
InfoGAN and Generative Adversarial NetworksZak Jost
 
Beginners Guide to Non-Negative Matrix Factorization
Beginners Guide to Non-Negative Matrix FactorizationBeginners Guide to Non-Negative Matrix Factorization
Beginners Guide to Non-Negative Matrix FactorizationBenjamin Bengfort
 
Matlab solved problems
Matlab solved problemsMatlab solved problems
Matlab solved problemsMake Mannan
 

What's hot (20)

Machine Learning Basics
Machine Learning BasicsMachine Learning Basics
Machine Learning Basics
 
Dictionary Learning for Massive Matrix Factorization
Dictionary Learning for Massive Matrix FactorizationDictionary Learning for Massive Matrix Factorization
Dictionary Learning for Massive Matrix Factorization
 
Ppt shuai
Ppt shuaiPpt shuai
Ppt shuai
 
Data-Driven Recommender Systems
Data-Driven Recommender SystemsData-Driven Recommender Systems
Data-Driven Recommender Systems
 
Introduction to conventional machine learning techniques
Introduction to conventional machine learning techniquesIntroduction to conventional machine learning techniques
Introduction to conventional machine learning techniques
 
Tree models with Scikit-Learn: Great models with little assumptions
Tree models with Scikit-Learn: Great models with little assumptionsTree models with Scikit-Learn: Great models with little assumptions
Tree models with Scikit-Learn: Great models with little assumptions
 
Xgboost
XgboostXgboost
Xgboost
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
Implementing the Perceptron Algorithm for Finding the weights of a Linear Dis...
Implementing the Perceptron Algorithm for Finding the weights of a Linear Dis...Implementing the Perceptron Algorithm for Finding the weights of a Linear Dis...
Implementing the Perceptron Algorithm for Finding the weights of a Linear Dis...
 
Primitives
PrimitivesPrimitives
Primitives
 
Comparitive Analysis of Algorithm strategies
Comparitive Analysis of Algorithm strategiesComparitive Analysis of Algorithm strategies
Comparitive Analysis of Algorithm strategies
 
Vc dimension in Machine Learning
Vc dimension in Machine LearningVc dimension in Machine Learning
Vc dimension in Machine Learning
 
Implementation of K-Nearest Neighbor Algorithm
Implementation of K-Nearest Neighbor AlgorithmImplementation of K-Nearest Neighbor Algorithm
Implementation of K-Nearest Neighbor Algorithm
 
Chapter 16
Chapter 16Chapter 16
Chapter 16
 
Ot regularization and_gradient_descent
Ot regularization and_gradient_descentOt regularization and_gradient_descent
Ot regularization and_gradient_descent
 
Machine learning
Machine learningMachine learning
Machine learning
 
IRJET- Unabridged Review of Supervised Machine Learning Regression and Classi...
IRJET- Unabridged Review of Supervised Machine Learning Regression and Classi...IRJET- Unabridged Review of Supervised Machine Learning Regression and Classi...
IRJET- Unabridged Review of Supervised Machine Learning Regression and Classi...
 
InfoGAN and Generative Adversarial Networks
InfoGAN and Generative Adversarial NetworksInfoGAN and Generative Adversarial Networks
InfoGAN and Generative Adversarial Networks
 
Beginners Guide to Non-Negative Matrix Factorization
Beginners Guide to Non-Negative Matrix FactorizationBeginners Guide to Non-Negative Matrix Factorization
Beginners Guide to Non-Negative Matrix Factorization
 
Matlab solved problems
Matlab solved problemsMatlab solved problems
Matlab solved problems
 

Viewers also liked

Viewers also liked (16)

Resume Krushi
Resume KrushiResume Krushi
Resume Krushi
 
Impact of promotional Activities on selling HFFC products
Impact of promotional Activities on selling HFFC productsImpact of promotional Activities on selling HFFC products
Impact of promotional Activities on selling HFFC products
 
Resume2014.docx
Resume2014.docxResume2014.docx
Resume2014.docx
 
Richard Avedon
Richard AvedonRichard Avedon
Richard Avedon
 
Resume Vanessa Vaughter Dir of Ed
Resume Vanessa Vaughter Dir of EdResume Vanessa Vaughter Dir of Ed
Resume Vanessa Vaughter Dir of Ed
 
2014 Karles Invitational Conference
2014 Karles Invitational Conference2014 Karles Invitational Conference
2014 Karles Invitational Conference
 
гост р 52906 обор.авиатопливообечпечения
гост р 52906 обор.авиатопливообечпечениягост р 52906 обор.авиатопливообечпечения
гост р 52906 обор.авиатопливообечпечения
 
Change ppt
Change pptChange ppt
Change ppt
 
Cv fernnando vasquez
Cv fernnando vasquezCv fernnando vasquez
Cv fernnando vasquez
 
Resume - Sunny Verma - 2
Resume - Sunny Verma - 2Resume - Sunny Verma - 2
Resume - Sunny Verma - 2
 
(1) sustainable land use
(1) sustainable land use(1) sustainable land use
(1) sustainable land use
 
PROSTHO CONFERENCE
PROSTHO CONFERENCEPROSTHO CONFERENCE
PROSTHO CONFERENCE
 
Portfolio
PortfolioPortfolio
Portfolio
 
Tics consulta
Tics consultaTics consulta
Tics consulta
 
SOF and GPF Integration
SOF and GPF IntegrationSOF and GPF Integration
SOF and GPF Integration
 
OECD_PortugalKeynote_SatelliteTechnology_R1r00
OECD_PortugalKeynote_SatelliteTechnology_R1r00OECD_PortugalKeynote_SatelliteTechnology_R1r00
OECD_PortugalKeynote_SatelliteTechnology_R1r00
 

Similar to Ai_Project_report

Essentials of machine learning algorithms
Essentials of machine learning algorithmsEssentials of machine learning algorithms
Essentials of machine learning algorithmsArunangsu Sahu
 
Machine learning (5)
Machine learning (5)Machine learning (5)
Machine learning (5)NYversity
 
CS229 Machine Learning Lecture Notes
CS229 Machine Learning Lecture NotesCS229 Machine Learning Lecture Notes
CS229 Machine Learning Lecture NotesEric Conner
 
Machine learning session4(linear regression)
Machine learning   session4(linear regression)Machine learning   session4(linear regression)
Machine learning session4(linear regression)Abhimanyu Dwivedi
 
Linear Regression (Machine Learning)
Linear Regression (Machine Learning)Linear Regression (Machine Learning)
Linear Regression (Machine Learning)Omkar Rane
 
Machine learning (1)
Machine learning (1)Machine learning (1)
Machine learning (1)NYversity
 
Python + Tensorflow: how to earn money in the Stock Exchange with Deep Learni...
Python + Tensorflow: how to earn money in the Stock Exchange with Deep Learni...Python + Tensorflow: how to earn money in the Stock Exchange with Deep Learni...
Python + Tensorflow: how to earn money in the Stock Exchange with Deep Learni...ETS Asset Management Factory
 
Transfer Learning for Improving Model Predictions in Highly Configurable Soft...
Transfer Learning for Improving Model Predictions in Highly Configurable Soft...Transfer Learning for Improving Model Predictions in Highly Configurable Soft...
Transfer Learning for Improving Model Predictions in Highly Configurable Soft...Pooyan Jamshidi
 
VCE Unit 01 (2).pptx
VCE Unit 01 (2).pptxVCE Unit 01 (2).pptx
VCE Unit 01 (2).pptxskilljiolms
 
And Then There Are Algorithms - Danilo Poccia - Codemotion Rome 2018
And Then There Are Algorithms - Danilo Poccia - Codemotion Rome 2018And Then There Are Algorithms - Danilo Poccia - Codemotion Rome 2018
And Then There Are Algorithms - Danilo Poccia - Codemotion Rome 2018Codemotion
 
Matching networks for one shot learning
Matching networks for one shot learningMatching networks for one shot learning
Matching networks for one shot learningKazuki Fujikawa
 
Engineering Numerical Analysis-Introduction.pdf
Engineering Numerical Analysis-Introduction.pdfEngineering Numerical Analysis-Introduction.pdf
Engineering Numerical Analysis-Introduction.pdfssuseraae901
 
Supervised Learning.pdf
Supervised Learning.pdfSupervised Learning.pdf
Supervised Learning.pdfgadissaassefa
 
SupportVectorRegression
SupportVectorRegressionSupportVectorRegression
SupportVectorRegressionDaniel K
 

Similar to Ai_Project_report (20)

Essentials of machine learning algorithms
Essentials of machine learning algorithmsEssentials of machine learning algorithms
Essentials of machine learning algorithms
 
Explore ml day 2
Explore ml day 2Explore ml day 2
Explore ml day 2
 
working with python
working with pythonworking with python
working with python
 
Machine learning (5)
Machine learning (5)Machine learning (5)
Machine learning (5)
 
CS229 Machine Learning Lecture Notes
CS229 Machine Learning Lecture NotesCS229 Machine Learning Lecture Notes
CS229 Machine Learning Lecture Notes
 
Machine learning session4(linear regression)
Machine learning   session4(linear regression)Machine learning   session4(linear regression)
Machine learning session4(linear regression)
 
Linear Regression (Machine Learning)
Linear Regression (Machine Learning)Linear Regression (Machine Learning)
Linear Regression (Machine Learning)
 
3ml.pdf
3ml.pdf3ml.pdf
3ml.pdf
 
Neural networks
Neural networksNeural networks
Neural networks
 
large scale Machine learning
large scale Machine learninglarge scale Machine learning
large scale Machine learning
 
Machine learning (1)
Machine learning (1)Machine learning (1)
Machine learning (1)
 
Python + Tensorflow: how to earn money in the Stock Exchange with Deep Learni...
Python + Tensorflow: how to earn money in the Stock Exchange with Deep Learni...Python + Tensorflow: how to earn money in the Stock Exchange with Deep Learni...
Python + Tensorflow: how to earn money in the Stock Exchange with Deep Learni...
 
Kaggle KDD Cup Report
Kaggle KDD Cup ReportKaggle KDD Cup Report
Kaggle KDD Cup Report
 
Transfer Learning for Improving Model Predictions in Highly Configurable Soft...
Transfer Learning for Improving Model Predictions in Highly Configurable Soft...Transfer Learning for Improving Model Predictions in Highly Configurable Soft...
Transfer Learning for Improving Model Predictions in Highly Configurable Soft...
 
VCE Unit 01 (2).pptx
VCE Unit 01 (2).pptxVCE Unit 01 (2).pptx
VCE Unit 01 (2).pptx
 
And Then There Are Algorithms - Danilo Poccia - Codemotion Rome 2018
And Then There Are Algorithms - Danilo Poccia - Codemotion Rome 2018And Then There Are Algorithms - Danilo Poccia - Codemotion Rome 2018
And Then There Are Algorithms - Danilo Poccia - Codemotion Rome 2018
 
Matching networks for one shot learning
Matching networks for one shot learningMatching networks for one shot learning
Matching networks for one shot learning
 
Engineering Numerical Analysis-Introduction.pdf
Engineering Numerical Analysis-Introduction.pdfEngineering Numerical Analysis-Introduction.pdf
Engineering Numerical Analysis-Introduction.pdf
 
Supervised Learning.pdf
Supervised Learning.pdfSupervised Learning.pdf
Supervised Learning.pdf
 
SupportVectorRegression
SupportVectorRegressionSupportVectorRegression
SupportVectorRegression
 

Ai_Project_report

  • 1. Indian Institute of Technology Jodhpur Computer Science of Engineering Sixth Semester (2015-2016) Machine learning(Building and comparing various machine learning models to recognize hand written digits) Team Members:Shrey Maheshwari(ug201314017) :Ravi Prakash Gupta(ug201310027) Mentor:Prof. K.R.Chowdhary 1
  • 2. Contents 1 Introduction 3 2 Theory 5 3 Implementation(Data Structures And Algorithms) 7 4 Application 10 5 Result 11 6 Conclusion 12 2
  • 3. 1 Introduction The data file contains grayscale images of handdrawn digits, from zero through nine. Each image is 16 pixels in height and 16 pixels in width, for a total of 256 pixels in total. Each pixel has a single pixelvalue associated with it, indicating the lightness or darkness of that pixel.Each image is 8bit depth single channel so this pixelvalue is an integer between 0 and 255, inclusive. We have modified it in the following way value=1 if pixel value >127 value =0 otherwise Previously each pixel value was taking 8 bits. But now each pixel value is taking 1 bit only. So 1 image is taking 256 bits only. The data set, (train.csv), has 266 columns. The first 256 columns are pixel values associated and other 10 indicate the label i.e. the digit that was drawn by the user. We divided our data into 2 sets 1. Training data which comprises of 80 % of the data. 2.Test data which comprises of 20% of the data. Figure 1: Data. Figure 1 shows the data. The test data set, (test.csv), is the same as the training set, except that it does not contain the ”label” column. 3
  • 4. Figure 2: Visualization of data Classification is a process of assigning new data to a category based on training data in known categories. In this paper, we use a number of human identified digit images split into training and test set. A classifier learns on training images and labels and produces output based on test images. Output is then compared to test labels to evaluate the classification performance. A good classifier should be able to learn on the training data but maintain the generalization property to be accurate when identifying the test set. 4
  • 5. 2 Theory The given problem falls under the category of Supervised Learning. Su- pervised learning is the machine learning task of inferring a function from supervised training data. The training data consist of a set of training ex- amples. In supervised learning, each example is a pair consisting of an input object (typically a vector) and a desired output value (also called the su- pervisory signal).Our problem is basically a multiclass classification problem . To solve this problem we used Logistic Regression. logistic regression is a regression model where the dependent variable (DV) is categorical. The logistic function is defined as follows: σ(t) = et 1 + e−t Figure 3: Logistic Function. Figure 3 shows the Logistic Function. 5
  • 6. The range of logistic function is [0,1] So our prediction will fall in [0,1] which indicates the probability that output is that number of which logistic regression is applied.And then our final answer will be the index value at which the probability is maximum. We applied gradient descent algorithm as it is simple to implement. Gradient descent is a first-order optimization algorithm. To find a local minimum of a function using gradient descent, one takes steps proportional to the negative of the gradient(or of the approximate gradient) of the function at the current point. If instead one takes steps proportional to the positive of the gradient, one approaches a local maximum of that function; the procedure is then known as gradient ascent. We used around 500 iterations to reach to a saturation state after which error was not decreasing much.We plotted error vs iterations curve to ensure that the error is always decreasing.If it had not been the case then we would have reduced our learning rate. We used Regularization to ensure that our model do not overfit the training data. Regularization, in mathematics and statistics and particularly in the fields of machine learning and inverse problems, refers to a process of introducing additional information in order to solve an ill- posed problem or to preventoverfitting. In general, a regularization term R(f) is introduced to a general loss function: x min n i=1 V (f(ˆxi), ˆyi) + λR(f) for a loss function V that describes the cost of predicting f(x) when the label is y , such as the square loss or hinge loss, and for the term λ which controls the importance of the regularization term. R(f) is typically a penalty on the complexity of f , such as restrictions for smoothness or bounds on the vector space norm. There are 10 labels but logictic regression is binary classifier so we need to train 10 logistic regression in binary classifier so we need to 10 logistic classifier . Then we applied one vs all method to finally choose the final answer. The one-vs.-all strategy involves training a single classifier per class, with the samples of that class as positive samples and all other samples as neg- atives. This strategy requires the base classifiers to produce a realvalued confidence score for its decision, rather than just a class label; discrete class labels alone can lead to ambiguities, where multiple classes are predicted for a single sample. 6
  • 7. 3 Implementation(Data Structures And Al- gorithms) Firstly we initialized our learning parameter which we denoted by theta as all zeros.Our hypothesis was sigmoid of Xθ where X is training data and theta is learned parameter.We used sigmoid function because it gives output between 0 and 1.We wanted output of hypothesis between 0 and 1 because this is multiclass problem which we are converting into 10 binary class classifier. Sigmoid function looks like- S(t) = 1 1 + e−t Its range is [0,1]. We defined our cost function which shows the error between our prediction and actual value as J (θ) = 1 m [ m i=1 y(i) logθ(x(i) ) + (1 − y(i) ) log(1 − hθ(x(i) ))] It is a convex function so the problem of converging at local optima will not come into picture. The initial value of error was array([ 0.69314718, 0.69314718, 0.69314718, 0.69314718, 0.69314718, 0.69314718, 0.69314718, 0.69314718, 0.69314718, 0.69314718]) We initialized learning rate with some value,initially high value was cre- ating problem so we decreased the value until the problem was solved.To get the best value of learning rate is hit and trial. Then we implemented the loop in which we updated our learning parameter as- θj := θj − α 1 m m i=1 (hθ(x(i) ) − y(i) )x (i) j In every iteration we stored the value of cost error to ensure that it is decreas- ing with every iteration.Initially in the project there was a problem that the value of cost error was not decreasing continuously, it was increasing some- times in between. So we found out that the value of learning rate is high due to which it was overshooting. So we decreased the value of learning rate and set it to maximum value at which cost error function was not overshooting. 7
  • 8. Then we plotted cost error vs no. of iterations graph to ensure that the func- tion is strictly decreasing and also to see that have we achieved saturation in the error. So we recognized that after 200 iterations the error was not still saturated so we increased the number of iterations.We achieved best possible number of iterations by hit and trial and finally decided to keep it 500. Figure 4: Cost Error Function. Figure 4 shows the Cost Error Function. Our hypothesis is a linear model which is x1θ1 + x2θ2 + x3θ3 + x4θ4........ Then we predicted output on test data by multiplying matrix of test data by learned parameters. We obtained 10 output for every fresh example.Output of every test example is 10x1 row. The value at each index indicates the probability of that test example being that index. So we choose index with maximum value as our final answer. In sensitive applications like cheque reading in banks we cant afford to make a single mistake so there is a different approach for it.If all the 10 values 8
  • 9. of our prediction are less than some threshold(say 0.7) i.e. none of the model is much confident about the prediction, we can output not able to recognize so that the case is handled manually.Then we tried different combinations of learning rate and number of iterations to achieve the best accuracy on test data. We finally set the value of number of iterations to 500 and learning rate to 0.5 . Data structures arrays and matrices, list We used matrix as data structures to store training data,test data and learned parameters.It was best data structure to use because we need to take transpose of data, sometimes we need to add rows,coloumns and remove rows,coloumns .It was also very easy to perform matrix multiplication and we saved a lot of time by doing implementing vectorization instead of loops which was only possible with the help of matrix data structure in numpy module. We used list data structure of python to store the values of cost error in every iterations, it was easy to append the values in the list. We used arrays(single dimension matrix) to plot the graphs and contain some other useful information. We used Matplotlib package to plot graph.We provided values for the x-axis and plot the cost error values at those x-axis values and joined them.We obtained a decreasing curve which saturated after some value of x. def sigmoid(z): a = np. exp(−z) a = 1 + a a = 1 a return a 9
  • 10. 4 Application It has wide applications .It can be used in banks for amount reading although it very sensitive because we cant afford a single mistake so we should add a new feature in our model that if the confidence is low in classifying then it should give output as not able to recognize and that case should be han- dled manually.It can further be extended to character recognition of various languages.It can be used in post offices for postal code reading,there it will reduce the workload significantlly and make the process faster. The same project can be further extended to read telephone numbers . In that case we first need to separate different digits and then recognize them individually.This can also convert a handwritten document to digital document which you can edit. This can be useful for the applications which translate the sentence from one language to another .Those applications can only work if the input is keyboard written so handwritten character recog- nition system can recognize those characters and then provide input to the translator application. 10
  • 11. 5 Result Figure 5: Table of accuacy obtained. Figure 5 shows table of accuracy obtained. 11
  • 12. 6 Conclusion In this paper, a method to increase handwritten digits recognition rates by combining feature extractions methods is proposed. Experimental results showed that complementary features can significantly improve recognition performance. The proposed concavity feature extraction method in conjunc- tion with gradient features gave the highest recognition accuracy in majority of experiments. The method worked well with chaincode features as well, being one out of two top performers. It also has the lowest feature count among observed complementary features, which lowers computational cost of classification. Experiments using reduced training sets showed that the proposed concavity method outperforms other observed approaches making it useful for applications requiring use of a small training set. Adding training instances from another dataset reflected on the recognition accuracy differ- ently for different datasets. Accuracy was increased on two datasets and decreased on one, indicating that learning process is sensitive to small dif- ferences in image retrieval and preprocessing. Overall, the proposed method achieved the best performance. 12
  • 13. References [1] Recognizing Handwritten Digits Using Mixtures of Linear Models Ge- offrey E Hinton Michael Revow Peter Dayan Department of Computer Science, University of Toronto Toronto, Ontario, Canada M5S 1A4. [2] Representation and Recognition of Handwritten Digits Using Deformable Templates Anil K. Jain, Fellow, IEEE, and Douglas Zongke [3] Comparison of Learning Algorithms For Handwritten digit recogni- tion Y.LeCun,L.Jackel,L.Bottou,C.Cortes Bell Laboratories,Holmdel,NJ 07733, USA [4] Handwritten Digit Recognition using DCT and HMMs Syed Salman Ali, Muhammad Usman Ghani Lahore, Pakistan [5] Neocognitron for handwritten digit recognition Kunihiko Fukushima Tokyo University of Technology, 14041, Ktakura, Hachioji, Tokyo 1920982, Japan [6] R. P. W. Duin, The combining classifier: to train or not to train?, in Pattern Recognition, 2002. Proceedings. 16th International Conference on, 2002, vol. 2, pp. 765770 vol.2. [7] Josef Kittler, Mohamad Hatef, Robert P. W. Duin, and Jiri Matas, On combining classifiers, IEEE Transactions on Pattern Analysis and Ma- chine Intelligence, vol. 20, pp. 226239,1998 [8] M. Riedmiller and H. Braun, A direct adaptive method for faster back- propagation learning: The RPROP algorithm,International Conference on Neural Networks, pp. 586591,1993. [9] Y.C. Chim, A. A. Kassim, and Y. Ibrahim, Dual classifier system for handprinted alphanumeric character recognition, Pattern Analysis and Application, , no. 1, pp. 155162, 1998. 13