SlideShare a Scribd company logo
1 of 17
Download to read offline
1
A STUDY ON
THE USE OF DIFFERENT CLASSIFIERS
FOR MACHINE LEARNING
PROJECT REPORT
Submitted by
AMAN SONI
B.TECH PART-II
in
ELECTRONICS ENGINEERING
Indian Institute of Technology (B.H.U.)
Varanasi – 221005
JUNE 2016
2
ACKNOWLEDGEMENT
The internship opportunity I had in Indian School of Mines (I.S.M.)
Dhanbad, was a great chance for learning and professional
development. Therefore, I consider myself as a very lucky individual as I
was provided with an opportunity to be a part of it. I am also grateful for
having a chance to meet so many wonderful people and professionals
who led me though this internship period.
I am using this opportunity to express my deepest gratitude and special
thanks to Dr. Haider Banka, who in spite of being busy with his duties,
took time out to hear, guide and keep me on the correct path.
I express my deepest thanks to Prof. Debjani Mitra, for taking part in
useful decision and giving necessary advices and guidance and arranged
all facilities to make life easier. I choose this moment to acknowledge her
contribution gratefully.
I perceive this opportunity as a big milestone in my career development.
I will strive to use gained skills and knowledge in the best possible way,
and will continue to work on their improvement, in order to attain
desired career objectives.
Sincerely,
AMAN SONI
I.S.M. DHANBAD
21/06/2016
3
CERTIFICATE
This is to certify that Mr. Aman Soni, S/O Mr. Rakesh Soni, Electronics
Engineering Undergraduate from Indian Institute of Technology (B.H.U.)
Varanasi, successfully completed four week (From 16th May, 2016 to 10th
June, 2016) internship at Indian School of Mines (I.S.M.) Dhanbad. During
the period of his internship programme with us, he was found punctual,
hardworking and inquisitive.
We wish him every success in life.
_____________________________________
Prof. Debjani Mitra
H.O.D. Electronics Engineering
I.S.M. Dhanbad
4
CONTENTS
1. INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2. SHORT DESCRIPTION OF CLASSIFIERS
I. LINEAR REGRESSION . . . . . . . . . . . . . . . . . . . . . . . . . .6
II. LOGISTIC REGRESSION . . . . . . . . . . . . . . . . . . . . . . . .7
III. NEURAL NETWORK . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
IV. SUPPORT VECTOR MACHINE . . . . . . . . . . . . . . . . . . 9
3. USING DIFFERENT CLASSIFIERS
I. CLASSIFICATION DATA . . . . . . . . . . . . . . . . . . . . . . 10
II. REGRESSION DATA . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4. CONCLUSION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .16
5. REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .17
5
INTRODUCTION
Machine learning has become an indispensible tool today. Be it the face
recognition in Facebook, or movie recommendations of Netflix, machine
learning is now-a-days used as a tool everywhere. Since large number of
websites are using this tool for various purposes, its accuracy becomes of
prime importance. For instance, machine learning comes in handy, while
recommending products to a user, based on his previous purchases.
Now, for the company to increase its sale, it is very important to
recommend relevant products to a particular customer, so that neither
he misses on great products, nor does he feel continuously annoyed by
irrelevant products being recommended to him. Similarly, machine
learning finds its application in speech-to-text conversion in various
personal assistants such as Siri offered by Apple, and Cortana offered by
Windows. Now the liking of a user towards a particular brand, depends
on how good or accurate is the speech-to-text conversion. So in this mini-
project, I have tried to study various classifiers, and train various data
samples using this classifier to obtain maximum accuracy. The data
consists of both classification and regression. In classification problems,
the data is classified into 2 or more classes, whereas in regression, the
data consists of numerical outcomes, such as house-rate prediction. In all
these training data, I have tried to train different classifiers to fit the
training data, and then obtain the accuracy on a different test dataset.
Then I tried to vary the different parameters of the classifier, and noted
its influence on the accuracy. Finally, I tried to contrast between the
different classifiers to know which one gives the maximum accuracy on
the given dataset.
6
Short Description Of Various Classifiers Used:
1. Linear Regression
In Linear regression, the outcome is a number which can take on
different values. Given a training data such as housing data, where
given the various parameters of a house such as the size in sq feet, our
task is to predict the price of the house, by learning from the given
data. The various features of the house will be denoted by variable x,
the number of training examples will be denoted by m, and the
corresponding unknown parameter to be learned will be denoted by
variable theta (θ). The same notation will be followed for other
classifiers as well. Now consider the case of house-rate prediction
where only one feature (say size) is given. Then we have the following
scenario:
After minimizing the squared error cost function over the parameters
theta, we will obtain the model that best fits the training data. Then the
new data can be fit into the hypothesis to predict the house price.
7
2. Logistic Regression
Logistic regression is used to classify samples into two different
classes. The hypothesis for logistic regression is defined as:
Where function g is the sigmoid function. The sigmoid function is
defined as:
The cost function in logistic regression is given as:
Again the goal is to minimize the cost function over the unknown
parameters theta. The parameters hence obtained best fit the model,
and produces least possible error. Then the new data is fit into the
hypothesis function which returns a value between 0 to 1, which
determines the probability of the sample to belong to a particular
class. For multiclass classification, we use the method of one-vs-all,
where each class is trained with respect to every other classes, and a
model is fit to it. Thus when a new sample is to be tested, it is fit to the
hypothesis of each of these classes, and the one that returns the
maximum value corresponds to the desired class.
8
3. Neural Network Using Backpropagation
Neural networks are used to train complex non-linear hypothesis to
fit training data more accurately. A typical neural network can be
represented as:
The input layer corresponds to the input features. These are then
converted into more complex features, which are represented by the
hidden layers. Finally, the output layer represents the hypothesis. In
order to determine the theta parameters corresponding to the hidden
layers, the Backpropagation algorithm is used, where the errors is
propagated in the backward direction, just like the features for hidden
layers were calculated by forward propagation, but in reverse
direction. For multiclass classification, the output layer consists of
neurons, equal in number to the classes. The output layer, thus
represents a vector consisting of zeros and ones, where the index
containing one, represents the class of the sample. Thus unlike logistic
regression, there is no need for one-vs-all classification when using
neural network as the classifier.
9
4. Support Vector Machine
Support Vector Machine works in a similar manner as logistic
regression. For instance, we saw that for logistic regression our goal
was:
Where λ is the regularization parameter and n is the number of
features. Similarly for Support Vector Machine, our goal will be:
Where C is a parameter that works similar to inverse of λ, and cost1
and cost0 are functions that are defined almost similar to that used for
logistic regression, as shown below:
Also while using SVM, we can use various kernels, which are functions
that modify the input features, before using them for training the
classifier. For example there is linear kernel, Gaussian kernel,
polynomial kernel, and many more. That effect of using these kernels
on training accuracy, has been noted during the project.
10
Using Different Classifiers
1. Classification Data
For classification problem, the data consisted of 15 classes of hand
gestures, with more than 20 gestures available for each class. Each
image belonging to a particular class represents a training example.
For extracting the features from an image, the image was first resized
to 25 X 25. Now each pixel of this 25 X 25 matrix can be either 1 or 0
corresponding to white or black pixel. This matrix was converted into
a row vector of dimension 1 X 625. So for each image, we have 625
pixel values, so the number of features corresponding to a training
example is 625. Now for training the model, we split the data such
that we have 225 training data (15 from each of the 15 classes), and
75 cross validation data (5 from each of the 15 class).
A. Using Logistic Regression :
Using Logistic regression, the training accuracy obtained was
87.89 %, using one-vs-all method, with 150 iteration each, for
gradient descent. As the number of features were greater
than the number of training examples, so increasing the
number of training examples, slightly increased the accuracy,
by reducing the problem of overfitting. Also the accuracy
improved by using regularization, with the accuracy reaching
89.6% using λ=1, again as a consequence of less overfitting.
To speed up the algorithm, the learning rate was initially set
to 0.001 and then slowly increased by 5% as the cost function
starts converging.
11
B. Using Support Vector Machine :
Using Support Vector Machine, the accuracy obtained with
linear kernel is 90% and with gaussian kernel is 82.06%.
Again the results follow, as the number of features here is
quite large as compared to the training example, and so using
more complex features from the ones given lead to
overfitting. However by reducing the parameter C of the SVM,
the results with gaussian kernel matched with the linear
kernel, since by reducing the value of C, we are increasing the
regularization parameter, and so avoiding overfitting. Also
reducing the parameter C with linear kernel after a certain
value, started showing a decrease in accuracy due to
underfitting. In this case, using more training samples didn't
help, as the model was underfitting. However with the value
of C set to 1, using more training examples slightly increased
the accuracy. Also all these cases showed a decline in
accuracy when the number of features were increased from
625 to 2500 (i.e. the image was resized to 50 X 50). Again the
results follow, as now the number of features is very large in
comparison to training examples and the model is very much
prone to overfitting.
We find that since in this case we are using the linear kernel
in SVM, the results of SVM are almost similar to that of
regularized logistic regression. However when the number of
features was reduced to 225 (i.e. by resizing the image to 15 X
15), then using SVM with gaussian kernel proved to be a
better alternative then using logistic regression, as using
SVM, we got to learn more complex features through kernel
functions.
12
C. Using Backpropagation:
Using Backpropagation, the accuracy obtained was 84.74 %.
This was the case when 1 hidden layer was used and the
number of neurons in the hidden layer was 100. When the
number of neurons in the hidden layer was reduced to 10, the
accuracy decreased to less than 50%. This was the result of
underfitting, as the model was unable to represent the
complex relationship between the input and the output using
just 10 neurons in the hidden layer. On the other hand, when
the number of neurons in the hidden layer was increased to
115, the accuracy became 82.63 % which shows that the
model has started overfitting the data. This can be reduced by
introducing the regularization parameter.
Overall, we find that SVM gives best results among these
three classifiers and also is computationally less expensive
than backpropagation. On the other hand, with
backpropagation, we have to play with the number of hidden
layers, and the number of neurons in each hidden layer, to
obtain best possible results. Also increasing the number of
hidden layers and neurons, makes the computation quite
slow, and needs more memory.
Thus using these facts about the various classifiers, one can
move in the right direction to increase the accuracy and
performance of the classifier, without unnecessarily wasting
time in gathering more training data, or computing more
features, even when they will not contribute towards
increasing the accuracy.
13
2. Regression Data:
The regression data is for house rate prediction, with 13 features.
The number of available samples is 506. The
accuracy/performance of a particular classifier for a regression
problem is obtained by calculating the squared error on the test
data set. Following classifiers were used to fit the training data:
A. Linear Regression:
For training the data using linear regression, the data was
first split into training and test dataset, with 350 examples in
the training data, and 156 samples for test dataset. The
features varied a lot in values, for example some of the
features are binary, having only 0 or 1 as its value, where
others had relatively large values such as size(in sq feet) of
the house. So, feature scaling was performed before training
the data, by subtracting each of the feature, by the mean value
of the feature, and then dividing it by the range of the feature
(i.e. the difference between the maximum and minimum
value of that feature). After this, the data obtained were
randomly shuffled, and then gradient descent applied to
minimize the squared error cost function. Setting the
learning rate alpha to 0.001, the squared error obtained on
the test set after 3000 iterations was equal to 20.2396.
Now the number of training samples were increased from
350 to 450. This lead to the square error on test set equal to
24.7069. The result follows, as the number of training
samples are far more than the number of features, and the
data is suffering from underfitting.
14
B. Using Support Vector Machine:
Using SVM, with linear kernel and regularization parameter C
equal to 1 gave a relatively large square error on test dataset
of 94.7264. This follows, as both the use of linear kernel and
setting high regularization parameter lead to high bias.
However using gaussian kernel with regularization
parameter C set to 60000 gave a very high accuracy, with
squared error on test set equal to 12.0428 only. This shows
that by minimizing the regularization and using gaussian
kernel, we were able to create complex features from the 13
given features, and thus we were able to fit a more
appropriate model to the training data. Also, it is intuitive
that after a certain value, the increase in value of C will lead to
overfitting, as was evident here. When the value of C was set
to 6000000, the squared error increased slightly to 15.2026.
So, unlike the case for the previous classification problem
where the number of features were far greater than the
number of training samples available, here the number of
features is very small. So we need to add polynomial feature
terms to fit the training data more completely. As we found
above, one way was to reduce the regularization parameter,
and the other was to use a kernel function. Apart from this,
we could manually gather more features.
15
C. Using Backpropagation:
As it has become evident from the above classifiers, we need
to gather more and more features. So using a single hidden
layer with 13 neurons gave a training error of 23.0929 and a
test error of 57.4974 whereas using a single hidden layer
with 100 neurons gave a training error of 1.5606 and a test
error of 43.7274.
16
CONCLUSION
From the above analysis of different classifiers, it is clear
that different training samples require different treatment
for optimizing the accuracy of the classifier. For example for
a training data that suffers from high bias, it would be quite
useless to obtain more training samples, which would also
lead to wastage of time in gathering more data or manually
creating more. On the other a proper utilization of time
would be to analyze the current features to create more and
more features derived from existing features or to add
polynomial features, to fit complex non-linear functions to
the hypothesis. Conversely, from a data suffering from high
variance, obtaining more training samples would be a wise
choice to overcome the problem of overfitting. These
important decisions help in determining the right way to
proceed to increase the accuracy of our classifier.
17
REFERENCES
1. UCI Machine learning repository
https://archive.ics.uci.edu/ml/datasets/Housing
2. Machine Learning Course offered by Stanford
University on Coursera Platform
https://www.coursera.org/learn/machine-learning
3. Data Mining, Practical Machine Learning Tools and
Techniques
By Ian H.Witten, Eibe Frank, Mark A. Hall
4. Neural Networks, Algorithms, Applications, and
Programming Techniques
By James A. Freeman, David M. Skapura
5. LIBSVM -- A Library for Support Vector Machines
Chih-Chung Chang and Chih-Jen Lin
https://www.csie.ntu.edu.tw/~cjlin/libsvm

More Related Content

What's hot

sentiment analysis using support vector machine
sentiment analysis using support vector machinesentiment analysis using support vector machine
sentiment analysis using support vector machineShital Andhale
 
Machine learning session9(clustering)
Machine learning   session9(clustering)Machine learning   session9(clustering)
Machine learning session9(clustering)Abhimanyu Dwivedi
 
Machine learning Algorithms with a Sagemaker demo
Machine learning Algorithms with a Sagemaker demoMachine learning Algorithms with a Sagemaker demo
Machine learning Algorithms with a Sagemaker demoHridyesh Bisht
 
Machine learning session8(svm nlp)
Machine learning   session8(svm nlp)Machine learning   session8(svm nlp)
Machine learning session8(svm nlp)Abhimanyu Dwivedi
 
Classification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsClassification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsMd. Main Uddin Rony
 
PROFIT AGENT CLASSIFICATION USING FEATURE SELECTION EIGENVECTOR CENTRALITY
PROFIT AGENT CLASSIFICATION USING FEATURE SELECTION EIGENVECTOR CENTRALITYPROFIT AGENT CLASSIFICATION USING FEATURE SELECTION EIGENVECTOR CENTRALITY
PROFIT AGENT CLASSIFICATION USING FEATURE SELECTION EIGENVECTOR CENTRALITYIAEME Publication
 
Catching co occurrence information using word2vec-inspired matrix factorization
Catching co occurrence information using word2vec-inspired matrix factorizationCatching co occurrence information using word2vec-inspired matrix factorization
Catching co occurrence information using word2vec-inspired matrix factorizationhyunsung lee
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
Machine Learning Interview Questions
Machine Learning Interview QuestionsMachine Learning Interview Questions
Machine Learning Interview QuestionsRock Interview
 
Neural collaborative filtering-발표
Neural collaborative filtering-발표Neural collaborative filtering-발표
Neural collaborative filtering-발표hyunsung lee
 
IRJET- Sentimental Analysis for Online Reviews using Machine Learning Algorithms
IRJET- Sentimental Analysis for Online Reviews using Machine Learning AlgorithmsIRJET- Sentimental Analysis for Online Reviews using Machine Learning Algorithms
IRJET- Sentimental Analysis for Online Reviews using Machine Learning AlgorithmsIRJET Journal
 

What's hot (14)

Ga
GaGa
Ga
 
sentiment analysis using support vector machine
sentiment analysis using support vector machinesentiment analysis using support vector machine
sentiment analysis using support vector machine
 
Machine learning session9(clustering)
Machine learning   session9(clustering)Machine learning   session9(clustering)
Machine learning session9(clustering)
 
Machine learning Algorithms with a Sagemaker demo
Machine learning Algorithms with a Sagemaker demoMachine learning Algorithms with a Sagemaker demo
Machine learning Algorithms with a Sagemaker demo
 
Machine learning session8(svm nlp)
Machine learning   session8(svm nlp)Machine learning   session8(svm nlp)
Machine learning session8(svm nlp)
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Classification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsClassification Based Machine Learning Algorithms
Classification Based Machine Learning Algorithms
 
PROFIT AGENT CLASSIFICATION USING FEATURE SELECTION EIGENVECTOR CENTRALITY
PROFIT AGENT CLASSIFICATION USING FEATURE SELECTION EIGENVECTOR CENTRALITYPROFIT AGENT CLASSIFICATION USING FEATURE SELECTION EIGENVECTOR CENTRALITY
PROFIT AGENT CLASSIFICATION USING FEATURE SELECTION EIGENVECTOR CENTRALITY
 
Catching co occurrence information using word2vec-inspired matrix factorization
Catching co occurrence information using word2vec-inspired matrix factorizationCatching co occurrence information using word2vec-inspired matrix factorization
Catching co occurrence information using word2vec-inspired matrix factorization
 
IEEE
IEEEIEEE
IEEE
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 
Machine Learning Interview Questions
Machine Learning Interview QuestionsMachine Learning Interview Questions
Machine Learning Interview Questions
 
Neural collaborative filtering-발표
Neural collaborative filtering-발표Neural collaborative filtering-발표
Neural collaborative filtering-발표
 
IRJET- Sentimental Analysis for Online Reviews using Machine Learning Algorithms
IRJET- Sentimental Analysis for Online Reviews using Machine Learning AlgorithmsIRJET- Sentimental Analysis for Online Reviews using Machine Learning Algorithms
IRJET- Sentimental Analysis for Online Reviews using Machine Learning Algorithms
 

Viewers also liked

EXPLORATORY PROJECT
EXPLORATORY PROJECTEXPLORATORY PROJECT
EXPLORATORY PROJECTAman Soni
 
Hand Segmentation for Hand Gesture Recognition
Hand Segmentation for Hand Gesture RecognitionHand Segmentation for Hand Gesture Recognition
Hand Segmentation for Hand Gesture RecognitionAM Publications,India
 
RGB colour detection and tracking on MATLAB
RGB colour detection and tracking on MATLABRGB colour detection and tracking on MATLAB
RGB colour detection and tracking on MATLABNirma University
 
Image segmentation
Image segmentationImage segmentation
Image segmentationMukul Jindal
 
Color based image processing , tracking and automation using matlab
Color based image processing , tracking and automation using matlabColor based image processing , tracking and automation using matlab
Color based image processing , tracking and automation using matlabKamal Pradhan
 
Image segmentation
Image segmentationImage segmentation
Image segmentationDeepak Kumar
 
Matlab source codes section | Download MATLAB source code freerce-codes
Matlab source codes section | Download MATLAB source code freerce-codesMatlab source codes section | Download MATLAB source code freerce-codes
Matlab source codes section | Download MATLAB source code freerce-codeshafsabanu
 
Image proceesing with matlab
Image proceesing with matlabImage proceesing with matlab
Image proceesing with matlabAshutosh Shahi
 
Basics of Image Processing using MATLAB
Basics of Image Processing using MATLABBasics of Image Processing using MATLAB
Basics of Image Processing using MATLABvkn13
 
Image segmentation ppt
Image segmentation pptImage segmentation ppt
Image segmentation pptGichelle Amon
 
An Automated Method for Segmentation of the Hand In Sign Language
An Automated Method for Segmentation of the Hand In  Sign LanguageAn Automated Method for Segmentation of the Hand In  Sign Language
An Automated Method for Segmentation of the Hand In Sign LanguageIJMER
 
Active contour segmentation
Active contour segmentationActive contour segmentation
Active contour segmentationNishant Jain
 

Viewers also liked (13)

EXPLORATORY PROJECT
EXPLORATORY PROJECTEXPLORATORY PROJECT
EXPLORATORY PROJECT
 
Hand Segmentation for Hand Gesture Recognition
Hand Segmentation for Hand Gesture RecognitionHand Segmentation for Hand Gesture Recognition
Hand Segmentation for Hand Gesture Recognition
 
IMAGE PROCESSING
IMAGE PROCESSINGIMAGE PROCESSING
IMAGE PROCESSING
 
RGB colour detection and tracking on MATLAB
RGB colour detection and tracking on MATLABRGB colour detection and tracking on MATLAB
RGB colour detection and tracking on MATLAB
 
Image segmentation
Image segmentationImage segmentation
Image segmentation
 
Color based image processing , tracking and automation using matlab
Color based image processing , tracking and automation using matlabColor based image processing , tracking and automation using matlab
Color based image processing , tracking and automation using matlab
 
Image segmentation
Image segmentationImage segmentation
Image segmentation
 
Matlab source codes section | Download MATLAB source code freerce-codes
Matlab source codes section | Download MATLAB source code freerce-codesMatlab source codes section | Download MATLAB source code freerce-codes
Matlab source codes section | Download MATLAB source code freerce-codes
 
Image proceesing with matlab
Image proceesing with matlabImage proceesing with matlab
Image proceesing with matlab
 
Basics of Image Processing using MATLAB
Basics of Image Processing using MATLABBasics of Image Processing using MATLAB
Basics of Image Processing using MATLAB
 
Image segmentation ppt
Image segmentation pptImage segmentation ppt
Image segmentation ppt
 
An Automated Method for Segmentation of the Hand In Sign Language
An Automated Method for Segmentation of the Hand In  Sign LanguageAn Automated Method for Segmentation of the Hand In  Sign Language
An Automated Method for Segmentation of the Hand In Sign Language
 
Active contour segmentation
Active contour segmentationActive contour segmentation
Active contour segmentation
 

Similar to Final Report

Performance Comparision of Machine Learning Algorithms
Performance Comparision of Machine Learning AlgorithmsPerformance Comparision of Machine Learning Algorithms
Performance Comparision of Machine Learning AlgorithmsDinusha Dilanka
 
House Price Estimation as a Function Fitting Problem with using ANN Approach
House Price Estimation as a Function Fitting Problem with using ANN ApproachHouse Price Estimation as a Function Fitting Problem with using ANN Approach
House Price Estimation as a Function Fitting Problem with using ANN ApproachYusuf Uzun
 
Understanding the Applicability of Linear & Non-Linear Models Using a Case-Ba...
Understanding the Applicability of Linear & Non-Linear Models Using a Case-Ba...Understanding the Applicability of Linear & Non-Linear Models Using a Case-Ba...
Understanding the Applicability of Linear & Non-Linear Models Using a Case-Ba...ijaia
 
House price prediction
House price predictionHouse price prediction
House price predictionSabahBegum
 
Sign Detection from Hearing Impaired
Sign Detection from Hearing ImpairedSign Detection from Hearing Impaired
Sign Detection from Hearing ImpairedIRJET Journal
 
Machine Learning and Deep Learning from Foundations to Applications Excel, R,...
Machine Learning and Deep Learning from Foundations to Applications Excel, R,...Machine Learning and Deep Learning from Foundations to Applications Excel, R,...
Machine Learning and Deep Learning from Foundations to Applications Excel, R,...Narendra Ashar
 
Machine learning interview questions and answers
Machine learning interview questions and answersMachine learning interview questions and answers
Machine learning interview questions and answerskavinilavuG
 
IRJET- Survey on Face Recognition using Biometrics
IRJET-  	  Survey on Face Recognition using BiometricsIRJET-  	  Survey on Face Recognition using Biometrics
IRJET- Survey on Face Recognition using BiometricsIRJET Journal
 
THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...
THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...
THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...IJCSES Journal
 
THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...
THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...
THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...ijcseit
 
Essentials of machine learning algorithms
Essentials of machine learning algorithmsEssentials of machine learning algorithms
Essentials of machine learning algorithmsArunangsu Sahu
 
TEXT GENERATION WITH GAN NETWORKS USING FEEDBACK SCORE
TEXT GENERATION WITH GAN NETWORKS USING FEEDBACK SCORETEXT GENERATION WITH GAN NETWORKS USING FEEDBACK SCORE
TEXT GENERATION WITH GAN NETWORKS USING FEEDBACK SCOREIJCI JOURNAL
 
EssentialsOfMachineLearning.pdf
EssentialsOfMachineLearning.pdfEssentialsOfMachineLearning.pdf
EssentialsOfMachineLearning.pdfAnkita Tiwari
 
IRJET- Machine Learning: Survey, Types and Challenges
IRJET- Machine Learning: Survey, Types and ChallengesIRJET- Machine Learning: Survey, Types and Challenges
IRJET- Machine Learning: Survey, Types and ChallengesIRJET Journal
 
IRJET- American Sign Language Classification
IRJET- American Sign Language ClassificationIRJET- American Sign Language Classification
IRJET- American Sign Language ClassificationIRJET Journal
 
How to understand and implement regression analysis
How to understand and implement regression analysisHow to understand and implement regression analysis
How to understand and implement regression analysisClaireWhittaker5
 
Iterative Determinant Method for Solving Eigenvalue Problems
Iterative Determinant Method for Solving Eigenvalue ProblemsIterative Determinant Method for Solving Eigenvalue Problems
Iterative Determinant Method for Solving Eigenvalue Problemsijceronline
 
Application_of_Deep_Learning_Techniques.pptx
Application_of_Deep_Learning_Techniques.pptxApplication_of_Deep_Learning_Techniques.pptx
Application_of_Deep_Learning_Techniques.pptxKiranKumar918931
 
Caravan insurance data mining prediction models
Caravan insurance data mining prediction modelsCaravan insurance data mining prediction models
Caravan insurance data mining prediction modelsMuthu Kumaar Thangavelu
 

Similar to Final Report (20)

Performance Comparision of Machine Learning Algorithms
Performance Comparision of Machine Learning AlgorithmsPerformance Comparision of Machine Learning Algorithms
Performance Comparision of Machine Learning Algorithms
 
House Price Estimation as a Function Fitting Problem with using ANN Approach
House Price Estimation as a Function Fitting Problem with using ANN ApproachHouse Price Estimation as a Function Fitting Problem with using ANN Approach
House Price Estimation as a Function Fitting Problem with using ANN Approach
 
Understanding the Applicability of Linear & Non-Linear Models Using a Case-Ba...
Understanding the Applicability of Linear & Non-Linear Models Using a Case-Ba...Understanding the Applicability of Linear & Non-Linear Models Using a Case-Ba...
Understanding the Applicability of Linear & Non-Linear Models Using a Case-Ba...
 
House price prediction
House price predictionHouse price prediction
House price prediction
 
Sign Detection from Hearing Impaired
Sign Detection from Hearing ImpairedSign Detection from Hearing Impaired
Sign Detection from Hearing Impaired
 
Machine Learning and Deep Learning from Foundations to Applications Excel, R,...
Machine Learning and Deep Learning from Foundations to Applications Excel, R,...Machine Learning and Deep Learning from Foundations to Applications Excel, R,...
Machine Learning and Deep Learning from Foundations to Applications Excel, R,...
 
Machine learning interview questions and answers
Machine learning interview questions and answersMachine learning interview questions and answers
Machine learning interview questions and answers
 
IRJET- Survey on Face Recognition using Biometrics
IRJET-  	  Survey on Face Recognition using BiometricsIRJET-  	  Survey on Face Recognition using Biometrics
IRJET- Survey on Face Recognition using Biometrics
 
THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...
THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...
THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...
 
THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...
THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...
THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...
 
Essentials of machine learning algorithms
Essentials of machine learning algorithmsEssentials of machine learning algorithms
Essentials of machine learning algorithms
 
TEXT GENERATION WITH GAN NETWORKS USING FEEDBACK SCORE
TEXT GENERATION WITH GAN NETWORKS USING FEEDBACK SCORETEXT GENERATION WITH GAN NETWORKS USING FEEDBACK SCORE
TEXT GENERATION WITH GAN NETWORKS USING FEEDBACK SCORE
 
Stock Market Prediction Using ANN
Stock Market Prediction Using ANNStock Market Prediction Using ANN
Stock Market Prediction Using ANN
 
EssentialsOfMachineLearning.pdf
EssentialsOfMachineLearning.pdfEssentialsOfMachineLearning.pdf
EssentialsOfMachineLearning.pdf
 
IRJET- Machine Learning: Survey, Types and Challenges
IRJET- Machine Learning: Survey, Types and ChallengesIRJET- Machine Learning: Survey, Types and Challenges
IRJET- Machine Learning: Survey, Types and Challenges
 
IRJET- American Sign Language Classification
IRJET- American Sign Language ClassificationIRJET- American Sign Language Classification
IRJET- American Sign Language Classification
 
How to understand and implement regression analysis
How to understand and implement regression analysisHow to understand and implement regression analysis
How to understand and implement regression analysis
 
Iterative Determinant Method for Solving Eigenvalue Problems
Iterative Determinant Method for Solving Eigenvalue ProblemsIterative Determinant Method for Solving Eigenvalue Problems
Iterative Determinant Method for Solving Eigenvalue Problems
 
Application_of_Deep_Learning_Techniques.pptx
Application_of_Deep_Learning_Techniques.pptxApplication_of_Deep_Learning_Techniques.pptx
Application_of_Deep_Learning_Techniques.pptx
 
Caravan insurance data mining prediction models
Caravan insurance data mining prediction modelsCaravan insurance data mining prediction models
Caravan insurance data mining prediction models
 

Final Report

  • 1. 1 A STUDY ON THE USE OF DIFFERENT CLASSIFIERS FOR MACHINE LEARNING PROJECT REPORT Submitted by AMAN SONI B.TECH PART-II in ELECTRONICS ENGINEERING Indian Institute of Technology (B.H.U.) Varanasi – 221005 JUNE 2016
  • 2. 2 ACKNOWLEDGEMENT The internship opportunity I had in Indian School of Mines (I.S.M.) Dhanbad, was a great chance for learning and professional development. Therefore, I consider myself as a very lucky individual as I was provided with an opportunity to be a part of it. I am also grateful for having a chance to meet so many wonderful people and professionals who led me though this internship period. I am using this opportunity to express my deepest gratitude and special thanks to Dr. Haider Banka, who in spite of being busy with his duties, took time out to hear, guide and keep me on the correct path. I express my deepest thanks to Prof. Debjani Mitra, for taking part in useful decision and giving necessary advices and guidance and arranged all facilities to make life easier. I choose this moment to acknowledge her contribution gratefully. I perceive this opportunity as a big milestone in my career development. I will strive to use gained skills and knowledge in the best possible way, and will continue to work on their improvement, in order to attain desired career objectives. Sincerely, AMAN SONI I.S.M. DHANBAD 21/06/2016
  • 3. 3 CERTIFICATE This is to certify that Mr. Aman Soni, S/O Mr. Rakesh Soni, Electronics Engineering Undergraduate from Indian Institute of Technology (B.H.U.) Varanasi, successfully completed four week (From 16th May, 2016 to 10th June, 2016) internship at Indian School of Mines (I.S.M.) Dhanbad. During the period of his internship programme with us, he was found punctual, hardworking and inquisitive. We wish him every success in life. _____________________________________ Prof. Debjani Mitra H.O.D. Electronics Engineering I.S.M. Dhanbad
  • 4. 4 CONTENTS 1. INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2. SHORT DESCRIPTION OF CLASSIFIERS I. LINEAR REGRESSION . . . . . . . . . . . . . . . . . . . . . . . . . .6 II. LOGISTIC REGRESSION . . . . . . . . . . . . . . . . . . . . . . . .7 III. NEURAL NETWORK . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 IV. SUPPORT VECTOR MACHINE . . . . . . . . . . . . . . . . . . 9 3. USING DIFFERENT CLASSIFIERS I. CLASSIFICATION DATA . . . . . . . . . . . . . . . . . . . . . . 10 II. REGRESSION DATA . . . . . . . . . . . . . . . . . . . . . . . . . . 13 4. CONCLUSION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .16 5. REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .17
  • 5. 5 INTRODUCTION Machine learning has become an indispensible tool today. Be it the face recognition in Facebook, or movie recommendations of Netflix, machine learning is now-a-days used as a tool everywhere. Since large number of websites are using this tool for various purposes, its accuracy becomes of prime importance. For instance, machine learning comes in handy, while recommending products to a user, based on his previous purchases. Now, for the company to increase its sale, it is very important to recommend relevant products to a particular customer, so that neither he misses on great products, nor does he feel continuously annoyed by irrelevant products being recommended to him. Similarly, machine learning finds its application in speech-to-text conversion in various personal assistants such as Siri offered by Apple, and Cortana offered by Windows. Now the liking of a user towards a particular brand, depends on how good or accurate is the speech-to-text conversion. So in this mini- project, I have tried to study various classifiers, and train various data samples using this classifier to obtain maximum accuracy. The data consists of both classification and regression. In classification problems, the data is classified into 2 or more classes, whereas in regression, the data consists of numerical outcomes, such as house-rate prediction. In all these training data, I have tried to train different classifiers to fit the training data, and then obtain the accuracy on a different test dataset. Then I tried to vary the different parameters of the classifier, and noted its influence on the accuracy. Finally, I tried to contrast between the different classifiers to know which one gives the maximum accuracy on the given dataset.
  • 6. 6 Short Description Of Various Classifiers Used: 1. Linear Regression In Linear regression, the outcome is a number which can take on different values. Given a training data such as housing data, where given the various parameters of a house such as the size in sq feet, our task is to predict the price of the house, by learning from the given data. The various features of the house will be denoted by variable x, the number of training examples will be denoted by m, and the corresponding unknown parameter to be learned will be denoted by variable theta (θ). The same notation will be followed for other classifiers as well. Now consider the case of house-rate prediction where only one feature (say size) is given. Then we have the following scenario: After minimizing the squared error cost function over the parameters theta, we will obtain the model that best fits the training data. Then the new data can be fit into the hypothesis to predict the house price.
  • 7. 7 2. Logistic Regression Logistic regression is used to classify samples into two different classes. The hypothesis for logistic regression is defined as: Where function g is the sigmoid function. The sigmoid function is defined as: The cost function in logistic regression is given as: Again the goal is to minimize the cost function over the unknown parameters theta. The parameters hence obtained best fit the model, and produces least possible error. Then the new data is fit into the hypothesis function which returns a value between 0 to 1, which determines the probability of the sample to belong to a particular class. For multiclass classification, we use the method of one-vs-all, where each class is trained with respect to every other classes, and a model is fit to it. Thus when a new sample is to be tested, it is fit to the hypothesis of each of these classes, and the one that returns the maximum value corresponds to the desired class.
  • 8. 8 3. Neural Network Using Backpropagation Neural networks are used to train complex non-linear hypothesis to fit training data more accurately. A typical neural network can be represented as: The input layer corresponds to the input features. These are then converted into more complex features, which are represented by the hidden layers. Finally, the output layer represents the hypothesis. In order to determine the theta parameters corresponding to the hidden layers, the Backpropagation algorithm is used, where the errors is propagated in the backward direction, just like the features for hidden layers were calculated by forward propagation, but in reverse direction. For multiclass classification, the output layer consists of neurons, equal in number to the classes. The output layer, thus represents a vector consisting of zeros and ones, where the index containing one, represents the class of the sample. Thus unlike logistic regression, there is no need for one-vs-all classification when using neural network as the classifier.
  • 9. 9 4. Support Vector Machine Support Vector Machine works in a similar manner as logistic regression. For instance, we saw that for logistic regression our goal was: Where λ is the regularization parameter and n is the number of features. Similarly for Support Vector Machine, our goal will be: Where C is a parameter that works similar to inverse of λ, and cost1 and cost0 are functions that are defined almost similar to that used for logistic regression, as shown below: Also while using SVM, we can use various kernels, which are functions that modify the input features, before using them for training the classifier. For example there is linear kernel, Gaussian kernel, polynomial kernel, and many more. That effect of using these kernels on training accuracy, has been noted during the project.
  • 10. 10 Using Different Classifiers 1. Classification Data For classification problem, the data consisted of 15 classes of hand gestures, with more than 20 gestures available for each class. Each image belonging to a particular class represents a training example. For extracting the features from an image, the image was first resized to 25 X 25. Now each pixel of this 25 X 25 matrix can be either 1 or 0 corresponding to white or black pixel. This matrix was converted into a row vector of dimension 1 X 625. So for each image, we have 625 pixel values, so the number of features corresponding to a training example is 625. Now for training the model, we split the data such that we have 225 training data (15 from each of the 15 classes), and 75 cross validation data (5 from each of the 15 class). A. Using Logistic Regression : Using Logistic regression, the training accuracy obtained was 87.89 %, using one-vs-all method, with 150 iteration each, for gradient descent. As the number of features were greater than the number of training examples, so increasing the number of training examples, slightly increased the accuracy, by reducing the problem of overfitting. Also the accuracy improved by using regularization, with the accuracy reaching 89.6% using λ=1, again as a consequence of less overfitting. To speed up the algorithm, the learning rate was initially set to 0.001 and then slowly increased by 5% as the cost function starts converging.
  • 11. 11 B. Using Support Vector Machine : Using Support Vector Machine, the accuracy obtained with linear kernel is 90% and with gaussian kernel is 82.06%. Again the results follow, as the number of features here is quite large as compared to the training example, and so using more complex features from the ones given lead to overfitting. However by reducing the parameter C of the SVM, the results with gaussian kernel matched with the linear kernel, since by reducing the value of C, we are increasing the regularization parameter, and so avoiding overfitting. Also reducing the parameter C with linear kernel after a certain value, started showing a decrease in accuracy due to underfitting. In this case, using more training samples didn't help, as the model was underfitting. However with the value of C set to 1, using more training examples slightly increased the accuracy. Also all these cases showed a decline in accuracy when the number of features were increased from 625 to 2500 (i.e. the image was resized to 50 X 50). Again the results follow, as now the number of features is very large in comparison to training examples and the model is very much prone to overfitting. We find that since in this case we are using the linear kernel in SVM, the results of SVM are almost similar to that of regularized logistic regression. However when the number of features was reduced to 225 (i.e. by resizing the image to 15 X 15), then using SVM with gaussian kernel proved to be a better alternative then using logistic regression, as using SVM, we got to learn more complex features through kernel functions.
  • 12. 12 C. Using Backpropagation: Using Backpropagation, the accuracy obtained was 84.74 %. This was the case when 1 hidden layer was used and the number of neurons in the hidden layer was 100. When the number of neurons in the hidden layer was reduced to 10, the accuracy decreased to less than 50%. This was the result of underfitting, as the model was unable to represent the complex relationship between the input and the output using just 10 neurons in the hidden layer. On the other hand, when the number of neurons in the hidden layer was increased to 115, the accuracy became 82.63 % which shows that the model has started overfitting the data. This can be reduced by introducing the regularization parameter. Overall, we find that SVM gives best results among these three classifiers and also is computationally less expensive than backpropagation. On the other hand, with backpropagation, we have to play with the number of hidden layers, and the number of neurons in each hidden layer, to obtain best possible results. Also increasing the number of hidden layers and neurons, makes the computation quite slow, and needs more memory. Thus using these facts about the various classifiers, one can move in the right direction to increase the accuracy and performance of the classifier, without unnecessarily wasting time in gathering more training data, or computing more features, even when they will not contribute towards increasing the accuracy.
  • 13. 13 2. Regression Data: The regression data is for house rate prediction, with 13 features. The number of available samples is 506. The accuracy/performance of a particular classifier for a regression problem is obtained by calculating the squared error on the test data set. Following classifiers were used to fit the training data: A. Linear Regression: For training the data using linear regression, the data was first split into training and test dataset, with 350 examples in the training data, and 156 samples for test dataset. The features varied a lot in values, for example some of the features are binary, having only 0 or 1 as its value, where others had relatively large values such as size(in sq feet) of the house. So, feature scaling was performed before training the data, by subtracting each of the feature, by the mean value of the feature, and then dividing it by the range of the feature (i.e. the difference between the maximum and minimum value of that feature). After this, the data obtained were randomly shuffled, and then gradient descent applied to minimize the squared error cost function. Setting the learning rate alpha to 0.001, the squared error obtained on the test set after 3000 iterations was equal to 20.2396. Now the number of training samples were increased from 350 to 450. This lead to the square error on test set equal to 24.7069. The result follows, as the number of training samples are far more than the number of features, and the data is suffering from underfitting.
  • 14. 14 B. Using Support Vector Machine: Using SVM, with linear kernel and regularization parameter C equal to 1 gave a relatively large square error on test dataset of 94.7264. This follows, as both the use of linear kernel and setting high regularization parameter lead to high bias. However using gaussian kernel with regularization parameter C set to 60000 gave a very high accuracy, with squared error on test set equal to 12.0428 only. This shows that by minimizing the regularization and using gaussian kernel, we were able to create complex features from the 13 given features, and thus we were able to fit a more appropriate model to the training data. Also, it is intuitive that after a certain value, the increase in value of C will lead to overfitting, as was evident here. When the value of C was set to 6000000, the squared error increased slightly to 15.2026. So, unlike the case for the previous classification problem where the number of features were far greater than the number of training samples available, here the number of features is very small. So we need to add polynomial feature terms to fit the training data more completely. As we found above, one way was to reduce the regularization parameter, and the other was to use a kernel function. Apart from this, we could manually gather more features.
  • 15. 15 C. Using Backpropagation: As it has become evident from the above classifiers, we need to gather more and more features. So using a single hidden layer with 13 neurons gave a training error of 23.0929 and a test error of 57.4974 whereas using a single hidden layer with 100 neurons gave a training error of 1.5606 and a test error of 43.7274.
  • 16. 16 CONCLUSION From the above analysis of different classifiers, it is clear that different training samples require different treatment for optimizing the accuracy of the classifier. For example for a training data that suffers from high bias, it would be quite useless to obtain more training samples, which would also lead to wastage of time in gathering more data or manually creating more. On the other a proper utilization of time would be to analyze the current features to create more and more features derived from existing features or to add polynomial features, to fit complex non-linear functions to the hypothesis. Conversely, from a data suffering from high variance, obtaining more training samples would be a wise choice to overcome the problem of overfitting. These important decisions help in determining the right way to proceed to increase the accuracy of our classifier.
  • 17. 17 REFERENCES 1. UCI Machine learning repository https://archive.ics.uci.edu/ml/datasets/Housing 2. Machine Learning Course offered by Stanford University on Coursera Platform https://www.coursera.org/learn/machine-learning 3. Data Mining, Practical Machine Learning Tools and Techniques By Ian H.Witten, Eibe Frank, Mark A. Hall 4. Neural Networks, Algorithms, Applications, and Programming Techniques By James A. Freeman, David M. Skapura 5. LIBSVM -- A Library for Support Vector Machines Chih-Chung Chang and Chih-Jen Lin https://www.csie.ntu.edu.tw/~cjlin/libsvm