SlideShare a Scribd company logo
1 of 1
T0 T60 T90
T0 T30 T60 T90
Machine Learning: LIBSVM Support Vector Machines and K-
Nearest – Neighbor Approach for Accurate Parameter Selection
Jonathan Rinfret
1
San Jose State University, Mathematics and Statistics Dept., San Jose, CA
95192-0100
2
Gavilan College, STEM Dept., Gilroy, CA 95020
Methods Acknowledgements
References
Conclusions
Introduction
Thank you to Dr. Guangliang Chen for mentoring
this project and his extensive knowledge of
machine learning and SVM. Thanks to Mr. Rey
Morales and Dr. Hope Jukl and other Gavilan
College staff. Funding for this internship was
provided by the Gavilan College STEM Grant.
Objectives
• To implement a KNN
approach to finding a value
Sigma to use for Gamma.
•Compare the value for gamma
from KNN to that of the
value for gamma by
Gridsearch.
Support Vector Machines are a current method for
machine learning in which a program reads in a set
of training data and attempts to predict the next set
of testing data by drawing a hyperplane between the
classes of data. This study focuses on modifying an
existing program called LIBSVM to support a k-
nearest-neighbor (KNN) approach to find a hyper
parameter sigma used in the formulation of the
hyperplane and compare with a MATLAB KNN.
The hyper parameter sigma is a measure of the
closeness between two or more classes of data. In
our data sets, there are several data points, that
correspond to a label, or class. Sigma is a numerical
value of the difference between the size of two or
more classes of data. If sigma is large, then the
classes in the dataset are large in size. If sigma is
small, then the classes are small.
Results
Overall, the k-nearest-neighbor method for selecting
the hyper parameter gamma has resulted in increased
cross-validation accuracy, test accuracy, and speed
than the Gridsearch method. The KNN method is
much faster than the Gridsearch method as the value
of c increases. Also, after comparing different values
of the kth nearest neighbor, it seems that as the next
neighbor value increases, the increase in gamma
slows and reaches a limit.
From this, the modifications that have been made are
more successful than the standard Gridsearch
method.
Chih-Chung Chang and Chih-Jen Lin, LIBSVM : a library for support vector
machines. ACM Transactions on Intelligent Systems and Technology, 2:27:1--27:27,
2011. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm
Chen, Guangliang. MATH 285: Classification with Handwritten Digits. Guangliang
Chen, 28 Jan. 2016. Web. 3 June 2016.
<http://www.math.sjsu.edu/~gchen/Math285S16.html>.
KNN Speed VS Gridsearch Speed
Difference In Gamma For K Values
Different Data Sets Used
The Gridsearch
method finds the best
values for Gamma and
C. However,
Gridsearch may result
in a much longer
runtime and may not
be as accurate as KNN.
W ∙ (Φ)x +b =0
W ∙ (Φ)x +b = 1
W ∙ (Φ)x +b = -1
1 / ||w||
The k-nearest-neighbor method,
will find the best value for
gamma by using these two
equations to the left.
KNN method and SVM
Differences
The modifications I made to LIBSVM were different than
the MATLAB method for KNN. My method involved using
bubble sort to find the 8th
-nearest-neighbor.
The bubble sort method checks to see if the first number is
smaller than the next. If it is, the values change position.
While simple, the bubble sort method is very slow. To
increase the speed, I made my sorting algorithm sort only
the first 30 values in each row. While this did slightly lower
accuracy, it greatly increased the speed of my experiments
and my accuracy was comparatively close to that of the
MATLAB method.
Usually, most of the datasets had higher test accuracy
than validation accuracy. However, the vowel data set was
reversed. I presume that this data set had higher
validation accuracy since it had more classes and the test
and training data were similar in size.

More Related Content

What's hot

Scalable Graph Clustering with Pregel
Scalable Graph Clustering with PregelScalable Graph Clustering with Pregel
Scalable Graph Clustering with PregelSqrrl
 
140829+an+empirical+study+of+the+impact+of+cloud+patterns+on+quality+of+servi...
140829+an+empirical+study+of+the+impact+of+cloud+patterns+on+quality+of+servi...140829+an+empirical+study+of+the+impact+of+cloud+patterns+on+quality+of+servi...
140829+an+empirical+study+of+the+impact+of+cloud+patterns+on+quality+of+servi...Ptidej Team
 
Network Flow Pattern Extraction by Clustering Eugine Kang
Network Flow Pattern Extraction by Clustering Eugine KangNetwork Flow Pattern Extraction by Clustering Eugine Kang
Network Flow Pattern Extraction by Clustering Eugine KangEugine Kang
 
Arcomem training twitter-dynamics_advanced
Arcomem training twitter-dynamics_advancedArcomem training twitter-dynamics_advanced
Arcomem training twitter-dynamics_advancedarcomem
 
Probabilistic data structures
Probabilistic data structuresProbabilistic data structures
Probabilistic data structuresYoav chernobroda
 
improvement of search strategy with knn approach for traffic state prediction
improvement of search strategy with knn approach for traffic state predictionimprovement of search strategy with knn approach for traffic state prediction
improvement of search strategy with knn approach for traffic state predictionMIDHUN XAVIER
 
LIMSI @ MediaEval SED 2014
LIMSI @ MediaEval SED 2014LIMSI @ MediaEval SED 2014
LIMSI @ MediaEval SED 2014multimediaeval
 
Clustering Algorithms for Data Stream
Clustering Algorithms for Data StreamClustering Algorithms for Data Stream
Clustering Algorithms for Data StreamIRJET Journal
 

What's hot (10)

Scalable Graph Clustering with Pregel
Scalable Graph Clustering with PregelScalable Graph Clustering with Pregel
Scalable Graph Clustering with Pregel
 
140829+an+empirical+study+of+the+impact+of+cloud+patterns+on+quality+of+servi...
140829+an+empirical+study+of+the+impact+of+cloud+patterns+on+quality+of+servi...140829+an+empirical+study+of+the+impact+of+cloud+patterns+on+quality+of+servi...
140829+an+empirical+study+of+the+impact+of+cloud+patterns+on+quality+of+servi...
 
Network Flow Pattern Extraction by Clustering Eugine Kang
Network Flow Pattern Extraction by Clustering Eugine KangNetwork Flow Pattern Extraction by Clustering Eugine Kang
Network Flow Pattern Extraction by Clustering Eugine Kang
 
Arcomem training twitter-dynamics_advanced
Arcomem training twitter-dynamics_advancedArcomem training twitter-dynamics_advanced
Arcomem training twitter-dynamics_advanced
 
Probabilistic data structures
Probabilistic data structuresProbabilistic data structures
Probabilistic data structures
 
Climate Extremes Workshop - Networks and Extremes: Review and Further Studies...
Climate Extremes Workshop - Networks and Extremes: Review and Further Studies...Climate Extremes Workshop - Networks and Extremes: Review and Further Studies...
Climate Extremes Workshop - Networks and Extremes: Review and Further Studies...
 
improvement of search strategy with knn approach for traffic state prediction
improvement of search strategy with knn approach for traffic state predictionimprovement of search strategy with knn approach for traffic state prediction
improvement of search strategy with knn approach for traffic state prediction
 
LIMSI @ MediaEval SED 2014
LIMSI @ MediaEval SED 2014LIMSI @ MediaEval SED 2014
LIMSI @ MediaEval SED 2014
 
Clustering Algorithms for Data Stream
Clustering Algorithms for Data StreamClustering Algorithms for Data Stream
Clustering Algorithms for Data Stream
 
Bn
BnBn
Bn
 

Similar to Rinfret, Jonathan poster(2)

Instance based learning
Instance based learningInstance based learning
Instance based learningswapnac12
 
Text Categorization Using Improved K Nearest Neighbor Algorithm
Text Categorization Using Improved K Nearest Neighbor AlgorithmText Categorization Using Improved K Nearest Neighbor Algorithm
Text Categorization Using Improved K Nearest Neighbor AlgorithmIJTET Journal
 
Survey on classification algorithms for data mining (comparison and evaluation)
Survey on classification algorithms for data mining (comparison and evaluation)Survey on classification algorithms for data mining (comparison and evaluation)
Survey on classification algorithms for data mining (comparison and evaluation)Alexander Decker
 
Application of combined support vector machines in process fault diagnosis
Application of combined support vector machines in process fault diagnosisApplication of combined support vector machines in process fault diagnosis
Application of combined support vector machines in process fault diagnosisDr.Pooja Jain
 
Improving K-NN Internet Traffic Classification Using Clustering and Principle...
Improving K-NN Internet Traffic Classification Using Clustering and Principle...Improving K-NN Internet Traffic Classification Using Clustering and Principle...
Improving K-NN Internet Traffic Classification Using Clustering and Principle...journalBEEI
 
IGARSS2011-I-Ling.ppt
IGARSS2011-I-Ling.pptIGARSS2011-I-Ling.ppt
IGARSS2011-I-Ling.pptgrssieee
 
Supervised and unsupervised learning
Supervised and unsupervised learningSupervised and unsupervised learning
Supervised and unsupervised learningAmAn Singh
 
Performance Analysis of Different Clustering Algorithm
Performance Analysis of Different Clustering AlgorithmPerformance Analysis of Different Clustering Algorithm
Performance Analysis of Different Clustering AlgorithmIOSR Journals
 
consistency regularization for generative adversarial networks_review
consistency regularization for generative adversarial networks_reviewconsistency regularization for generative adversarial networks_review
consistency regularization for generative adversarial networks_reviewYoonho Na
 
SVM - Functional Verification
SVM - Functional VerificationSVM - Functional Verification
SVM - Functional VerificationSai Kiran Kadam
 
Data mining projects topics for java and dot net
Data mining projects topics for java and dot netData mining projects topics for java and dot net
Data mining projects topics for java and dot netredpel dot com
 
Clustering Using Shared Reference Points Algorithm Based On a Sound Data Model
Clustering Using Shared Reference Points Algorithm Based On a Sound Data ModelClustering Using Shared Reference Points Algorithm Based On a Sound Data Model
Clustering Using Shared Reference Points Algorithm Based On a Sound Data ModelWaqas Tariq
 
NEAREST NEIGHBOUR CLUSTER ANALYSIS.pptx
NEAREST NEIGHBOUR CLUSTER ANALYSIS.pptxNEAREST NEIGHBOUR CLUSTER ANALYSIS.pptx
NEAREST NEIGHBOUR CLUSTER ANALYSIS.pptxagniva pradhan
 
Vchunk join an efficient algorithm for edit similarity joins
Vchunk join an efficient algorithm for edit similarity joinsVchunk join an efficient algorithm for edit similarity joins
Vchunk join an efficient algorithm for edit similarity joinsVijay Koushik
 

Similar to Rinfret, Jonathan poster(2) (20)

Instance based learning
Instance based learningInstance based learning
Instance based learning
 
Text Categorization Using Improved K Nearest Neighbor Algorithm
Text Categorization Using Improved K Nearest Neighbor AlgorithmText Categorization Using Improved K Nearest Neighbor Algorithm
Text Categorization Using Improved K Nearest Neighbor Algorithm
 
Survey on classification algorithms for data mining (comparison and evaluation)
Survey on classification algorithms for data mining (comparison and evaluation)Survey on classification algorithms for data mining (comparison and evaluation)
Survey on classification algorithms for data mining (comparison and evaluation)
 
Application of combined support vector machines in process fault diagnosis
Application of combined support vector machines in process fault diagnosisApplication of combined support vector machines in process fault diagnosis
Application of combined support vector machines in process fault diagnosis
 
Nlp text classification
Nlp text classificationNlp text classification
Nlp text classification
 
Improving K-NN Internet Traffic Classification Using Clustering and Principle...
Improving K-NN Internet Traffic Classification Using Clustering and Principle...Improving K-NN Internet Traffic Classification Using Clustering and Principle...
Improving K-NN Internet Traffic Classification Using Clustering and Principle...
 
IGARSS2011-I-Ling.ppt
IGARSS2011-I-Ling.pptIGARSS2011-I-Ling.ppt
IGARSS2011-I-Ling.ppt
 
Ijetr042111
Ijetr042111Ijetr042111
Ijetr042111
 
Supervised and unsupervised learning
Supervised and unsupervised learningSupervised and unsupervised learning
Supervised and unsupervised learning
 
Performance Analysis of Different Clustering Algorithm
Performance Analysis of Different Clustering AlgorithmPerformance Analysis of Different Clustering Algorithm
Performance Analysis of Different Clustering Algorithm
 
F017132529
F017132529F017132529
F017132529
 
consistency regularization for generative adversarial networks_review
consistency regularization for generative adversarial networks_reviewconsistency regularization for generative adversarial networks_review
consistency regularization for generative adversarial networks_review
 
KNN.pptx
KNN.pptxKNN.pptx
KNN.pptx
 
SVM - Functional Verification
SVM - Functional VerificationSVM - Functional Verification
SVM - Functional Verification
 
artificial intelligence.pptx
artificial intelligence.pptxartificial intelligence.pptx
artificial intelligence.pptx
 
Di35605610
Di35605610Di35605610
Di35605610
 
Data mining projects topics for java and dot net
Data mining projects topics for java and dot netData mining projects topics for java and dot net
Data mining projects topics for java and dot net
 
Clustering Using Shared Reference Points Algorithm Based On a Sound Data Model
Clustering Using Shared Reference Points Algorithm Based On a Sound Data ModelClustering Using Shared Reference Points Algorithm Based On a Sound Data Model
Clustering Using Shared Reference Points Algorithm Based On a Sound Data Model
 
NEAREST NEIGHBOUR CLUSTER ANALYSIS.pptx
NEAREST NEIGHBOUR CLUSTER ANALYSIS.pptxNEAREST NEIGHBOUR CLUSTER ANALYSIS.pptx
NEAREST NEIGHBOUR CLUSTER ANALYSIS.pptx
 
Vchunk join an efficient algorithm for edit similarity joins
Vchunk join an efficient algorithm for edit similarity joinsVchunk join an efficient algorithm for edit similarity joins
Vchunk join an efficient algorithm for edit similarity joins
 

Rinfret, Jonathan poster(2)

  • 1. T0 T60 T90 T0 T30 T60 T90 Machine Learning: LIBSVM Support Vector Machines and K- Nearest – Neighbor Approach for Accurate Parameter Selection Jonathan Rinfret 1 San Jose State University, Mathematics and Statistics Dept., San Jose, CA 95192-0100 2 Gavilan College, STEM Dept., Gilroy, CA 95020 Methods Acknowledgements References Conclusions Introduction Thank you to Dr. Guangliang Chen for mentoring this project and his extensive knowledge of machine learning and SVM. Thanks to Mr. Rey Morales and Dr. Hope Jukl and other Gavilan College staff. Funding for this internship was provided by the Gavilan College STEM Grant. Objectives • To implement a KNN approach to finding a value Sigma to use for Gamma. •Compare the value for gamma from KNN to that of the value for gamma by Gridsearch. Support Vector Machines are a current method for machine learning in which a program reads in a set of training data and attempts to predict the next set of testing data by drawing a hyperplane between the classes of data. This study focuses on modifying an existing program called LIBSVM to support a k- nearest-neighbor (KNN) approach to find a hyper parameter sigma used in the formulation of the hyperplane and compare with a MATLAB KNN. The hyper parameter sigma is a measure of the closeness between two or more classes of data. In our data sets, there are several data points, that correspond to a label, or class. Sigma is a numerical value of the difference between the size of two or more classes of data. If sigma is large, then the classes in the dataset are large in size. If sigma is small, then the classes are small. Results Overall, the k-nearest-neighbor method for selecting the hyper parameter gamma has resulted in increased cross-validation accuracy, test accuracy, and speed than the Gridsearch method. The KNN method is much faster than the Gridsearch method as the value of c increases. Also, after comparing different values of the kth nearest neighbor, it seems that as the next neighbor value increases, the increase in gamma slows and reaches a limit. From this, the modifications that have been made are more successful than the standard Gridsearch method. Chih-Chung Chang and Chih-Jen Lin, LIBSVM : a library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2:27:1--27:27, 2011. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm Chen, Guangliang. MATH 285: Classification with Handwritten Digits. Guangliang Chen, 28 Jan. 2016. Web. 3 June 2016. <http://www.math.sjsu.edu/~gchen/Math285S16.html>. KNN Speed VS Gridsearch Speed Difference In Gamma For K Values Different Data Sets Used The Gridsearch method finds the best values for Gamma and C. However, Gridsearch may result in a much longer runtime and may not be as accurate as KNN. W ∙ (Φ)x +b =0 W ∙ (Φ)x +b = 1 W ∙ (Φ)x +b = -1 1 / ||w|| The k-nearest-neighbor method, will find the best value for gamma by using these two equations to the left. KNN method and SVM Differences The modifications I made to LIBSVM were different than the MATLAB method for KNN. My method involved using bubble sort to find the 8th -nearest-neighbor. The bubble sort method checks to see if the first number is smaller than the next. If it is, the values change position. While simple, the bubble sort method is very slow. To increase the speed, I made my sorting algorithm sort only the first 30 values in each row. While this did slightly lower accuracy, it greatly increased the speed of my experiments and my accuracy was comparatively close to that of the MATLAB method. Usually, most of the datasets had higher test accuracy than validation accuracy. However, the vowel data set was reversed. I presume that this data set had higher validation accuracy since it had more classes and the test and training data were similar in size.