Introduction to ArtificiaI Intelligence in Higher Education
Performance Analysis of Machine Learning Algorithms for Self Localization Systems
1. Venkat Java Projects
Mobile:+91 9966499110
Visit:www.venkatjavaprojects.com Email:venkatjavaprojects@gmail.com
Performance Analysis of Machine Learning Algorithms on Self Localization Systems
In thispaperauthoris usingSVM(SupportVectorMachine), DecisionTree Classifier,K-Neighbors
Classifier,naïve bayes,RandomForestClassifier,BaggingClassifier,AdaBoostClassifierandMLP
Classifier
All the algorithmsgenerate model fromtraindatasetandnew datawill be appliedontrainmodel to
predictitclass.Random Forest algorithmisgivingbetterpredictionaccuracycompare to all other
algorithm.
Support vector machine:
Machine learninginvolvespredictingandclassifyingdataandto do sowe employvariousmachine
learningalgorithmsaccordingtothe dataset.SVMor SupportVectorMachine is a linearmodel for
classificationandregressionproblems.Itcansolve linearandnon-linearproblemsandworkwell for
manypractical problems.The ideaof SVMissimple:The algorithmcreatesaline or a hyperplane
whichseparatesthe dataintoclasses.Inmachine learning,the radial basisfunctionkernel,orRBF
kernel,isapopularkernel functionusedinvariouskernelizedlearningalgorithms.Inparticular,itis
commonlyusedinsupportvectormachine classification.Asasimple example,foraclassification
task withonlytwofeatures(like the image above),youcanthinkof a hyper plane asa line that
linearlyseparatesandclassifiesasetof data.
Intuitively,the furtherfromthe hyper plane ourdatapointslie,the more confidentwe are thatthey
have beencorrectlyclassified.We therefore wantourdatapointstobe as far awayfrom the hyper
plane as possible,while still beingonthe correctside of it.
So whennewtestingdataisadded, whateversideof the hyperplane itlandswill decide the class
that we assignto it.
How dowe findthe righthyper plane?
Or, inotherwords,howdo we bestsegregate the twoclasseswithinthe data?
The distance betweenthe hyper plane andthe nearestdatapointfromeithersetisknownasthe
margin.The goal isto choose a hyper plane withthe greatestpossible marginbetweenthe hyper
plane andany pointwithinthe trainingset,givingagreaterchance of new data beingclassified
correctly.
Both algorithmsgeneratemodel fromtraindatasetandnew datawill be appliedontrainmodel to
predictitclass.SVMalgorithmisgivingbetterpredictionaccuracycompare to ANN algorithm.
Naïve Bayes Classifier Algorithm
It wouldbe difficultand practicallyimpossible toclassifyawebpage,a document,anemail orany
otherlengthytextnotesmanually.This iswhere Naïve BayesClassifiermachine learningalgorithm
comesto the rescue.A classifierisafunctionthatallocatesa population’s elementvalue fromone of
the available categories.Forinstance,SpamFilteringisapopularapplicationof Naïve Bayes
algorithm.Spamfilterhere,isaclassifierthatassignsalabel “Spam”or “Not Spam” to all the emails.
2. Venkat Java Projects
Mobile:+91 9966499110
Visit:www.venkatjavaprojects.com Email:venkatjavaprojects@gmail.com
Naïve BayesClassifierisamongst the mostpopularlearningmethodgroupedbysimilaritiesthat
workson the popularBayesTheorem of Probability- tobuildmachine learningmodelsparticularly
for disease predictionanddocumentclassification.Itisa simple classification of wordsbased on
BayesProbabilityTheoremforsubjective analysisof content.
Decision tree:
A decisiontree isagraphical representationthatmakesuse of branchingmethodologytoexemplify
all possible outcomesof adecision, basedoncertainconditions.Inadecisiontree,the internalnode
representsateston the attribute,eachbranchof the tree represents the outcome of the testand
the leaf node representsaparticularclasslabel i.e.the decisionmade aftercomputingall of the
attributes.
The classificationrulesare representedthroughthe pathfromrootto the leaf node.
Types ofDecisionTrees
ClassificationTrees- These are consideredasthe defaultkindof decisiontreesusedtoseparate a
datasetintodifferentclasses,based onthe responsevariable.Theseare generallyusedwhenthe
response variable iscategoricalinnature.
RegressionTrees-Whenthe responseortargetvariable iscontinuousornumerical,regressiontrees
are used.These are generally usedinpredictive type of problemswhencomparedtoclassification.
Decisiontreescanalsobe classifiedintotwotypes,basedonthe type of targetvariable- Continuous
Variable DecisionTreesandBinary Variable DecisionTrees.Itisthe targetvariable thathelpsdecide
whatkindof decisiontree wouldbe requiredforaparticularproblem.
Random forest:
RandomForestisthe go to machine learningalgorithmthatusesabaggingapproachto create a
bunchof decisiontreeswithrandom subsetof the data.A model istrainedseveral timesonrandom
sample of the datasetto achieve goodpredictionperformancefromthe randomforestalgorithm.In
thisensemble learningmethod,the outputof all the decisiontreesinthe randomforest,is
combinedtomake the final prediction.The final prediction of the randomforestalgorithmisderived
by pollingthe resultsof eachdecisiontree orjustby goingwitha predictionthatappearsthe most
timesinthe decisiontrees.
For instance,inthe above example- if 5 friendsdecide thatyouwill like restaurantRbutonly2
friendsdecide thatyouwill notlike the restaurantthenthe final predictionisthat,youwill like
restaurantR as majorityalwayswins.
K – nearest neighbor:
K-nearest neighbor’salgorithm(k-NN) isa nonparametricmethodused forclassification
and regression Inbothcases,the inputconsistsof the kclosesttrainingexamplesinthe feature
space.The outputdependsonwhether k-NN isusedforclassificationorregression:
In k-NN classification,the outputisa class membership.Anobjectisclassifiedbya
pluralityvote of itsneighbors,withthe objectbeingassignedtothe classmost common
amongits k nearestneighbors(kisapositive integer,typicallysmall).If k= 1, thenthe
objectissimplyassignedtothe classof that single nearestneighbor.
In k-NN regression,the outputisthe propertyvalue forthe object.Thisvalue isthe
average of the valuesof k nearestneighbors.
3. Venkat Java Projects
Mobile:+91 9966499110
Visit:www.venkatjavaprojects.com Email:venkatjavaprojects@gmail.com
K-NN isa type of instant-basedlearning,orlazylearning, where the functionisonly
approximatedlocallyandall computationisdeferreduntil classification.
Both forclassificationandregression,auseful technique canbe toassignweightstothe
contributionsof the neighbors,sothatthe nearerneighborscontribute more tothe average
than the more distantones.Forexample,acommonweightingscheme consistsingiving
each neighboraweightof 1/d,where d isthe distance tothe neighbor
The neighborsare takenfroma setof objectsforwhichthe class (for k-NN classification)or
the objectpropertyvalue (for k-NN regression) isknown.Thiscanbe thoughtof as the
trainingsetforthe algorithm,thoughnoexplicittrainingstepisrequired.
A peculiarityof the k-NN algorithmisthatitis sensitive tothe local structure of the data.
Bagging classifier:
A Baggingclassifierisanensemble meta-estimatorthatfitsbase classifierseachonrandomsubsets
of the original datasetandthenaggregate theirindividual predictions(eitherbyvotingorby
averaging) toforma final prediction.Suchameta-estimatorcantypicallybe usedasa wayto reduce
the variance of a black-box estimator(e.g.,adecisiontree),byintroducingrandomizationintoits
constructionprocedure andthenmakinganensemble outof it.
Each base classifieristrainedinparallelwithatrainingsetwhichisgeneratedbyrandomlydrawing,
withreplacement,N examples (ordata) fromthe original trainingdataset –whereN is the size of the
original training set.Trainingsetfor each of the base classifiersisindependentof eachother.Many
of the original datamaybe repeatedinthe resultingtrainingsetwhile othersmaybe leftout.
Baggingreducesoverfitting(variance) byaveragingorvoting,however,thisleadstoanincrease in
bias,whichis compensatedbythe reductioninvariance though.
AdaBoost:
Adaptiveboosting isamachine learningmeatalgorithm formulated.Itcanbe usedin conjunction
withmanyothertypesof learningalgorithmstoimprove performance.The outputof the other
learningalgorithms('weaklearners') iscombinedintoaweightedsumthatrepresentsthe final
outputof the boostedclassifier.AdaBoostisadaptive inthe sense thatsubsequentweaklearners
are tweakedinfavorof those instancesmisclassifiedbypreviousclassifiers.AdaBoostissensitiveto
noisydata and outliers. Insome problemsitcanbe lesssusceptible tothe overfittingproblemthan
otherlearningalgorithms.The individuallearnerscanbe weak,butas longas the performance of
each one isslightlybetterthanrandomguessing,the final model canbe proventoconverge toa
stronglearner.
Everylearningalgorithmtendsto suitsome problemtypesbetterthanothers,andtypicallyhas
manydifferentparametersandconfigurationstoadjustbefore itachievesoptimalperformance ona
dataset,AdaBoost isoftenreferredtoasthe bestout-of-the-box classifier.[2]
Whenusedwith
decisiontree learning,informationgatheredateachstage of the AdaBoostalgorithmaboutthe
relative 'hardness'of eachtrainingsample isfedintothe tree growingalgorithmsuchthatlatertrees
tendto focuson harder-to-classifyexamples.
4. Venkat Java Projects
Mobile:+91 9966499110
Visit:www.venkatjavaprojects.com Email:venkatjavaprojects@gmail.com
Multilayer perceptron (MLP):
A multilayerperceptron(MLP) isa classof feedforward artificialneural network (ANN).The term
MLP isusedambiguously,sometimeslooselytoreferto any feed forwardANN,sometimesstrictlyto
referto networkscomposedof multiple layersof perceptrons (withthresholdactivation);
see § Terminology.Multilayerperceptronsare sometimescolloquiallyreferredtoas"vanilla"neural
networks,especiallywhentheyhave asingle hiddenlayer.
An MLP consistsof at leastthree layersof nodes:aninputlayer,a hiddenlayerandanoutputlayer.
Exceptfor the inputnodes,eachnode isa neuronthatusesa nonlinear activationfunction.MLP
utilizesasupervised learningtechnique called backpropagation fortraining. Itsmultiple layersand
non-linearactivationdistinguishMLPfroma linear perceptron.Itcandistinguishdatathatis
not linearlyseparable.
To implementabove all algorithmswe have usedpython technologyand‘student data’ dataset.This
dataset available inside dataset folder which contains test dataset with dataset information file.
PythonPackagesandLibrariesused:Numpy,pandas, tkinter,
PyVISA 1.10.1 1.10.1
PyVISA-py 0.3.1 0.3.1
cycler 0.10.0 0.10.0
imutils 0.5.3 0.5.3
joblib 0.14.1 0.14.1
kiwisolver 1.1.0 1.1.0
matplotlib 3.1.2 3.1.2
nltk 3.4.5 3.4.5
numpy 1.18.1 1.18.1
opencv-python 4.1.2.30 4.1.2.30
pandas 0.25.3 0.25.3
pip 19.0.3 20.0.1
pylab 0.0.2 0.0.2
pyparsing 2.4.6 2.4.6
python-dateutil 2.8.1 2.8.1
pytz 2019.3 2019.3
pyusb 1.0.2 1.0.2
scikit-learn 0.22.1 0.22.1
scipy 1.4.1 1.4.1
seaborn 0.9.0 0.9.0
setuptools 40.8.0 45.1.0
six 1.14.0 1.14.0
sklearn 0.0 0.0
style 1.1.6 1.1.6
5. Venkat Java Projects
Mobile:+91 9966499110
Visit:www.venkatjavaprojects.com Email:venkatjavaprojects@gmail.com
styled 0.2.0.post1 0.2.0.post1
classificationreport,confusionmatrix,accuracyscore,train_test_split,K-Fold,cross_val_score,Grid
SearchCV, DecisionTree Classifier,K-NeighborsClassifier,SVC,naive_bayes,Random Forest
Classifier,BaggingClassifier,AdaBoostClassifier,MLP Classifier.
Screenshots
Whenwe run the code itdisplaysbelow window
Nowclickon ‘uploaddataset’touploadthe data
6. Venkat Java Projects
Mobile:+91 9966499110
Visit:www.venkatjavaprojects.com Email:venkatjavaprojects@gmail.com
Nowclickon ‘readdata’ itreads the data
Nowclickon ‘Train_Test_split’tosplitthe dataintotrainingandtesting
7. Venkat Java Projects
Mobile:+91 9966499110
Visit:www.venkatjavaprojects.com Email:venkatjavaprojects@gmail.com
Nowclickon ‘All classifiers’toclassifythe models
KNN Predicted Values on Test Data is 98%
CART Predicted Values on Test Data is 97.31%
SVM Predicted Values on Test Data is 98.18%
RF Predicted Values on Test Data is 98.62%
Bagging Predicted Values on Test Data is 97.41%
Ada Predicted Values on Test Data is 87.43%
MLP Predicted Values on Test Data is 98.00%
Nowclickon ‘Model comparison’the comparisonbetweenthe models