SlideShare a Scribd company logo
1 of 14
MuhammadGulraj
BS GIKI,Pakistan
MS UET Peshawar,Pakistan
1
Machine Learning
Name: Muhammad Gulraj
Muhammad.gulraj@yahoo.com
MuhammadGulraj
BS GIKI,Pakistan
MS UET Peshawar,Pakistan
2
K–Nearest neighbor (KNN)
In machine learning and pattern recognition K - Nearest neighbor KNN is an algorithm (non-
parametric) that can be used for classification and regression. It is among the simplest
algorithms in machine learning. The output of the KNN depends on whether the algorithm is
used for regression or classification.
KNN classify an object using the majority vote/neighbor. The object is assigned to the class
which has the most members near to the object.
MuhammadGulraj
BS GIKI,Pakistan
MS UET Peshawar,Pakistan
3
KNN is instance based learning or lazy learning. KNN is simple but powerful algorithm in which
no training is required and new training example can be added easily. However it is slow and
expensive having computation complexity of O (md). KNN has a slow run time performance but
it can be improved significantly by removing redundant data, computing approximate distance or
using pre-sort training examples into fast data structures.
KNN can be used in handwritten character classification, content based image retrieval,
intrusion detection and fault detection.
Let’s consider a simple example of an object which have many sea bass and salmon as
neighbors. Let’s assume that K = 3 which means that we will consider 3 nearest neighbors of
the object. From the below image it is clear that the object have 2 sea bass and 1 salmon, so
the KNN algorithm will classify the object as sea bass.
MuhammadGulraj
BS GIKI,Pakistan
MS UET Peshawar,Pakistan
4
KNN Algorithm is very simple. The training period consist of storing all instances and labels
(Class labels). If feature selection has to be performed then n-fold cross validation should be
used on training set. If we want to test a new instance X given a set Y, the following steps are
needed.
i. Compute the distance of X from each instance in set Y
ii. Sort the distance in increasing order and pick the highest K elements.
iii. Find the most repeated class in K nearest neighbor.
We can implement the KNN algorithm in matlab for IRIS dataset. Summary of the script is as
follows.
i. Load iris data in matlab
ii. Randomize the data for new iteration for new sets of data and training set
iii. For every observation we compute the Euclidean distance
iv. We compute the K nearest neighbor and store it in an array
v. We assign the label for the lowest distance
vi. In case there is a tie we will randomly assign a class label
vii. Return the label of the class
viii. Find confusion matrix
Please find the code knniris.m in the assignment folder. KNN shows very good results with less
number of classes and features.
MuhammadGulraj
BS GIKI,Pakistan
MS UET Peshawar,Pakistan
5
Correctly classified 149
Incorrectly classified 1
Mean error 0.008
Relative absolute error 1.9%
Root mean squared error 0.009
Total instances 150
Confusion matrix a b c
setosa 50 0 0
vericolor 0 50 0
virginica 0 0 49
The detailed analysis shows that the KNN classifier makes very few mistakes in a dataset that is
simple, although not linearly separable.
MuhammadGulraj
BS GIKI,Pakistan
MS UET Peshawar,Pakistan
6
Bayes Classifier
Bayesian Theorem (Bayes Rule, Bayes Law) is the result of mathematical manipulation of
conditional probabilities in probability and statistics. Bayesian classification rule provides a
mathematical rule for updating the existing belief when find new evidence. Mathematically we
can show it as:
This rule can be explained by a simple example of a new born who observe a sun set and
wonders if the sun will rise again tomorrow. The new born will assign equal probabilities (0.5,
0.5) to both outcomes. When the sun rise next day, the probability of sun rise will increase from
0.5 to 0.66 and thus the child’s belief that the sun will rise again increases. This process will
continue and the child’s belief that the sun will rise again, increases from fifty percent probability
to a universal truth.
Another example is, assume someone told you that he had a nice conversation with someone
on a bus. Knowing nothing about the conversation, the probability that the person had a
conversation with a woman is 50% and the probability that the conversation was with a man is
50%. Suppose that the person also told you that the conversational partner had long hair. Thus
the probability that the conversational partner is a woman increases because most of women
(75%) have long hair. Similarly more features or evidences can increase the existing belief and
can help you in deciding whether the conversational someone was ‘man’ or a ‘woman’.
MuhammadGulraj
BS GIKI,Pakistan
MS UET Peshawar,Pakistan
7
We can apply this rule using the above mentioned mathematical formula. Let’s suppose A
represent an event ‘person has cancer’ and even B represent an event ‘person is smoker’. Let’s
suppose the probability of event A, i.e. ‘person has cancer’ P (A) = 0.1 (which means that 10
percent of patients entering the clinic have cancer) and the probability of a patient having cancer
is a smoker is P (B) = 0.5, Using the previous data of patients, we can determined the
probability of smokers having cancer P (B|A) = 0.8, Using these numbers the probability that a
person has cancer and is a smoker increases from 0.1 to 0.16 which is a significant increase.
This shows that after finding new evidence the probability varies significantly.
The given dataset contains 150 instances, corresponding to three equally-recurrent species of iris
plant (setosa, versicolour, virginica).
MuhammadGulraj
BS GIKI,Pakistan
MS UET Peshawar,Pakistan
8
The execution and result of bayes classifier is as follows which shows that a Naïve Bayes classifier
makes less mistakes in a dataset that, although simple, is not linearly separable.
MuhammadGulraj
BS GIKI,Pakistan
MS UET Peshawar,Pakistan
9
K–Means Clustering
K means clustering is a popular method used for vector quantization. K means clustering is
basically the partition of X observations into K clusters. In K means clustering each observation
belongs to the cluster having the nearest mean. K means clustering is an NP hard problem
(Computationally difficult). Efficient heuristic algorithms can be used which converge rapidly to a
local minimum.
Assume a set of observations x1, x2, x3, x4 …. Xn, every observation is a vector of d-
dimensions. The main aim of K means clustering is to partition the n observations into k sets
(clusters) where (k<=n). S = S1, S2 , … Sk (to minimize the sum of squares within-cluster).
μi is mean of points in Si.
MuhammadGulraj
BS GIKI,Pakistan
MS UET Peshawar,Pakistan
10
The algorithm of K means clustering is as follows.
i. Specify K (the number of clusters)
ii. Select K points randomly as cluster centers.
iii. Assign each object/instance to closest cluster center.
iv. Find the centroid or mean for every cluster and use it as new cluster center.
v. Now reassign all the objects/instances to the closest cluster center.
vi. Iterate until the center of the clusters doesn’t change any more.
MuhammadGulraj
BS GIKI,Pakistan
MS UET Peshawar,Pakistan
11
If we use the K means clustering on Iris data set, it will find the natural grouping between iris
specimens based on the features given in the data. Using the K means algorithm we must first
specify the number clusters we want to create. The Matlab implementation of K means
algorithm is present in the assignment folder. The different results that we found using K means
algorithm are as follows.
MuhammadGulraj
BS GIKI,Pakistan
MS UET Peshawar,Pakistan
12
Every Iteration that is performed by the K means algorithm, it reassigns the points between
clusters to decrease the distance between centroids and recompute it for new clusters. In each
iteration the re-assignment decreases until the algorithm reaches a minimum.
MuhammadGulraj
BS GIKI,Pakistan
MS UET Peshawar,Pakistan
13
The new clusters formed will be shown like the following figure.
References
MuhammadGulraj
BS GIKI,Pakistan
MS UET Peshawar,Pakistan
14
1. http://documentation.statsoft.com/STATISTICAHelp.aspx?path=MachineLearning/MachineL
earning/NaiveBayes/NaiveBayesClassifierExample1Classification
2. http://www.mathworks.com/help/stats/examples/classification.html
3. Andrew Ng (2013), an online course for Machine learning, Stanford University,
Stanford, https://class.coursera.org/ml-004/class.
4. Duda and Hart, Pattern Classification (2001-2002), Wiley, New York.
5. Ying Cui and Zhong Jin, Facial feature points (2012),
http://www.jprr.org/index.php/jprr
6. Ioannis Dimou and Michalis Zervakis, On the analogy of classifier ensembles,
http://www.jprr.org/index.php/jprr

More Related Content

What's hot

Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
Fuzzy clustering and fuzzy c-means partition cluster analysis and validation ...
Fuzzy clustering and fuzzy c-means partition cluster analysis and validation ...Fuzzy clustering and fuzzy c-means partition cluster analysis and validation ...
Fuzzy clustering and fuzzy c-means partition cluster analysis and validation ...IJECEIAES
 
Thesis (presentation)
Thesis (presentation)Thesis (presentation)
Thesis (presentation)nlt2390
 
Improving Classifier Accuracy using Unlabeled Data..doc
Improving Classifier Accuracy using Unlabeled Data..docImproving Classifier Accuracy using Unlabeled Data..doc
Improving Classifier Accuracy using Unlabeled Data..docbutest
 
Enhanced Genetic Algorithm with K-Means for the Clustering Problem
Enhanced Genetic Algorithm with K-Means for the Clustering ProblemEnhanced Genetic Algorithm with K-Means for the Clustering Problem
Enhanced Genetic Algorithm with K-Means for the Clustering ProblemAnders Viken
 
Rainfall Prediction using Data-Core Based Fuzzy Min-Max Neural Network for Cl...
Rainfall Prediction using Data-Core Based Fuzzy Min-Max Neural Network for Cl...Rainfall Prediction using Data-Core Based Fuzzy Min-Max Neural Network for Cl...
Rainfall Prediction using Data-Core Based Fuzzy Min-Max Neural Network for Cl...IJERA Editor
 
Instance based learning
Instance based learningInstance based learning
Instance based learningswapnac12
 
A Study of Efficiency Improvements Technique for K-Means Algorithm
A Study of Efficiency Improvements Technique for K-Means AlgorithmA Study of Efficiency Improvements Technique for K-Means Algorithm
A Study of Efficiency Improvements Technique for K-Means AlgorithmIRJET Journal
 
Parallel Guided Local Search and Some Preliminary Experimental Results for Co...
Parallel Guided Local Search and Some Preliminary Experimental Results for Co...Parallel Guided Local Search and Some Preliminary Experimental Results for Co...
Parallel Guided Local Search and Some Preliminary Experimental Results for Co...csandit
 
A survey on Efficient Enhanced K-Means Clustering Algorithm
 A survey on Efficient Enhanced K-Means Clustering Algorithm A survey on Efficient Enhanced K-Means Clustering Algorithm
A survey on Efficient Enhanced K-Means Clustering Algorithmijsrd.com
 
Critical Paths Identification on Fuzzy Network Project
Critical Paths Identification on Fuzzy Network ProjectCritical Paths Identification on Fuzzy Network Project
Critical Paths Identification on Fuzzy Network Projectiosrjce
 
Distribution Similarity based Data Partition and Nearest Neighbor Search on U...
Distribution Similarity based Data Partition and Nearest Neighbor Search on U...Distribution Similarity based Data Partition and Nearest Neighbor Search on U...
Distribution Similarity based Data Partition and Nearest Neighbor Search on U...Editor IJMTER
 
Enhancing Classification Accuracy of K-Nearest Neighbors Algorithm using Gain...
Enhancing Classification Accuracy of K-Nearest Neighbors Algorithm using Gain...Enhancing Classification Accuracy of K-Nearest Neighbors Algorithm using Gain...
Enhancing Classification Accuracy of K-Nearest Neighbors Algorithm using Gain...IRJET Journal
 
Survey paper on Big Data Imputation and Privacy Algorithms
Survey paper on Big Data Imputation and Privacy AlgorithmsSurvey paper on Big Data Imputation and Privacy Algorithms
Survey paper on Big Data Imputation and Privacy AlgorithmsIRJET Journal
 
Mncs 16-09-4주-변승규-introduction to the machine learning
Mncs 16-09-4주-변승규-introduction to the machine learningMncs 16-09-4주-변승규-introduction to the machine learning
Mncs 16-09-4주-변승규-introduction to the machine learningSeung-gyu Byeon
 
Optimising Data Using K-Means Clustering Algorithm
Optimising Data Using K-Means Clustering AlgorithmOptimising Data Using K-Means Clustering Algorithm
Optimising Data Using K-Means Clustering AlgorithmIJERA Editor
 
Chapter 09 class advanced
Chapter 09 class advancedChapter 09 class advanced
Chapter 09 class advancedHouw Liong The
 

What's hot (20)

Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)
 
Fuzzy clustering and fuzzy c-means partition cluster analysis and validation ...
Fuzzy clustering and fuzzy c-means partition cluster analysis and validation ...Fuzzy clustering and fuzzy c-means partition cluster analysis and validation ...
Fuzzy clustering and fuzzy c-means partition cluster analysis and validation ...
 
Thesis (presentation)
Thesis (presentation)Thesis (presentation)
Thesis (presentation)
 
Improving Classifier Accuracy using Unlabeled Data..doc
Improving Classifier Accuracy using Unlabeled Data..docImproving Classifier Accuracy using Unlabeled Data..doc
Improving Classifier Accuracy using Unlabeled Data..doc
 
Enhanced Genetic Algorithm with K-Means for the Clustering Problem
Enhanced Genetic Algorithm with K-Means for the Clustering ProblemEnhanced Genetic Algorithm with K-Means for the Clustering Problem
Enhanced Genetic Algorithm with K-Means for the Clustering Problem
 
Rainfall Prediction using Data-Core Based Fuzzy Min-Max Neural Network for Cl...
Rainfall Prediction using Data-Core Based Fuzzy Min-Max Neural Network for Cl...Rainfall Prediction using Data-Core Based Fuzzy Min-Max Neural Network for Cl...
Rainfall Prediction using Data-Core Based Fuzzy Min-Max Neural Network for Cl...
 
세미나 20170929
세미나 20170929세미나 20170929
세미나 20170929
 
Instance based learning
Instance based learningInstance based learning
Instance based learning
 
A Study of Efficiency Improvements Technique for K-Means Algorithm
A Study of Efficiency Improvements Technique for K-Means AlgorithmA Study of Efficiency Improvements Technique for K-Means Algorithm
A Study of Efficiency Improvements Technique for K-Means Algorithm
 
50120140505013
5012014050501350120140505013
50120140505013
 
Parallel Guided Local Search and Some Preliminary Experimental Results for Co...
Parallel Guided Local Search and Some Preliminary Experimental Results for Co...Parallel Guided Local Search and Some Preliminary Experimental Results for Co...
Parallel Guided Local Search and Some Preliminary Experimental Results for Co...
 
A survey on Efficient Enhanced K-Means Clustering Algorithm
 A survey on Efficient Enhanced K-Means Clustering Algorithm A survey on Efficient Enhanced K-Means Clustering Algorithm
A survey on Efficient Enhanced K-Means Clustering Algorithm
 
Critical Paths Identification on Fuzzy Network Project
Critical Paths Identification on Fuzzy Network ProjectCritical Paths Identification on Fuzzy Network Project
Critical Paths Identification on Fuzzy Network Project
 
Distribution Similarity based Data Partition and Nearest Neighbor Search on U...
Distribution Similarity based Data Partition and Nearest Neighbor Search on U...Distribution Similarity based Data Partition and Nearest Neighbor Search on U...
Distribution Similarity based Data Partition and Nearest Neighbor Search on U...
 
Enhancing Classification Accuracy of K-Nearest Neighbors Algorithm using Gain...
Enhancing Classification Accuracy of K-Nearest Neighbors Algorithm using Gain...Enhancing Classification Accuracy of K-Nearest Neighbors Algorithm using Gain...
Enhancing Classification Accuracy of K-Nearest Neighbors Algorithm using Gain...
 
Survey paper on Big Data Imputation and Privacy Algorithms
Survey paper on Big Data Imputation and Privacy AlgorithmsSurvey paper on Big Data Imputation and Privacy Algorithms
Survey paper on Big Data Imputation and Privacy Algorithms
 
Mncs 16-09-4주-변승규-introduction to the machine learning
Mncs 16-09-4주-변승규-introduction to the machine learningMncs 16-09-4주-변승규-introduction to the machine learning
Mncs 16-09-4주-변승규-introduction to the machine learning
 
Nlp text classification
Nlp text classificationNlp text classification
Nlp text classification
 
Optimising Data Using K-Means Clustering Algorithm
Optimising Data Using K-Means Clustering AlgorithmOptimising Data Using K-Means Clustering Algorithm
Optimising Data Using K-Means Clustering Algorithm
 
Chapter 09 class advanced
Chapter 09 class advancedChapter 09 class advanced
Chapter 09 class advanced
 

Viewers also liked

Data mining assignment 6
Data mining assignment 6Data mining assignment 6
Data mining assignment 6BarryK88
 
Acct 311 homework assignment #7
Acct 311 homework assignment #7Acct 311 homework assignment #7
Acct 311 homework assignment #7Renea Barrera
 
Assignment (062)
Assignment (062)Assignment (062)
Assignment (062)jotish647
 
Assignment #3 pattern thinking
Assignment #3   pattern thinkingAssignment #3   pattern thinking
Assignment #3 pattern thinkingScenicProps Design
 
M.Tech: AI and Neural Networks Assignment II
M.Tech:  AI and Neural Networks Assignment IIM.Tech:  AI and Neural Networks Assignment II
M.Tech: AI and Neural Networks Assignment IIVijayananda Mohire
 
Practical Machine Learning
Practical Machine LearningPractical Machine Learning
Practical Machine LearningLynn Langit
 
MTech - AI_NeuralNetworks_Assignment
MTech - AI_NeuralNetworks_AssignmentMTech - AI_NeuralNetworks_Assignment
MTech - AI_NeuralNetworks_AssignmentVijayananda Mohire
 
Computer security using machine learning
Computer security using machine learningComputer security using machine learning
Computer security using machine learningSandeep Sabnani
 
Online Assignment
Online AssignmentOnline Assignment
Online Assignmentsivapriyags
 
4 ma0 4hr_que_20140520
4 ma0 4hr_que_20140520 4 ma0 4hr_que_20140520
4 ma0 4hr_que_20140520 AnneRostom
 

Viewers also liked (12)

Assignment
Assignment Assignment
Assignment
 
Data mining assignment 6
Data mining assignment 6Data mining assignment 6
Data mining assignment 6
 
Acct 311 homework assignment #7
Acct 311 homework assignment #7Acct 311 homework assignment #7
Acct 311 homework assignment #7
 
Assignment (062)
Assignment (062)Assignment (062)
Assignment (062)
 
Assignment #3 pattern thinking
Assignment #3   pattern thinkingAssignment #3   pattern thinking
Assignment #3 pattern thinking
 
M.Tech: AI and Neural Networks Assignment II
M.Tech:  AI and Neural Networks Assignment IIM.Tech:  AI and Neural Networks Assignment II
M.Tech: AI and Neural Networks Assignment II
 
Practical Machine Learning
Practical Machine LearningPractical Machine Learning
Practical Machine Learning
 
Pattern recognition
Pattern recognitionPattern recognition
Pattern recognition
 
MTech - AI_NeuralNetworks_Assignment
MTech - AI_NeuralNetworks_AssignmentMTech - AI_NeuralNetworks_Assignment
MTech - AI_NeuralNetworks_Assignment
 
Computer security using machine learning
Computer security using machine learningComputer security using machine learning
Computer security using machine learning
 
Online Assignment
Online AssignmentOnline Assignment
Online Assignment
 
4 ma0 4hr_que_20140520
4 ma0 4hr_que_20140520 4 ma0 4hr_que_20140520
4 ma0 4hr_que_20140520
 

Similar to Machine Learning 1

WisconsinBreastCancerDiagnosticClassificationusingKNNandRandomForest
WisconsinBreastCancerDiagnosticClassificationusingKNNandRandomForestWisconsinBreastCancerDiagnosticClassificationusingKNNandRandomForest
WisconsinBreastCancerDiagnosticClassificationusingKNNandRandomForestSheing Jing Ng
 
SVM - Functional Verification
SVM - Functional VerificationSVM - Functional Verification
SVM - Functional VerificationSai Kiran Kadam
 
K-NN Classifier Performs Better Than K-Means Clustering in Missing Value Imp...
K-NN Classifier Performs Better Than K-Means Clustering in  Missing Value Imp...K-NN Classifier Performs Better Than K-Means Clustering in  Missing Value Imp...
K-NN Classifier Performs Better Than K-Means Clustering in Missing Value Imp...IOSR Journals
 
Improving K-NN Internet Traffic Classification Using Clustering and Principle...
Improving K-NN Internet Traffic Classification Using Clustering and Principle...Improving K-NN Internet Traffic Classification Using Clustering and Principle...
Improving K-NN Internet Traffic Classification Using Clustering and Principle...journalBEEI
 
Machine Learning Algorithms for Image Classification of Hand Digits and Face ...
Machine Learning Algorithms for Image Classification of Hand Digits and Face ...Machine Learning Algorithms for Image Classification of Hand Digits and Face ...
Machine Learning Algorithms for Image Classification of Hand Digits and Face ...IRJET Journal
 
Analysis On Classification Techniques In Mammographic Mass Data Set
Analysis On Classification Techniques In Mammographic Mass Data SetAnalysis On Classification Techniques In Mammographic Mass Data Set
Analysis On Classification Techniques In Mammographic Mass Data SetIJERA Editor
 
A HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETS
A HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETSA HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETS
A HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETSEditor IJCATR
 
Classifiers
ClassifiersClassifiers
ClassifiersAyurdata
 
Particle Swarm Optimization based K-Prototype Clustering Algorithm
Particle Swarm Optimization based K-Prototype Clustering Algorithm Particle Swarm Optimization based K-Prototype Clustering Algorithm
Particle Swarm Optimization based K-Prototype Clustering Algorithm iosrjce
 
Supervised and unsupervised learning
Supervised and unsupervised learningSupervised and unsupervised learning
Supervised and unsupervised learningAmAn Singh
 
IRJET- Disease Prediction using Machine Learning
IRJET-  Disease Prediction using Machine LearningIRJET-  Disease Prediction using Machine Learning
IRJET- Disease Prediction using Machine LearningIRJET Journal
 
Learning On The Border:Active Learning in Imbalanced classification Data
Learning On The Border:Active Learning in Imbalanced classification DataLearning On The Border:Active Learning in Imbalanced classification Data
Learning On The Border:Active Learning in Imbalanced classification Data萍華 楊
 
Probability density estimation using Product of Conditional Experts
Probability density estimation using Product of Conditional ExpertsProbability density estimation using Product of Conditional Experts
Probability density estimation using Product of Conditional ExpertsChirag Gupta
 
AN IMPROVED METHOD FOR IDENTIFYING WELL-TEST INTERPRETATION MODEL BASED ON AG...
AN IMPROVED METHOD FOR IDENTIFYING WELL-TEST INTERPRETATION MODEL BASED ON AG...AN IMPROVED METHOD FOR IDENTIFYING WELL-TEST INTERPRETATION MODEL BASED ON AG...
AN IMPROVED METHOD FOR IDENTIFYING WELL-TEST INTERPRETATION MODEL BASED ON AG...IAEME Publication
 
ANALYSIS AND COMPARISON STUDY OF DATA MINING ALGORITHMS USING RAPIDMINER
ANALYSIS AND COMPARISON STUDY OF DATA MINING ALGORITHMS USING RAPIDMINERANALYSIS AND COMPARISON STUDY OF DATA MINING ALGORITHMS USING RAPIDMINER
ANALYSIS AND COMPARISON STUDY OF DATA MINING ALGORITHMS USING RAPIDMINERIJCSEA Journal
 
PROGRAM TEST DATA GENERATION FOR BRANCH COVERAGE WITH GENETIC ALGORITHM: COMP...
PROGRAM TEST DATA GENERATION FOR BRANCH COVERAGE WITH GENETIC ALGORITHM: COMP...PROGRAM TEST DATA GENERATION FOR BRANCH COVERAGE WITH GENETIC ALGORITHM: COMP...
PROGRAM TEST DATA GENERATION FOR BRANCH COVERAGE WITH GENETIC ALGORITHM: COMP...cscpconf
 
Machine learning algorithms
Machine learning algorithmsMachine learning algorithms
Machine learning algorithmsShalitha Suranga
 

Similar to Machine Learning 1 (20)

WisconsinBreastCancerDiagnosticClassificationusingKNNandRandomForest
WisconsinBreastCancerDiagnosticClassificationusingKNNandRandomForestWisconsinBreastCancerDiagnosticClassificationusingKNNandRandomForest
WisconsinBreastCancerDiagnosticClassificationusingKNNandRandomForest
 
SVM - Functional Verification
SVM - Functional VerificationSVM - Functional Verification
SVM - Functional Verification
 
K-NN Classifier Performs Better Than K-Means Clustering in Missing Value Imp...
K-NN Classifier Performs Better Than K-Means Clustering in  Missing Value Imp...K-NN Classifier Performs Better Than K-Means Clustering in  Missing Value Imp...
K-NN Classifier Performs Better Than K-Means Clustering in Missing Value Imp...
 
Improving K-NN Internet Traffic Classification Using Clustering and Principle...
Improving K-NN Internet Traffic Classification Using Clustering and Principle...Improving K-NN Internet Traffic Classification Using Clustering and Principle...
Improving K-NN Internet Traffic Classification Using Clustering and Principle...
 
Machine Learning Algorithms for Image Classification of Hand Digits and Face ...
Machine Learning Algorithms for Image Classification of Hand Digits and Face ...Machine Learning Algorithms for Image Classification of Hand Digits and Face ...
Machine Learning Algorithms for Image Classification of Hand Digits and Face ...
 
Analysis On Classification Techniques In Mammographic Mass Data Set
Analysis On Classification Techniques In Mammographic Mass Data SetAnalysis On Classification Techniques In Mammographic Mass Data Set
Analysis On Classification Techniques In Mammographic Mass Data Set
 
A HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETS
A HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETSA HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETS
A HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETS
 
Classifiers
ClassifiersClassifiers
Classifiers
 
Particle Swarm Optimization based K-Prototype Clustering Algorithm
Particle Swarm Optimization based K-Prototype Clustering Algorithm Particle Swarm Optimization based K-Prototype Clustering Algorithm
Particle Swarm Optimization based K-Prototype Clustering Algorithm
 
I017235662
I017235662I017235662
I017235662
 
2224d_final
2224d_final2224d_final
2224d_final
 
Supervised and unsupervised learning
Supervised and unsupervised learningSupervised and unsupervised learning
Supervised and unsupervised learning
 
IRJET- Disease Prediction using Machine Learning
IRJET-  Disease Prediction using Machine LearningIRJET-  Disease Prediction using Machine Learning
IRJET- Disease Prediction using Machine Learning
 
Learning On The Border:Active Learning in Imbalanced classification Data
Learning On The Border:Active Learning in Imbalanced classification DataLearning On The Border:Active Learning in Imbalanced classification Data
Learning On The Border:Active Learning in Imbalanced classification Data
 
Neural nw k means
Neural nw k meansNeural nw k means
Neural nw k means
 
Probability density estimation using Product of Conditional Experts
Probability density estimation using Product of Conditional ExpertsProbability density estimation using Product of Conditional Experts
Probability density estimation using Product of Conditional Experts
 
AN IMPROVED METHOD FOR IDENTIFYING WELL-TEST INTERPRETATION MODEL BASED ON AG...
AN IMPROVED METHOD FOR IDENTIFYING WELL-TEST INTERPRETATION MODEL BASED ON AG...AN IMPROVED METHOD FOR IDENTIFYING WELL-TEST INTERPRETATION MODEL BASED ON AG...
AN IMPROVED METHOD FOR IDENTIFYING WELL-TEST INTERPRETATION MODEL BASED ON AG...
 
ANALYSIS AND COMPARISON STUDY OF DATA MINING ALGORITHMS USING RAPIDMINER
ANALYSIS AND COMPARISON STUDY OF DATA MINING ALGORITHMS USING RAPIDMINERANALYSIS AND COMPARISON STUDY OF DATA MINING ALGORITHMS USING RAPIDMINER
ANALYSIS AND COMPARISON STUDY OF DATA MINING ALGORITHMS USING RAPIDMINER
 
PROGRAM TEST DATA GENERATION FOR BRANCH COVERAGE WITH GENETIC ALGORITHM: COMP...
PROGRAM TEST DATA GENERATION FOR BRANCH COVERAGE WITH GENETIC ALGORITHM: COMP...PROGRAM TEST DATA GENERATION FOR BRANCH COVERAGE WITH GENETIC ALGORITHM: COMP...
PROGRAM TEST DATA GENERATION FOR BRANCH COVERAGE WITH GENETIC ALGORITHM: COMP...
 
Machine learning algorithms
Machine learning algorithmsMachine learning algorithms
Machine learning algorithms
 

Machine Learning 1

  • 1. MuhammadGulraj BS GIKI,Pakistan MS UET Peshawar,Pakistan 1 Machine Learning Name: Muhammad Gulraj Muhammad.gulraj@yahoo.com
  • 2. MuhammadGulraj BS GIKI,Pakistan MS UET Peshawar,Pakistan 2 K–Nearest neighbor (KNN) In machine learning and pattern recognition K - Nearest neighbor KNN is an algorithm (non- parametric) that can be used for classification and regression. It is among the simplest algorithms in machine learning. The output of the KNN depends on whether the algorithm is used for regression or classification. KNN classify an object using the majority vote/neighbor. The object is assigned to the class which has the most members near to the object.
  • 3. MuhammadGulraj BS GIKI,Pakistan MS UET Peshawar,Pakistan 3 KNN is instance based learning or lazy learning. KNN is simple but powerful algorithm in which no training is required and new training example can be added easily. However it is slow and expensive having computation complexity of O (md). KNN has a slow run time performance but it can be improved significantly by removing redundant data, computing approximate distance or using pre-sort training examples into fast data structures. KNN can be used in handwritten character classification, content based image retrieval, intrusion detection and fault detection. Let’s consider a simple example of an object which have many sea bass and salmon as neighbors. Let’s assume that K = 3 which means that we will consider 3 nearest neighbors of the object. From the below image it is clear that the object have 2 sea bass and 1 salmon, so the KNN algorithm will classify the object as sea bass.
  • 4. MuhammadGulraj BS GIKI,Pakistan MS UET Peshawar,Pakistan 4 KNN Algorithm is very simple. The training period consist of storing all instances and labels (Class labels). If feature selection has to be performed then n-fold cross validation should be used on training set. If we want to test a new instance X given a set Y, the following steps are needed. i. Compute the distance of X from each instance in set Y ii. Sort the distance in increasing order and pick the highest K elements. iii. Find the most repeated class in K nearest neighbor. We can implement the KNN algorithm in matlab for IRIS dataset. Summary of the script is as follows. i. Load iris data in matlab ii. Randomize the data for new iteration for new sets of data and training set iii. For every observation we compute the Euclidean distance iv. We compute the K nearest neighbor and store it in an array v. We assign the label for the lowest distance vi. In case there is a tie we will randomly assign a class label vii. Return the label of the class viii. Find confusion matrix Please find the code knniris.m in the assignment folder. KNN shows very good results with less number of classes and features.
  • 5. MuhammadGulraj BS GIKI,Pakistan MS UET Peshawar,Pakistan 5 Correctly classified 149 Incorrectly classified 1 Mean error 0.008 Relative absolute error 1.9% Root mean squared error 0.009 Total instances 150 Confusion matrix a b c setosa 50 0 0 vericolor 0 50 0 virginica 0 0 49 The detailed analysis shows that the KNN classifier makes very few mistakes in a dataset that is simple, although not linearly separable.
  • 6. MuhammadGulraj BS GIKI,Pakistan MS UET Peshawar,Pakistan 6 Bayes Classifier Bayesian Theorem (Bayes Rule, Bayes Law) is the result of mathematical manipulation of conditional probabilities in probability and statistics. Bayesian classification rule provides a mathematical rule for updating the existing belief when find new evidence. Mathematically we can show it as: This rule can be explained by a simple example of a new born who observe a sun set and wonders if the sun will rise again tomorrow. The new born will assign equal probabilities (0.5, 0.5) to both outcomes. When the sun rise next day, the probability of sun rise will increase from 0.5 to 0.66 and thus the child’s belief that the sun will rise again increases. This process will continue and the child’s belief that the sun will rise again, increases from fifty percent probability to a universal truth. Another example is, assume someone told you that he had a nice conversation with someone on a bus. Knowing nothing about the conversation, the probability that the person had a conversation with a woman is 50% and the probability that the conversation was with a man is 50%. Suppose that the person also told you that the conversational partner had long hair. Thus the probability that the conversational partner is a woman increases because most of women (75%) have long hair. Similarly more features or evidences can increase the existing belief and can help you in deciding whether the conversational someone was ‘man’ or a ‘woman’.
  • 7. MuhammadGulraj BS GIKI,Pakistan MS UET Peshawar,Pakistan 7 We can apply this rule using the above mentioned mathematical formula. Let’s suppose A represent an event ‘person has cancer’ and even B represent an event ‘person is smoker’. Let’s suppose the probability of event A, i.e. ‘person has cancer’ P (A) = 0.1 (which means that 10 percent of patients entering the clinic have cancer) and the probability of a patient having cancer is a smoker is P (B) = 0.5, Using the previous data of patients, we can determined the probability of smokers having cancer P (B|A) = 0.8, Using these numbers the probability that a person has cancer and is a smoker increases from 0.1 to 0.16 which is a significant increase. This shows that after finding new evidence the probability varies significantly. The given dataset contains 150 instances, corresponding to three equally-recurrent species of iris plant (setosa, versicolour, virginica).
  • 8. MuhammadGulraj BS GIKI,Pakistan MS UET Peshawar,Pakistan 8 The execution and result of bayes classifier is as follows which shows that a Naïve Bayes classifier makes less mistakes in a dataset that, although simple, is not linearly separable.
  • 9. MuhammadGulraj BS GIKI,Pakistan MS UET Peshawar,Pakistan 9 K–Means Clustering K means clustering is a popular method used for vector quantization. K means clustering is basically the partition of X observations into K clusters. In K means clustering each observation belongs to the cluster having the nearest mean. K means clustering is an NP hard problem (Computationally difficult). Efficient heuristic algorithms can be used which converge rapidly to a local minimum. Assume a set of observations x1, x2, x3, x4 …. Xn, every observation is a vector of d- dimensions. The main aim of K means clustering is to partition the n observations into k sets (clusters) where (k<=n). S = S1, S2 , … Sk (to minimize the sum of squares within-cluster). μi is mean of points in Si.
  • 10. MuhammadGulraj BS GIKI,Pakistan MS UET Peshawar,Pakistan 10 The algorithm of K means clustering is as follows. i. Specify K (the number of clusters) ii. Select K points randomly as cluster centers. iii. Assign each object/instance to closest cluster center. iv. Find the centroid or mean for every cluster and use it as new cluster center. v. Now reassign all the objects/instances to the closest cluster center. vi. Iterate until the center of the clusters doesn’t change any more.
  • 11. MuhammadGulraj BS GIKI,Pakistan MS UET Peshawar,Pakistan 11 If we use the K means clustering on Iris data set, it will find the natural grouping between iris specimens based on the features given in the data. Using the K means algorithm we must first specify the number clusters we want to create. The Matlab implementation of K means algorithm is present in the assignment folder. The different results that we found using K means algorithm are as follows.
  • 12. MuhammadGulraj BS GIKI,Pakistan MS UET Peshawar,Pakistan 12 Every Iteration that is performed by the K means algorithm, it reassigns the points between clusters to decrease the distance between centroids and recompute it for new clusters. In each iteration the re-assignment decreases until the algorithm reaches a minimum.
  • 13. MuhammadGulraj BS GIKI,Pakistan MS UET Peshawar,Pakistan 13 The new clusters formed will be shown like the following figure. References
  • 14. MuhammadGulraj BS GIKI,Pakistan MS UET Peshawar,Pakistan 14 1. http://documentation.statsoft.com/STATISTICAHelp.aspx?path=MachineLearning/MachineL earning/NaiveBayes/NaiveBayesClassifierExample1Classification 2. http://www.mathworks.com/help/stats/examples/classification.html 3. Andrew Ng (2013), an online course for Machine learning, Stanford University, Stanford, https://class.coursera.org/ml-004/class. 4. Duda and Hart, Pattern Classification (2001-2002), Wiley, New York. 5. Ying Cui and Zhong Jin, Facial feature points (2012), http://www.jprr.org/index.php/jprr 6. Ioannis Dimou and Michalis Zervakis, On the analogy of classifier ensembles, http://www.jprr.org/index.php/jprr