SUPERVISED LEARNING
ALGORITHM: KNN,
NAÏVE BAYES AND
LINEAR DISCRIMINANT
ANALYSIS
CHAPTER
11
“Introduction to Data Science : Practical Approach with R and Python ”
B.Uma Maheswari and R Sujatha
Copyright @ 2021 Wiley India Pvt. Ltd. All rights reserved.
LEARNING OBJECTIVES
Understand the concept of KNN
Learn to build models using KNN algorithm
Understand the concept of Naïve Bayes algorithm
Learn the types of Naïve Bayes theorem
Understand the concept of Linear Discriminant analysis
Implement KNN, Naïve Bayes and Linear Discriminant analysis using R
and Python programming
K NEAREST NEIGHBOURS
K NEAREST NEIGHBOURS
KNN is a supervised learning algorithm which is
frequency based.
Can be applied both for regression as well as
classification.
This classification is based on similarity or
dissimilarity between the objects.
This similarity is based on distance measure
The next step is to determine the number of
neighbors with which the comparison has to be
made.
This is the ‘K’ value.
K is calculated as the square root of the number of
records in the dataset.
NAÏVE BAYES ALGORITHM
Naïve Bayes algorithm is a supervised learning technique which
can be used for both classification and regression.
It is considered as one of the fastest and most accurate
algorithms used in prediction especially for large datasets.
This technique could be used for multi-class classification
(when the response variable is not binary and has more than 2
classes ) .
This algorithm is based on the Bayes theorem or in more
general terms on probabilities.
The Bayes theorem is based on the assumption that the
features used are independent of each other and therefore
called ‘Naïve’.
NAÏVE BAYES FORMULA
Naïve Bayes formula
Where
P(a) is the probability of occurrence of ‘a’
P(b) is the probability of occurrence of ‘b’
P(a/b) is the probability of occurrence of ‘a’ on the condition
that ‘b’ happens
P(b/a) is the probability of occurrence of ‘b’ on the condition
that ‘a’ happens
IN MACHINE LEARNING
‘a’ is the response variable or the dependent variable
‘bi’ are the predictor variables or the independent variables representing
b1, b2, b3, b4,……. bn
P(a) is the probability of the response variable (also known as prior
probability)
P(bi) is the probability of the predictor variables (also known as prior
probability)
P(a/bi) is the conditional probability of the response variable given the
predictor variables (prediction)
P(bi/a) is the conditional probability of occurrence of the predictor
variables given the response variable (training data)
The important point to be noted is the fact that this theorem assumes
that the predictor variables are independent of each other.
STEPS IN NAÏVE BAYES
Step 1: Load the
dataset
Step 2:
Understanding the
data
Step 3: Splitting
the data into
training and
testing data
Step 4: Naive
Bayes algorithm
Step 5: Model
performance on
training data
Step 6:Model
performance on
test data
LINEAR DISCRIMINANT
ANALYSIS
Linear Discriminant Analysis (LDA) is the oldest of the classification algorithms,
developed by R.A.Fischer.
This algorithm is used to classify objects into two or more groups.
It estimates the relationship between a dependent variable and one or more
independent variables.
The dependent variable can be categorical and the independent variables are
continuous in nature.
The algorithm identifies which independent variables discriminate the groups in the
dependent variable.
LDA assumes that the independent variables are normally distributed.
The different categories have their specific means and equal variances.
When the dependent or response variable has two categories, the method is called as
two-group discriminant analysis.
If there are more than two categories, it is called as multiple discriminant analysis.
The independent variables in LDA is assumed to be metric or continuous.
The objective of LDA is to understand the differences in groups and to predict
whether a new object belongs to a particular group based on a set of independent
variables.
APPLICATIONS
Predicting success/failure of new products
Accepting/rejecting admission to an applicant
Predicting credit risk category for a person who has applied for a loan
Classifying patients into different categories such as low risk,
medium risk and high risk
Predicting whether a firm will be successful or not
Predicting a student in a university will be a high performer, average
or a low performer
MULTIPLE DISCRIMINANT
ANALYSIS
LD1 AND LD2
Mahalanobis method
Mahalanobis distance is multivariate distance metric used for measuring the distance between a
point and a distribution. It is a measure of divergence or distance between groups in terms of
multiple characteristics (or variables). It is weighted distance that accounts for positioning and
spread of the data points. It finds a plane that separates the two groups by taking into account
the correlation between the variables as well. Mahalanobis method is mainly used when
discriminant analysis is used for classification.
Fisher’s method
Fisher’s method optimizes the weights such that the ratio of between group variation and within
group variation is maximized. This method gives a line that joins the centroid of the groups.
This method is highly used when discriminant analysis is used for differentiation or profiling of
the observations.
Coefficients
Coefficients are of three types in linear discriminant analysis. Coefficients are important
because it might not be possible to easily understand the variables used for differentiation.
Three types of coefficients are provided by discriminant analysis.
Unstandardized coefficients
The unstandardized coefficients are the coefficients of discriminant functions. They are the
multipliers of variables and are called as weights. The unstandardized coefficients quantifies
the relative influence of each independent variable to the classification of groups in the
dependent variable.
Standardized coefficients
Standardised Coefficients are not straight forward. They are used to examine the significance
of the independent variables in separating the categories in the dependent variable.
Structured coefficients
Structured Coefficients are mathematical calculations of correlations
between an independent variable and the discriminant scores.
Structured coefficients helps to understand which discriminant
function is able to differentiate more.
Eigen values
Eigen Value denotes the ratio of between group variations to within
group variation. Eigen value is based on the fisher’s method. For
every discriminant function, one eigen value is calculated. Larger
eigen values indicates greater discriminating power of the model.
Wilks’ lambda
Wilks’ lambda is a function of eigen value and is tested for its
significance. It is the ratio of within group sum of squares to the
total sum of squares. The output of Wilks’ lambda will range
between 0 and 1. A value nearing to 0 indicates that the group
STEPS IN LINEAR
DISCRIMINANT ANALYSIS
Step 1: Load the
dataset
Step 2:
Understanding
the data
Step 3: Scatter
plot
Step 4: Splitting
the data into
training and
testing data
Step 5: Perform
Linear
Discriminant
analysis
Step 6:
Predicting using
the train data
Step 7: Plotting
Histogram
Step 8: Plotting
Biplot
Step 9: Model
performance
measures
Step 10: Model
performance on
test data

Chapter 11 KNN Naive Bayes and LDA.pptx

  • 1.
    SUPERVISED LEARNING ALGORITHM: KNN, NAÏVEBAYES AND LINEAR DISCRIMINANT ANALYSIS CHAPTER 11 “Introduction to Data Science : Practical Approach with R and Python ” B.Uma Maheswari and R Sujatha Copyright @ 2021 Wiley India Pvt. Ltd. All rights reserved.
  • 2.
    LEARNING OBJECTIVES Understand theconcept of KNN Learn to build models using KNN algorithm Understand the concept of Naïve Bayes algorithm Learn the types of Naïve Bayes theorem Understand the concept of Linear Discriminant analysis Implement KNN, Naïve Bayes and Linear Discriminant analysis using R and Python programming
  • 3.
  • 4.
    K NEAREST NEIGHBOURS KNNis a supervised learning algorithm which is frequency based. Can be applied both for regression as well as classification. This classification is based on similarity or dissimilarity between the objects. This similarity is based on distance measure The next step is to determine the number of neighbors with which the comparison has to be made. This is the ‘K’ value. K is calculated as the square root of the number of records in the dataset.
  • 5.
    NAÏVE BAYES ALGORITHM NaïveBayes algorithm is a supervised learning technique which can be used for both classification and regression. It is considered as one of the fastest and most accurate algorithms used in prediction especially for large datasets. This technique could be used for multi-class classification (when the response variable is not binary and has more than 2 classes ) . This algorithm is based on the Bayes theorem or in more general terms on probabilities. The Bayes theorem is based on the assumption that the features used are independent of each other and therefore called ‘Naïve’.
  • 6.
    NAÏVE BAYES FORMULA NaïveBayes formula Where P(a) is the probability of occurrence of ‘a’ P(b) is the probability of occurrence of ‘b’ P(a/b) is the probability of occurrence of ‘a’ on the condition that ‘b’ happens P(b/a) is the probability of occurrence of ‘b’ on the condition that ‘a’ happens
  • 7.
    IN MACHINE LEARNING ‘a’is the response variable or the dependent variable ‘bi’ are the predictor variables or the independent variables representing b1, b2, b3, b4,……. bn P(a) is the probability of the response variable (also known as prior probability) P(bi) is the probability of the predictor variables (also known as prior probability) P(a/bi) is the conditional probability of the response variable given the predictor variables (prediction) P(bi/a) is the conditional probability of occurrence of the predictor variables given the response variable (training data) The important point to be noted is the fact that this theorem assumes that the predictor variables are independent of each other.
  • 8.
    STEPS IN NAÏVEBAYES Step 1: Load the dataset Step 2: Understanding the data Step 3: Splitting the data into training and testing data Step 4: Naive Bayes algorithm Step 5: Model performance on training data Step 6:Model performance on test data
  • 9.
    LINEAR DISCRIMINANT ANALYSIS Linear DiscriminantAnalysis (LDA) is the oldest of the classification algorithms, developed by R.A.Fischer. This algorithm is used to classify objects into two or more groups. It estimates the relationship between a dependent variable and one or more independent variables. The dependent variable can be categorical and the independent variables are continuous in nature. The algorithm identifies which independent variables discriminate the groups in the dependent variable. LDA assumes that the independent variables are normally distributed. The different categories have their specific means and equal variances. When the dependent or response variable has two categories, the method is called as two-group discriminant analysis. If there are more than two categories, it is called as multiple discriminant analysis. The independent variables in LDA is assumed to be metric or continuous. The objective of LDA is to understand the differences in groups and to predict whether a new object belongs to a particular group based on a set of independent variables.
  • 10.
    APPLICATIONS Predicting success/failure ofnew products Accepting/rejecting admission to an applicant Predicting credit risk category for a person who has applied for a loan Classifying patients into different categories such as low risk, medium risk and high risk Predicting whether a firm will be successful or not Predicting a student in a university will be a high performer, average or a low performer
  • 11.
  • 12.
  • 13.
    Mahalanobis method Mahalanobis distanceis multivariate distance metric used for measuring the distance between a point and a distribution. It is a measure of divergence or distance between groups in terms of multiple characteristics (or variables). It is weighted distance that accounts for positioning and spread of the data points. It finds a plane that separates the two groups by taking into account the correlation between the variables as well. Mahalanobis method is mainly used when discriminant analysis is used for classification. Fisher’s method Fisher’s method optimizes the weights such that the ratio of between group variation and within group variation is maximized. This method gives a line that joins the centroid of the groups. This method is highly used when discriminant analysis is used for differentiation or profiling of the observations. Coefficients Coefficients are of three types in linear discriminant analysis. Coefficients are important because it might not be possible to easily understand the variables used for differentiation. Three types of coefficients are provided by discriminant analysis. Unstandardized coefficients The unstandardized coefficients are the coefficients of discriminant functions. They are the multipliers of variables and are called as weights. The unstandardized coefficients quantifies the relative influence of each independent variable to the classification of groups in the dependent variable. Standardized coefficients Standardised Coefficients are not straight forward. They are used to examine the significance of the independent variables in separating the categories in the dependent variable.
  • 14.
    Structured coefficients Structured Coefficientsare mathematical calculations of correlations between an independent variable and the discriminant scores. Structured coefficients helps to understand which discriminant function is able to differentiate more. Eigen values Eigen Value denotes the ratio of between group variations to within group variation. Eigen value is based on the fisher’s method. For every discriminant function, one eigen value is calculated. Larger eigen values indicates greater discriminating power of the model. Wilks’ lambda Wilks’ lambda is a function of eigen value and is tested for its significance. It is the ratio of within group sum of squares to the total sum of squares. The output of Wilks’ lambda will range between 0 and 1. A value nearing to 0 indicates that the group
  • 15.
    STEPS IN LINEAR DISCRIMINANTANALYSIS Step 1: Load the dataset Step 2: Understanding the data Step 3: Scatter plot Step 4: Splitting the data into training and testing data Step 5: Perform Linear Discriminant analysis Step 6: Predicting using the train data Step 7: Plotting Histogram Step 8: Plotting Biplot Step 9: Model performance measures Step 10: Model performance on test data