SlideShare a Scribd company logo
1 of 53
Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
 The Classification algorithm is a Supervised Learning technique that is used
to identify the category of new observations on the basis of training data. In
Classification, a program learns from the given dataset or observations and
then classifies new observation into a number of classes or groups. Such
as, Yes or No, 0 or 1, Spam or Not Spam, cat or dog, etc. Classes can be
called as targets/labels or categories.
 Unlike regression, the output variable of Classification is a category, not a
value, such as "Green or Blue", "fruit or animal", etc. Since the Classification
algorithm is a Supervised learning technique, hence it takes labeled input
data, which means it contains input with the corresponding output.
 In classification algorithm, a discrete output function(y) is mapped to input
variable(x).
y=f(x), where y = categorical output
Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
 The best example of an ML classification algorithm is Email Spam
Detector.
 The main goal of the Classification algorithm is to identify the category of a
given dataset, and these algorithms are mainly used to predict the output for
the categorical data.
 Classification algorithms can be better understood using the below diagram.
In the below diagram, there are two classes, class A and Class B. These
classes have features that are similar to each other and dissimilar to other
classes.
Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
 The algorithm which implements the classification on a dataset is known as
a classifier. There are two types of Classifications:
 Binary Classifier: If the classification problem has only two possible
outcomes, then it is called as Binary Classifier.
Examples: YES or NO, MALE or FEMALE, SPAM or NOT SPAM, CAT or
DOG, etc.
 Multi-class Classifier: If a classification problem has more than two
outcomes, then it is called as Multi-class Classifier.
Example: Classifications of types of crops, Classification of types of music.
Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
Learners in Classification Problems:
In the classification problems, there are two types of learners:
 Lazy Learners: Lazy Learner firstly stores the training dataset and wait until
it receives the test dataset. In Lazy learner case, classification is done on
the basis of the most related data stored in the training dataset. It takes less
time in training but more time for predictions.
Example: K-NN algorithm, Case-based reasoning.
 Eager Learners: Eager Learners develop a classification model based on a
training dataset before receiving a test dataset. Opposite to Lazy learners,
Eager learners take less time in training and more time in prediction.
Example: Decision Trees, Naïve Bayes, ANN.
Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
Types of ML Classification Algorithms:
Classification Algorithms can be further divided into the Mainly two category:
 Linear Models
• Logistic Regression
• Support Vector Machines
 Non-linear Models
• K-Nearest Neighbors
• Kernel SVM
• Naïve Bayes
• Decision Tree Classification
• Random Forest Classification
What is Logistic Regression :
 Logistic regression is a supervised ML classification algorithm used to
assign observations to a discrete set of classes. Unlike linear regression
which outputs continuous number values like what will be salary of new
employee or price of house etc, logistic regression transforms its output into
probability value which can then be mapped to two or more discrete classes.
 For example,
 i) From given students training dataset, it learns to decide whether new or
another student data yield result pass or fail.
 ii) From the symptom of patient, whether he/she is infected by corona virus
or not.
 iii) whether, a particular animal is cat or dog or rat etc.
Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
Type of Logistic Regression:
 On the basis of the categories, Logistic Regression can be classified into
three types:
 Binomial Logistic regression: where the response variable has two
values 0 and 1 or pass and fail or true and false.
 Multinomial Logistic regression: there can be 3 or more possible
unordered types of the dependent variable, such as "cat", "dogs", or "rats“
 Ordinal Logistic regression: In this case, there can be 3 or more possible
ordered types of dependent variables, such as "low", "Medium", or "High".
Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
 Let’s try to understand first why logistics, and why not linear?
 Let ‘x’ be some feature and ‘y’ be the output which can be either 0 or 1, with
binary classification.
 The probability that the output is 1 given its input x, can be represented as:
 If we predict the probability via linear regression, we can write it as:
P(X) = b0 + b1 *X
Where, P(X) = P(y=1|x)
 Linear regression model can generate the predicted probability as any
number ranging from - ∞ to + ∞ to, whereas probability of an outcome can
only lie between 0< P(x)<1.Also, Linear regression has a considerable effect
on outliers.
Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
 To avoid this problem, log-odds function or logit function is used.
 Logistic regression can therefore be expressed in terms of logit function as :
log(P(x)/1-P(x)) = b0 + b1 * X
 where, the left-hand side is called the logit or log-odds function, and p(x)/(1-
p(x)) is called odds.
 The odds signify the ratio of probability of success [p(x)] to probability of failure [
1- p(X)]. Therefore, in Logistic Regression, linear combination of inputs is
mapped to the log(odds) - the output being equal to 1.

If we take an inverse of the above function, we get:
 P(x) =
Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
 In more simplified form, above equation becomes
 This is known as the Sigmoid function since it gives an S-shaped curve. It
always gives a value of probability ranging from 0<p<1, as shown in
following figure:
 In order to map this to a discrete class (true/false, cat/dog), we select a
threshold value or tipping point above which we will classify values into class
1 and below which we classify values into class 2.
P(x) ≥0.5,class=1
P(x) <0.5,class=0
Support Vector Machine(SVM) :
 SVM is Supervised Learning algorithms, also can be used for Classification
as well as Regression problems. But, mostly used for Classification in
Machine Learning.
 The goal of the SVM algorithm is to create the best hyperplane or decision
boundary that can separate n-dimensional space into classes so that we can
easily put the new data point in the correct category in the future.
 SVM chooses the extreme points/vectors called Support Vectors that help in
creating the hyperplane. Consider the following diagram in which there are
two different categories that are classified using a decision boundary or
hyperplane:
Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
Hyperplane and Support Vectors in the SVM algorithm:
 Hyperplane: There can be multiple lines/decision boundaries to seprate the
classes in n-dimensional space, but we need to find out the best decision
boundary that helps to classify the data points. This best boundary is known
as the hyperplane of SVM.
The dimensions of the hyperplane depend on the features present in the
dataset, which means if there are 2 features (as shown in image), then
hyperplane will be a straight line. And if there are 3 features, then
hyperplane will be a 2-dimension plane.
We always create a hyperplane that has a maximum margin, which means
the maximum distance between the data points.
 Support Vectors:
The data points or vectors that are the closest to the hyperplane and which
affect the position of the hyperplane are termed as Support Vector. Since
these vectors support the hyperplane, hence called a Support vector.
Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
 Here suppose there is a strange cat that has some features as that of dogs,
so if we want a model that can accurately identify whether it is a cat or dog,
in such cases we use SVM. We will first train our model with lots of features
of cats and dogs so that it can learn from number of features of cats and
dogs, and then test it with this strange animal. So as support vector creates
a decision boundary between these two data (cat and dog) and choose
extreme cases (of support vectors), it will see the extreme case of cat and
dog. On the basis of these extreme cases of support vectors, it will classify it
as a cat.
Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
 The simple SVM algorithm can be used to find decision boundary for linearly
separable data. However, in the case of non-linearly separable data, such as
the in following Fig., straight line or hyperplane cannot be used as a decision
boundary
 In case of non-linearly separable data, the simple SVM algorithm cannot be
used. Rather, a modified version of SVM, called Kernel SVM, is used.
Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
 A kernel is a mathematical function that transforms an input data space into
the required form. The kernel takes a low-dimensional input space and
transforms it into a higher dimensional space. In other words, you can say
that it converts non-separable problem to separable problems by adding
nonlinear shape of boundary or by adding more dimension to it as shown in
following figure or. It is most useful in non-linear separation problem. Kernel
trick or kernel SVM helps us to build a more accurate classifier.
Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
 Kernel SVM algorithms uses different types of kernel functions . For
example, linear, nonlinear, polynomial, radial basis function (RBF),
and sigmoid. These Kernel functions are for sequence data, graphs,
text, images, as well as vectors. The most used type of kernel function
is RBF. Because it has localized and finite response along the entire x-axis.
The kernel functions return the inner product between two points in a
suitable feature space. Thus, by defining a notion of similarity, with little
computational cost even in very high-dimensional spaces.
 As seen in above figure, it is easily seen that kernel adds levels based on
distance of nonlinear data points. If you are in the origin, then the points will
be on the lowest level. As we move away from the origin, it means that we
are climbing the hill (moving from the centre of the plane towards the
margins) so the level of the points will be higher. Now we can easily
separate the two classes. These transformations are called kernels. Popular
kernels are: Linear Kernel, Polynomial Kernel, Gaussian Kernel, Radial
Basis Function (RBF), Laplace RBF Kernel, Sigmoid Kernel, Above RBF
Kernel, etc. Let’s try to understand brief about some of these important
kernels:
Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
 Naive Bayes classifiers is a collection of classification algorithms based
on Bayes’ Theorem. It is not a single algorithm but a family of algorithms
where all of them share a common principle, i.e. every pair of features being
classified is independent of each other.
 A Naive Bayes classifier is a probabilistic machine learning model that’s
used for classification task. The key point of the classifier is the Bayes
theorem which is as follows:
Where,
 P(A|B) is Posterior probability: Probability of hypothesis A on the observed
event B.
 P(B|A) is Likelihood probability: Probability of the evidence given that the
probability of a hypothesis is true.
 P(A) is Prior Probability: Probability of hypothesis before observing the
evidence.
 P(B) is Marginal Probability: Probability of Evidence.
Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
Types of Naïve Bayes Model:
There are three types of Naive Bayes Model, which are given below:
 Gaussian: The Gaussian model assumes that features follow a normal
distribution. This means if predictors take continuous values instead of discrete,
then the model assumes that these values are sampled from the Gaussian
distribution.
 Multinomial: The Multinomial Naïve Bayes classifier is used when the data is
multinomial distributed. It is primarily used for document classification problems,
it means a particular document belongs to which category such as Sports,
Politics, education, etc.
 Bernoulli: The Bernoulli classifier works similar to the Multinomial classifier, but
the predictor variables are the independent Booleans variables. Such as if a
particular word is present or not in a document. This model is also famous for
document classification tasks.
Working of Naïve Bayes' Classifier:
 Suppose we have a dataset of weather conditions and corresponding
target variable "Play". So using this dataset we need to decide that whether
we should play or not on a particular day according to the weather
conditions. So to solve this problem, we need to follow the below steps:
 Convert the given dataset into frequency tables.
 Generate Likelihood table by finding the probabilities of given features.
 Now, use Bayes theorem to calculate the posterior probability.
Problem: If the weather is sunny, then the Player should play or not?
Solution: To solve this, first consider the below dataset:
Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
Outlook Play
Rainy Yes
Sunny Yes
Overcast Yes
Overcast Yes
Sunny No
Rainy Yes
Sunny Yes
Overcast Yes
Rainy No
Sunny No
Sunny Yes
Rainy No
Overcast Yes
Overcast Yes
 Frequency table for the Weather Conditions:
Weather Yes No
Overcast 5 0
Rainy 2 2
Sunny 3 2
Total 10 5
Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
 Likelihood table weather condition:
Weather No Yes
Overcast 0 5 5/14= 0.35
Rainy 2 2 4/14=0.29
Sunny 2 3 5/14=0.35
All 4/14=0.29 10/14=0.71
Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
Applying Bayes'theorem:
P(Yes|Sunny)= P(Sunny|Yes)*P(Yes)/P(Sunny)
P(Sunny|Yes)= 3/10= 0.3
P(Sunny)= 0.35
P(Yes)=0.71
So P(Yes|Sunny) = 0.3*0.71/0.35= 0.60
P(No|Sunny)= P(Sunny|No)*P(No)/P(Sunny)
P(Sunny|NO)= 2/4=0.5
P(No)= 0.29
P(Sunny)= 0.35
So P(No|Sunny)= 0.5*0.29/0.35 = 0.41
So as we can see from the above calculation that,
P(Yes|Sunny)>P(No|Sunny)
 Hence on a Sunny day, Player can play the game.
Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
Advantages of Naïve Bayes Classifier:
 Naïve Bayes is one of the fast and easy ML algorithms to predict a class of
datasets.
 It can be used for Binary as well as Multi-class Classifications.
 It performs well in Multi-class predictions as compared to the other
Algorithms.
 It is the most popular choice for text classification problems.
Disadvantages of Naïve Bayes Classifier:
 Naive Bayes assumes that all features are independent or unrelated, so it
cannot learn the relationship between features.
Applications of Naïve Bayes Classifier:
 It is used for Credit Scoring.
 It is used in medical data classification.
 It can be used in real-time predictions because Naïve Bayes Classifier is
an eager learner.
 It is used in Text classification such as Spam filtering and Sentiment
analysis.
 K-Nearest Neighbour is one of the simplest Supervised Machine Learning
that compares similarities between the K number of features (attributes) of
new dataset of a particular object with K features of the dataset of available
objects and put it into the category that is most similar to. Hence the name is
K-NN.
 This algorithm can be used for Regression as well as for Classification, but
mostly it is useful for the classification problems.
 It is a non-parametric algorithm, means it does not make any assumption
on underlying data and is also called a lazy learner algorithm because it
does not learn from the training set immediately instead it stores the dataset
and at the time of classification, it performs an action on the dataset.
 In other words, at the training phase K-NN algorithm just stores the dataset
and when it gets new data, then it classifies that data into a category that is
much similar to the new data.
Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
 For Example, Suppose, we have a new animal that looks similar to cat and
dog, but we want to know either it is a cat or dog. So, for this identification,
we can use the KNN algorithm, as it works on a similarity principle. Our KNN
model will simply, find the similar features of the new data set into the cats
and dogs’ available dataset and based on the most similar features it will put
it in either category of cat or dog.
The intuition of KNN can be clearer through following implementation steps:
 Step 1: Identify the problem type as either falling into classification or
regression.
 Step 2: Fix a value for k features of an object to compare which can be any
number greater than zero.
 Step 3: Now find k feature’s values in existing dataset of the specified
feature’s values from that are closest to the unknown/uncategorized features
based on distance (Euclidean Distance, Manhattan Distance etc.)
 Step 4: Find the solution in either of the following steps:
Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
1. If classification, assign the uncategorized object to the class where the
maximum number of neighbours belonged to.
or
2. If regression, find the average value of all the closest neighbours and assign
it as the value for the unknown object.
 For step 3, the most used distance formula is Euclidean Distance which is
given as follows:
 By Euclidean Distance, the distance between two points P1(x1,y1)and
P2(x2,y2) can be expressed as :
Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
Why do we need a K-NN Algorithm?
Suppose there are two categories, i.e., Category A and Category B, and we
have a new data point x1, so this data point will lie in which of these
categories. To solve this type of problem, we need a K-NN algorithm. With
the help of K-NN, we can easily identify the category or class of a particular
dataset. Consider the below diagram:
Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
How does K-NN work?
The K-NN working can be explained on the basis of the below algorithm:
 Step-1: Select the number K of the neighbors
 Step-2: Calculate the Euclidean distance of K number of neighbors
 Step-3: Take the K nearest neighbors as per the calculated Euclidean
distance.
 Step-4: Among these k neighbors, count the number of the data points in
each category.
 Step-5: Assign the new data points to that category for which the number of
the neighbor is maximum.
 Step-6: Our model is ready.
 Suppose we have a new data point and we need to put it in the required
category. Consider the below image:
Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
Firstly, we will choose the number of neighbors, so we will choose the k=5.
Next, we will calculate the Euclidean distance between the data points. The
Euclidean distance is the distance between two points, which we have already
studied in geometry. It can be calculated as:
Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
By calculating the Euclidean distance we got the nearest neighbors, as three
nearest neighbors in category A and two nearest neighbors in category B.
Consider the below image:
Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
As we can see the 3 nearest neighbors are from category A, hence this
new data point must belong to category A.
How to select the value of K in the K-NN Algorithm?
Below are some points to remember while selecting the value of K in the
KNN algorithm:
 There is no particular way to determine the best value for "K", so we need to
try some values to find the best out of them. The most preferred value for K
is 5.
 A very low value for K such as K=1 or K=2, can be noisy and lead to the
effects of outliers in the model.
 Large values for K are good, but it may find some difficulties.
Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
Advantages of KNN Algorithm:
 It is simple to implement.
 It is robust to the noisy training data
 It can be more effective if the training data is large.
Disadvantages of KNN Algorithm:
 Always needs to determine the value of K which may be complex some
time.
 The computation cost is high because of calculating the distance between
the data points for all the training samples.
Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
 Decision Tree is a Supervised learning technique that can be used for
both classification and Regression problems, but mostly it is preferred for
solving Classification problems.
 It is a tree-structured classifier, where internal nodes represent the features
of a dataset, branches represent the decision rules and each leaf node
represents the outcome.
 In a Decision tree, there are two nodes, which are the Decision
Node and Leaf Node. Decision nodes are used to make any decision and
have multiple branches, whereas Leaf nodes are the output of those
decisions and do not contain any further branches.
 The decisions or the test are performed on the basis of features of the given
dataset.
 It is a graphical representation for getting all the possible solutions to
a problem/decision based on given conditions.
Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
 It is called a decision tree because, similar to a tree, it starts with the root
node, which expands on further branches and constructs a tree-like
structure.
 In order to build a tree, we use the CART algorithm, which stands
for Classification and Regression Tree algorithm.
 A decision tree simply asks a question, and based on the answer (Yes/No), it
further split the tree into sub trees.
 Below diagram explains the general structure of a decision tree:
Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
Why use Decision Trees?
Below are the two reasons for using the Decision tree:
1.Decision Trees usually mimic human thinking ability while making a
decision, so it is easy to understand.
2.The logic behind the decision tree can be easily understood because it
shows a tree-like structure.
Decision Tree Terminologies
 Root Node: Root node is from where the decision tree starts. It represents
the entire dataset, which further gets divided into two or more homogeneous
sets.
 Leaf Node: Leaf nodes are the final output node, and the tree cannot be
segregated further after getting a leaf node.
Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
 Splitting: Splitting is the process of dividing the decision node/root node into
sub-nodes according to the given conditions.
 Branch/Sub Tree: A tree formed by splitting the tree.
 Pruning: Pruning is the process of removing the unwanted branches from
the tree.
 Parent/Child node: The root node of the tree is called the parent node, and
other nodes are called the child nodes.
 How does the Decision Tree algorithm Work?
In a decision tree, for predicting the class of the given dataset, the algorithm
starts from the root node of the tree. This algorithm compares the values of
root attribute with the record (real dataset) attribute and, based on the
comparison, follows the branch and jumps to the next node.
For the next node, the algorithm again compares the attribute value with the
other sub-nodes and move further. It continues the process until it reaches
the leaf node of the tree. The complete process can be better understood
using the below algorithm:
Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
 Step-1: Begin the tree with the root node, says S, which contains the
complete dataset.
 Step-2: Find the best attribute in the dataset using Attribute Selection
Measure (ASM).
 Step-3: Divide the S into subsets that contains possible values for the best
attributes.
 Step-4: Generate the decision tree node, which contains the best attribute.
 Step-5: Recursively make new decision trees using the subsets of the
dataset created in step -3. Continue this process until a stage is reached
where you cannot further classify the nodes and called the final node as a
leaf node.
Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
Attribute Selection Measures :-
 While implementing a Decision tree, the main issue arises that how to select
the best attribute for the root node and for sub-nodes. So, to solve such
problems there is a technique which is called as Attribute selection measure
or ASM. By this measurement, we can easily select the best attribute for the
nodes of the tree. There are two popular techniques for ASM, which are:
1.Information Gain:
 Information gain is the measurement of changes in entropy after the
segmentation of a dataset based on an attribute.
 It calculates how much information a feature provides us about a class.
 According to the value of information gain, we split the node and build the
decision tree.
 A decision tree algorithm always tries to maximize the value of information
gain, and a node/attribute having the highest information gain is split first. It
can be calculated using the below formula:
Information Gain= Entropy(S)- [(Weighted Avg) *Entropy(each feature)
Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
Entropy: Entropy is a metric to measure the impurity in a given attribute. It
specifies randomness in data. Entropy can be calculated as:
Entropy(s)= -P(yes)log2 P(yes)- P(no) log2 P(no)
Where,
S= Total number of samples
P(yes)= probability of yes
P(no)= probability of no
2. Gini Index:
 Gini index is a measure of impurity or purity used while creating a decision tree
in the CART(Classification and Regression Tree) algorithm.
 An attribute with the low Gini index should be preferred as compared to the high
Gini index.
 It only creates binary splits, and the CART algorithm uses the Gini index to
create binary splits.
 Gini index can be calculated using the below formula:
Gini Index= 1- ∑jPj
2
Pruning: Getting an Optimal Decision tree
 Pruning is a process of deleting the unnecessary nodes from a tree in order
to get the optimal decision tree.
 A too-large tree increases the risk of overfitting, and a small tree may not
capture all the important features of the dataset. Therefore, a technique that
decreases the size of the learning tree without reducing accuracy is known
as Pruning. There are mainly two types of tree pruning technology
used:1.Cost Complexity Pruning and 2.Reduced Error Pruning.
Advantages of the Decision Tree
 It is simple to understand as it follows the same process which a human
follow while making any decision in real-life.
 It can be very useful for solving decision-related problems.
 It helps to think about all the possible outcomes for a problem.
 There is less requirement of data cleaning compared to other algorithms.
Disadvantages of the Decision Tree
 The decision tree contains lots of layers, which makes it complex.
 It may have an overfitting issue, which can be resolved using the Random
Forest algorithm.
 For more class labels, the computational complexity of the decision tree may
increase.
Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
 Random Forest is a popular machine learning algorithm that belongs to the
supervised learning technique. It can be used for both Classification and
Regression problems in ML. It is based on the concept of ensemble
learning, which is a process of combining multiple classifiers to solve a
complex problem and to improve the performance of the model.
 As the name suggests, "Random Forest is a classifier that contains a
number of decision trees on various subsets of the given dataset and
takes the average to improve the predictive accuracy of that
dataset." Instead of relying on one decision tree, the random forest takes
the prediction from each tree and based on the majority votes of predictions,
and it predicts the final output.
 The greater number of trees in the forest leads to higher accuracy and
prevents the problem of overfitting.
 The below diagram explains the working of the Random Forest algorithm:
Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
 Since the random forest combines multiple trees to predict the class of the
dataset, it is possible that some decision trees may predict the correct
output, while others may not. But together, all the trees predict the correct
output. Therefore, below are two assumptions for a better Random forest
classifier:
1.There should be some actual values in the feature variable of the dataset
so that the classifier can predict accurate results rather than a guessed
result.
2.The predictions from each tree must have very low correlations.
Why use Random Forest?
Below are some points that explain why we should use the Random Forest
algorithm:
 It takes less training time as compared to other algorithms.
 It predicts output with high accuracy, even for the large dataset it runs
efficiently.
 It can also maintain accuracy when a large proportion of data is missing.
Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
How does Random Forest algorithm work?
 Random Forest works in two-phase first is to create the random forest by
combining N decision tree, and second is to make predictions for each tree
created in the first phase.
The Working process can be explained in the below steps and diagram:
 Step-1: Select random K data points from the training set.
 Step-2: Build the decision trees associated with the selected data points
(Subsets).
 Step-3: Choose the number N for decision trees that you want to build.
 Step-4: Repeat Step 1 & 2.
 Step-5: For new data points, find the predictions of each decision tree, and
assign the new data points to the category that wins the majority votes.
Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
Applications of Random Forest
There are mainly four sectors where Random forest mostly used:
 Banking: Banking sector mostly uses this algorithm for the identification of
loan risk.
 Medicine: With the help of this algorithm, disease trends and risks of the
disease can be identified.
 Land Use: We can identify the areas of similar land use by this algorithm.
 Marketing: Marketing trends can be identified using this algorithm.
Advantages of Random Forest
 Random Forest is capable of performing both Classification and Regression
tasks.
 It is capable of handling large datasets with high dimensionality.
 It enhances the accuracy of the model and prevents the overfitting issue.
Disadvantages of Random Forest
 Although random forest can be used for both classification and regression
tasks, it is not more suitable for Regression tasks.
Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
Thanks !!!
Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune

More Related Content

Similar to Introduction to Classification . pptx

Ash bus 308 week 2 problem set new
Ash bus 308 week 2 problem set newAsh bus 308 week 2 problem set new
Ash bus 308 week 2 problem set neweyavagal
 
Ash bus 308 week 2 problem set new
Ash bus 308 week 2 problem set newAsh bus 308 week 2 problem set new
Ash bus 308 week 2 problem set newuopassignment
 
Ash bus 308 week 2 problem set new
Ash bus 308 week 2 problem set newAsh bus 308 week 2 problem set new
Ash bus 308 week 2 problem set newrhettwhitee
 
Ash bus 308 week 2 problem set new
Ash bus 308 week 2 problem set newAsh bus 308 week 2 problem set new
Ash bus 308 week 2 problem set newkingrani623
 
Ash bus 308 week 2 problem set new
Ash bus 308 week 2 problem set newAsh bus 308 week 2 problem set new
Ash bus 308 week 2 problem set newNoahliamwilliam
 
Ash bus 308 week 2 problem set new
Ash bus 308 week 2 problem set newAsh bus 308 week 2 problem set new
Ash bus 308 week 2 problem set newFaarooqkhaann
 
Rd1 r17a19 datawarehousing and mining_cap617t_cap617
Rd1 r17a19 datawarehousing and mining_cap617t_cap617Rd1 r17a19 datawarehousing and mining_cap617t_cap617
Rd1 r17a19 datawarehousing and mining_cap617t_cap617Ravi Kumar
 
Machine learning Method and techniques
Machine learning Method and techniquesMachine learning Method and techniques
Machine learning Method and techniquesMarkMojumdar
 
An Experimental Study of Diabetes Disease Prediction System Using Classificat...
An Experimental Study of Diabetes Disease Prediction System Using Classificat...An Experimental Study of Diabetes Disease Prediction System Using Classificat...
An Experimental Study of Diabetes Disease Prediction System Using Classificat...IOSRjournaljce
 
Unit-4 classification
Unit-4 classificationUnit-4 classification
Unit-4 classificationLokarchanaD
 
Supervised learning and unsupervised learning
Supervised learning and unsupervised learningSupervised learning and unsupervised learning
Supervised learning and unsupervised learningArunakumariAkula1
 
A Decision Tree Based Classifier for Classification & Prediction of Diseases
A Decision Tree Based Classifier for Classification & Prediction of DiseasesA Decision Tree Based Classifier for Classification & Prediction of Diseases
A Decision Tree Based Classifier for Classification & Prediction of Diseasesijsrd.com
 
Advanced business mathematics and statistics for entrepreneurs
Advanced business mathematics and statistics for entrepreneursAdvanced business mathematics and statistics for entrepreneurs
Advanced business mathematics and statistics for entrepreneursDr. Trilok Kumar Jain
 
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)Parth Khare
 
Data mining classifiers.
Data mining classifiers.Data mining classifiers.
Data mining classifiers.ShwetaPatil174
 
3_learning.ppt
3_learning.ppt3_learning.ppt
3_learning.pptbutest
 
M08 BiasVarianceTradeoff
M08 BiasVarianceTradeoffM08 BiasVarianceTradeoff
M08 BiasVarianceTradeoffRaman Kannan
 
chap4_Parametric_Methods.ppt
chap4_Parametric_Methods.pptchap4_Parametric_Methods.ppt
chap4_Parametric_Methods.pptShayanChowdary
 

Similar to Introduction to Classification . pptx (20)

Ash bus 308 week 2 problem set new
Ash bus 308 week 2 problem set newAsh bus 308 week 2 problem set new
Ash bus 308 week 2 problem set new
 
Ash bus 308 week 2 problem set new
Ash bus 308 week 2 problem set newAsh bus 308 week 2 problem set new
Ash bus 308 week 2 problem set new
 
Ash bus 308 week 2 problem set new
Ash bus 308 week 2 problem set newAsh bus 308 week 2 problem set new
Ash bus 308 week 2 problem set new
 
Ash bus 308 week 2 problem set new
Ash bus 308 week 2 problem set newAsh bus 308 week 2 problem set new
Ash bus 308 week 2 problem set new
 
Ash bus 308 week 2 problem set new
Ash bus 308 week 2 problem set newAsh bus 308 week 2 problem set new
Ash bus 308 week 2 problem set new
 
Ash bus 308 week 2 problem set new
Ash bus 308 week 2 problem set newAsh bus 308 week 2 problem set new
Ash bus 308 week 2 problem set new
 
Rd1 r17a19 datawarehousing and mining_cap617t_cap617
Rd1 r17a19 datawarehousing and mining_cap617t_cap617Rd1 r17a19 datawarehousing and mining_cap617t_cap617
Rd1 r17a19 datawarehousing and mining_cap617t_cap617
 
Machine learning Method and techniques
Machine learning Method and techniquesMachine learning Method and techniques
Machine learning Method and techniques
 
An Experimental Study of Diabetes Disease Prediction System Using Classificat...
An Experimental Study of Diabetes Disease Prediction System Using Classificat...An Experimental Study of Diabetes Disease Prediction System Using Classificat...
An Experimental Study of Diabetes Disease Prediction System Using Classificat...
 
Unit-4 classification
Unit-4 classificationUnit-4 classification
Unit-4 classification
 
Naive bayes classifier
Naive bayes classifierNaive bayes classifier
Naive bayes classifier
 
Supervised learning and unsupervised learning
Supervised learning and unsupervised learningSupervised learning and unsupervised learning
Supervised learning and unsupervised learning
 
Machine Learning_PPT.pptx
Machine Learning_PPT.pptxMachine Learning_PPT.pptx
Machine Learning_PPT.pptx
 
A Decision Tree Based Classifier for Classification & Prediction of Diseases
A Decision Tree Based Classifier for Classification & Prediction of DiseasesA Decision Tree Based Classifier for Classification & Prediction of Diseases
A Decision Tree Based Classifier for Classification & Prediction of Diseases
 
Advanced business mathematics and statistics for entrepreneurs
Advanced business mathematics and statistics for entrepreneursAdvanced business mathematics and statistics for entrepreneurs
Advanced business mathematics and statistics for entrepreneurs
 
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
 
Data mining classifiers.
Data mining classifiers.Data mining classifiers.
Data mining classifiers.
 
3_learning.ppt
3_learning.ppt3_learning.ppt
3_learning.ppt
 
M08 BiasVarianceTradeoff
M08 BiasVarianceTradeoffM08 BiasVarianceTradeoff
M08 BiasVarianceTradeoff
 
chap4_Parametric_Methods.ppt
chap4_Parametric_Methods.pptchap4_Parametric_Methods.ppt
chap4_Parametric_Methods.ppt
 

More from Harsha Patel

Introduction to Reinforcement Learning.pptx
Introduction to Reinforcement Learning.pptxIntroduction to Reinforcement Learning.pptx
Introduction to Reinforcement Learning.pptxHarsha Patel
 
Introduction to Clustering . pptx
Introduction    to     Clustering . pptxIntroduction    to     Clustering . pptx
Introduction to Clustering . pptxHarsha Patel
 
Introduction to Machine Learning.pptx
Introduction  to  Machine  Learning.pptxIntroduction  to  Machine  Learning.pptx
Introduction to Machine Learning.pptxHarsha Patel
 
Unit-V-Introduction to Data Mining.pptx
Unit-V-Introduction to  Data Mining.pptxUnit-V-Introduction to  Data Mining.pptx
Unit-V-Introduction to Data Mining.pptxHarsha Patel
 
Unit-IV-Introduction to Data Warehousing .pptx
Unit-IV-Introduction to Data Warehousing .pptxUnit-IV-Introduction to Data Warehousing .pptx
Unit-IV-Introduction to Data Warehousing .pptxHarsha Patel
 
Unit-III-AI Search Techniques and solution's
Unit-III-AI Search Techniques and solution'sUnit-III-AI Search Techniques and solution's
Unit-III-AI Search Techniques and solution'sHarsha Patel
 
Unit-II-Introduction of Artifiial Intelligence.pptx
Unit-II-Introduction of Artifiial Intelligence.pptxUnit-II-Introduction of Artifiial Intelligence.pptx
Unit-II-Introduction of Artifiial Intelligence.pptxHarsha Patel
 
Unit-I-Introduction to Recent Trends.pptx
Unit-I-Introduction to Recent Trends.pptxUnit-I-Introduction to Recent Trends.pptx
Unit-I-Introduction to Recent Trends.pptxHarsha Patel
 
Using Unix Commands.pptx
Using Unix Commands.pptxUsing Unix Commands.pptx
Using Unix Commands.pptxHarsha Patel
 
Using Vi Editor.pptx
Using Vi Editor.pptxUsing Vi Editor.pptx
Using Vi Editor.pptxHarsha Patel
 
Shell Scripting and Programming.pptx
Shell Scripting and Programming.pptxShell Scripting and Programming.pptx
Shell Scripting and Programming.pptxHarsha Patel
 
Managing Processes in Unix.pptx
Managing Processes in Unix.pptxManaging Processes in Unix.pptx
Managing Processes in Unix.pptxHarsha Patel
 
Introduction to Unix Concets.pptx
Introduction to Unix Concets.pptxIntroduction to Unix Concets.pptx
Introduction to Unix Concets.pptxHarsha Patel
 
Handling Files Under Unix.pptx
Handling Files Under Unix.pptxHandling Files Under Unix.pptx
Handling Files Under Unix.pptxHarsha Patel
 
Introduction to OS.pptx
Introduction to OS.pptxIntroduction to OS.pptx
Introduction to OS.pptxHarsha Patel
 
Using Unix Commands.pptx
Using Unix Commands.pptxUsing Unix Commands.pptx
Using Unix Commands.pptxHarsha Patel
 
Introduction to Unix Concets.pptx
Introduction to Unix Concets.pptxIntroduction to Unix Concets.pptx
Introduction to Unix Concets.pptxHarsha Patel
 
Shell Scripting and Programming.pptx
Shell Scripting and Programming.pptxShell Scripting and Programming.pptx
Shell Scripting and Programming.pptxHarsha Patel
 
Using Vi Editor.pptx
Using Vi Editor.pptxUsing Vi Editor.pptx
Using Vi Editor.pptxHarsha Patel
 
Managing Processes in Unix.pptx
Managing Processes in Unix.pptxManaging Processes in Unix.pptx
Managing Processes in Unix.pptxHarsha Patel
 

More from Harsha Patel (20)

Introduction to Reinforcement Learning.pptx
Introduction to Reinforcement Learning.pptxIntroduction to Reinforcement Learning.pptx
Introduction to Reinforcement Learning.pptx
 
Introduction to Clustering . pptx
Introduction    to     Clustering . pptxIntroduction    to     Clustering . pptx
Introduction to Clustering . pptx
 
Introduction to Machine Learning.pptx
Introduction  to  Machine  Learning.pptxIntroduction  to  Machine  Learning.pptx
Introduction to Machine Learning.pptx
 
Unit-V-Introduction to Data Mining.pptx
Unit-V-Introduction to  Data Mining.pptxUnit-V-Introduction to  Data Mining.pptx
Unit-V-Introduction to Data Mining.pptx
 
Unit-IV-Introduction to Data Warehousing .pptx
Unit-IV-Introduction to Data Warehousing .pptxUnit-IV-Introduction to Data Warehousing .pptx
Unit-IV-Introduction to Data Warehousing .pptx
 
Unit-III-AI Search Techniques and solution's
Unit-III-AI Search Techniques and solution'sUnit-III-AI Search Techniques and solution's
Unit-III-AI Search Techniques and solution's
 
Unit-II-Introduction of Artifiial Intelligence.pptx
Unit-II-Introduction of Artifiial Intelligence.pptxUnit-II-Introduction of Artifiial Intelligence.pptx
Unit-II-Introduction of Artifiial Intelligence.pptx
 
Unit-I-Introduction to Recent Trends.pptx
Unit-I-Introduction to Recent Trends.pptxUnit-I-Introduction to Recent Trends.pptx
Unit-I-Introduction to Recent Trends.pptx
 
Using Unix Commands.pptx
Using Unix Commands.pptxUsing Unix Commands.pptx
Using Unix Commands.pptx
 
Using Vi Editor.pptx
Using Vi Editor.pptxUsing Vi Editor.pptx
Using Vi Editor.pptx
 
Shell Scripting and Programming.pptx
Shell Scripting and Programming.pptxShell Scripting and Programming.pptx
Shell Scripting and Programming.pptx
 
Managing Processes in Unix.pptx
Managing Processes in Unix.pptxManaging Processes in Unix.pptx
Managing Processes in Unix.pptx
 
Introduction to Unix Concets.pptx
Introduction to Unix Concets.pptxIntroduction to Unix Concets.pptx
Introduction to Unix Concets.pptx
 
Handling Files Under Unix.pptx
Handling Files Under Unix.pptxHandling Files Under Unix.pptx
Handling Files Under Unix.pptx
 
Introduction to OS.pptx
Introduction to OS.pptxIntroduction to OS.pptx
Introduction to OS.pptx
 
Using Unix Commands.pptx
Using Unix Commands.pptxUsing Unix Commands.pptx
Using Unix Commands.pptx
 
Introduction to Unix Concets.pptx
Introduction to Unix Concets.pptxIntroduction to Unix Concets.pptx
Introduction to Unix Concets.pptx
 
Shell Scripting and Programming.pptx
Shell Scripting and Programming.pptxShell Scripting and Programming.pptx
Shell Scripting and Programming.pptx
 
Using Vi Editor.pptx
Using Vi Editor.pptxUsing Vi Editor.pptx
Using Vi Editor.pptx
 
Managing Processes in Unix.pptx
Managing Processes in Unix.pptxManaging Processes in Unix.pptx
Managing Processes in Unix.pptx
 

Recently uploaded

Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 

Recently uploaded (20)

Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 

Introduction to Classification . pptx

  • 1. Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
  • 2.  The Classification algorithm is a Supervised Learning technique that is used to identify the category of new observations on the basis of training data. In Classification, a program learns from the given dataset or observations and then classifies new observation into a number of classes or groups. Such as, Yes or No, 0 or 1, Spam or Not Spam, cat or dog, etc. Classes can be called as targets/labels or categories.  Unlike regression, the output variable of Classification is a category, not a value, such as "Green or Blue", "fruit or animal", etc. Since the Classification algorithm is a Supervised learning technique, hence it takes labeled input data, which means it contains input with the corresponding output.  In classification algorithm, a discrete output function(y) is mapped to input variable(x). y=f(x), where y = categorical output Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
  • 3.  The best example of an ML classification algorithm is Email Spam Detector.  The main goal of the Classification algorithm is to identify the category of a given dataset, and these algorithms are mainly used to predict the output for the categorical data.  Classification algorithms can be better understood using the below diagram. In the below diagram, there are two classes, class A and Class B. These classes have features that are similar to each other and dissimilar to other classes. Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
  • 4. Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
  • 5.  The algorithm which implements the classification on a dataset is known as a classifier. There are two types of Classifications:  Binary Classifier: If the classification problem has only two possible outcomes, then it is called as Binary Classifier. Examples: YES or NO, MALE or FEMALE, SPAM or NOT SPAM, CAT or DOG, etc.  Multi-class Classifier: If a classification problem has more than two outcomes, then it is called as Multi-class Classifier. Example: Classifications of types of crops, Classification of types of music. Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
  • 6. Learners in Classification Problems: In the classification problems, there are two types of learners:  Lazy Learners: Lazy Learner firstly stores the training dataset and wait until it receives the test dataset. In Lazy learner case, classification is done on the basis of the most related data stored in the training dataset. It takes less time in training but more time for predictions. Example: K-NN algorithm, Case-based reasoning.  Eager Learners: Eager Learners develop a classification model based on a training dataset before receiving a test dataset. Opposite to Lazy learners, Eager learners take less time in training and more time in prediction. Example: Decision Trees, Naïve Bayes, ANN. Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
  • 7. Types of ML Classification Algorithms: Classification Algorithms can be further divided into the Mainly two category:  Linear Models • Logistic Regression • Support Vector Machines  Non-linear Models • K-Nearest Neighbors • Kernel SVM • Naïve Bayes • Decision Tree Classification • Random Forest Classification
  • 8. What is Logistic Regression :  Logistic regression is a supervised ML classification algorithm used to assign observations to a discrete set of classes. Unlike linear regression which outputs continuous number values like what will be salary of new employee or price of house etc, logistic regression transforms its output into probability value which can then be mapped to two or more discrete classes.  For example,  i) From given students training dataset, it learns to decide whether new or another student data yield result pass or fail.  ii) From the symptom of patient, whether he/she is infected by corona virus or not.  iii) whether, a particular animal is cat or dog or rat etc. Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
  • 9. Type of Logistic Regression:  On the basis of the categories, Logistic Regression can be classified into three types:  Binomial Logistic regression: where the response variable has two values 0 and 1 or pass and fail or true and false.  Multinomial Logistic regression: there can be 3 or more possible unordered types of the dependent variable, such as "cat", "dogs", or "rats“  Ordinal Logistic regression: In this case, there can be 3 or more possible ordered types of dependent variables, such as "low", "Medium", or "High". Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
  • 10.  Let’s try to understand first why logistics, and why not linear?  Let ‘x’ be some feature and ‘y’ be the output which can be either 0 or 1, with binary classification.  The probability that the output is 1 given its input x, can be represented as:  If we predict the probability via linear regression, we can write it as: P(X) = b0 + b1 *X Where, P(X) = P(y=1|x)  Linear regression model can generate the predicted probability as any number ranging from - ∞ to + ∞ to, whereas probability of an outcome can only lie between 0< P(x)<1.Also, Linear regression has a considerable effect on outliers. Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
  • 11.  To avoid this problem, log-odds function or logit function is used.  Logistic regression can therefore be expressed in terms of logit function as : log(P(x)/1-P(x)) = b0 + b1 * X  where, the left-hand side is called the logit or log-odds function, and p(x)/(1- p(x)) is called odds.  The odds signify the ratio of probability of success [p(x)] to probability of failure [ 1- p(X)]. Therefore, in Logistic Regression, linear combination of inputs is mapped to the log(odds) - the output being equal to 1.  If we take an inverse of the above function, we get:  P(x) = Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
  • 12. Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune  In more simplified form, above equation becomes  This is known as the Sigmoid function since it gives an S-shaped curve. It always gives a value of probability ranging from 0<p<1, as shown in following figure:  In order to map this to a discrete class (true/false, cat/dog), we select a threshold value or tipping point above which we will classify values into class 1 and below which we classify values into class 2. P(x) ≥0.5,class=1 P(x) <0.5,class=0
  • 13. Support Vector Machine(SVM) :  SVM is Supervised Learning algorithms, also can be used for Classification as well as Regression problems. But, mostly used for Classification in Machine Learning.  The goal of the SVM algorithm is to create the best hyperplane or decision boundary that can separate n-dimensional space into classes so that we can easily put the new data point in the correct category in the future.  SVM chooses the extreme points/vectors called Support Vectors that help in creating the hyperplane. Consider the following diagram in which there are two different categories that are classified using a decision boundary or hyperplane: Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
  • 14. Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
  • 15. Hyperplane and Support Vectors in the SVM algorithm:  Hyperplane: There can be multiple lines/decision boundaries to seprate the classes in n-dimensional space, but we need to find out the best decision boundary that helps to classify the data points. This best boundary is known as the hyperplane of SVM. The dimensions of the hyperplane depend on the features present in the dataset, which means if there are 2 features (as shown in image), then hyperplane will be a straight line. And if there are 3 features, then hyperplane will be a 2-dimension plane. We always create a hyperplane that has a maximum margin, which means the maximum distance between the data points.  Support Vectors: The data points or vectors that are the closest to the hyperplane and which affect the position of the hyperplane are termed as Support Vector. Since these vectors support the hyperplane, hence called a Support vector. Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
  • 16.  Here suppose there is a strange cat that has some features as that of dogs, so if we want a model that can accurately identify whether it is a cat or dog, in such cases we use SVM. We will first train our model with lots of features of cats and dogs so that it can learn from number of features of cats and dogs, and then test it with this strange animal. So as support vector creates a decision boundary between these two data (cat and dog) and choose extreme cases (of support vectors), it will see the extreme case of cat and dog. On the basis of these extreme cases of support vectors, it will classify it as a cat. Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
  • 17.  The simple SVM algorithm can be used to find decision boundary for linearly separable data. However, in the case of non-linearly separable data, such as the in following Fig., straight line or hyperplane cannot be used as a decision boundary  In case of non-linearly separable data, the simple SVM algorithm cannot be used. Rather, a modified version of SVM, called Kernel SVM, is used. Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
  • 18.  A kernel is a mathematical function that transforms an input data space into the required form. The kernel takes a low-dimensional input space and transforms it into a higher dimensional space. In other words, you can say that it converts non-separable problem to separable problems by adding nonlinear shape of boundary or by adding more dimension to it as shown in following figure or. It is most useful in non-linear separation problem. Kernel trick or kernel SVM helps us to build a more accurate classifier. Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
  • 19.  Kernel SVM algorithms uses different types of kernel functions . For example, linear, nonlinear, polynomial, radial basis function (RBF), and sigmoid. These Kernel functions are for sequence data, graphs, text, images, as well as vectors. The most used type of kernel function is RBF. Because it has localized and finite response along the entire x-axis. The kernel functions return the inner product between two points in a suitable feature space. Thus, by defining a notion of similarity, with little computational cost even in very high-dimensional spaces.  As seen in above figure, it is easily seen that kernel adds levels based on distance of nonlinear data points. If you are in the origin, then the points will be on the lowest level. As we move away from the origin, it means that we are climbing the hill (moving from the centre of the plane towards the margins) so the level of the points will be higher. Now we can easily separate the two classes. These transformations are called kernels. Popular kernels are: Linear Kernel, Polynomial Kernel, Gaussian Kernel, Radial Basis Function (RBF), Laplace RBF Kernel, Sigmoid Kernel, Above RBF Kernel, etc. Let’s try to understand brief about some of these important kernels: Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
  • 20.  Naive Bayes classifiers is a collection of classification algorithms based on Bayes’ Theorem. It is not a single algorithm but a family of algorithms where all of them share a common principle, i.e. every pair of features being classified is independent of each other.  A Naive Bayes classifier is a probabilistic machine learning model that’s used for classification task. The key point of the classifier is the Bayes theorem which is as follows: Where,  P(A|B) is Posterior probability: Probability of hypothesis A on the observed event B.  P(B|A) is Likelihood probability: Probability of the evidence given that the probability of a hypothesis is true.  P(A) is Prior Probability: Probability of hypothesis before observing the evidence.  P(B) is Marginal Probability: Probability of Evidence. Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
  • 21. Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune Types of Naïve Bayes Model: There are three types of Naive Bayes Model, which are given below:  Gaussian: The Gaussian model assumes that features follow a normal distribution. This means if predictors take continuous values instead of discrete, then the model assumes that these values are sampled from the Gaussian distribution.  Multinomial: The Multinomial Naïve Bayes classifier is used when the data is multinomial distributed. It is primarily used for document classification problems, it means a particular document belongs to which category such as Sports, Politics, education, etc.  Bernoulli: The Bernoulli classifier works similar to the Multinomial classifier, but the predictor variables are the independent Booleans variables. Such as if a particular word is present or not in a document. This model is also famous for document classification tasks.
  • 22. Working of Naïve Bayes' Classifier:  Suppose we have a dataset of weather conditions and corresponding target variable "Play". So using this dataset we need to decide that whether we should play or not on a particular day according to the weather conditions. So to solve this problem, we need to follow the below steps:  Convert the given dataset into frequency tables.  Generate Likelihood table by finding the probabilities of given features.  Now, use Bayes theorem to calculate the posterior probability. Problem: If the weather is sunny, then the Player should play or not? Solution: To solve this, first consider the below dataset: Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
  • 23. Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune Outlook Play Rainy Yes Sunny Yes Overcast Yes Overcast Yes Sunny No Rainy Yes Sunny Yes Overcast Yes Rainy No Sunny No Sunny Yes Rainy No Overcast Yes Overcast Yes
  • 24.  Frequency table for the Weather Conditions: Weather Yes No Overcast 5 0 Rainy 2 2 Sunny 3 2 Total 10 5
  • 25. Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune  Likelihood table weather condition: Weather No Yes Overcast 0 5 5/14= 0.35 Rainy 2 2 4/14=0.29 Sunny 2 3 5/14=0.35 All 4/14=0.29 10/14=0.71
  • 26. Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune Applying Bayes'theorem: P(Yes|Sunny)= P(Sunny|Yes)*P(Yes)/P(Sunny) P(Sunny|Yes)= 3/10= 0.3 P(Sunny)= 0.35 P(Yes)=0.71 So P(Yes|Sunny) = 0.3*0.71/0.35= 0.60 P(No|Sunny)= P(Sunny|No)*P(No)/P(Sunny) P(Sunny|NO)= 2/4=0.5 P(No)= 0.29 P(Sunny)= 0.35 So P(No|Sunny)= 0.5*0.29/0.35 = 0.41 So as we can see from the above calculation that, P(Yes|Sunny)>P(No|Sunny)  Hence on a Sunny day, Player can play the game.
  • 27. Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune Advantages of Naïve Bayes Classifier:  Naïve Bayes is one of the fast and easy ML algorithms to predict a class of datasets.  It can be used for Binary as well as Multi-class Classifications.  It performs well in Multi-class predictions as compared to the other Algorithms.  It is the most popular choice for text classification problems. Disadvantages of Naïve Bayes Classifier:  Naive Bayes assumes that all features are independent or unrelated, so it cannot learn the relationship between features. Applications of Naïve Bayes Classifier:  It is used for Credit Scoring.  It is used in medical data classification.  It can be used in real-time predictions because Naïve Bayes Classifier is an eager learner.  It is used in Text classification such as Spam filtering and Sentiment analysis.
  • 28.  K-Nearest Neighbour is one of the simplest Supervised Machine Learning that compares similarities between the K number of features (attributes) of new dataset of a particular object with K features of the dataset of available objects and put it into the category that is most similar to. Hence the name is K-NN.  This algorithm can be used for Regression as well as for Classification, but mostly it is useful for the classification problems.  It is a non-parametric algorithm, means it does not make any assumption on underlying data and is also called a lazy learner algorithm because it does not learn from the training set immediately instead it stores the dataset and at the time of classification, it performs an action on the dataset.  In other words, at the training phase K-NN algorithm just stores the dataset and when it gets new data, then it classifies that data into a category that is much similar to the new data. Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
  • 29.  For Example, Suppose, we have a new animal that looks similar to cat and dog, but we want to know either it is a cat or dog. So, for this identification, we can use the KNN algorithm, as it works on a similarity principle. Our KNN model will simply, find the similar features of the new data set into the cats and dogs’ available dataset and based on the most similar features it will put it in either category of cat or dog. The intuition of KNN can be clearer through following implementation steps:  Step 1: Identify the problem type as either falling into classification or regression.  Step 2: Fix a value for k features of an object to compare which can be any number greater than zero.  Step 3: Now find k feature’s values in existing dataset of the specified feature’s values from that are closest to the unknown/uncategorized features based on distance (Euclidean Distance, Manhattan Distance etc.)  Step 4: Find the solution in either of the following steps: Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
  • 30. 1. If classification, assign the uncategorized object to the class where the maximum number of neighbours belonged to. or 2. If regression, find the average value of all the closest neighbours and assign it as the value for the unknown object.  For step 3, the most used distance formula is Euclidean Distance which is given as follows:  By Euclidean Distance, the distance between two points P1(x1,y1)and P2(x2,y2) can be expressed as : Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
  • 31. Why do we need a K-NN Algorithm? Suppose there are two categories, i.e., Category A and Category B, and we have a new data point x1, so this data point will lie in which of these categories. To solve this type of problem, we need a K-NN algorithm. With the help of K-NN, we can easily identify the category or class of a particular dataset. Consider the below diagram: Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
  • 32. How does K-NN work? The K-NN working can be explained on the basis of the below algorithm:  Step-1: Select the number K of the neighbors  Step-2: Calculate the Euclidean distance of K number of neighbors  Step-3: Take the K nearest neighbors as per the calculated Euclidean distance.  Step-4: Among these k neighbors, count the number of the data points in each category.  Step-5: Assign the new data points to that category for which the number of the neighbor is maximum.  Step-6: Our model is ready.  Suppose we have a new data point and we need to put it in the required category. Consider the below image: Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
  • 33. Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune Firstly, we will choose the number of neighbors, so we will choose the k=5. Next, we will calculate the Euclidean distance between the data points. The Euclidean distance is the distance between two points, which we have already studied in geometry. It can be calculated as:
  • 34. Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune By calculating the Euclidean distance we got the nearest neighbors, as three nearest neighbors in category A and two nearest neighbors in category B. Consider the below image:
  • 35. Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune As we can see the 3 nearest neighbors are from category A, hence this new data point must belong to category A.
  • 36. How to select the value of K in the K-NN Algorithm? Below are some points to remember while selecting the value of K in the KNN algorithm:  There is no particular way to determine the best value for "K", so we need to try some values to find the best out of them. The most preferred value for K is 5.  A very low value for K such as K=1 or K=2, can be noisy and lead to the effects of outliers in the model.  Large values for K are good, but it may find some difficulties. Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
  • 37. Advantages of KNN Algorithm:  It is simple to implement.  It is robust to the noisy training data  It can be more effective if the training data is large. Disadvantages of KNN Algorithm:  Always needs to determine the value of K which may be complex some time.  The computation cost is high because of calculating the distance between the data points for all the training samples. Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
  • 38.  Decision Tree is a Supervised learning technique that can be used for both classification and Regression problems, but mostly it is preferred for solving Classification problems.  It is a tree-structured classifier, where internal nodes represent the features of a dataset, branches represent the decision rules and each leaf node represents the outcome.  In a Decision tree, there are two nodes, which are the Decision Node and Leaf Node. Decision nodes are used to make any decision and have multiple branches, whereas Leaf nodes are the output of those decisions and do not contain any further branches.  The decisions or the test are performed on the basis of features of the given dataset.  It is a graphical representation for getting all the possible solutions to a problem/decision based on given conditions. Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
  • 39.  It is called a decision tree because, similar to a tree, it starts with the root node, which expands on further branches and constructs a tree-like structure.  In order to build a tree, we use the CART algorithm, which stands for Classification and Regression Tree algorithm.  A decision tree simply asks a question, and based on the answer (Yes/No), it further split the tree into sub trees.  Below diagram explains the general structure of a decision tree: Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
  • 40. Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
  • 41. Why use Decision Trees? Below are the two reasons for using the Decision tree: 1.Decision Trees usually mimic human thinking ability while making a decision, so it is easy to understand. 2.The logic behind the decision tree can be easily understood because it shows a tree-like structure. Decision Tree Terminologies  Root Node: Root node is from where the decision tree starts. It represents the entire dataset, which further gets divided into two or more homogeneous sets.  Leaf Node: Leaf nodes are the final output node, and the tree cannot be segregated further after getting a leaf node. Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
  • 42.  Splitting: Splitting is the process of dividing the decision node/root node into sub-nodes according to the given conditions.  Branch/Sub Tree: A tree formed by splitting the tree.  Pruning: Pruning is the process of removing the unwanted branches from the tree.  Parent/Child node: The root node of the tree is called the parent node, and other nodes are called the child nodes.  How does the Decision Tree algorithm Work? In a decision tree, for predicting the class of the given dataset, the algorithm starts from the root node of the tree. This algorithm compares the values of root attribute with the record (real dataset) attribute and, based on the comparison, follows the branch and jumps to the next node. For the next node, the algorithm again compares the attribute value with the other sub-nodes and move further. It continues the process until it reaches the leaf node of the tree. The complete process can be better understood using the below algorithm: Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
  • 43.  Step-1: Begin the tree with the root node, says S, which contains the complete dataset.  Step-2: Find the best attribute in the dataset using Attribute Selection Measure (ASM).  Step-3: Divide the S into subsets that contains possible values for the best attributes.  Step-4: Generate the decision tree node, which contains the best attribute.  Step-5: Recursively make new decision trees using the subsets of the dataset created in step -3. Continue this process until a stage is reached where you cannot further classify the nodes and called the final node as a leaf node. Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
  • 44. Attribute Selection Measures :-  While implementing a Decision tree, the main issue arises that how to select the best attribute for the root node and for sub-nodes. So, to solve such problems there is a technique which is called as Attribute selection measure or ASM. By this measurement, we can easily select the best attribute for the nodes of the tree. There are two popular techniques for ASM, which are: 1.Information Gain:  Information gain is the measurement of changes in entropy after the segmentation of a dataset based on an attribute.  It calculates how much information a feature provides us about a class.  According to the value of information gain, we split the node and build the decision tree.  A decision tree algorithm always tries to maximize the value of information gain, and a node/attribute having the highest information gain is split first. It can be calculated using the below formula: Information Gain= Entropy(S)- [(Weighted Avg) *Entropy(each feature) Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
  • 45. Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune Entropy: Entropy is a metric to measure the impurity in a given attribute. It specifies randomness in data. Entropy can be calculated as: Entropy(s)= -P(yes)log2 P(yes)- P(no) log2 P(no) Where, S= Total number of samples P(yes)= probability of yes P(no)= probability of no 2. Gini Index:  Gini index is a measure of impurity or purity used while creating a decision tree in the CART(Classification and Regression Tree) algorithm.  An attribute with the low Gini index should be preferred as compared to the high Gini index.  It only creates binary splits, and the CART algorithm uses the Gini index to create binary splits.  Gini index can be calculated using the below formula: Gini Index= 1- ∑jPj 2
  • 46. Pruning: Getting an Optimal Decision tree  Pruning is a process of deleting the unnecessary nodes from a tree in order to get the optimal decision tree.  A too-large tree increases the risk of overfitting, and a small tree may not capture all the important features of the dataset. Therefore, a technique that decreases the size of the learning tree without reducing accuracy is known as Pruning. There are mainly two types of tree pruning technology used:1.Cost Complexity Pruning and 2.Reduced Error Pruning. Advantages of the Decision Tree  It is simple to understand as it follows the same process which a human follow while making any decision in real-life.  It can be very useful for solving decision-related problems.  It helps to think about all the possible outcomes for a problem.  There is less requirement of data cleaning compared to other algorithms. Disadvantages of the Decision Tree  The decision tree contains lots of layers, which makes it complex.  It may have an overfitting issue, which can be resolved using the Random Forest algorithm.  For more class labels, the computational complexity of the decision tree may increase. Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
  • 47.  Random Forest is a popular machine learning algorithm that belongs to the supervised learning technique. It can be used for both Classification and Regression problems in ML. It is based on the concept of ensemble learning, which is a process of combining multiple classifiers to solve a complex problem and to improve the performance of the model.  As the name suggests, "Random Forest is a classifier that contains a number of decision trees on various subsets of the given dataset and takes the average to improve the predictive accuracy of that dataset." Instead of relying on one decision tree, the random forest takes the prediction from each tree and based on the majority votes of predictions, and it predicts the final output.  The greater number of trees in the forest leads to higher accuracy and prevents the problem of overfitting.  The below diagram explains the working of the Random Forest algorithm: Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
  • 48. Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
  • 49.  Since the random forest combines multiple trees to predict the class of the dataset, it is possible that some decision trees may predict the correct output, while others may not. But together, all the trees predict the correct output. Therefore, below are two assumptions for a better Random forest classifier: 1.There should be some actual values in the feature variable of the dataset so that the classifier can predict accurate results rather than a guessed result. 2.The predictions from each tree must have very low correlations. Why use Random Forest? Below are some points that explain why we should use the Random Forest algorithm:  It takes less training time as compared to other algorithms.  It predicts output with high accuracy, even for the large dataset it runs efficiently.  It can also maintain accuracy when a large proportion of data is missing. Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
  • 50. How does Random Forest algorithm work?  Random Forest works in two-phase first is to create the random forest by combining N decision tree, and second is to make predictions for each tree created in the first phase. The Working process can be explained in the below steps and diagram:  Step-1: Select random K data points from the training set.  Step-2: Build the decision trees associated with the selected data points (Subsets).  Step-3: Choose the number N for decision trees that you want to build.  Step-4: Repeat Step 1 & 2.  Step-5: For new data points, find the predictions of each decision tree, and assign the new data points to the category that wins the majority votes. Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
  • 51. Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
  • 52. Applications of Random Forest There are mainly four sectors where Random forest mostly used:  Banking: Banking sector mostly uses this algorithm for the identification of loan risk.  Medicine: With the help of this algorithm, disease trends and risks of the disease can be identified.  Land Use: We can identify the areas of similar land use by this algorithm.  Marketing: Marketing trends can be identified using this algorithm. Advantages of Random Forest  Random Forest is capable of performing both Classification and Regression tasks.  It is capable of handling large datasets with high dimensionality.  It enhances the accuracy of the model and prevents the overfitting issue. Disadvantages of Random Forest  Although random forest can be used for both classification and regression tasks, it is not more suitable for Regression tasks. Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune
  • 53. Thanks !!! Mrs.Harsha Patil,Dr.D.Y.Patil ACS College,Pimpri,Pune