What’s in it for you?
What is Machine Learning
Applications of Random Forest
What is Classification
Why Random Forest
What is Random Forest
Random Forest and Decision Tree
Use Case
How does Random Forest work
Remote
Sensing
Used in ETM devices to
acquire images of the
earth’s surface.
Accuracy is higher and
training time is less
Object Detection
Multiclass object
detection is done
using Random
Forest algorithms
Provides better
detection in
complicated
environments
Kinect
Random Forest is
used in a game
console called
Kinect
Tracks body
movements and
recreates it in the
game
Application of Random Forest
Remote
Sensing
Used in ETM devices to
acquire images of the
earth’s surface.
Accuracy is higher and
training time is less
Object Detection
Multiclass object
detection is done
using Random
Forest algorithms
Provides better
detection in
complicated
environments
Kinect
Random Forest is
used in a game
console called
Kinect
Tracks body
movements and
recreates it in the
game
Application of Random Forest
Kinect
Random Forest is
used in a game
console called
Kinect
Tracks body
movements and
recreates it in the
game
Application of Random Forest
Application of Random Forest
User performs a step Kinect registers the movement Marks the user based on accuracy
Application of Random Forest
User performs a step Kinect registers the movement Marks the user based on accuracy
Training set to identify body parts Random forest classifier
learns
Identifies the body parts
while dancing
Score game avatar based
on accuracy
What’s in it for you?
What is Machine Learning?
Applications of Random Forest
What is Classification?
Why Random Forest?
What is Random Forest?
Random Forest and Decision Tree
Comparing Random Forest and Regression
Use Case – Iris Flower Analysis
Types of Machine Learning
Machine
Learning
Supervised Learning
Unsupervised
Learning
Reinforcement
Learning
Types of Machine Learning
Machine
Learning
Supervised Learning
Unsupervised
Learning
Reinforcement
Learning
Types of Machine Learning
Supervised Learning
Machine
Learning
Unsupervised
Learning
Reinforcement
Learning
Types of Supervised Learning
Supervised Learning
Machine
LearningClassification Regression
Types of Supervised Learning
Supervised Learning
Machine
LearningRegressionClassification
What is Classification?
Supervised Learning
Machine
Learning
Classification is a kind of
problem wherein the outputs are
categorical in nature like ‘Yes’ or
‘No’, ‘True’ or ‘False’, ‘0’ or ‘1’
Classification
What is Classification?
Supervised Learning
Machine
LearningClassification
Solutions under Classification
Supervised Learning
Machine
LearningClassification
KNN
Solutions under Classification
Supervised Learning
Machine
LearningClassification
KNN
Naïve Bayes
Solutions under Classification
Supervised Learning
Machine
LearningClassification
KNN
Decision Tree
Naïve Bayes
Solutions under Classification
Supervised Learning
Machine
LearningClassification
KNN
Decision Tree
Naïve Bayes
Random
Forest
Solutions under Classification
Supervised Learning
Machine
LearningClassification
KNN
Decision Tree
Naïve Bayes
Random
Forest
Solutions under Classification
Why Random Forest?
Why Random Forest?
No overfitting
Use of multiple trees
reduce the risk of
overfitting
Training time is less
Estimates missing data
Random Forest
can maintain
accuracy when a
large proportion of
data is missing
High accuracy
For large data, it
produces highly
accurate predictions
Runs efficiently on large
database
What is Random Forest?
Random forest or Random Decision Forest is a method that operates by constructing multiple Decision Trees during training
phase.
The Decision of the majority of the trees is chosen by the random forest as the final decision
What is Random Forest?
Random forest or Random Decision Forest is a method that operates by constructing multiple Decision Trees during training
phase.
The Decision of the majority of the trees is chosen by the random forest as the final decision
Decision Tree 1
Output 1
What is Random Forest?
Random forest or Random Decision Forest is a method that operates by constructing multiple Decision Trees during training
phase.
The Decision of the majority of the trees is chosen by the random forest as the final decision
Decision Tree 1
Decision Tree 2
Output 1
Output 2
What is Random Forest?
Random forest or Random Decision Forest is a method that operates by constructing multiple Decision Trees during training
phase.
The Decision of the majority of the trees is chosen by the random forest as the final decision
Decision Tree 1
Decision Tree 2
Decision Tree 3
Output 1
Output 2
Output 3
What is Random Forest?
Random forest or Random Decision Forest is a method that operates by constructing multiple Decision Trees during training
phase.
The Decision of the majority of the trees is chosen by the random forest as the final decision
Decision Tree 1
Decision Tree 2
Decision Tree 3
Random Forest
Output 1
Output 2
Output 3
What is Random Forest?
Random forest or Random Decision Forest is a method that operates by constructing multiple Decision Trees during training
phase.
The Decision of the majority of the trees is chosen by the random forest as the final decision
Decision Tree 1
Decision Tree 2
Decision Tree 3
Random Forest
Output 1
Output 2
Output 3
What is Random Forest?
Random forest or Random Decision Forest is a method that operates by constructing multiple Decision Trees during training
phase.
The Decision of the majority of the trees is chosen by the random forest as the final decision
Decision Tree 1
Decision Tree 2
Decision Tree 3
Random Forest
Output 1
Output 2
Output 3
Majority
What is Random Forest?
Random forest or Random Decision Forest is a method that operates by constructing multiple Decision Trees during training
phase.
The Decision of the majority of the trees is chosen by the random forest as the final decision
Decision Tree 1
Decision Tree 2
Decision Tree 3
Random Forest
Output 1
Output 2
Output 3
Majority
Final Decision
What is Random Forest?
Random forest or Random Decision Forest is a method that operates by constructing multiple Decision Trees during training
phase.
The Decision of the majority of the trees is chosen by the random forest as the final decision
Decision Tree 1
Decision Tree 2
Decision Tree 3
Random Forest
Output 1
Output 2
Output 3
Majority
Final Decision
What is Random Forest?
Random Forest and Decision Tree
Decision Tree
Decision Tree is a tree shaped diagram used to determine a course of action. Each branch of the tree represents a
possible decision, occurrence or reaction
Is diameter>=3
False True
Is color Orange?
False True
Decision Tree is a Graph that uses branching method to illustrate every possible outcome of a decision
Entropy Information gain Leaf Node Decision
Node
Root Node
Entropy is the
measure of
randomness or
unpredictability in
the dataset
Decision Tree- Important Terms
Entropy
Entropy is the
measure of
randomness or
unpredictability in
the dataset
High entropy E1
Decision Tree - Important Terms
Entropy
Entropy is the
measure of
randomness or
unpredictability in
the dataset
Initial Dataset
Decision split
Set 1 Set 2
High entropy E1
Decision Tree - Important Terms
Entropy
Entropy is the
measure of
randomness or
unpredictability in
the dataset
Initial Dataset
Decision split
Set 1 Set 2
High entropy E1
Decision Tree - Important Terms
Entropy
Entropy is the
measure of
randomness or
unpredictability in
the dataset
Initial Dataset
Decision Split
Set 1 Set 2
High entropy
Lower entropy
After
Splitting
E1
E2
Decision Tree - Important Terms
Decision Tree is a Graph that uses branching method to illustrate every possible outcome of a decision
Entropy Information gain Leaf Node Decision
Node
Root Node
It is the measure of
decrease in entropy
after the dataset is
split
Decision Tree - Important Terms
Decision Tree is a Graph that uses branching method to illustrate every possible outcome of a decision
EntropyInformation gain
It is the measure of
decrease in entropy
after the dataset is
split
High entropy
Lower entropy
After
Splitting
E1
E2
Initial Dataset
Decision Split
Set 1 Set 2
E1
E2 E2
Decision Tree - Important Terms
Decision Tree is a Graph that uses branching method to illustrate every possible outcome of a decision
EntropyInformation gain
It is the measure of
decrease in entropy
after the dataset is
split
High entropy
Lower entropy
After
splitting
E1
E2
Information gain= E1-E2
Initial Dataset
Decision Split
Set 1 Set 2
E1
E2 E2
Decision Tree - Important Terms
Decision Tree is a Graph that uses branching method to illustrate every possible outcome of a decision
Entropy Information gain Leaf Node Decision
Node
Root Node
Leaf node
carries the
classification or
the decision
Decision Tree - Important Terms
EntropyLeaf Node
Leaf node
carries the
classification or
the decision
Initial Dataset
Decision Split
Set 1 Set 2
E1
E2 E2
Leaf Node
Decision Tree - Important Terms
Decision Tree is a Graph that uses branching method to illustrate every possible outcome of a decision
Entropy Information gain Leaf Node Decision
Node
Root Node
Decision node
has two or more
branches
Decision Tree - Important Terms
EntropyDecision
Node
Decision node
has two or more
branches
Initial Dataset
Decision Split
Set 1 Set 2
E1
E2 E2
Decision Node
Decision Tree - Important Terms
Decision Tree is a Graph that uses branching method to illustrate every possible outcome of a decision
Entropy Information gain Leaf Node Decision
Node
Root Node
The top most
Decision node is
known as the
Root node
Decision Tree - Important Terms
Decision Tree is a Graph that uses branching method to illustrate every possible outcome of a decision
EntropyRoot Node
The top most
Decision node is
known as the
Root node
Initial Dataset
Decision Split
Set 1 Set 2
E1
E2 E2
Root Node
Decision Tree - Important Terms
How does a Decision Tree work?
Let’s try to understand how a
DecisionTree works with a simple
example
How does a Decision Tree work?
How does a Decision Tree work?
Problem statement
To classify the different types of fruits in
the bowl based on different features
How does a Decision Tree work?
Problem statement
To classify the different types of fruits in
the bowl based on different features
The dataset(bowl) is looking quite messy
and the entropy is high in this case
How does a Decision Tree work?
Problem statement
To classify the different types of fruits in
the bowl based on different features
The dataset(bowl) is looking quite messy
and the entropy is high in this case
Training Dataset
Color Diameter Label
Red
Yellow
Purple
Red
Yellow
3
1
1
3
Purple
3
3
Apple
Apple
Lemon
Lemon
Grapes
Grapes
How does a Decision Tree work?
How to split the data
We have to frame the conditions that split
the data in such a way that the
information gain is the highest
How does a Decision Tree work?
How to split the data
We have to frame the conditions that split
the data in such a way that the
information gain is the highest
Note
Gain is the measure of
decrease in entropy after
splitting
How does a Decision Tree work?
Now we will try to choose a
condition that gives us the
highest gain
How does a Decision Tree work?
Now we will try to choose a
condition that gives us the
highest gain
We will do that by splitting
the data using each condition
and checking the gain that
we get out them.
How does a Decision Tree work?
We will do that by splitting
the data using each condition
and checking the gain that
we get out them.
The condition that gives us
the highest gain will be used
to make the first split
How does a Decision Tree work?
Conditions
Training Dataset
Color Diameter Label
Red
Yellow
purple
Red
Yellow
3
1
1
3
purple
3
3
Apple
Apple
Lemon
Lemon
Grapes
Grapes
Color== purple?
Diameter=3
Color==Yellow?
Color== Red?
Diameter=1
How does a Decision Tree work?
Conditions
Training Dataset
Color Diameter Label
Red
Yellow
purple
Red
Yellow
3
1
1
3
purple
3
3
Apple
Apple
Lemon
Lemon
Grapes
Grapes
Color== purple?
Diameter=3
Color==Yellow?
Color== Red?
Diameter=1
Let’s say this condition gives us the
maximum gain
How does a Decision Tree work?
Color==Yellow?
We split the data
Is diameter >=3?
False True
How does a Decision Tree work?
Is diameter >=3?
False True
The entropy after splitting has
decreased considerably
How does a Decision Tree work?
Is diameter >=3?
False True
The entropy after splitting has
decreased considerablyThis NODE has already
attained an entropy value
of zero
As you can see, there is
only one kind of label left
for this branch
How does a Decision Tree work?
Is diameter >=3?
False True
So no further splitting is
required for this NODE
How does a Decision Tree work?
Is diameter >=3?
False True
So no further splitting is
required for this NODE
However, this node still
requires a split to decrease
the entropy further
How does a Decision Tree work?
Is diameter>=3
False True
Is color yellow?
False True
So, we split the right
node further based on
the color
How does a Decision Tree work?
Is diameter>=3
False True
Is color yellow?
False True
So the entropy in this
case is now zero
How does a Decision Tree work?
Is diameter>=3
False True
Is color yellow?
False True
Predict lemon with
100% accuracy
How does a Decision Tree work?
Is diameter>=3
False True
Is color yellow?
False True
Predict apple with
100% accuracy
How does a Decision Tree work?
Is diameter>=3
False True
Is color yellow?
False True
Predict grapes
with 100%
accuracy
How does a Decision Tree work?
How does Random Forest work?
Is diameter>=3
False True
Is color Orange?
False True
How does a Random Forest work?
Let this beTree 1
Let this beTree 2
Is color=red
FalseTrue
Is shape==circle?
False true
How does a Random Forest work?
Let this beTree 3
Is diameter=1
FalseTrue
Grows in summer?
False True
How does a Random Forest work?
Is diameter>=3
False True
Is color Orange?
False True
Is color=red
FalseTrue
Is shape==circle?
False true
Is diameter=1
FalseTrue
Grows in summer?
False True
tree 1 tree 2 tree 3
How does a Random Forest work?
Now Lets try to classify this
fruit
How does a Random Forest work?
Is diameter>=3
False True
Is color Orange?
False True
Tree 1 classifies it as an
orange
How does a Random Forest work?
Diameter = 3
Colour = orange
Grows in summer = yes
SHAPE = CIRCLE
Tree 2 classifies it as
cherries
Is color=red
FalseTrue
Is shape==circle?
False True
How does a Random Forest work?
Diameter = 3
Colour = orange
Grows in summer = yes
SHAPE = CIRCLE
Tree 3 classifies it as
orange
Is diameter=1
FalseTrue
Grows in summer?
False True
How does a Random Forest work?
Diameter = 3
Colour = orange
Grows in summer = yes
SHAPE = CIRCLE
Is diameter>=3
False True
Is color Orange?
False True
Is color=red
FalseTrue
Is shape==circle?
False True
Is diameter=1
FalseTrue
Grows in summer?
False True
Tree 1 Tree 2 Tree 3
How does a Random Forest work?
How does a Random Forest work?
orange
Is diameter>=3
False True
Is color Orange?
False True
Is color=red
FalseTrue
Is shape==circle?
False True
Is diameter=1
FalseTrue
Grows in summer?
False True
Tree 1 Tree 2 Tree 3
How does a Random Forest work?
cherryorange
Is diameter>=3
False True
Is color Orange?
False True
Is color=red
FalseTrue
Is shape==circle?
False True
Is diameter=1
FalseTrue
Grows in summer?
False True
Tree 1 Tree 2 Tree 3
How does a Random Forest work?
orangecherryorange
Is diameter>=3
False True
Is color Orange?
False True
Is color=red
FalseTrue
Is shape==circle?
False True
Is diameter=1
FalseTrue
Grows in summer?
False True
Tree 1 Tree 2 Tree 3
How does a Random Forest work?
orangecherryorange
Is diameter>=3
False True
Is color Orange?
False True
Is color=red
FalseTrue
Is shape==circle?
False True
Is diameter=1
FalseTrue
Grows in summer?
False True
Tree 1 Tree 2 Tree 3
cherry
Majority voted
How does a Random Forest work?
orange
So the fruit is classified
as an orange
How does a Random Forest work?
So the fruit is classified
as an orange
How does a Random Forest work?
Use Case – Iris Flower Analysis
Wonder what species of Iris do these
flowers belong to?
Use Case - Problem Statement
Let’s try to predict the species of the
flowers using machine learning in Python
Use Case - Problem Statement
Let’s See how it can be done
Use Case - Implementation
# Loading the library with the iris dataset
from sklearn.datasets import load_iris
# Loading scikit's random forest classifier library
from sklearn.ensemble import RandomForestClassifier
# Loading pandas
import pandas as pd
# Loading numpy
import numpy as np
# Setting random seed
np.random.seed(0)
Use Case - Implementation
# Creating an object called iris with the iris data
iris = load_iris()
# Creating a dataframe with the four feature variables
df = pd.DataFrame(iris.data, columns=iris.feature_names)
# Viewing the top 5 rows
df.head()
Use Case - Implementation
# Adding a new column for the species name
df['species'] = pd.Categorical.from_codes(iris.target,
iris.target_names)
# Viewing the top 5 rows
df.head()
Use Case - Implementation
# Creating Test and Train Data
df['is_train'] = np.random.uniform(0, 1, len(df)) <= .75
# View the top 5 rows
df.head()
Use Case - Implementation
# Creating dataframes with test rows and training rows
train, test = df[df['is_train']==True], df[df['is_train']==False]
# Show the number of observations for the test and training dataframes
print('Number of observations in the training data:', len(train))
print('Number of observations in the test data:',len(test))
Use Case - Implementation
# Create a list of the feature column's names
features = df.columns[:4]
# View features
features
Use Case - Implementation
# Converting each species name into digits
y = pd.factorize(train['species'])[0]
# Viewing target
y
Use Case - Implementation
# Creating a random forest Classifier.
clf = RandomForestClassifier(n_jobs=2, random_state=0)
# Training the classifier
clf.fit(train[features], y)
Use Case - Implementation
# Applying the trained Classifier to the test
clf.predict(test[features])
Use Case - Implementation
# Viewing the predicted probabilities of the first 10
observations
clf.predict_proba(test[features])[0:10]
Use Case - Implementation
# mapping names for the plants for each predicted plant class
preds = iris.target_names[clf.predict(test[features])]
# View the PREDICTED species for the first five
observations
preds[0:5]
Use Case - Implementation
# Viewing the ACTUAL species for the first five observations
test['species'].head()
Use Case - Implementation
# Creating confusion matrix
pd.crosstab(test['species'], preds, rownames=['Actual Species'],
colnames=['Predicted Species'])
Use Case - Implementation
Use Case - Implementation
Total number of predictions = 32
Use Case - Implementation
Number of accurate predictions = 30
Use Case - Implementation
Number of inaccurate predictions = 2
Use Case - Implementation
ModelAccuracy
30
32
X 100 = 93
Use Case - Implementation
So the model accuracy
is 93%
ModelAccuracy
30
32
X 100 = 93
Key takeaways
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine Learning | Simplilearn

Random Forest Algorithm - Random Forest Explained | Random Forest In Machine Learning | Simplilearn

  • 1.
    What’s in itfor you? What is Machine Learning Applications of Random Forest What is Classification Why Random Forest What is Random Forest Random Forest and Decision Tree Use Case How does Random Forest work
  • 2.
    Remote Sensing Used in ETMdevices to acquire images of the earth’s surface. Accuracy is higher and training time is less Object Detection Multiclass object detection is done using Random Forest algorithms Provides better detection in complicated environments Kinect Random Forest is used in a game console called Kinect Tracks body movements and recreates it in the game Application of Random Forest
  • 3.
    Remote Sensing Used in ETMdevices to acquire images of the earth’s surface. Accuracy is higher and training time is less Object Detection Multiclass object detection is done using Random Forest algorithms Provides better detection in complicated environments Kinect Random Forest is used in a game console called Kinect Tracks body movements and recreates it in the game Application of Random Forest
  • 4.
    Kinect Random Forest is usedin a game console called Kinect Tracks body movements and recreates it in the game Application of Random Forest
  • 5.
    Application of RandomForest User performs a step Kinect registers the movement Marks the user based on accuracy
  • 6.
    Application of RandomForest User performs a step Kinect registers the movement Marks the user based on accuracy Training set to identify body parts Random forest classifier learns Identifies the body parts while dancing Score game avatar based on accuracy
  • 7.
    What’s in itfor you? What is Machine Learning? Applications of Random Forest What is Classification? Why Random Forest? What is Random Forest? Random Forest and Decision Tree Comparing Random Forest and Regression Use Case – Iris Flower Analysis
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
    Supervised Learning Machine Learning Classification isa kind of problem wherein the outputs are categorical in nature like ‘Yes’ or ‘No’, ‘True’ or ‘False’, ‘0’ or ‘1’ Classification What is Classification?
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
    Supervised Learning Machine LearningClassification KNN Decision Tree NaïveBayes Random Forest Solutions under Classification
  • 20.
    Supervised Learning Machine LearningClassification KNN Decision Tree NaïveBayes Random Forest Solutions under Classification
  • 21.
  • 22.
    Why Random Forest? Nooverfitting Use of multiple trees reduce the risk of overfitting Training time is less Estimates missing data Random Forest can maintain accuracy when a large proportion of data is missing High accuracy For large data, it produces highly accurate predictions Runs efficiently on large database
  • 23.
  • 24.
    Random forest orRandom Decision Forest is a method that operates by constructing multiple Decision Trees during training phase. The Decision of the majority of the trees is chosen by the random forest as the final decision What is Random Forest?
  • 25.
    Random forest orRandom Decision Forest is a method that operates by constructing multiple Decision Trees during training phase. The Decision of the majority of the trees is chosen by the random forest as the final decision Decision Tree 1 Output 1 What is Random Forest?
  • 26.
    Random forest orRandom Decision Forest is a method that operates by constructing multiple Decision Trees during training phase. The Decision of the majority of the trees is chosen by the random forest as the final decision Decision Tree 1 Decision Tree 2 Output 1 Output 2 What is Random Forest?
  • 27.
    Random forest orRandom Decision Forest is a method that operates by constructing multiple Decision Trees during training phase. The Decision of the majority of the trees is chosen by the random forest as the final decision Decision Tree 1 Decision Tree 2 Decision Tree 3 Output 1 Output 2 Output 3 What is Random Forest?
  • 28.
    Random forest orRandom Decision Forest is a method that operates by constructing multiple Decision Trees during training phase. The Decision of the majority of the trees is chosen by the random forest as the final decision Decision Tree 1 Decision Tree 2 Decision Tree 3 Random Forest Output 1 Output 2 Output 3 What is Random Forest?
  • 29.
    Random forest orRandom Decision Forest is a method that operates by constructing multiple Decision Trees during training phase. The Decision of the majority of the trees is chosen by the random forest as the final decision Decision Tree 1 Decision Tree 2 Decision Tree 3 Random Forest Output 1 Output 2 Output 3 What is Random Forest?
  • 30.
    Random forest orRandom Decision Forest is a method that operates by constructing multiple Decision Trees during training phase. The Decision of the majority of the trees is chosen by the random forest as the final decision Decision Tree 1 Decision Tree 2 Decision Tree 3 Random Forest Output 1 Output 2 Output 3 Majority What is Random Forest?
  • 31.
    Random forest orRandom Decision Forest is a method that operates by constructing multiple Decision Trees during training phase. The Decision of the majority of the trees is chosen by the random forest as the final decision Decision Tree 1 Decision Tree 2 Decision Tree 3 Random Forest Output 1 Output 2 Output 3 Majority Final Decision What is Random Forest?
  • 32.
    Random forest orRandom Decision Forest is a method that operates by constructing multiple Decision Trees during training phase. The Decision of the majority of the trees is chosen by the random forest as the final decision Decision Tree 1 Decision Tree 2 Decision Tree 3 Random Forest Output 1 Output 2 Output 3 Majority Final Decision What is Random Forest?
  • 33.
    Random Forest andDecision Tree
  • 34.
    Decision Tree Decision Treeis a tree shaped diagram used to determine a course of action. Each branch of the tree represents a possible decision, occurrence or reaction Is diameter>=3 False True Is color Orange? False True
  • 35.
    Decision Tree isa Graph that uses branching method to illustrate every possible outcome of a decision Entropy Information gain Leaf Node Decision Node Root Node Entropy is the measure of randomness or unpredictability in the dataset Decision Tree- Important Terms
  • 36.
    Entropy Entropy is the measureof randomness or unpredictability in the dataset High entropy E1 Decision Tree - Important Terms
  • 37.
    Entropy Entropy is the measureof randomness or unpredictability in the dataset Initial Dataset Decision split Set 1 Set 2 High entropy E1 Decision Tree - Important Terms
  • 38.
    Entropy Entropy is the measureof randomness or unpredictability in the dataset Initial Dataset Decision split Set 1 Set 2 High entropy E1 Decision Tree - Important Terms
  • 39.
    Entropy Entropy is the measureof randomness or unpredictability in the dataset Initial Dataset Decision Split Set 1 Set 2 High entropy Lower entropy After Splitting E1 E2 Decision Tree - Important Terms
  • 40.
    Decision Tree isa Graph that uses branching method to illustrate every possible outcome of a decision Entropy Information gain Leaf Node Decision Node Root Node It is the measure of decrease in entropy after the dataset is split Decision Tree - Important Terms
  • 41.
    Decision Tree isa Graph that uses branching method to illustrate every possible outcome of a decision EntropyInformation gain It is the measure of decrease in entropy after the dataset is split High entropy Lower entropy After Splitting E1 E2 Initial Dataset Decision Split Set 1 Set 2 E1 E2 E2 Decision Tree - Important Terms
  • 42.
    Decision Tree isa Graph that uses branching method to illustrate every possible outcome of a decision EntropyInformation gain It is the measure of decrease in entropy after the dataset is split High entropy Lower entropy After splitting E1 E2 Information gain= E1-E2 Initial Dataset Decision Split Set 1 Set 2 E1 E2 E2 Decision Tree - Important Terms
  • 43.
    Decision Tree isa Graph that uses branching method to illustrate every possible outcome of a decision Entropy Information gain Leaf Node Decision Node Root Node Leaf node carries the classification or the decision Decision Tree - Important Terms
  • 44.
    EntropyLeaf Node Leaf node carriesthe classification or the decision Initial Dataset Decision Split Set 1 Set 2 E1 E2 E2 Leaf Node Decision Tree - Important Terms
  • 45.
    Decision Tree isa Graph that uses branching method to illustrate every possible outcome of a decision Entropy Information gain Leaf Node Decision Node Root Node Decision node has two or more branches Decision Tree - Important Terms
  • 46.
    EntropyDecision Node Decision node has twoor more branches Initial Dataset Decision Split Set 1 Set 2 E1 E2 E2 Decision Node Decision Tree - Important Terms
  • 47.
    Decision Tree isa Graph that uses branching method to illustrate every possible outcome of a decision Entropy Information gain Leaf Node Decision Node Root Node The top most Decision node is known as the Root node Decision Tree - Important Terms
  • 48.
    Decision Tree isa Graph that uses branching method to illustrate every possible outcome of a decision EntropyRoot Node The top most Decision node is known as the Root node Initial Dataset Decision Split Set 1 Set 2 E1 E2 E2 Root Node Decision Tree - Important Terms
  • 49.
    How does aDecision Tree work?
  • 50.
    Let’s try tounderstand how a DecisionTree works with a simple example How does a Decision Tree work?
  • 51.
    How does aDecision Tree work?
  • 52.
    Problem statement To classifythe different types of fruits in the bowl based on different features How does a Decision Tree work?
  • 53.
    Problem statement To classifythe different types of fruits in the bowl based on different features The dataset(bowl) is looking quite messy and the entropy is high in this case How does a Decision Tree work?
  • 54.
    Problem statement To classifythe different types of fruits in the bowl based on different features The dataset(bowl) is looking quite messy and the entropy is high in this case Training Dataset Color Diameter Label Red Yellow Purple Red Yellow 3 1 1 3 Purple 3 3 Apple Apple Lemon Lemon Grapes Grapes How does a Decision Tree work?
  • 55.
    How to splitthe data We have to frame the conditions that split the data in such a way that the information gain is the highest How does a Decision Tree work?
  • 56.
    How to splitthe data We have to frame the conditions that split the data in such a way that the information gain is the highest Note Gain is the measure of decrease in entropy after splitting How does a Decision Tree work?
  • 57.
    Now we willtry to choose a condition that gives us the highest gain How does a Decision Tree work?
  • 58.
    Now we willtry to choose a condition that gives us the highest gain We will do that by splitting the data using each condition and checking the gain that we get out them. How does a Decision Tree work?
  • 59.
    We will dothat by splitting the data using each condition and checking the gain that we get out them. The condition that gives us the highest gain will be used to make the first split How does a Decision Tree work?
  • 60.
    Conditions Training Dataset Color DiameterLabel Red Yellow purple Red Yellow 3 1 1 3 purple 3 3 Apple Apple Lemon Lemon Grapes Grapes Color== purple? Diameter=3 Color==Yellow? Color== Red? Diameter=1 How does a Decision Tree work?
  • 61.
    Conditions Training Dataset Color DiameterLabel Red Yellow purple Red Yellow 3 1 1 3 purple 3 3 Apple Apple Lemon Lemon Grapes Grapes Color== purple? Diameter=3 Color==Yellow? Color== Red? Diameter=1 Let’s say this condition gives us the maximum gain How does a Decision Tree work? Color==Yellow?
  • 62.
    We split thedata Is diameter >=3? False True How does a Decision Tree work?
  • 63.
    Is diameter >=3? FalseTrue The entropy after splitting has decreased considerably How does a Decision Tree work?
  • 64.
    Is diameter >=3? FalseTrue The entropy after splitting has decreased considerablyThis NODE has already attained an entropy value of zero As you can see, there is only one kind of label left for this branch How does a Decision Tree work?
  • 65.
    Is diameter >=3? FalseTrue So no further splitting is required for this NODE How does a Decision Tree work?
  • 66.
    Is diameter >=3? FalseTrue So no further splitting is required for this NODE However, this node still requires a split to decrease the entropy further How does a Decision Tree work?
  • 67.
    Is diameter>=3 False True Iscolor yellow? False True So, we split the right node further based on the color How does a Decision Tree work?
  • 68.
    Is diameter>=3 False True Iscolor yellow? False True So the entropy in this case is now zero How does a Decision Tree work?
  • 69.
    Is diameter>=3 False True Iscolor yellow? False True Predict lemon with 100% accuracy How does a Decision Tree work?
  • 70.
    Is diameter>=3 False True Iscolor yellow? False True Predict apple with 100% accuracy How does a Decision Tree work?
  • 71.
    Is diameter>=3 False True Iscolor yellow? False True Predict grapes with 100% accuracy How does a Decision Tree work?
  • 72.
    How does RandomForest work?
  • 73.
    Is diameter>=3 False True Iscolor Orange? False True How does a Random Forest work? Let this beTree 1
  • 74.
    Let this beTree2 Is color=red FalseTrue Is shape==circle? False true How does a Random Forest work?
  • 75.
    Let this beTree3 Is diameter=1 FalseTrue Grows in summer? False True How does a Random Forest work?
  • 76.
    Is diameter>=3 False True Iscolor Orange? False True Is color=red FalseTrue Is shape==circle? False true Is diameter=1 FalseTrue Grows in summer? False True tree 1 tree 2 tree 3 How does a Random Forest work?
  • 77.
    Now Lets tryto classify this fruit How does a Random Forest work?
  • 78.
    Is diameter>=3 False True Iscolor Orange? False True Tree 1 classifies it as an orange How does a Random Forest work? Diameter = 3 Colour = orange Grows in summer = yes SHAPE = CIRCLE
  • 79.
    Tree 2 classifiesit as cherries Is color=red FalseTrue Is shape==circle? False True How does a Random Forest work? Diameter = 3 Colour = orange Grows in summer = yes SHAPE = CIRCLE
  • 80.
    Tree 3 classifiesit as orange Is diameter=1 FalseTrue Grows in summer? False True How does a Random Forest work? Diameter = 3 Colour = orange Grows in summer = yes SHAPE = CIRCLE
  • 81.
    Is diameter>=3 False True Iscolor Orange? False True Is color=red FalseTrue Is shape==circle? False True Is diameter=1 FalseTrue Grows in summer? False True Tree 1 Tree 2 Tree 3 How does a Random Forest work?
  • 82.
    How does aRandom Forest work? orange Is diameter>=3 False True Is color Orange? False True Is color=red FalseTrue Is shape==circle? False True Is diameter=1 FalseTrue Grows in summer? False True Tree 1 Tree 2 Tree 3
  • 83.
    How does aRandom Forest work? cherryorange Is diameter>=3 False True Is color Orange? False True Is color=red FalseTrue Is shape==circle? False True Is diameter=1 FalseTrue Grows in summer? False True Tree 1 Tree 2 Tree 3
  • 84.
    How does aRandom Forest work? orangecherryorange Is diameter>=3 False True Is color Orange? False True Is color=red FalseTrue Is shape==circle? False True Is diameter=1 FalseTrue Grows in summer? False True Tree 1 Tree 2 Tree 3
  • 85.
    How does aRandom Forest work? orangecherryorange Is diameter>=3 False True Is color Orange? False True Is color=red FalseTrue Is shape==circle? False True Is diameter=1 FalseTrue Grows in summer? False True Tree 1 Tree 2 Tree 3
  • 86.
    cherry Majority voted How doesa Random Forest work? orange
  • 87.
    So the fruitis classified as an orange How does a Random Forest work?
  • 88.
    So the fruitis classified as an orange How does a Random Forest work?
  • 89.
    Use Case –Iris Flower Analysis
  • 90.
    Wonder what speciesof Iris do these flowers belong to? Use Case - Problem Statement
  • 91.
    Let’s try topredict the species of the flowers using machine learning in Python Use Case - Problem Statement
  • 92.
    Let’s See howit can be done Use Case - Implementation
  • 93.
    # Loading thelibrary with the iris dataset from sklearn.datasets import load_iris # Loading scikit's random forest classifier library from sklearn.ensemble import RandomForestClassifier # Loading pandas import pandas as pd # Loading numpy import numpy as np # Setting random seed np.random.seed(0) Use Case - Implementation
  • 94.
    # Creating anobject called iris with the iris data iris = load_iris() # Creating a dataframe with the four feature variables df = pd.DataFrame(iris.data, columns=iris.feature_names) # Viewing the top 5 rows df.head() Use Case - Implementation
  • 95.
    # Adding anew column for the species name df['species'] = pd.Categorical.from_codes(iris.target, iris.target_names) # Viewing the top 5 rows df.head() Use Case - Implementation
  • 96.
    # Creating Testand Train Data df['is_train'] = np.random.uniform(0, 1, len(df)) <= .75 # View the top 5 rows df.head() Use Case - Implementation
  • 97.
    # Creating dataframeswith test rows and training rows train, test = df[df['is_train']==True], df[df['is_train']==False] # Show the number of observations for the test and training dataframes print('Number of observations in the training data:', len(train)) print('Number of observations in the test data:',len(test)) Use Case - Implementation
  • 98.
    # Create alist of the feature column's names features = df.columns[:4] # View features features Use Case - Implementation
  • 99.
    # Converting eachspecies name into digits y = pd.factorize(train['species'])[0] # Viewing target y Use Case - Implementation
  • 100.
    # Creating arandom forest Classifier. clf = RandomForestClassifier(n_jobs=2, random_state=0) # Training the classifier clf.fit(train[features], y) Use Case - Implementation
  • 101.
    # Applying thetrained Classifier to the test clf.predict(test[features]) Use Case - Implementation
  • 102.
    # Viewing thepredicted probabilities of the first 10 observations clf.predict_proba(test[features])[0:10] Use Case - Implementation
  • 103.
    # mapping namesfor the plants for each predicted plant class preds = iris.target_names[clf.predict(test[features])] # View the PREDICTED species for the first five observations preds[0:5] Use Case - Implementation
  • 104.
    # Viewing theACTUAL species for the first five observations test['species'].head() Use Case - Implementation
  • 105.
    # Creating confusionmatrix pd.crosstab(test['species'], preds, rownames=['Actual Species'], colnames=['Predicted Species']) Use Case - Implementation
  • 106.
    Use Case -Implementation Total number of predictions = 32
  • 107.
    Use Case -Implementation Number of accurate predictions = 30
  • 108.
    Use Case -Implementation Number of inaccurate predictions = 2
  • 109.
    Use Case -Implementation ModelAccuracy 30 32 X 100 = 93
  • 110.
    Use Case -Implementation So the model accuracy is 93% ModelAccuracy 30 32 X 100 = 93
  • 111.

Editor's Notes