Machine
Learning - V
Random Forest
Random Forest
To understand the Random Forest let’s first understand Ensemble model.
Ensemble model is a collection of outputs from multiple models to more
accuracy predictive modeling
Ensemble model are high demand due to easy implementation of
multiple models in a short time and effort for high prediction accuracy.
And Decision tree is a branching method of one or more if-then-else
statements for the predictors
* It is very useful for data exploration that breaks down the dataset into
smaller and smaller subsets of association.
Rupak Roy
Single decision Tree
In Decision trees the measure i.e. the branching of the tree is done by
Information Gain.
Information Gain = Entropy of the parent node – Entropy of the
split(children)
Entropy is a measure on how disorganized the systems is.
Entropy ranges from 0 to1. Pure node has an Entropy of 0 while impure
node has Entropy of 1
* The core algorithm for building decision trees was known as ID3 by
J.R.Quinlan
• It uses a top down approach and can be used to build Classification
and Regression Decision trees.
Rupak Roy
Decision Making in Regression Decision
As we know the main aim in regression tree is to reduce the standard
deviation and in classification tree the main aim is to reduce entropy.
Random Forest is suitable for numerical values and since random forest
is a collection of decision trees so first lets understand how numerical
values works for a decision tree.
The numerical values for the decision tree i.e. the regression tree uses
standard deviation scores to do the splitting. The attribute with the
largest standard deviation reduction is chosen for the next decision
node(the node that can be further split). The branch with standard
deviation value more than 0 usually needs for splitting.
Rupak Roy
Decision Making in Regression Decision
 Stop/pruning criteria is provided using a size based criteria to further
stop the tree growing which leads to over fitting problems.
 The process of splitting the decision nodes runs recursively till it
reaches the terminal/leafnodes(the node that cannot be further split)
When the number of instances is more than one at a least node we
calculate the average as the final value for the target.
Rupak Roy
Decision Tree Algorithms
ID3 or Iterative Dichotomizer is one of the first of 3 decision tree
algorithms to implement, developed by J Quinlan in 1986
C4.5 – is the next version also developed by J Quinlan, optimized for
continuous and discrete features with some improvement on the over
fitting problem by using bottom up approach known as pruning.
CART or Classification & Regression Trees
 The CART implementation is similar to C4.5 that prunes the tree by
imposing a complexity penalty based on number of leaves in the
tree.
 CART uses the GINI method to create binary splits. Most commonly
used decision tree algorithm.
Advantages of Single DT
Advantages of Single DT
 It is a non- parametric method i.e. it is independent of type, size of
underlying population, that is we can even use when sample size is
low. Therefore very fast and easy to understand and implement.
 Can handle outliers and missing values, therefore requires less data
preparation than other machine leaning methods and can be used
for both continuous and numerical data types.
Here let’s focus more into the disadvantages of a Decision Tree to get a
solution.
Rupak Roy
Disadvantages of Single DT
Disadvantages of Single DT
 As we know decision trees are easily prone to over fitting issue,
therefore it needs to be controlled by pruning techniques.
 It uses range of values to split the tree rather than actual values for
continuous numerical variables. Hence sometimes not very effective
for estimating continuous values.
 The robustness to outliers and skewness comes at the cost of throwing
away some of the information from the dataset.
 When some input variable have too many possible values they need
to be aggregated into groups else it will result in too many splits
which may result in poor predicting performance.
This disadvantages of a Decision Tree has given rise to the ensemble
methods.
Rupak Roy
Ensemble Methods
This disadvantages of a Decision Tree has given rise to the ensemble
methods.
* A collection of several models in this case collection of decision trees
are used in order to increase predictive power & the final score is
obtained by aggregating them.
• This is known as Ensemble Method in Machine Learning
Random forest for continuous numerical variables and Boosting &
Bagging for categorical variables are the most popular ensemble
methods.
However the basic functionality remains the same i.e. the original
concept of creating a tree by using entropy & information gain.
Rupak Roy
Random Forest in brief
• The goal of random forest is to improve the prediction accuracy by
using the collection of un-pruned decision trees combined with a rule
based criteria.
So let’s understand the goals of random forest in detail.
Rupak Roy

Introduction to Random Forest

  • 1.
  • 2.
    Random Forest To understandthe Random Forest let’s first understand Ensemble model. Ensemble model is a collection of outputs from multiple models to more accuracy predictive modeling Ensemble model are high demand due to easy implementation of multiple models in a short time and effort for high prediction accuracy. And Decision tree is a branching method of one or more if-then-else statements for the predictors * It is very useful for data exploration that breaks down the dataset into smaller and smaller subsets of association. Rupak Roy
  • 3.
    Single decision Tree InDecision trees the measure i.e. the branching of the tree is done by Information Gain. Information Gain = Entropy of the parent node – Entropy of the split(children) Entropy is a measure on how disorganized the systems is. Entropy ranges from 0 to1. Pure node has an Entropy of 0 while impure node has Entropy of 1 * The core algorithm for building decision trees was known as ID3 by J.R.Quinlan • It uses a top down approach and can be used to build Classification and Regression Decision trees. Rupak Roy
  • 4.
    Decision Making inRegression Decision As we know the main aim in regression tree is to reduce the standard deviation and in classification tree the main aim is to reduce entropy. Random Forest is suitable for numerical values and since random forest is a collection of decision trees so first lets understand how numerical values works for a decision tree. The numerical values for the decision tree i.e. the regression tree uses standard deviation scores to do the splitting. The attribute with the largest standard deviation reduction is chosen for the next decision node(the node that can be further split). The branch with standard deviation value more than 0 usually needs for splitting. Rupak Roy
  • 5.
    Decision Making inRegression Decision  Stop/pruning criteria is provided using a size based criteria to further stop the tree growing which leads to over fitting problems.  The process of splitting the decision nodes runs recursively till it reaches the terminal/leafnodes(the node that cannot be further split) When the number of instances is more than one at a least node we calculate the average as the final value for the target. Rupak Roy
  • 6.
    Decision Tree Algorithms ID3or Iterative Dichotomizer is one of the first of 3 decision tree algorithms to implement, developed by J Quinlan in 1986 C4.5 – is the next version also developed by J Quinlan, optimized for continuous and discrete features with some improvement on the over fitting problem by using bottom up approach known as pruning. CART or Classification & Regression Trees  The CART implementation is similar to C4.5 that prunes the tree by imposing a complexity penalty based on number of leaves in the tree.  CART uses the GINI method to create binary splits. Most commonly used decision tree algorithm.
  • 7.
    Advantages of SingleDT Advantages of Single DT  It is a non- parametric method i.e. it is independent of type, size of underlying population, that is we can even use when sample size is low. Therefore very fast and easy to understand and implement.  Can handle outliers and missing values, therefore requires less data preparation than other machine leaning methods and can be used for both continuous and numerical data types. Here let’s focus more into the disadvantages of a Decision Tree to get a solution. Rupak Roy
  • 8.
    Disadvantages of SingleDT Disadvantages of Single DT  As we know decision trees are easily prone to over fitting issue, therefore it needs to be controlled by pruning techniques.  It uses range of values to split the tree rather than actual values for continuous numerical variables. Hence sometimes not very effective for estimating continuous values.  The robustness to outliers and skewness comes at the cost of throwing away some of the information from the dataset.  When some input variable have too many possible values they need to be aggregated into groups else it will result in too many splits which may result in poor predicting performance. This disadvantages of a Decision Tree has given rise to the ensemble methods. Rupak Roy
  • 9.
    Ensemble Methods This disadvantagesof a Decision Tree has given rise to the ensemble methods. * A collection of several models in this case collection of decision trees are used in order to increase predictive power & the final score is obtained by aggregating them. • This is known as Ensemble Method in Machine Learning Random forest for continuous numerical variables and Boosting & Bagging for categorical variables are the most popular ensemble methods. However the basic functionality remains the same i.e. the original concept of creating a tree by using entropy & information gain. Rupak Roy
  • 10.
    Random Forest inbrief • The goal of random forest is to improve the prediction accuracy by using the collection of un-pruned decision trees combined with a rule based criteria. So let’s understand the goals of random forest in detail. Rupak Roy