2. Outlines
1 Introduction
2 Why Random Forest ?
3 What is Random Forest ?
4 Random Forest Example
5 How Random Forest Works ?
6 References
Subject: Machine LearningDr. Varun Kumar Lecture 8 2 / 13
3. Introduction
Supervised machine learning
1 Regression
Linear regression
Logistic regression
2 Classification: It is process for dividing a data sets into a different
categories or groups by adding label.
Decision tree
Naive Bayes
Random forest
K nearest neighbor (KNN)
Subject: Machine LearningDr. Varun Kumar Lecture 8 3 / 13
4. Random Forest
⇒ Random forest is an ensemble classifier made using decision tree
models.
⇒ Ensemble model combines the results from different models.
⇒ Combination of multiple decision tree.
Subject: Machine LearningDr. Varun Kumar Lecture 8 4 / 13
6. Why random forest ?
Use case- Credit risk detection
Subject: Machine LearningDr. Varun Kumar Lecture 8 6 / 13
7. Continued–
1 To minimize the loss (Bank need a decision rule to predict for giving
an approval to the loan.)
2 An applicant demographic (income, debit/credit history and
socio-economic profiles are considered.)
3 Data science based assistance tool (Helps for modeling the behavioral
patterns of individual customer)
Variable Measurements
Marital status Married or unmarried
Gender Male or female
Age Varried
Status Default or not
Time of payment Varried
Employment Employed or un-employed
Home ownership With home or without home
Education level Secondary above or below
Subject: Machine LearningDr. Varun Kumar Lecture 8 7 / 13
8. What is random forest ?
⇒ Random forest is versatile algorithm and capable with
Regression
Classification
⇒ It is a type of ensemble learning method.
⇒ Commonly used predictive modeling and machine learning techniques.
Subject: Machine LearningDr. Varun Kumar Lecture 8 8 / 13
9. Random forest algorithm
T: Number of features
D: Number of trees to be constructed
Subject: Machine LearningDr. Varun Kumar Lecture 8 9 / 13
10. How random forest works
Days Outlook Humidity Wind Play
01 Sunny High Weak No
02 Sunny High Strong No
03 Overcast High Weak Yes
04 Rain High Weak Yes
05 Rain Normal Weak Yes
06 Rain Normal Strong No
07 Overcast Normal Strong Yes
08 Sunny High Weak No
09 Sunny Normal Weak Yes
10 Rain Normal Weak Yes
11 Sunny Normal Strong Yes
12 Overcast High Strong Yes
13 Overcast Normal Weak Yes
14 Rain High Strong ‘ No
Subject: Machine LearningDr. Varun Kumar Lecture 8 10 / 13
12. Features of random forest
1 Most accurate learning algorithm
2 Works well for both classification and regression problem.
3 Runs efficiently on large data base
4 Requires almost no input preparation
5 Performs implicit features selection
6 Can be easily grown in parallel
7 Methods for balancing error in unbalanced data set.
Important steps
⇒ Data acquisition
⇒ Divide data set → (1) Training data set (2) Testing data set
⇒ Implement model
⇒ Visualize
⇒ Model validation
Subject: Machine LearningDr. Varun Kumar Lecture 8 12 / 13
13. References
E. Alpaydin, Introduction to machine learning. MIT press, 2020.
T. M. Mitchell, The discipline of machine learning. Carnegie Mellon University,
School of Computer Science, Machine Learning , 2006, vol. 9.
J. Grus, Data science from scratch: first principles with python. O’Reilly Media,
2019.
Subject: Machine LearningDr. Varun Kumar Lecture 8 13 / 13