Random Forest Algorithm widespread popularity stems from its user-friendly nature and adaptability, enabling it to tackle both classification and regression problems effectively. The algorithm’s strength lies in its ability to handle complex datasets and mitigate overfitting, making it a valuable tool for various predictive tasks in machine learning.
One of the most important features of the Random Forest Algorithm is that it can handle the data set containing continuous variables, as in the case of regression, and categorical variables, as in the case of classification. It performs better for classification and regression tasks. In this tutorial, we will understand the working of random forest and implement random forest on a classification task.
2. Agenda
• What Is Random Forest And History
• What is Supervised Learning
• What Is Decision Tree
• Decision Tree Important Terms
• How Does Decision Tree Works
• Ensemble Learning
• Dataset Preparation
• Bagging At Training Time
• Bagging At Inference Time
• Random Subspace Method At Training Time
• Random Subspace Method At Inference Time
• Definition
• Random Forest Model
• Why Use Random Forest Model
• Advantages & Disadvantages
• Random Forest Application
R A N D O M F O R E S T 2
3. Introduction
• A random Forest Algorithm is a supervised machine learning Algorithm consisting
decision trees.
• The general method of random decision forests was first proposed by Ho in 1995.
After that, It was developed by Leo Breiman in 2001.
4. What Is Supervised Learning:
1. Supervised learning is the type of machine learning in which machines are trained using
labeled training data.
2. On the basis of that data, machines predict the output.
3. The labeled data means some input data is already tagged with the correct output.
5.
6.
7. How Decision Tree Works
• It follows a tree-like model of decisions and their possible consequences.
• The algorithm works by recursively splitting the data into subsets based on the most significant feature at each
node of the tree.
8.
9. Ensemble Learning
Ensemble learning creates a stronger model by aggregating the predictions of multiple weak models. Random
Forest is an example of ensemble learning where each model is a decision tree. The idea behind it is - the
wisdom of the crowd. The majority vote aggregation can have better accuracy than the individual models.
15. Definition
The Random Forest algorithm is an ensemble learning method consisting of many decision
trees that are built by using bagging and feature bagging which helps to create an
uncorrelated forest of trees whose combined prediction is more accurate than that of a single
tree. For classification tasks, final prediction is done by taking majority votes and for
regression tasks, average of all the individual trees.
16.
17. Why Use Random Forest
Random forests are an effective tool in prediction.
Forests give results competitive with boosting and adaptive bagging, yet do not
progressively change the training set.
Random inputs and random features produce good results in classification- less so in
regression.
For larger data sets, we can gain accuracy by combining random features with boosting.
18. Advantages
Versatile uses
Easy-to-understand hyperparameters
Classifier doesn't overfit with enough
trees
Disadvantages
Increased accuracy requires more trees
More trees slow down model
Can’t describe relationships within data
Advantages and Disadvantages
19. Random Forest Applications
Detects reliable debtors and potential fraudsters in finance
Verifies medicine components and patient data in healthcare
Gauges whether customers will like products in e-commerce