SlideShare a Scribd company logo
Master the Art of Analytics
Basic Analytics for Citizen Data Scientists
T h e A u g m e n t e d A n a l y t i c s J o u r n e y
March - 2021
Random Forest Regression
Terminology
Introduction & Example
Standard Input/Tuning Parameters & Sample UI
Sample Output UI
Interpretation of Output
Limitations
Business Use Cases
Overview
Introduction – Random Forest Regression
Figure 1 – Structure of Random Forest Regression
Random Forest Regression creates a set
of Decision Trees from
a randomly selected subset of the
training set, and aggregates by averaging
values from different decision trees to
decide the final target value.
Predictors and Target Variable
Target Variable (usually denoted by Y)
represents the variable that will be
predicted and is also called Dependent
Variable, Response Variable or Outcome
Variable.
Predictor (usually denoted by X) is
sometimes called an Independent
Variable or Explanatory Variable, and is
the variable used to predict the Target
Variable Y.
Methodology
How Random Forest works:
1. Pick k data points at random from
the training set.
2. Build a decision tree associated to
these k data points.
3. Choose the number of trees – N,
that you want to build and repeat
steps 1 and 2.
4. For a new data point, make each
one of your N-trees tree predict the
value of y for the data point in
question and assign the new data
point to the average across all of
the predicted y values.
Sample Random Forest Regression
Here we perform Random Forest Regression analysis on independent variables: carpet area, buildup area, market distance, rainfall, city type and
target variable: House price
Independent
variables (Xi)
Target
Variable (Y) Model is a good fit as
Accuracy > 70%
Regression Statistics
Accuracy 78%
Root Mean Square
Error
179.23
Mean Absolute Error 92.20
House
Price
Carpet
Area
Build up
area
Market
distance
Rainfall City type
2027000 1624 2171 1 870 CAT A
6118000 986 1822 0.7 1160 CAT C
5916000 1627 1770 0.2 340 CAT A
4350000 1816 1154 2 1250 CAT B
8976000 1160 2000 0.1 1150 CAT B
7157000 1309 1807 0.4 420 CAT C
5934000 1543 1678 3 680 CAT A
6354000 2019 1543 1 1100 CAT B
• Root Mean Square Error: Square root of
the average of squared difference between
prediction and actual observation
• Mean Absolute Error: Average of the
difference between prediction and actual
observation.
Select the Target Variable
Build up Area
Carpet Area
House Price
Market Distance
Step
1
Select the Predictors
Build up Area
Carpet Area
House Price
Market Distance
Step
2
More than one
predictor can be
selected
Step 3
Number of Trees = 20
By default these parameters
should be set with the values
mentioned
Step 4
Display the output window containing following:
o Model summary
o Interpretation
o Residual plot
▪ Categorical predictors should be auto detected and converted to dummy/binary variables before applying regression
▪ Decision on selection of predictors depends on business knowledge and the correlation value between the target variable and predictors.
Standard Input/Tuning Parameters & Sample UI
Sample Output - Model Summary
Root Mean Square Error (RMSE): Square root of the average of squared differences between
prediction and actual observation. It is the standard deviation of residual error.
Mean Absolute Error (MAE): Average of the absolute differences between prediction and actual
observation.
Accuracy: Reveals appropriateness of the fit of the model, with
a value between 1 and 100. The closer the value to 100, the
better the model.
Root Mean Square Error 179.23
Mean Absolute Error 92.20
Used to identify the variation of errors from predicted to actual values. Lower Values (near to
zero) of RMSE and MAE represent a better fit of the regression model.
Sample Output - Interpretation
The Feature Importance chart reveals the impact of each Influencer on the Target Variable.
Sample Output - Plot
The Residual Plot is used to check the assumption of equal error variances and outliers
*See Interpretation sample for more details
Interpretation of Model Statistics
Accuracy
• Accuracy >70%
indicates the model is
a good fit for the data,
and that the predicted
values are reasonably
accurate
• Accuracy <70%
indicates that the model
is not a good fit for the
data, and the predicted
values are likely to have
significant errors
Root Mean Square
Error (RMSE):
• Square root of the
average of squared
differences between
the prediction and the
actual observation
(standard deviation of
residual error)
• Used to identify
variation from predicted
to actual values
• Lower values
(near zero) indicate a
better fit of regression
model
Mean Absolute
Error (MAE)
• Average of absolute
differences between
prediction and actual
observation
• Used to identify
variation of errors
from predicted to
actual values
• Lower values
(near zero)
indicate better fit of
regression model
Feature Important
• Values are used to check
the impact of each
Influencer (Predictor) on
the Target Variable
• Random Forest is a
useful technique to
determine Feature
Importance
• Feature importance is
useful in selecting
appropriate predictors
affecting the target
which is helpful in
training the model.
Interpretation of Plots - Residual vs. Fit Plot
Indicates the scattered plot of standardized residuals on Y axis and predicted (fitted) values on X axis
Note: The red data point in figure 1 is an outlier and should be removed from data before interpreting the model
Used to detect the unequal residual variances and Outliers in data
Limitations
• Random Forest Regression is limited
to predicting numeric output so
dependent variable must be numeric
in nature.
• The minimum sample size should be
at least 20 cases per independent
variable.
• Residuals should be time
independent as illustrated in the
image..
Time independent error (fairly constant over time and within a certain range)
Limitations
• Target/Independent variables should be normally
distributed
A normal distribution is an arrangement of a dataset in
which most values are midrange and the rest taper off
symmetrically toward either extreme. It will look like a
bell curve as shown in figure 1 on the right
• Outliers in data (target as well as Independent Variables)
can affect the analysis, and must be removed.
Outliers are the observations lying outside overall pattern
of distribution as shown in figure 2 in right. These
extreme values/outliers can be replaced with 1st or 99th
percentile values to improve model accuracy
Outliers
Figure 1
Figure 2
Business Use Case - House Price
Business Problem
A real-estate brokerage company wants to measure the impact of locality, the number of rooms, the
area(sq. yards) etc. on a house price. The goal of this statistical analysis is to help us understand the
relationship between house features and how these variables are used to predict house price.
Target
House Price
Predictors
Area with carpet, Rainfall, city, parking, distance from hospital, distance from shopping, etc
Business Benefit
• The business can determine which predictors have a significant impact on house price.
• Pricing strategies and recommendations will be more accurate and result in quicker sales.
• If the number of rooms or the distance from shopping or schools are significant factors, these factors
are given more focus when searching for a house that fits a client budget and affects profit.
Business Use Case - Agriculture
Business Problem
An agriculture business wants to measure the impact of weather, market price, quality of crop, land
used etc. on the crop price.
Input Data
Predictor/Independent Variables
• Weather
• Demand
• Crop health
Dependent Variable
Crop Price
Business Benefit
• Business can clarify which factors have a significant impact on crop price.
• Pricing strategies can be refined to improve accuracy and meet targeted crop pricing and revenue.
• If crop health and climate are significant factors, these factors would receive more focus when
deciding crop price.
Business Use Case - Compensation Policies
Business Problem
A business wishes to measure the salary of employee based on position, experience, degree, level,
productive hours etc.
Input Data
Predictor/Independent Variables
• Position
• Years of experience
• Productive hours
Dependent Variable
Salary of Employee
Business Benefit
• The business can clarify which predictors have a significant impact on employee salary.
• Salary policies and strategies can more accurately reflect employee value and targeted salaries.
• If productive hours and experience are significant factors, these factors would be given more focus
when developing salary policies.
Want to
Learn More?
Contact Us: Support@Smarten.com
Explore & Learn: Smarten.com

More Related Content

What's hot

Antenna theory tutorial
Antenna theory tutorialAntenna theory tutorial
Antenna theory tutorial
HarikaReddy115
 
Introduction to wavelet transform
Introduction to wavelet transformIntroduction to wavelet transform
Introduction to wavelet transform
Raj Endiran
 
Interpolation
InterpolationInterpolation
Interpolation
seidmmd
 
Remote Sensing:. Image Filtering
Remote Sensing:. Image FilteringRemote Sensing:. Image Filtering
Remote Sensing:. Image Filtering
Kamlesh Kumar
 
Image pre processing
Image pre processingImage pre processing
Image pre processingAshish Kumar
 
Cnn
CnnCnn
Image noise reduction
Image noise reductionImage noise reduction
Image noise reduction
Jksuryawanshi
 
Support Vector Machine without tears
Support Vector Machine without tearsSupport Vector Machine without tears
Support Vector Machine without tears
Ankit Sharma
 
Reference data and its importance in Remote Sensing
Reference data and its importance in Remote SensingReference data and its importance in Remote Sensing
Reference data and its importance in Remote Sensing
Hiba Shahid
 
Principal Component Analysis PCA
Principal Component Analysis PCAPrincipal Component Analysis PCA
Principal Component Analysis PCA
Abdullah al Mamun
 
Digital Signal Processing
Digital Signal ProcessingDigital Signal Processing
Digital Signal Processing
Sandip Ladi
 
application of correlation
application of correlationapplication of correlation
application of correlation
sudhanyavinod
 
Steps for Principal Component Analysis (pca) using ERDAS software
Steps for Principal Component Analysis (pca) using ERDAS softwareSteps for Principal Component Analysis (pca) using ERDAS software
Steps for Principal Component Analysis (pca) using ERDAS software
Swetha A
 
Dbscan algorithom
Dbscan algorithomDbscan algorithom
Dbscan algorithom
Mahbubur Rahman Shimul
 
Digital Image Processing - Image Enhancement
Digital Image Processing  - Image EnhancementDigital Image Processing  - Image Enhancement
Digital Image Processing - Image Enhancement
Mathankumar S
 
Avanced Image Classification
Avanced Image ClassificationAvanced Image Classification
Avanced Image Classification
Bayes Ahmed
 
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...
Simplilearn
 
Machine Learning Performance metrics for classification
Machine Learning Performance metrics for classificationMachine Learning Performance metrics for classification
Machine Learning Performance metrics for classification
Kuppusamy P
 
Regression ppt.pptx
Regression ppt.pptxRegression ppt.pptx
Regression ppt.pptx
DevendraSinghKaushal1
 

What's hot (20)

Antenna theory tutorial
Antenna theory tutorialAntenna theory tutorial
Antenna theory tutorial
 
Introduction to wavelet transform
Introduction to wavelet transformIntroduction to wavelet transform
Introduction to wavelet transform
 
Edge detection
Edge detectionEdge detection
Edge detection
 
Interpolation
InterpolationInterpolation
Interpolation
 
Remote Sensing:. Image Filtering
Remote Sensing:. Image FilteringRemote Sensing:. Image Filtering
Remote Sensing:. Image Filtering
 
Image pre processing
Image pre processingImage pre processing
Image pre processing
 
Cnn
CnnCnn
Cnn
 
Image noise reduction
Image noise reductionImage noise reduction
Image noise reduction
 
Support Vector Machine without tears
Support Vector Machine without tearsSupport Vector Machine without tears
Support Vector Machine without tears
 
Reference data and its importance in Remote Sensing
Reference data and its importance in Remote SensingReference data and its importance in Remote Sensing
Reference data and its importance in Remote Sensing
 
Principal Component Analysis PCA
Principal Component Analysis PCAPrincipal Component Analysis PCA
Principal Component Analysis PCA
 
Digital Signal Processing
Digital Signal ProcessingDigital Signal Processing
Digital Signal Processing
 
application of correlation
application of correlationapplication of correlation
application of correlation
 
Steps for Principal Component Analysis (pca) using ERDAS software
Steps for Principal Component Analysis (pca) using ERDAS softwareSteps for Principal Component Analysis (pca) using ERDAS software
Steps for Principal Component Analysis (pca) using ERDAS software
 
Dbscan algorithom
Dbscan algorithomDbscan algorithom
Dbscan algorithom
 
Digital Image Processing - Image Enhancement
Digital Image Processing  - Image EnhancementDigital Image Processing  - Image Enhancement
Digital Image Processing - Image Enhancement
 
Avanced Image Classification
Avanced Image ClassificationAvanced Image Classification
Avanced Image Classification
 
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...
 
Machine Learning Performance metrics for classification
Machine Learning Performance metrics for classificationMachine Learning Performance metrics for classification
Machine Learning Performance metrics for classification
 
Regression ppt.pptx
Regression ppt.pptxRegression ppt.pptx
Regression ppt.pptx
 

Similar to Random Forest Regression Analysis Reveals Impact of Variables on Target Values

Gradient Boosting Regression Analysis Reveals Dependent Variables and Interre...
Gradient Boosting Regression Analysis Reveals Dependent Variables and Interre...Gradient Boosting Regression Analysis Reveals Dependent Variables and Interre...
Gradient Boosting Regression Analysis Reveals Dependent Variables and Interre...
Smarten Augmented Analytics
 
What Is Multilayer Perceptron Classifier And How Is It Used For Enterprise An...
What Is Multilayer Perceptron Classifier And How Is It Used For Enterprise An...What Is Multilayer Perceptron Classifier And How Is It Used For Enterprise An...
What Is Multilayer Perceptron Classifier And How Is It Used For Enterprise An...
Smarten Augmented Analytics
 
What Is Random Forest Classification And How Can It Help Your Business?
What Is Random Forest Classification And How Can It Help Your Business?What Is Random Forest Classification And How Can It Help Your Business?
What Is Random Forest Classification And How Can It Help Your Business?
Smarten Augmented Analytics
 
What Is Generalized Linear Regression with Gaussian Distribution And How Can ...
What Is Generalized Linear Regression with Gaussian Distribution And How Can ...What Is Generalized Linear Regression with Gaussian Distribution And How Can ...
What Is Generalized Linear Regression with Gaussian Distribution And How Can ...
Smarten Augmented Analytics
 
Intro to data science
Intro to data scienceIntro to data science
Intro to data science
ANURAG SINGH
 
Introduction To Data Science Using R
Introduction To Data Science Using RIntroduction To Data Science Using R
Introduction To Data Science Using R
ANURAG SINGH
 
What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?
What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?
What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?
Smarten Augmented Analytics
 
Simple & Multiple Regression Analysis
Simple & Multiple Regression AnalysisSimple & Multiple Regression Analysis
Simple & Multiple Regression Analysis
Shailendra Tomar
 
Prediction of house price using multiple regression
Prediction of house price using multiple regressionPrediction of house price using multiple regression
Prediction of house price using multiple regression
vinovk
 
Stat_AMBA_600_Problem Set3
Stat_AMBA_600_Problem Set3Stat_AMBA_600_Problem Set3
Stat_AMBA_600_Problem Set3Tyler Anton
 
Forecasting Techniques
Forecasting TechniquesForecasting Techniques
Forecasting Techniques
Anand Subramaniam
 
Forecasting Techniques
Forecasting TechniquesForecasting Techniques
Forecasting Techniques
guest865c0e0c
 
Strategic approachppg v02
Strategic approachppg v02Strategic approachppg v02
Strategic approachppg v02
Daniel Arturo Espinoza Soto
 
Project Week 71. Both graphs shows a.docx
Project Week 71. Both graphs shows a.docxProject Week 71. Both graphs shows a.docx
Project Week 71. Both graphs shows a.docx
wkyra78
 
Percentage and its applications /COMMERCIAL MATHEMATICS
Percentage and its applications /COMMERCIAL MATHEMATICSPercentage and its applications /COMMERCIAL MATHEMATICS
Percentage and its applications /COMMERCIAL MATHEMATICS
indianeducation
 
Churn Modeling-For-Mobile-Telecommunications
Churn Modeling-For-Mobile-Telecommunications Churn Modeling-For-Mobile-Telecommunications
Churn Modeling-For-Mobile-Telecommunications
Salford Systems
 
Chapter 03
Chapter 03Chapter 03
Chapter 03bmcfad01
 
SECTION VI - CHAPTER 39 - Descriptive Statistics basics
SECTION VI - CHAPTER 39 - Descriptive Statistics basicsSECTION VI - CHAPTER 39 - Descriptive Statistics basics
SECTION VI - CHAPTER 39 - Descriptive Statistics basics
Professional Training Academy
 
IRJET- House Rent Price Prediction
IRJET- House Rent Price PredictionIRJET- House Rent Price Prediction
IRJET- House Rent Price Prediction
IRJET Journal
 
Churn Analysis in Telecom Industry
Churn Analysis in Telecom IndustryChurn Analysis in Telecom Industry
Churn Analysis in Telecom Industry
Satyam Barsaiyan
 

Similar to Random Forest Regression Analysis Reveals Impact of Variables on Target Values (20)

Gradient Boosting Regression Analysis Reveals Dependent Variables and Interre...
Gradient Boosting Regression Analysis Reveals Dependent Variables and Interre...Gradient Boosting Regression Analysis Reveals Dependent Variables and Interre...
Gradient Boosting Regression Analysis Reveals Dependent Variables and Interre...
 
What Is Multilayer Perceptron Classifier And How Is It Used For Enterprise An...
What Is Multilayer Perceptron Classifier And How Is It Used For Enterprise An...What Is Multilayer Perceptron Classifier And How Is It Used For Enterprise An...
What Is Multilayer Perceptron Classifier And How Is It Used For Enterprise An...
 
What Is Random Forest Classification And How Can It Help Your Business?
What Is Random Forest Classification And How Can It Help Your Business?What Is Random Forest Classification And How Can It Help Your Business?
What Is Random Forest Classification And How Can It Help Your Business?
 
What Is Generalized Linear Regression with Gaussian Distribution And How Can ...
What Is Generalized Linear Regression with Gaussian Distribution And How Can ...What Is Generalized Linear Regression with Gaussian Distribution And How Can ...
What Is Generalized Linear Regression with Gaussian Distribution And How Can ...
 
Intro to data science
Intro to data scienceIntro to data science
Intro to data science
 
Introduction To Data Science Using R
Introduction To Data Science Using RIntroduction To Data Science Using R
Introduction To Data Science Using R
 
What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?
What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?
What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?
 
Simple & Multiple Regression Analysis
Simple & Multiple Regression AnalysisSimple & Multiple Regression Analysis
Simple & Multiple Regression Analysis
 
Prediction of house price using multiple regression
Prediction of house price using multiple regressionPrediction of house price using multiple regression
Prediction of house price using multiple regression
 
Stat_AMBA_600_Problem Set3
Stat_AMBA_600_Problem Set3Stat_AMBA_600_Problem Set3
Stat_AMBA_600_Problem Set3
 
Forecasting Techniques
Forecasting TechniquesForecasting Techniques
Forecasting Techniques
 
Forecasting Techniques
Forecasting TechniquesForecasting Techniques
Forecasting Techniques
 
Strategic approachppg v02
Strategic approachppg v02Strategic approachppg v02
Strategic approachppg v02
 
Project Week 71. Both graphs shows a.docx
Project Week 71. Both graphs shows a.docxProject Week 71. Both graphs shows a.docx
Project Week 71. Both graphs shows a.docx
 
Percentage and its applications /COMMERCIAL MATHEMATICS
Percentage and its applications /COMMERCIAL MATHEMATICSPercentage and its applications /COMMERCIAL MATHEMATICS
Percentage and its applications /COMMERCIAL MATHEMATICS
 
Churn Modeling-For-Mobile-Telecommunications
Churn Modeling-For-Mobile-Telecommunications Churn Modeling-For-Mobile-Telecommunications
Churn Modeling-For-Mobile-Telecommunications
 
Chapter 03
Chapter 03Chapter 03
Chapter 03
 
SECTION VI - CHAPTER 39 - Descriptive Statistics basics
SECTION VI - CHAPTER 39 - Descriptive Statistics basicsSECTION VI - CHAPTER 39 - Descriptive Statistics basics
SECTION VI - CHAPTER 39 - Descriptive Statistics basics
 
IRJET- House Rent Price Prediction
IRJET- House Rent Price PredictionIRJET- House Rent Price Prediction
IRJET- House Rent Price Prediction
 
Churn Analysis in Telecom Industry
Churn Analysis in Telecom IndustryChurn Analysis in Telecom Industry
Churn Analysis in Telecom Industry
 

More from Smarten Augmented Analytics

Crime Type Prediction - Augmented Analytics Use Case – Smarten
Crime Type Prediction - Augmented Analytics Use Case – SmartenCrime Type Prediction - Augmented Analytics Use Case – Smarten
Crime Type Prediction - Augmented Analytics Use Case – Smarten
Smarten Augmented Analytics
 
Students' Academic Performance Predictive Analytics Use Case – Smarten
Students' Academic Performance Predictive Analytics Use Case – SmartenStudents' Academic Performance Predictive Analytics Use Case – Smarten
Students' Academic Performance Predictive Analytics Use Case – Smarten
Smarten Augmented Analytics
 
What is Simple Linear Regression and How Can an Enterprise Use this Technique...
What is Simple Linear Regression and How Can an Enterprise Use this Technique...What is Simple Linear Regression and How Can an Enterprise Use this Technique...
What is Simple Linear Regression and How Can an Enterprise Use this Technique...
Smarten Augmented Analytics
 
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
Smarten Augmented Analytics
 
Fraud Mitigation Predictive Analytics Use Case – Smarten
Fraud Mitigation Predictive Analytics Use Case – SmartenFraud Mitigation Predictive Analytics Use Case – Smarten
Fraud Mitigation Predictive Analytics Use Case – Smarten
Smarten Augmented Analytics
 
Quality Control Predictive Analytics Use Case - Smarten
Quality Control Predictive Analytics Use Case - SmartenQuality Control Predictive Analytics Use Case - Smarten
Quality Control Predictive Analytics Use Case - Smarten
Smarten Augmented Analytics
 
Machine Maintenance Management Predictive Analytics Use Case - Smarten
Machine Maintenance Management Predictive Analytics Use Case - SmartenMachine Maintenance Management Predictive Analytics Use Case - Smarten
Machine Maintenance Management Predictive Analytics Use Case - Smarten
Smarten Augmented Analytics
 
Predictive Analytics Using External Data Augmented Analytics Use Case - Smarten
Predictive Analytics Using External Data Augmented Analytics Use Case - SmartenPredictive Analytics Using External Data Augmented Analytics Use Case - Smarten
Predictive Analytics Using External Data Augmented Analytics Use Case - Smarten
Smarten Augmented Analytics
 
Marketing Optimization Augmented Analytics Use Cases - Smarten
Marketing Optimization Augmented Analytics Use Cases - SmartenMarketing Optimization Augmented Analytics Use Cases - Smarten
Marketing Optimization Augmented Analytics Use Cases - Smarten
Smarten Augmented Analytics
 
Human Resource Attrition Augmented Analytics Use Case - Smarten
Human Resource Attrition Augmented Analytics Use Case - SmartenHuman Resource Attrition Augmented Analytics Use Case - Smarten
Human Resource Attrition Augmented Analytics Use Case - Smarten
Smarten Augmented Analytics
 
Customer Targeting Augmented Analytics Use Case - Smarten
Customer Targeting Augmented Analytics Use Case - SmartenCustomer Targeting Augmented Analytics Use Case - Smarten
Customer Targeting Augmented Analytics Use Case - Smarten
Smarten Augmented Analytics
 
What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?
What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?
What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?
Smarten Augmented Analytics
 
What is KNN Classification and How Can This Analysis Help an Enterprise?
What is KNN Classification and How Can This Analysis Help an Enterprise?What is KNN Classification and How Can This Analysis Help an Enterprise?
What is KNN Classification and How Can This Analysis Help an Enterprise?
Smarten Augmented Analytics
 
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
Smarten Augmented Analytics
 
What is the Independent Samples T Test Method of Analysis and How Can it Bene...
What is the Independent Samples T Test Method of Analysis and How Can it Bene...What is the Independent Samples T Test Method of Analysis and How Can it Bene...
What is the Independent Samples T Test Method of Analysis and How Can it Bene...
Smarten Augmented Analytics
 
What Are Simple Random Sampling and Stratified Random Sampling Analytical Tec...
What Are Simple Random Sampling and Stratified Random Sampling Analytical Tec...What Are Simple Random Sampling and Stratified Random Sampling Analytical Tec...
What Are Simple Random Sampling and Stratified Random Sampling Analytical Tec...
Smarten Augmented Analytics
 
What is Binary Logistic Regression Classification and How is it Used in Analy...
What is Binary Logistic Regression Classification and How is it Used in Analy...What is Binary Logistic Regression Classification and How is it Used in Analy...
What is Binary Logistic Regression Classification and How is it Used in Analy...
Smarten Augmented Analytics
 
What is the Paired Sample T Test and How is it Beneficial to Business Analysis?
What is the Paired Sample T Test and How is it Beneficial to Business Analysis?What is the Paired Sample T Test and How is it Beneficial to Business Analysis?
What is the Paired Sample T Test and How is it Beneficial to Business Analysis?
Smarten Augmented Analytics
 
What is Simple Linear Regression and How Can an Enterprise Use this Technique...
What is Simple Linear Regression and How Can an Enterprise Use this Technique...What is Simple Linear Regression and How Can an Enterprise Use this Technique...
What is Simple Linear Regression and How Can an Enterprise Use this Technique...
Smarten Augmented Analytics
 
What is ARIMAX Forecasting and How is it Used for Enterprise Analysis?
What is ARIMAX Forecasting and How is it Used for Enterprise Analysis?What is ARIMAX Forecasting and How is it Used for Enterprise Analysis?
What is ARIMAX Forecasting and How is it Used for Enterprise Analysis?
Smarten Augmented Analytics
 

More from Smarten Augmented Analytics (20)

Crime Type Prediction - Augmented Analytics Use Case – Smarten
Crime Type Prediction - Augmented Analytics Use Case – SmartenCrime Type Prediction - Augmented Analytics Use Case – Smarten
Crime Type Prediction - Augmented Analytics Use Case – Smarten
 
Students' Academic Performance Predictive Analytics Use Case – Smarten
Students' Academic Performance Predictive Analytics Use Case – SmartenStudents' Academic Performance Predictive Analytics Use Case – Smarten
Students' Academic Performance Predictive Analytics Use Case – Smarten
 
What is Simple Linear Regression and How Can an Enterprise Use this Technique...
What is Simple Linear Regression and How Can an Enterprise Use this Technique...What is Simple Linear Regression and How Can an Enterprise Use this Technique...
What is Simple Linear Regression and How Can an Enterprise Use this Technique...
 
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
 
Fraud Mitigation Predictive Analytics Use Case – Smarten
Fraud Mitigation Predictive Analytics Use Case – SmartenFraud Mitigation Predictive Analytics Use Case – Smarten
Fraud Mitigation Predictive Analytics Use Case – Smarten
 
Quality Control Predictive Analytics Use Case - Smarten
Quality Control Predictive Analytics Use Case - SmartenQuality Control Predictive Analytics Use Case - Smarten
Quality Control Predictive Analytics Use Case - Smarten
 
Machine Maintenance Management Predictive Analytics Use Case - Smarten
Machine Maintenance Management Predictive Analytics Use Case - SmartenMachine Maintenance Management Predictive Analytics Use Case - Smarten
Machine Maintenance Management Predictive Analytics Use Case - Smarten
 
Predictive Analytics Using External Data Augmented Analytics Use Case - Smarten
Predictive Analytics Using External Data Augmented Analytics Use Case - SmartenPredictive Analytics Using External Data Augmented Analytics Use Case - Smarten
Predictive Analytics Using External Data Augmented Analytics Use Case - Smarten
 
Marketing Optimization Augmented Analytics Use Cases - Smarten
Marketing Optimization Augmented Analytics Use Cases - SmartenMarketing Optimization Augmented Analytics Use Cases - Smarten
Marketing Optimization Augmented Analytics Use Cases - Smarten
 
Human Resource Attrition Augmented Analytics Use Case - Smarten
Human Resource Attrition Augmented Analytics Use Case - SmartenHuman Resource Attrition Augmented Analytics Use Case - Smarten
Human Resource Attrition Augmented Analytics Use Case - Smarten
 
Customer Targeting Augmented Analytics Use Case - Smarten
Customer Targeting Augmented Analytics Use Case - SmartenCustomer Targeting Augmented Analytics Use Case - Smarten
Customer Targeting Augmented Analytics Use Case - Smarten
 
What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?
What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?
What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?
 
What is KNN Classification and How Can This Analysis Help an Enterprise?
What is KNN Classification and How Can This Analysis Help an Enterprise?What is KNN Classification and How Can This Analysis Help an Enterprise?
What is KNN Classification and How Can This Analysis Help an Enterprise?
 
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
 
What is the Independent Samples T Test Method of Analysis and How Can it Bene...
What is the Independent Samples T Test Method of Analysis and How Can it Bene...What is the Independent Samples T Test Method of Analysis and How Can it Bene...
What is the Independent Samples T Test Method of Analysis and How Can it Bene...
 
What Are Simple Random Sampling and Stratified Random Sampling Analytical Tec...
What Are Simple Random Sampling and Stratified Random Sampling Analytical Tec...What Are Simple Random Sampling and Stratified Random Sampling Analytical Tec...
What Are Simple Random Sampling and Stratified Random Sampling Analytical Tec...
 
What is Binary Logistic Regression Classification and How is it Used in Analy...
What is Binary Logistic Regression Classification and How is it Used in Analy...What is Binary Logistic Regression Classification and How is it Used in Analy...
What is Binary Logistic Regression Classification and How is it Used in Analy...
 
What is the Paired Sample T Test and How is it Beneficial to Business Analysis?
What is the Paired Sample T Test and How is it Beneficial to Business Analysis?What is the Paired Sample T Test and How is it Beneficial to Business Analysis?
What is the Paired Sample T Test and How is it Beneficial to Business Analysis?
 
What is Simple Linear Regression and How Can an Enterprise Use this Technique...
What is Simple Linear Regression and How Can an Enterprise Use this Technique...What is Simple Linear Regression and How Can an Enterprise Use this Technique...
What is Simple Linear Regression and How Can an Enterprise Use this Technique...
 
What is ARIMAX Forecasting and How is it Used for Enterprise Analysis?
What is ARIMAX Forecasting and How is it Used for Enterprise Analysis?What is ARIMAX Forecasting and How is it Used for Enterprise Analysis?
What is ARIMAX Forecasting and How is it Used for Enterprise Analysis?
 

Recently uploaded

2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
Łukasz Chruściel
 
E-commerce Application Development Company.pdf
E-commerce Application Development Company.pdfE-commerce Application Development Company.pdf
E-commerce Application Development Company.pdf
Hornet Dynamics
 
GraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph TechnologyGraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph Technology
Neo4j
 
Artificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension FunctionsArtificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension Functions
Octavian Nadolu
 
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Mind IT Systems
 
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
Fermin Galan
 
openEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain SecurityopenEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain Security
Shane Coughlan
 
Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"
Donna Lenk
 
Mobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona InfotechMobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona Infotech
Drona Infotech
 
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissancesAtelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Neo4j
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
Paco van Beckhoven
 
Fundamentals of Programming and Language Processors
Fundamentals of Programming and Language ProcessorsFundamentals of Programming and Language Processors
Fundamentals of Programming and Language Processors
Rakesh Kumar R
 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
rickgrimesss22
 
Graspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code AnalysisGraspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code Analysis
Aftab Hussain
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
A Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of PassageA Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of Passage
Philip Schwarz
 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
NYGGS Automation Suite
 
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Crescat
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
Max Andersen
 
AI Genie Review: World’s First Open AI WordPress Website Creator
AI Genie Review: World’s First Open AI WordPress Website CreatorAI Genie Review: World’s First Open AI WordPress Website Creator
AI Genie Review: World’s First Open AI WordPress Website Creator
Google
 

Recently uploaded (20)

2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
 
E-commerce Application Development Company.pdf
E-commerce Application Development Company.pdfE-commerce Application Development Company.pdf
E-commerce Application Development Company.pdf
 
GraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph TechnologyGraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph Technology
 
Artificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension FunctionsArtificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension Functions
 
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
 
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
 
openEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain SecurityopenEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain Security
 
Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"
 
Mobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona InfotechMobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona Infotech
 
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissancesAtelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissances
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
 
Fundamentals of Programming and Language Processors
Fundamentals of Programming and Language ProcessorsFundamentals of Programming and Language Processors
Fundamentals of Programming and Language Processors
 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
 
Graspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code AnalysisGraspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code Analysis
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
A Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of PassageA Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of Passage
 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
 
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
 
AI Genie Review: World’s First Open AI WordPress Website Creator
AI Genie Review: World’s First Open AI WordPress Website CreatorAI Genie Review: World’s First Open AI WordPress Website Creator
AI Genie Review: World’s First Open AI WordPress Website Creator
 

Random Forest Regression Analysis Reveals Impact of Variables on Target Values

  • 1. Master the Art of Analytics Basic Analytics for Citizen Data Scientists T h e A u g m e n t e d A n a l y t i c s J o u r n e y March - 2021
  • 3. Terminology Introduction & Example Standard Input/Tuning Parameters & Sample UI Sample Output UI Interpretation of Output Limitations Business Use Cases Overview
  • 4. Introduction – Random Forest Regression Figure 1 – Structure of Random Forest Regression Random Forest Regression creates a set of Decision Trees from a randomly selected subset of the training set, and aggregates by averaging values from different decision trees to decide the final target value. Predictors and Target Variable Target Variable (usually denoted by Y) represents the variable that will be predicted and is also called Dependent Variable, Response Variable or Outcome Variable. Predictor (usually denoted by X) is sometimes called an Independent Variable or Explanatory Variable, and is the variable used to predict the Target Variable Y.
  • 5. Methodology How Random Forest works: 1. Pick k data points at random from the training set. 2. Build a decision tree associated to these k data points. 3. Choose the number of trees – N, that you want to build and repeat steps 1 and 2. 4. For a new data point, make each one of your N-trees tree predict the value of y for the data point in question and assign the new data point to the average across all of the predicted y values.
  • 6. Sample Random Forest Regression Here we perform Random Forest Regression analysis on independent variables: carpet area, buildup area, market distance, rainfall, city type and target variable: House price Independent variables (Xi) Target Variable (Y) Model is a good fit as Accuracy > 70% Regression Statistics Accuracy 78% Root Mean Square Error 179.23 Mean Absolute Error 92.20 House Price Carpet Area Build up area Market distance Rainfall City type 2027000 1624 2171 1 870 CAT A 6118000 986 1822 0.7 1160 CAT C 5916000 1627 1770 0.2 340 CAT A 4350000 1816 1154 2 1250 CAT B 8976000 1160 2000 0.1 1150 CAT B 7157000 1309 1807 0.4 420 CAT C 5934000 1543 1678 3 680 CAT A 6354000 2019 1543 1 1100 CAT B • Root Mean Square Error: Square root of the average of squared difference between prediction and actual observation • Mean Absolute Error: Average of the difference between prediction and actual observation.
  • 7. Select the Target Variable Build up Area Carpet Area House Price Market Distance Step 1 Select the Predictors Build up Area Carpet Area House Price Market Distance Step 2 More than one predictor can be selected Step 3 Number of Trees = 20 By default these parameters should be set with the values mentioned Step 4 Display the output window containing following: o Model summary o Interpretation o Residual plot ▪ Categorical predictors should be auto detected and converted to dummy/binary variables before applying regression ▪ Decision on selection of predictors depends on business knowledge and the correlation value between the target variable and predictors. Standard Input/Tuning Parameters & Sample UI
  • 8. Sample Output - Model Summary Root Mean Square Error (RMSE): Square root of the average of squared differences between prediction and actual observation. It is the standard deviation of residual error. Mean Absolute Error (MAE): Average of the absolute differences between prediction and actual observation. Accuracy: Reveals appropriateness of the fit of the model, with a value between 1 and 100. The closer the value to 100, the better the model. Root Mean Square Error 179.23 Mean Absolute Error 92.20 Used to identify the variation of errors from predicted to actual values. Lower Values (near to zero) of RMSE and MAE represent a better fit of the regression model.
  • 9. Sample Output - Interpretation The Feature Importance chart reveals the impact of each Influencer on the Target Variable.
  • 10. Sample Output - Plot The Residual Plot is used to check the assumption of equal error variances and outliers *See Interpretation sample for more details
  • 11. Interpretation of Model Statistics Accuracy • Accuracy >70% indicates the model is a good fit for the data, and that the predicted values are reasonably accurate • Accuracy <70% indicates that the model is not a good fit for the data, and the predicted values are likely to have significant errors Root Mean Square Error (RMSE): • Square root of the average of squared differences between the prediction and the actual observation (standard deviation of residual error) • Used to identify variation from predicted to actual values • Lower values (near zero) indicate a better fit of regression model Mean Absolute Error (MAE) • Average of absolute differences between prediction and actual observation • Used to identify variation of errors from predicted to actual values • Lower values (near zero) indicate better fit of regression model Feature Important • Values are used to check the impact of each Influencer (Predictor) on the Target Variable • Random Forest is a useful technique to determine Feature Importance • Feature importance is useful in selecting appropriate predictors affecting the target which is helpful in training the model.
  • 12. Interpretation of Plots - Residual vs. Fit Plot Indicates the scattered plot of standardized residuals on Y axis and predicted (fitted) values on X axis Note: The red data point in figure 1 is an outlier and should be removed from data before interpreting the model Used to detect the unequal residual variances and Outliers in data
  • 13. Limitations • Random Forest Regression is limited to predicting numeric output so dependent variable must be numeric in nature. • The minimum sample size should be at least 20 cases per independent variable. • Residuals should be time independent as illustrated in the image.. Time independent error (fairly constant over time and within a certain range)
  • 14. Limitations • Target/Independent variables should be normally distributed A normal distribution is an arrangement of a dataset in which most values are midrange and the rest taper off symmetrically toward either extreme. It will look like a bell curve as shown in figure 1 on the right • Outliers in data (target as well as Independent Variables) can affect the analysis, and must be removed. Outliers are the observations lying outside overall pattern of distribution as shown in figure 2 in right. These extreme values/outliers can be replaced with 1st or 99th percentile values to improve model accuracy Outliers Figure 1 Figure 2
  • 15. Business Use Case - House Price Business Problem A real-estate brokerage company wants to measure the impact of locality, the number of rooms, the area(sq. yards) etc. on a house price. The goal of this statistical analysis is to help us understand the relationship between house features and how these variables are used to predict house price. Target House Price Predictors Area with carpet, Rainfall, city, parking, distance from hospital, distance from shopping, etc Business Benefit • The business can determine which predictors have a significant impact on house price. • Pricing strategies and recommendations will be more accurate and result in quicker sales. • If the number of rooms or the distance from shopping or schools are significant factors, these factors are given more focus when searching for a house that fits a client budget and affects profit.
  • 16. Business Use Case - Agriculture Business Problem An agriculture business wants to measure the impact of weather, market price, quality of crop, land used etc. on the crop price. Input Data Predictor/Independent Variables • Weather • Demand • Crop health Dependent Variable Crop Price Business Benefit • Business can clarify which factors have a significant impact on crop price. • Pricing strategies can be refined to improve accuracy and meet targeted crop pricing and revenue. • If crop health and climate are significant factors, these factors would receive more focus when deciding crop price.
  • 17. Business Use Case - Compensation Policies Business Problem A business wishes to measure the salary of employee based on position, experience, degree, level, productive hours etc. Input Data Predictor/Independent Variables • Position • Years of experience • Productive hours Dependent Variable Salary of Employee Business Benefit • The business can clarify which predictors have a significant impact on employee salary. • Salary policies and strategies can more accurately reflect employee value and targeted salaries. • If productive hours and experience are significant factors, these factors would be given more focus when developing salary policies.
  • 18. Want to Learn More? Contact Us: Support@Smarten.com Explore & Learn: Smarten.com