SlideShare a Scribd company logo
Master the Art of Analytics
A Simplistic Explainer Series For Citizen Data Scientists
Journey Towards Augmented Analytics
Random Forest Classification
Terminologies
Introduction & Example
Standard input/tuning parameters & Sample UI
Sample output UI
Interpretation of Output
Limitations
Business use cases
What is
Covered
Terminologies
▪ Target variable usually denoted by Y, is the variable being predicted and is also called dependent variable,
output variable, response variable or outcome variable (E.g., One highlighted in red box in table below).
▪ Predictor, sometimes called an independent variable, is a variable that is being used to predict the
target variable (E.g., Variables highlighted in green box in table below).
The predictors highlighted in green box above constitutes of the attributes upon which the target variable
highlighted in red box (i.e., Churn) depends on.
Contract Tenure Internet Service Churn
Month-to-month 2 DSL Yes
Two-year 72 Fibre Optic No
Month-to-month 29 Fibre Optic Yes
One-year 12 DSL No
Month-to-month 30 DSL No
Terminologies (Continued...)
Feature Importance:
• Feature importance values are used to check impact of each influencers (predictors) on target
variable.
• Random Forest Classification algorithm gives an estimate of what variables are important in the
classification.
• For instance, predicting which customers are prone to churn by identifying which variables are
important, i.e., which factors determine the rate of attrition(churn).
Target Variable: Churn
Introduction
• Objective:
– It is a statistical technique to explore the
relationship between two or more variables (Xi
and Y).
• Benefit:
– Random Forest Classification output helps
identify important factors ( Xi ) impacting the
dependent variable(y) and the nature of
relationship between each of these factors and
dependent variable.
• Model:
– Random Forest Classification model constructs
many trees wherein each tree votes and
outputs the most popular class as the
prediction result.
Example: Random Forest Classification
Let’s conduct the Random Forest Classification analysis on independent variables: Contract, Tenure, Internet Service, Tech Support, Online Security
and target variable: Churn as shown below:
Churn Contract Tenure
Internet
Service
Tech
Support
Online Security
Yes Month-to-month 2 DSL
No internet
service
Yes
No Two-year 72 Fibre optic No No
Yes Month-to-month 29 Fibre optic No
No Internet
Service
No One-year 12 DSL Yes No
Yes Month-to-month 30 DSL Yes No
Independent
variables (Xi)
Target
Variable (Y) Model is an excellent fit as
Accuracy > 75%
Classification Evaluation Metric
Accuracy 78.6%
Classification Error 21.4%
• Classification Accuracy:
○ A crucial criterion for assessing Model
Performance
○ Model with prediction accuracy > 75% is
useful.
• Classification Error = 100- Accuracy = 21.4%
○ Indicates that there is 21.4% chance of error
in classification.
Standard Input/Tuning Parameters & Sample UI
Select the target variable
Contract
Churn
Online Security
Tenure
Tech Support
Internet Service
Step
1
Step
2
More than one
predictors can be
selected
Step 3
Number of Trees= 20
Range for no. of Trees: 1-128
Depth of Trees=20
Range for max Depth: 1-30
By default, these parameters
should be set with the values
mentioned
Step 4
Display the output window containing following:
● Scatter Plot
● Dimension Contribution
● Dimension Counts By Percentage
● Average Measures by Target Classes
Note:
▪ Decision on selection of predictors depends on the business knowledge and the correlation value between target variable and predictors.
Select the predictor variable(s)
Contract
Churn
Online Security
Tenure
Tech Support
Internet Service
Influencer’s importance chart is used to show impact of each predictor on target variable.
Target Variable: Churn
Influencer’s Importance
Sample Output: 1. Interpretation
● Accuracy: It shows the goodness of fit of the model. It lies
between 1 to 100 and closer the value to 100, better the model.
● Precision: Proportion of predicted values that were actually correct. Generally, higher precision (>70%) indicates
that confidence for predicted class is high.
● Recall/Sensitivity/Hit Rate: Proportion of actual positives that were predicted correctly. Generally, higher recall
(>70%) indicates that confidence for predicted class is high.
Precision Recall
No 79.91% 94.23%
Yes 70.78% 37.1%
Accuracy 78.6%
Class Wise Precision and Recall
Predicted
No Yes
Actual
No 3503 195
Yes 880 507
Actual versus Predicted Class
Sample Output: 2. Model Summary
Sample Output: 3. Predicted Class & Probability
Churn Contract Online
Security
Tech Support Tenure Internet
Service
Monthly
Charges
Probability Predicted Churn
No Month-to-month No No 3 Fibre optic 90.4 0.72 Yes
No Two year No internet
service
No internet
service
8 No 19.5 0.91 No
No One year No No 60 Fibre optic 100.5 0.77 No
No Two year No internet
service
No internet
service
66 No 20.55 0.93 No
No One year Yes Yes 27 DSL 81.7 0.92 No
No Month-to-month No No 12 Fibre optic 79.95 0.69 Yes
The data output will contain predicted class column along with the probability of prediction
Accuracy
• Accuracy > 75%
represents model is
well fit on the
provided data and
the values are
reasonably accurate.
• Accuracy < 75%
represents model is
not well fit on
provided data and
the values are likely
to be inaccurate and
contain high
chances of error.
Precision:
• Proportion of
predicted values
that were actually
correct. Generally,
higher precision
(>70%) indicates
that confidence
for predicted
class is high.
Recall:
• Proportion of
actual positives
that were
predicted
correctly.
Generally, higher
recall (>70%)
indicates that
confidence for
predicted class is
high.
Feature Importance:
• Feature
Importance values
are used to check
the impact of each
influencer
(predictors) on
target variable.
Interpretation of Important Model Summary
Statistics
Interpretation of Plots: Scatter Plot
● This plot is used to see the classification quality by model; the less overlap among the classes in the plot
above, the better the classification by model.
● We can also visually analyze how a particular class is assigned.
● Scatter plots give the overview of the input data, allowing a user to see general trends for the
attributes.
● The graph is plotted against measures within the data.
Monthly
Charges
Tenure
No Yes
Interpretation of Plots: Dimension Contribution
● This plot is used to display how dimension values are distributed for each class in the target variable.
● For instance, the plot above shows how various values of Contract period (Month-to-month, One year,
Two year) are distributed within each class of response (Yes, No). The graph shows counts of target
class(Yes, No) for each Contract (Month-to-month, One year, Two year).
Interpretation of plots: Dimension Counts by Percentage
● This plot is used to visually analyze how dimension counts are distributed across target variable classes.
● For instance, the plot above shows the churn status to analyze whether a particular target class is having
relatively more counts of a particular status.
Interpretation of Plots: Average Measures by Target Class
● This plot is used to visually analyze how average measures are distributed across target variable classes.
● For instance, the plot above shows how average Tenure is distributed within each Churn status.
Average
Avg(Tenure)
Churn
Avg(Monthly Charges)
Limitations
● Minimum sample size should be at least 20 cases per independent variable.
● Random Forests can be computationally intensive for large datasets, i.e., it
does not work very well on large datasets.
● The main limitation of random forest is that a large number of trees can make
the algorithm too slow and ineffective for real-time predictions.
● The model provides a very little control over itself.
● Target/independent variables should be normally distributed.
Limitations (Continued…)
● A normal distribution is an arrangement of
a data set in which most values cluster in
the middle of the range and the rest taper
off symmetrically towards extreme. It will
look like a bell curve as shown in figure 1.
● Outliers in data (target as well as
independent variables) can affect the
analysis, hence outliers need to be
removed.
● Outliers are the observations lying outside
overall pattern of distribution as shown in
figure 2.
Figure 1
Figure 2
Business Use Case 1
• Business Problem: Predict loan default
• Based on the historical data related to credit card payments , loan payments , existing loan status, job
status we want to classify/divide the customers into defaulters and non defaulters.
• Input Data:
• Predictor/independent variables:
• Home ownership status
• Existing loan status
• Occupation
• Account Balance
• Target/dependent variable:
• Default Status
• Business Benefit:
• The predictive model will help us identify, whether a customer fails to repay the loan depending on
certain factors, which would lead to easier identification of risky customers and help the bank avert the
risk delinquencies.
Business Use Case 2
• Business Problem: Predict quality of Red Wine
• The data is a result of analysis to determine the quality of the red wine based upon chemicals it
constitutes of.
• Input Data:
• Predictor/independent variables:
• Citric Acid
• Density
• Residual Sugar
• Chlorides
• Target/dependent variable:
• Quality_Category
• Business Benefit:
• Using random forest classification, we can determine the quality of red wine (high, low) based upon
its influential chemical attributes.
Want to
Learn More?
Get in touch with us @
support@Smarten.com
And Do Checkout the Learning section
on
Smarten.com
September 2021

More Related Content

What's hot

Machine Learning lecture4(logistic regression)
Machine Learning lecture4(logistic regression)Machine Learning lecture4(logistic regression)
Machine Learning lecture4(logistic regression)
cairo university
 
Classification and Clustering
Classification and ClusteringClassification and Clustering
Classification and Clustering
Yogendra Tamang
 
Logistic regression
Logistic regressionLogistic regression
Logistic regression
YashwantGahlot1
 
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
Simplilearn
 
Logistic Regression in Python | Logistic Regression Example | Machine Learnin...
Logistic Regression in Python | Logistic Regression Example | Machine Learnin...Logistic Regression in Python | Logistic Regression Example | Machine Learnin...
Logistic Regression in Python | Logistic Regression Example | Machine Learnin...
Edureka!
 
What is the Expectation Maximization (EM) Algorithm?
What is the Expectation Maximization (EM) Algorithm?What is the Expectation Maximization (EM) Algorithm?
What is the Expectation Maximization (EM) Algorithm?
Kazuki Yoshida
 
Multivariate time series
Multivariate time seriesMultivariate time series
Multivariate time series
Luigi Piva CQF
 
5.2 mining time series data
5.2 mining time series data5.2 mining time series data
5.2 mining time series data
Krish_ver2
 
Linear regression with gradient descent
Linear regression with gradient descentLinear regression with gradient descent
Linear regression with gradient descent
Suraj Parmar
 
Discriminant analysis group no. 4
Discriminant analysis  group no. 4Discriminant analysis  group no. 4
Discriminant analysis group no. 4Advait Bhobe
 
support vector machine 1.pptx
support vector machine 1.pptxsupport vector machine 1.pptx
support vector machine 1.pptx
surbhidutta4
 
Regularization and variable selection via elastic net
Regularization and variable selection via elastic netRegularization and variable selection via elastic net
Regularization and variable selection via elastic net
KyusonLim
 
Machine learning session4(linear regression)
Machine learning   session4(linear regression)Machine learning   session4(linear regression)
Machine learning session4(linear regression)
Abhimanyu Dwivedi
 
Linear regression
Linear regressionLinear regression
Linear regression
MartinHogg9
 
Model selection and cross validation techniques
Model selection and cross validation techniquesModel selection and cross validation techniques
Model selection and cross validation techniques
Venkata Reddy Konasani
 
Introduction to predictive modeling v1
Introduction to predictive modeling v1Introduction to predictive modeling v1
Introduction to predictive modeling v1
Venkata Reddy Konasani
 
Methods of Optimization in Machine Learning
Methods of Optimization in Machine LearningMethods of Optimization in Machine Learning
Methods of Optimization in Machine Learning
Knoldus Inc.
 
k medoid clustering.pptx
k medoid clustering.pptxk medoid clustering.pptx
k medoid clustering.pptx
Roshan86572
 
Support vector machine
Support vector machineSupport vector machine
Support vector machine
zekeLabs Technologies
 

What's hot (20)

Machine Learning lecture4(logistic regression)
Machine Learning lecture4(logistic regression)Machine Learning lecture4(logistic regression)
Machine Learning lecture4(logistic regression)
 
Classification and Clustering
Classification and ClusteringClassification and Clustering
Classification and Clustering
 
Logistic regression
Logistic regressionLogistic regression
Logistic regression
 
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
 
Logistic Regression in Python | Logistic Regression Example | Machine Learnin...
Logistic Regression in Python | Logistic Regression Example | Machine Learnin...Logistic Regression in Python | Logistic Regression Example | Machine Learnin...
Logistic Regression in Python | Logistic Regression Example | Machine Learnin...
 
What is the Expectation Maximization (EM) Algorithm?
What is the Expectation Maximization (EM) Algorithm?What is the Expectation Maximization (EM) Algorithm?
What is the Expectation Maximization (EM) Algorithm?
 
Multivariate time series
Multivariate time seriesMultivariate time series
Multivariate time series
 
5.2 mining time series data
5.2 mining time series data5.2 mining time series data
5.2 mining time series data
 
Linear regression with gradient descent
Linear regression with gradient descentLinear regression with gradient descent
Linear regression with gradient descent
 
Discriminant analysis group no. 4
Discriminant analysis  group no. 4Discriminant analysis  group no. 4
Discriminant analysis group no. 4
 
support vector machine 1.pptx
support vector machine 1.pptxsupport vector machine 1.pptx
support vector machine 1.pptx
 
Regularization and variable selection via elastic net
Regularization and variable selection via elastic netRegularization and variable selection via elastic net
Regularization and variable selection via elastic net
 
Machine learning session4(linear regression)
Machine learning   session4(linear regression)Machine learning   session4(linear regression)
Machine learning session4(linear regression)
 
Linear regression
Linear regressionLinear regression
Linear regression
 
Model selection and cross validation techniques
Model selection and cross validation techniquesModel selection and cross validation techniques
Model selection and cross validation techniques
 
Introduction to predictive modeling v1
Introduction to predictive modeling v1Introduction to predictive modeling v1
Introduction to predictive modeling v1
 
Methods of Optimization in Machine Learning
Methods of Optimization in Machine LearningMethods of Optimization in Machine Learning
Methods of Optimization in Machine Learning
 
k medoid clustering.pptx
k medoid clustering.pptxk medoid clustering.pptx
k medoid clustering.pptx
 
Cluster analysis
Cluster analysisCluster analysis
Cluster analysis
 
Support vector machine
Support vector machineSupport vector machine
Support vector machine
 

Similar to What Is Random Forest Classification And How Can It Help Your Business?

What Is Multilayer Perceptron Classifier And How Is It Used For Enterprise An...
What Is Multilayer Perceptron Classifier And How Is It Used For Enterprise An...What Is Multilayer Perceptron Classifier And How Is It Used For Enterprise An...
What Is Multilayer Perceptron Classifier And How Is It Used For Enterprise An...
Smarten Augmented Analytics
 
What is Hierarchical Clustering and How Can an Organization Use it to Analyze...
What is Hierarchical Clustering and How Can an Organization Use it to Analyze...What is Hierarchical Clustering and How Can an Organization Use it to Analyze...
What is Hierarchical Clustering and How Can an Organization Use it to Analyze...
Smarten Augmented Analytics
 
Gradient Boosting Regression Analysis Reveals Dependent Variables and Interre...
Gradient Boosting Regression Analysis Reveals Dependent Variables and Interre...Gradient Boosting Regression Analysis Reveals Dependent Variables and Interre...
Gradient Boosting Regression Analysis Reveals Dependent Variables and Interre...
Smarten Augmented Analytics
 
Random Forest Regression Analysis Reveals Impact of Variables on Target Values
Random Forest Regression Analysis Reveals Impact of Variables on Target Values  Random Forest Regression Analysis Reveals Impact of Variables on Target Values
Random Forest Regression Analysis Reveals Impact of Variables on Target Values
Smarten Augmented Analytics
 
Session 1 and 2.pptx
Session 1 and 2.pptxSession 1 and 2.pptx
Session 1 and 2.pptx
AkshitMGoel
 
Data Mining to Classify Telco Churners
Data Mining to Classify Telco ChurnersData Mining to Classify Telco Churners
Data Mining to Classify Telco Churners
MohitMhapuskar
 
customer_profiling_based_on_fuzzy_principals_linkedin
customer_profiling_based_on_fuzzy_principals_linkedincustomer_profiling_based_on_fuzzy_principals_linkedin
customer_profiling_based_on_fuzzy_principals_linkedinAsoka Korale
 
Six Sigma ReportHammettSix Sigma DMAIC Project Report Templa.docx
Six Sigma ReportHammettSix Sigma DMAIC Project Report Templa.docxSix Sigma ReportHammettSix Sigma DMAIC Project Report Templa.docx
Six Sigma ReportHammettSix Sigma DMAIC Project Report Templa.docx
whitneyleman54422
 
Descriptive Statistics
Descriptive StatisticsDescriptive Statistics
Descriptive Statistics
CIToolkit
 
A high level overview of all that is Analytics
A high level overview of all that is AnalyticsA high level overview of all that is Analytics
A high level overview of all that is Analytics
Ramkumar Ravichandran
 
Data Analytics Using R - Report
Data Analytics Using R - ReportData Analytics Using R - Report
Data Analytics Using R - Report
Akanksha Gohil
 
IRJET - An Overview of Machine Learning Algorithms for Data Science
IRJET - An Overview of Machine Learning Algorithms for Data ScienceIRJET - An Overview of Machine Learning Algorithms for Data Science
IRJET - An Overview of Machine Learning Algorithms for Data Science
IRJET Journal
 
What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?
What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?
What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?
Smarten Augmented Analytics
 
Intro to ml_2021
Intro to ml_2021Intro to ml_2021
Intro to ml_2021
Sanghamitra Deb
 
Credit risk scoring model final
Credit risk scoring model finalCredit risk scoring model final
Credit risk scoring model final
Ritu Sarkar
 
What Is Generalized Linear Regression with Gaussian Distribution And How Can ...
What Is Generalized Linear Regression with Gaussian Distribution And How Can ...What Is Generalized Linear Regression with Gaussian Distribution And How Can ...
What Is Generalized Linear Regression with Gaussian Distribution And How Can ...
Smarten Augmented Analytics
 
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Md. Main Uddin Rony
 
5(re dfd-erd-data dictionay)
5(re dfd-erd-data dictionay)5(re dfd-erd-data dictionay)
5(re dfd-erd-data dictionay)randhirlpu
 
A Comparative Study for Anomaly Detection in Data Mining
A Comparative Study for Anomaly Detection in Data MiningA Comparative Study for Anomaly Detection in Data Mining
A Comparative Study for Anomaly Detection in Data Mining
IRJET Journal
 
Neural Network Model
Neural Network ModelNeural Network Model
Neural Network ModelEric Esajian
 

Similar to What Is Random Forest Classification And How Can It Help Your Business? (20)

What Is Multilayer Perceptron Classifier And How Is It Used For Enterprise An...
What Is Multilayer Perceptron Classifier And How Is It Used For Enterprise An...What Is Multilayer Perceptron Classifier And How Is It Used For Enterprise An...
What Is Multilayer Perceptron Classifier And How Is It Used For Enterprise An...
 
What is Hierarchical Clustering and How Can an Organization Use it to Analyze...
What is Hierarchical Clustering and How Can an Organization Use it to Analyze...What is Hierarchical Clustering and How Can an Organization Use it to Analyze...
What is Hierarchical Clustering and How Can an Organization Use it to Analyze...
 
Gradient Boosting Regression Analysis Reveals Dependent Variables and Interre...
Gradient Boosting Regression Analysis Reveals Dependent Variables and Interre...Gradient Boosting Regression Analysis Reveals Dependent Variables and Interre...
Gradient Boosting Regression Analysis Reveals Dependent Variables and Interre...
 
Random Forest Regression Analysis Reveals Impact of Variables on Target Values
Random Forest Regression Analysis Reveals Impact of Variables on Target Values  Random Forest Regression Analysis Reveals Impact of Variables on Target Values
Random Forest Regression Analysis Reveals Impact of Variables on Target Values
 
Session 1 and 2.pptx
Session 1 and 2.pptxSession 1 and 2.pptx
Session 1 and 2.pptx
 
Data Mining to Classify Telco Churners
Data Mining to Classify Telco ChurnersData Mining to Classify Telco Churners
Data Mining to Classify Telco Churners
 
customer_profiling_based_on_fuzzy_principals_linkedin
customer_profiling_based_on_fuzzy_principals_linkedincustomer_profiling_based_on_fuzzy_principals_linkedin
customer_profiling_based_on_fuzzy_principals_linkedin
 
Six Sigma ReportHammettSix Sigma DMAIC Project Report Templa.docx
Six Sigma ReportHammettSix Sigma DMAIC Project Report Templa.docxSix Sigma ReportHammettSix Sigma DMAIC Project Report Templa.docx
Six Sigma ReportHammettSix Sigma DMAIC Project Report Templa.docx
 
Descriptive Statistics
Descriptive StatisticsDescriptive Statistics
Descriptive Statistics
 
A high level overview of all that is Analytics
A high level overview of all that is AnalyticsA high level overview of all that is Analytics
A high level overview of all that is Analytics
 
Data Analytics Using R - Report
Data Analytics Using R - ReportData Analytics Using R - Report
Data Analytics Using R - Report
 
IRJET - An Overview of Machine Learning Algorithms for Data Science
IRJET - An Overview of Machine Learning Algorithms for Data ScienceIRJET - An Overview of Machine Learning Algorithms for Data Science
IRJET - An Overview of Machine Learning Algorithms for Data Science
 
What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?
What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?
What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?
 
Intro to ml_2021
Intro to ml_2021Intro to ml_2021
Intro to ml_2021
 
Credit risk scoring model final
Credit risk scoring model finalCredit risk scoring model final
Credit risk scoring model final
 
What Is Generalized Linear Regression with Gaussian Distribution And How Can ...
What Is Generalized Linear Regression with Gaussian Distribution And How Can ...What Is Generalized Linear Regression with Gaussian Distribution And How Can ...
What Is Generalized Linear Regression with Gaussian Distribution And How Can ...
 
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
 
5(re dfd-erd-data dictionay)
5(re dfd-erd-data dictionay)5(re dfd-erd-data dictionay)
5(re dfd-erd-data dictionay)
 
A Comparative Study for Anomaly Detection in Data Mining
A Comparative Study for Anomaly Detection in Data MiningA Comparative Study for Anomaly Detection in Data Mining
A Comparative Study for Anomaly Detection in Data Mining
 
Neural Network Model
Neural Network ModelNeural Network Model
Neural Network Model
 

More from Smarten Augmented Analytics

Crime Type Prediction - Augmented Analytics Use Case – Smarten
Crime Type Prediction - Augmented Analytics Use Case – SmartenCrime Type Prediction - Augmented Analytics Use Case – Smarten
Crime Type Prediction - Augmented Analytics Use Case – Smarten
Smarten Augmented Analytics
 
What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?
What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?
What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?
Smarten Augmented Analytics
 
Students' Academic Performance Predictive Analytics Use Case – Smarten
Students' Academic Performance Predictive Analytics Use Case – SmartenStudents' Academic Performance Predictive Analytics Use Case – Smarten
Students' Academic Performance Predictive Analytics Use Case – Smarten
Smarten Augmented Analytics
 
What is Simple Linear Regression and How Can an Enterprise Use this Technique...
What is Simple Linear Regression and How Can an Enterprise Use this Technique...What is Simple Linear Regression and How Can an Enterprise Use this Technique...
What is Simple Linear Regression and How Can an Enterprise Use this Technique...
Smarten Augmented Analytics
 
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
Smarten Augmented Analytics
 
Fraud Mitigation Predictive Analytics Use Case – Smarten
Fraud Mitigation Predictive Analytics Use Case – SmartenFraud Mitigation Predictive Analytics Use Case – Smarten
Fraud Mitigation Predictive Analytics Use Case – Smarten
Smarten Augmented Analytics
 
Quality Control Predictive Analytics Use Case - Smarten
Quality Control Predictive Analytics Use Case - SmartenQuality Control Predictive Analytics Use Case - Smarten
Quality Control Predictive Analytics Use Case - Smarten
Smarten Augmented Analytics
 
Machine Maintenance Management Predictive Analytics Use Case - Smarten
Machine Maintenance Management Predictive Analytics Use Case - SmartenMachine Maintenance Management Predictive Analytics Use Case - Smarten
Machine Maintenance Management Predictive Analytics Use Case - Smarten
Smarten Augmented Analytics
 
Predictive Analytics Using External Data Augmented Analytics Use Case - Smarten
Predictive Analytics Using External Data Augmented Analytics Use Case - SmartenPredictive Analytics Using External Data Augmented Analytics Use Case - Smarten
Predictive Analytics Using External Data Augmented Analytics Use Case - Smarten
Smarten Augmented Analytics
 
Marketing Optimization Augmented Analytics Use Cases - Smarten
Marketing Optimization Augmented Analytics Use Cases - SmartenMarketing Optimization Augmented Analytics Use Cases - Smarten
Marketing Optimization Augmented Analytics Use Cases - Smarten
Smarten Augmented Analytics
 
Human Resource Attrition Augmented Analytics Use Case - Smarten
Human Resource Attrition Augmented Analytics Use Case - SmartenHuman Resource Attrition Augmented Analytics Use Case - Smarten
Human Resource Attrition Augmented Analytics Use Case - Smarten
Smarten Augmented Analytics
 
Customer Targeting Augmented Analytics Use Case - Smarten
Customer Targeting Augmented Analytics Use Case - SmartenCustomer Targeting Augmented Analytics Use Case - Smarten
Customer Targeting Augmented Analytics Use Case - Smarten
Smarten Augmented Analytics
 
What is KNN Classification and How Can This Analysis Help an Enterprise?
What is KNN Classification and How Can This Analysis Help an Enterprise?What is KNN Classification and How Can This Analysis Help an Enterprise?
What is KNN Classification and How Can This Analysis Help an Enterprise?
Smarten Augmented Analytics
 
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
Smarten Augmented Analytics
 
What is the Independent Samples T Test Method of Analysis and How Can it Bene...
What is the Independent Samples T Test Method of Analysis and How Can it Bene...What is the Independent Samples T Test Method of Analysis and How Can it Bene...
What is the Independent Samples T Test Method of Analysis and How Can it Bene...
Smarten Augmented Analytics
 
What Are Simple Random Sampling and Stratified Random Sampling Analytical Tec...
What Are Simple Random Sampling and Stratified Random Sampling Analytical Tec...What Are Simple Random Sampling and Stratified Random Sampling Analytical Tec...
What Are Simple Random Sampling and Stratified Random Sampling Analytical Tec...
Smarten Augmented Analytics
 
What is Binary Logistic Regression Classification and How is it Used in Analy...
What is Binary Logistic Regression Classification and How is it Used in Analy...What is Binary Logistic Regression Classification and How is it Used in Analy...
What is Binary Logistic Regression Classification and How is it Used in Analy...
Smarten Augmented Analytics
 
What is the Paired Sample T Test and How is it Beneficial to Business Analysis?
What is the Paired Sample T Test and How is it Beneficial to Business Analysis?What is the Paired Sample T Test and How is it Beneficial to Business Analysis?
What is the Paired Sample T Test and How is it Beneficial to Business Analysis?
Smarten Augmented Analytics
 
What is Simple Linear Regression and How Can an Enterprise Use this Technique...
What is Simple Linear Regression and How Can an Enterprise Use this Technique...What is Simple Linear Regression and How Can an Enterprise Use this Technique...
What is Simple Linear Regression and How Can an Enterprise Use this Technique...
Smarten Augmented Analytics
 
What is ARIMAX Forecasting and How is it Used for Enterprise Analysis?
What is ARIMAX Forecasting and How is it Used for Enterprise Analysis?What is ARIMAX Forecasting and How is it Used for Enterprise Analysis?
What is ARIMAX Forecasting and How is it Used for Enterprise Analysis?
Smarten Augmented Analytics
 

More from Smarten Augmented Analytics (20)

Crime Type Prediction - Augmented Analytics Use Case – Smarten
Crime Type Prediction - Augmented Analytics Use Case – SmartenCrime Type Prediction - Augmented Analytics Use Case – Smarten
Crime Type Prediction - Augmented Analytics Use Case – Smarten
 
What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?
What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?
What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?
 
Students' Academic Performance Predictive Analytics Use Case – Smarten
Students' Academic Performance Predictive Analytics Use Case – SmartenStudents' Academic Performance Predictive Analytics Use Case – Smarten
Students' Academic Performance Predictive Analytics Use Case – Smarten
 
What is Simple Linear Regression and How Can an Enterprise Use this Technique...
What is Simple Linear Regression and How Can an Enterprise Use this Technique...What is Simple Linear Regression and How Can an Enterprise Use this Technique...
What is Simple Linear Regression and How Can an Enterprise Use this Technique...
 
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
 
Fraud Mitigation Predictive Analytics Use Case – Smarten
Fraud Mitigation Predictive Analytics Use Case – SmartenFraud Mitigation Predictive Analytics Use Case – Smarten
Fraud Mitigation Predictive Analytics Use Case – Smarten
 
Quality Control Predictive Analytics Use Case - Smarten
Quality Control Predictive Analytics Use Case - SmartenQuality Control Predictive Analytics Use Case - Smarten
Quality Control Predictive Analytics Use Case - Smarten
 
Machine Maintenance Management Predictive Analytics Use Case - Smarten
Machine Maintenance Management Predictive Analytics Use Case - SmartenMachine Maintenance Management Predictive Analytics Use Case - Smarten
Machine Maintenance Management Predictive Analytics Use Case - Smarten
 
Predictive Analytics Using External Data Augmented Analytics Use Case - Smarten
Predictive Analytics Using External Data Augmented Analytics Use Case - SmartenPredictive Analytics Using External Data Augmented Analytics Use Case - Smarten
Predictive Analytics Using External Data Augmented Analytics Use Case - Smarten
 
Marketing Optimization Augmented Analytics Use Cases - Smarten
Marketing Optimization Augmented Analytics Use Cases - SmartenMarketing Optimization Augmented Analytics Use Cases - Smarten
Marketing Optimization Augmented Analytics Use Cases - Smarten
 
Human Resource Attrition Augmented Analytics Use Case - Smarten
Human Resource Attrition Augmented Analytics Use Case - SmartenHuman Resource Attrition Augmented Analytics Use Case - Smarten
Human Resource Attrition Augmented Analytics Use Case - Smarten
 
Customer Targeting Augmented Analytics Use Case - Smarten
Customer Targeting Augmented Analytics Use Case - SmartenCustomer Targeting Augmented Analytics Use Case - Smarten
Customer Targeting Augmented Analytics Use Case - Smarten
 
What is KNN Classification and How Can This Analysis Help an Enterprise?
What is KNN Classification and How Can This Analysis Help an Enterprise?What is KNN Classification and How Can This Analysis Help an Enterprise?
What is KNN Classification and How Can This Analysis Help an Enterprise?
 
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
 
What is the Independent Samples T Test Method of Analysis and How Can it Bene...
What is the Independent Samples T Test Method of Analysis and How Can it Bene...What is the Independent Samples T Test Method of Analysis and How Can it Bene...
What is the Independent Samples T Test Method of Analysis and How Can it Bene...
 
What Are Simple Random Sampling and Stratified Random Sampling Analytical Tec...
What Are Simple Random Sampling and Stratified Random Sampling Analytical Tec...What Are Simple Random Sampling and Stratified Random Sampling Analytical Tec...
What Are Simple Random Sampling and Stratified Random Sampling Analytical Tec...
 
What is Binary Logistic Regression Classification and How is it Used in Analy...
What is Binary Logistic Regression Classification and How is it Used in Analy...What is Binary Logistic Regression Classification and How is it Used in Analy...
What is Binary Logistic Regression Classification and How is it Used in Analy...
 
What is the Paired Sample T Test and How is it Beneficial to Business Analysis?
What is the Paired Sample T Test and How is it Beneficial to Business Analysis?What is the Paired Sample T Test and How is it Beneficial to Business Analysis?
What is the Paired Sample T Test and How is it Beneficial to Business Analysis?
 
What is Simple Linear Regression and How Can an Enterprise Use this Technique...
What is Simple Linear Regression and How Can an Enterprise Use this Technique...What is Simple Linear Regression and How Can an Enterprise Use this Technique...
What is Simple Linear Regression and How Can an Enterprise Use this Technique...
 
What is ARIMAX Forecasting and How is it Used for Enterprise Analysis?
What is ARIMAX Forecasting and How is it Used for Enterprise Analysis?What is ARIMAX Forecasting and How is it Used for Enterprise Analysis?
What is ARIMAX Forecasting and How is it Used for Enterprise Analysis?
 

Recently uploaded

Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfUnleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Enterprise Wired
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
u86oixdj
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTESAdjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Subhajit Sahu
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
Nanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdfNanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdf
eddie19851
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
manishkhaire30
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
dwreak4tg
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
balafet
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 

Recently uploaded (20)

Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfUnleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTESAdjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
Nanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdfNanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdf
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 

What Is Random Forest Classification And How Can It Help Your Business?

  • 1. Master the Art of Analytics A Simplistic Explainer Series For Citizen Data Scientists Journey Towards Augmented Analytics
  • 3. Terminologies Introduction & Example Standard input/tuning parameters & Sample UI Sample output UI Interpretation of Output Limitations Business use cases What is Covered
  • 4. Terminologies ▪ Target variable usually denoted by Y, is the variable being predicted and is also called dependent variable, output variable, response variable or outcome variable (E.g., One highlighted in red box in table below). ▪ Predictor, sometimes called an independent variable, is a variable that is being used to predict the target variable (E.g., Variables highlighted in green box in table below). The predictors highlighted in green box above constitutes of the attributes upon which the target variable highlighted in red box (i.e., Churn) depends on. Contract Tenure Internet Service Churn Month-to-month 2 DSL Yes Two-year 72 Fibre Optic No Month-to-month 29 Fibre Optic Yes One-year 12 DSL No Month-to-month 30 DSL No
  • 5. Terminologies (Continued...) Feature Importance: • Feature importance values are used to check impact of each influencers (predictors) on target variable. • Random Forest Classification algorithm gives an estimate of what variables are important in the classification. • For instance, predicting which customers are prone to churn by identifying which variables are important, i.e., which factors determine the rate of attrition(churn). Target Variable: Churn
  • 6. Introduction • Objective: – It is a statistical technique to explore the relationship between two or more variables (Xi and Y). • Benefit: – Random Forest Classification output helps identify important factors ( Xi ) impacting the dependent variable(y) and the nature of relationship between each of these factors and dependent variable. • Model: – Random Forest Classification model constructs many trees wherein each tree votes and outputs the most popular class as the prediction result.
  • 7. Example: Random Forest Classification Let’s conduct the Random Forest Classification analysis on independent variables: Contract, Tenure, Internet Service, Tech Support, Online Security and target variable: Churn as shown below: Churn Contract Tenure Internet Service Tech Support Online Security Yes Month-to-month 2 DSL No internet service Yes No Two-year 72 Fibre optic No No Yes Month-to-month 29 Fibre optic No No Internet Service No One-year 12 DSL Yes No Yes Month-to-month 30 DSL Yes No Independent variables (Xi) Target Variable (Y) Model is an excellent fit as Accuracy > 75% Classification Evaluation Metric Accuracy 78.6% Classification Error 21.4% • Classification Accuracy: ○ A crucial criterion for assessing Model Performance ○ Model with prediction accuracy > 75% is useful. • Classification Error = 100- Accuracy = 21.4% ○ Indicates that there is 21.4% chance of error in classification.
  • 8. Standard Input/Tuning Parameters & Sample UI Select the target variable Contract Churn Online Security Tenure Tech Support Internet Service Step 1 Step 2 More than one predictors can be selected Step 3 Number of Trees= 20 Range for no. of Trees: 1-128 Depth of Trees=20 Range for max Depth: 1-30 By default, these parameters should be set with the values mentioned Step 4 Display the output window containing following: ● Scatter Plot ● Dimension Contribution ● Dimension Counts By Percentage ● Average Measures by Target Classes Note: ▪ Decision on selection of predictors depends on the business knowledge and the correlation value between target variable and predictors. Select the predictor variable(s) Contract Churn Online Security Tenure Tech Support Internet Service
  • 9. Influencer’s importance chart is used to show impact of each predictor on target variable. Target Variable: Churn Influencer’s Importance Sample Output: 1. Interpretation
  • 10. ● Accuracy: It shows the goodness of fit of the model. It lies between 1 to 100 and closer the value to 100, better the model. ● Precision: Proportion of predicted values that were actually correct. Generally, higher precision (>70%) indicates that confidence for predicted class is high. ● Recall/Sensitivity/Hit Rate: Proportion of actual positives that were predicted correctly. Generally, higher recall (>70%) indicates that confidence for predicted class is high. Precision Recall No 79.91% 94.23% Yes 70.78% 37.1% Accuracy 78.6% Class Wise Precision and Recall Predicted No Yes Actual No 3503 195 Yes 880 507 Actual versus Predicted Class Sample Output: 2. Model Summary
  • 11. Sample Output: 3. Predicted Class & Probability Churn Contract Online Security Tech Support Tenure Internet Service Monthly Charges Probability Predicted Churn No Month-to-month No No 3 Fibre optic 90.4 0.72 Yes No Two year No internet service No internet service 8 No 19.5 0.91 No No One year No No 60 Fibre optic 100.5 0.77 No No Two year No internet service No internet service 66 No 20.55 0.93 No No One year Yes Yes 27 DSL 81.7 0.92 No No Month-to-month No No 12 Fibre optic 79.95 0.69 Yes The data output will contain predicted class column along with the probability of prediction
  • 12. Accuracy • Accuracy > 75% represents model is well fit on the provided data and the values are reasonably accurate. • Accuracy < 75% represents model is not well fit on provided data and the values are likely to be inaccurate and contain high chances of error. Precision: • Proportion of predicted values that were actually correct. Generally, higher precision (>70%) indicates that confidence for predicted class is high. Recall: • Proportion of actual positives that were predicted correctly. Generally, higher recall (>70%) indicates that confidence for predicted class is high. Feature Importance: • Feature Importance values are used to check the impact of each influencer (predictors) on target variable. Interpretation of Important Model Summary Statistics
  • 13. Interpretation of Plots: Scatter Plot ● This plot is used to see the classification quality by model; the less overlap among the classes in the plot above, the better the classification by model. ● We can also visually analyze how a particular class is assigned. ● Scatter plots give the overview of the input data, allowing a user to see general trends for the attributes. ● The graph is plotted against measures within the data. Monthly Charges Tenure No Yes
  • 14. Interpretation of Plots: Dimension Contribution ● This plot is used to display how dimension values are distributed for each class in the target variable. ● For instance, the plot above shows how various values of Contract period (Month-to-month, One year, Two year) are distributed within each class of response (Yes, No). The graph shows counts of target class(Yes, No) for each Contract (Month-to-month, One year, Two year).
  • 15. Interpretation of plots: Dimension Counts by Percentage ● This plot is used to visually analyze how dimension counts are distributed across target variable classes. ● For instance, the plot above shows the churn status to analyze whether a particular target class is having relatively more counts of a particular status.
  • 16. Interpretation of Plots: Average Measures by Target Class ● This plot is used to visually analyze how average measures are distributed across target variable classes. ● For instance, the plot above shows how average Tenure is distributed within each Churn status. Average Avg(Tenure) Churn Avg(Monthly Charges)
  • 17. Limitations ● Minimum sample size should be at least 20 cases per independent variable. ● Random Forests can be computationally intensive for large datasets, i.e., it does not work very well on large datasets. ● The main limitation of random forest is that a large number of trees can make the algorithm too slow and ineffective for real-time predictions. ● The model provides a very little control over itself. ● Target/independent variables should be normally distributed.
  • 18. Limitations (Continued…) ● A normal distribution is an arrangement of a data set in which most values cluster in the middle of the range and the rest taper off symmetrically towards extreme. It will look like a bell curve as shown in figure 1. ● Outliers in data (target as well as independent variables) can affect the analysis, hence outliers need to be removed. ● Outliers are the observations lying outside overall pattern of distribution as shown in figure 2. Figure 1 Figure 2
  • 19. Business Use Case 1 • Business Problem: Predict loan default • Based on the historical data related to credit card payments , loan payments , existing loan status, job status we want to classify/divide the customers into defaulters and non defaulters. • Input Data: • Predictor/independent variables: • Home ownership status • Existing loan status • Occupation • Account Balance • Target/dependent variable: • Default Status • Business Benefit: • The predictive model will help us identify, whether a customer fails to repay the loan depending on certain factors, which would lead to easier identification of risky customers and help the bank avert the risk delinquencies.
  • 20. Business Use Case 2 • Business Problem: Predict quality of Red Wine • The data is a result of analysis to determine the quality of the red wine based upon chemicals it constitutes of. • Input Data: • Predictor/independent variables: • Citric Acid • Density • Residual Sugar • Chlorides • Target/dependent variable: • Quality_Category • Business Benefit: • Using random forest classification, we can determine the quality of red wine (high, low) based upon its influential chemical attributes.
  • 21. Want to Learn More? Get in touch with us @ support@Smarten.com And Do Checkout the Learning section on Smarten.com September 2021