SlideShare a Scribd company logo
1 of 23
Using Insight-informed Data to
Determine Factors Driving the
Lead Conversion Process
An Application of
Logistic Regression Modelling
Author: Anthony Mok
Date: 16 Nov 2023
Email: xxiaohao@yahoo.com
Agenda
1. Unlock Insights, Efficient Workflows
2. Machine Learning
3. Logistic Regression Model
4. Primary Goals
5. Description of Dataset
6. Modelling Strategies
7. Findings, Conclusions, and Recommendations
2
An Application of Logistic Regression Modelling
Unlock Insights, Efficient Workflows
Traditional Organisations
and Workflows
• Workflow defines the specific steps
taken to complete a task or process
• Workflow is supported by systems,
procedures and roles to ensure tasks
are completed efficiently and
consistently
Artificial Intelligence (AI)
Enables:
• Analysis of large amount of data for
its patterns, trends, and insights
• Automation of repetitive and time
consuming tasks
• Predictions of future events or
outcomes
AI Augmenting Workflow
• Unlikely that AI completely eliminate
need for workflows in organisations
• Since workflows provide structure
and ensure tasks to be completed
correctly and consistently
• AI can help optimise workflows by
identifying bottlenecks and suggesting
improvements
1 2 1+2
An Application of Logistic Regression Modelling 3
Machine Learning
• Machine learning, a branch of
AI, uses algorithms to learn
from data, improving on tasks
without explicit instructions
• Used for automation, accuracy,
and personalization. It is used in
image recognition, language
processing, and more
4
An Application of Logistic Regression Modelling
Logistic Regression Modelling
• Logistic regression is a statistical model that
predicts the probability of an outcome being
one of two classes (e.g., win/lose, yes/no)
• It is popular for its ease of interpretation and
efficiency, making it well-suited for tasks like
spam detection or sentiment analysis where
factors influencing the outcome need to be
understood
5
An Application of Logistic Regression Modelling
Primary Goals
• To determine what factors are driving
the lead conversion process
• To Identify which leads are more
likely to convert to paid customers
An Application of Logistic Regression Modelling
Description of Dataset
Data Description
Dataset consists of 4613 rows and 15
columns
Data Dictionary
A sample data dictionary* is given right:
7
* More details are found in the project report, which are
not released at the request of the Social Enterprise
An Application of Logistic Regression Modelling
Modelling Strategies
Plan
1. Perform Dummy
Encoding
2. List Variables for
Modeling
3. Identify metric of
interest to judge
model's performance
Build
4. Build Logistic
Regression Model
(Preliminary Model)
5. Observe the metrics
of the model
Improve
6. Identify the
significant variables
7. Rebuild model
8. Observe the metrics
of the models
Decide
9. Compare the results
of Logistic Regression
model (Base model)
and Decision Tree
Model
10. Conclude on best
model for this project
Recommend
11. Determine factors
driving the lead
conversion process
12. Recommend what
that may help to
identify which leads
are more likely to
convert to paying
customers
1 2 3 4 5
8
An Application of Logistic Regression Modelling
1. Perform Dummy Encoding
9
1. The ‘CSV Read’ and ‘Data
Explorer’ nodes were dragged and
dropped onto the KNIME Platform
to ingest and explore the variables
and data in the dataset: no
missing values are found
2. The ‘One_to_Many’ node was used to
converting non-numeric variables into the
numeric form. Except for the ‘ID’ and
‘status’ variables, the rest of the non-
numeric variables, totaling 9, were ported
to the ‘Include’ box of the ‘Configure’
function of the ‘One_to_Many’ node
An Application of Logistic Regression Modelling
10
3. The final operation was to
filter the columns, using the
‘Column Filter’ node to remove
at least one of the dummy
encoded responses from each
of the non-numeric variable
An Application of Logistic Regression Modelling
4. This the list of 9 non-numeric
variables that can be used for modeling
2. Identify Variables for Modeling
11
Goal
To predict whether the lead is converted
to a paid customer or not, but the model
could make these wrong predictions and
produce the following consequences:
False Negatives
The model predicts that the lead will not
convert to a paid customer, but the lead
converts. The impact is spending resources
when it is not needed since the lead has
already intended to convert, thereby affecting
cost minimisation
False Positives
The model predicts that the lead will convert
to a paid customer, but the lead has not. So,
the impact is the loss of customer because of
little or no effort toward creating conversion
since sales and marketing team believed that
the lead will convert without investing in him
An Application of Logistic Regression Modelling
To be able to minimise cost and to convert
customers, the model has to reduce both its
False Negatives and False Positives as
wrongly identifying leads who may not
convert will affect the sales from the
customer base or monetary resources used
on the leads
These are the metrics of interest:
• Accuracy = TP + TN / (TP + FP + FN +
TN)
• Precision = TP / (TP + FP)
• Recall or sensitivity = TP / (TP+FN)
• F1 Score = (2 X Precision X Recall)/
(Precision + Recall)
A good model should have a high F1 Score!
3. Select Metric of Interest
12
4. Build Logistic Regression Model
An Application of Logistic Regression Modelling
1. ‘Partitioning’ node
was configured to split
the dataset in training
and testing sets by the
ratio of 7:3
2. The ‘Logistic
Regression Learner’ was
created with ‘status’ as
‘Target’
3. Two sets of ‘Logistic
Regression Predictor’,
‘Scorer’ and ‘ROC Curve’
were created; one to
ingest the training dataset
and the other to churn the
data from the testing
dataset
13
5. Observe Metrics of Model
An Application of Logistic Regression Modelling
After feeding the training and testing dataset, from the ‘partitioning’ node, into these
nodes, their scorers produced the following metrics with their corresponding ROC Curves
Training Dataset Scorer
ROC Curves
14
5. Observe Metrics of Model
After feeding the training and testing dataset, from the ‘partitioning’ node, into these
nodes, their scorers produced the following metrics with these corresponding ROC Curves
Testing Dataset Scorer
An Application of Logistic Regression Modelling
• The model’s performance is observed to be nearly the same for
the Training and Testing dataset. The overall accuracy on training
and testing data is 0.823 and 0.814, respectively
• The Recall for predicting the ‘1’ class is around 0.651, which
suggests that there are rows that are incorrectly predicted as ‘0’.
The F1-score for predicting ‘1’ class is around 0.67 because the
recall (at 0.65) and precision (at 0.70) for predicting ‘1’ are low
• The AUC score for test data is observed to be around 0.87. As the
difference between the training and testing dataset metrics is
within 10%, the model is not overfitted
• As the F1 score is low, other modeling methods should be explored
to improve its performance
ROC Curves
15
The p-value measures the significance of observational data. In the dataset, there are
7 variables which p-values are more than 0.05, starting with ‘Yes_digital_media’ at
0.608. Typically, p-value that is less than or equals to 0.05 is statistically significant,
which helps to determine if the observed relationship that arises is not a result of
chance
An Application of Logistic Regression Modelling
6. Identify Significant Variables
16
The model was rebuilt using the following
steps:
• Shift the variable with the highest p-value,
that is >0.05, to the ‘Exclude’ box of the
‘Configure’ function of the ‘Logistic
Regression Learner’
• Using the remaining variable, re-execute the
node
• Observe the changes in the p-values
through the ‘Coefficients and Statistics’
function of the node
• Identify the next variable with the highest
p-value
• Continue to iterate the process until all p-
values of remaining variables are ≤ 0.05
An Application of Logistic Regression Modelling
7. Rebuild the Model
17
These are the nine variables with p-value ≤ 0.05 that are
retained to rebuilt the model since they are statistically significant
An Application of Logistic Regression Modelling
7. Rebuild the Model
18
After the model has been rebuilt, the scorers and ROC Curves for the training and testing
dataset show the following information:
An Application of Logistic Regression Modelling
8. Observe Metrics of Models
Training Dataset Scorer
ROC Curves
19
An Application of Logistic Regression Modelling
8. Observe Metrics of Models
Testing Dataset Scorer
• The rebuilt model’s performance is observed to be nearly the same
for the Training and Testing dataset
• The overall accuracy on training and testing data is 0.819 and 0.818,
respectively. This is just 0.004 from 0.823 and 0.814 of the last
Logistic Regression Model
• For Testing data, the Recall for predicting the ‘1’ class is around
0.658, as compared to the previous model of 0.651
• The F1-score for predicting ‘1’ class is around 0.68 (the last was
0.67)
• The AUC score for test data is observed to be around 0.87, which is
the same as the last model
• As the difference between the training and testing dataset metrics is
within 10%, the model is not overfitted
• These suggest that the rebuilt model has not improved after removing all the insignificant variables
• As the rebuilt model’s F1 score is low, other modeling methods should be explored to improve its performance; a
recommendation which was provided earlier, which the rebuilt model confirms
ROC Curves
20
9 & 10. Which’s the Better Model?
A Decision Tree Model was created, using the same dataset, and its metrics of interest
were compared with the rebuilt Logistic Regression Model:
An Application of Logistic Regression Modelling
• The numbers shared in green and yellow show that the metrics of interest outcomes for
the Decision Tree are superior than the Logistic Regression Model (Rebuilt), and should
be the model to use to predict whether the lead is converted to a paid customer or not.
• Nevertheless, given this benefits in the metrics, Decision Trees have limitations. These
include its instabilities and prediction accuracy is not as good as the more complicated
models, like the Logistic Regression approach.
• For these disadvantages, the rebuilt Logistic Regression Model has been used for the
final analysis
11. Factors Driving Lead Conversion*
It is observed that the leads with the following features have positive impact on the
conversion of leads to paying customers:
• High and medium level of the leads’ profiles been filled on the website/mobile app
• Leads who first interacted with Social Enterprise through its website
• When the current occupation of the leads is in the professional field or is unemployed
• That the lead heard about Social Enterprise through references
• Where the lead’s last interaction with Social Enterprise was through his activities on
its website (on live chat with a representative, updated profile on the website, etc)
and through emails (by seeking details about the programme through email,
representative shared information with a lead like a brochure of programme, etc)
21
An Application of Logistic Regression Modelling
* More details are found in the project report, which are
not released at the request of the Social Enterprise
12. Which Leads are Likely to Convert*
An increased one unit in filling of the leads’ profile on Social Enterprise’s
website/mobile app, having one unit more of leads interacting with Social
Enterprise’s Representatives through its website, and targeting marketing
and sales outreach to get one more unit of leads who work in the
professional field would positively increase the conversion rate of leads into
paying customers
These are insights useful in informing decisions relating to creating positive
total leads’ experience on the use of social medias managed by Social
Enterprise, relating to the marketing efforts in increasing the awareness of
these social medias amongst the leads, and relating to fine-tuning the
marketing and product mix that appeal to the professionals
22
An Application of Logistic Regression Modelling
* More details are found in the project report, which are
not released at the request of the Social Enterprise
Thank you
Author: Anthony Mok
Date: 16 Nov 2023
Email: xxiaohao@yahoo.com

More Related Content

Similar to Predictive Analysis - Using Insight-informed Data to Determine Factors Driving the Lead Conversion Process

Software Quality Dashboard Benchmarking Study
Software Quality Dashboard Benchmarking StudySoftware Quality Dashboard Benchmarking Study
Software Quality Dashboard Benchmarking StudyJohn Carter
 
Data science in demand planning - when the machine is not enough
Data science in demand planning - when the machine is not enoughData science in demand planning - when the machine is not enough
Data science in demand planning - when the machine is not enoughTristan Wiggill
 
Lead Scoring Group Case Study Presentation.pdf
Lead Scoring Group Case Study Presentation.pdfLead Scoring Group Case Study Presentation.pdf
Lead Scoring Group Case Study Presentation.pdfKrishP2
 
Machine learning project_promotion
Machine learning project_promotionMachine learning project_promotion
Machine learning project_promotionkahhuey
 
Predicting Employee Attrition
Predicting Employee AttritionPredicting Employee Attrition
Predicting Employee AttritionMohamad Sahil
 
validation and verification part 2.pptx
validation and verification part 2.pptxvalidation and verification part 2.pptx
validation and verification part 2.pptxubaidullah75790
 
Unlocking DataDriven Talent Intelligence Transforming TALENTX with Industry P...
Unlocking DataDriven Talent Intelligence Transforming TALENTX with Industry P...Unlocking DataDriven Talent Intelligence Transforming TALENTX with Industry P...
Unlocking DataDriven Talent Intelligence Transforming TALENTX with Industry P...Prasanna Hegde
 
Model Management for FP&A
Model Management for FP&AModel Management for FP&A
Model Management for FP&ARob Trippe
 
BIG MART SALES PREDICTION USING MACHINE LEARNING
BIG MART SALES PREDICTION USING MACHINE LEARNINGBIG MART SALES PREDICTION USING MACHINE LEARNING
BIG MART SALES PREDICTION USING MACHINE LEARNINGIRJET Journal
 
Customer Churn Analytics using Microsoft R Open
Customer Churn Analytics using Microsoft R OpenCustomer Churn Analytics using Microsoft R Open
Customer Churn Analytics using Microsoft R OpenPoo Kuan Hoong
 
PM3 ARTICALS
PM3 ARTICALSPM3 ARTICALS
PM3 ARTICALSra na
 
Lead Scoring Case Study_Final.pptx
Lead Scoring Case Study_Final.pptxLead Scoring Case Study_Final.pptx
Lead Scoring Case Study_Final.pptxRachnaGoel10
 
6.1Updated April-09Lecture NotesChapter 6Knowi.docx
6.1Updated April-09Lecture NotesChapter 6Knowi.docx6.1Updated April-09Lecture NotesChapter 6Knowi.docx
6.1Updated April-09Lecture NotesChapter 6Knowi.docxtaishao1
 
MonetizingStatistics
MonetizingStatisticsMonetizingStatistics
MonetizingStatisticsAaron Sankey
 
Data Analyst Interview Questions & Answers
Data Analyst Interview Questions & AnswersData Analyst Interview Questions & Answers
Data Analyst Interview Questions & AnswersSatyam Jaiswal
 
IRJET- Boosting Response Aware Model-Based Collaborative Filtering
IRJET- Boosting Response Aware Model-Based Collaborative FilteringIRJET- Boosting Response Aware Model-Based Collaborative Filtering
IRJET- Boosting Response Aware Model-Based Collaborative FilteringIRJET Journal
 
1440 track 2 boire_using our laptop
1440 track 2 boire_using our laptop1440 track 2 boire_using our laptop
1440 track 2 boire_using our laptopRising Media, Inc.
 

Similar to Predictive Analysis - Using Insight-informed Data to Determine Factors Driving the Lead Conversion Process (20)

KPMG_Task2.pptx
KPMG_Task2.pptxKPMG_Task2.pptx
KPMG_Task2.pptx
 
Software Quality Dashboard Benchmarking Study
Software Quality Dashboard Benchmarking StudySoftware Quality Dashboard Benchmarking Study
Software Quality Dashboard Benchmarking Study
 
Employee mode of commuting
Employee mode of commutingEmployee mode of commuting
Employee mode of commuting
 
Dadm (lys)
Dadm (lys)Dadm (lys)
Dadm (lys)
 
Data science in demand planning - when the machine is not enough
Data science in demand planning - when the machine is not enoughData science in demand planning - when the machine is not enough
Data science in demand planning - when the machine is not enough
 
Lead Scoring Group Case Study Presentation.pdf
Lead Scoring Group Case Study Presentation.pdfLead Scoring Group Case Study Presentation.pdf
Lead Scoring Group Case Study Presentation.pdf
 
Machine learning project_promotion
Machine learning project_promotionMachine learning project_promotion
Machine learning project_promotion
 
Predicting Employee Attrition
Predicting Employee AttritionPredicting Employee Attrition
Predicting Employee Attrition
 
validation and verification part 2.pptx
validation and verification part 2.pptxvalidation and verification part 2.pptx
validation and verification part 2.pptx
 
Unlocking DataDriven Talent Intelligence Transforming TALENTX with Industry P...
Unlocking DataDriven Talent Intelligence Transforming TALENTX with Industry P...Unlocking DataDriven Talent Intelligence Transforming TALENTX with Industry P...
Unlocking DataDriven Talent Intelligence Transforming TALENTX with Industry P...
 
Model Management for FP&A
Model Management for FP&AModel Management for FP&A
Model Management for FP&A
 
BIG MART SALES PREDICTION USING MACHINE LEARNING
BIG MART SALES PREDICTION USING MACHINE LEARNINGBIG MART SALES PREDICTION USING MACHINE LEARNING
BIG MART SALES PREDICTION USING MACHINE LEARNING
 
Customer Churn Analytics using Microsoft R Open
Customer Churn Analytics using Microsoft R OpenCustomer Churn Analytics using Microsoft R Open
Customer Churn Analytics using Microsoft R Open
 
PM3 ARTICALS
PM3 ARTICALSPM3 ARTICALS
PM3 ARTICALS
 
Lead Scoring Case Study_Final.pptx
Lead Scoring Case Study_Final.pptxLead Scoring Case Study_Final.pptx
Lead Scoring Case Study_Final.pptx
 
6.1Updated April-09Lecture NotesChapter 6Knowi.docx
6.1Updated April-09Lecture NotesChapter 6Knowi.docx6.1Updated April-09Lecture NotesChapter 6Knowi.docx
6.1Updated April-09Lecture NotesChapter 6Knowi.docx
 
MonetizingStatistics
MonetizingStatisticsMonetizingStatistics
MonetizingStatistics
 
Data Analyst Interview Questions & Answers
Data Analyst Interview Questions & AnswersData Analyst Interview Questions & Answers
Data Analyst Interview Questions & Answers
 
IRJET- Boosting Response Aware Model-Based Collaborative Filtering
IRJET- Boosting Response Aware Model-Based Collaborative FilteringIRJET- Boosting Response Aware Model-Based Collaborative Filtering
IRJET- Boosting Response Aware Model-Based Collaborative Filtering
 
1440 track 2 boire_using our laptop
1440 track 2 boire_using our laptop1440 track 2 boire_using our laptop
1440 track 2 boire_using our laptop
 

More from ThinkInnovation

Ordinary Least Square Regression & Stage-2 Regression - Factors Influencing M...
Ordinary Least Square Regression & Stage-2 Regression - Factors Influencing M...Ordinary Least Square Regression & Stage-2 Regression - Factors Influencing M...
Ordinary Least Square Regression & Stage-2 Regression - Factors Influencing M...ThinkInnovation
 
Decision Making Under Uncertainty - Predict the Chances of a Person Suffering...
Decision Making Under Uncertainty - Predict the Chances of a Person Suffering...Decision Making Under Uncertainty - Predict the Chances of a Person Suffering...
Decision Making Under Uncertainty - Predict the Chances of a Person Suffering...ThinkInnovation
 
Decision Making Under Uncertainty - Is It Better Off Joining a Partnership or...
Decision Making Under Uncertainty - Is It Better Off Joining a Partnership or...Decision Making Under Uncertainty - Is It Better Off Joining a Partnership or...
Decision Making Under Uncertainty - Is It Better Off Joining a Partnership or...ThinkInnovation
 
Decision Making Under Uncertainty - Decide Whether Or Not to Take Precautions
Decision Making Under Uncertainty - Decide Whether Or Not to Take PrecautionsDecision Making Under Uncertainty - Decide Whether Or Not to Take Precautions
Decision Making Under Uncertainty - Decide Whether Or Not to Take PrecautionsThinkInnovation
 
Optimal Decision Making - Cost Reduction in Logistics
Optimal Decision Making - Cost Reduction in LogisticsOptimal Decision Making - Cost Reduction in Logistics
Optimal Decision Making - Cost Reduction in LogisticsThinkInnovation
 
Create Data Model & Conduct Visualisation in Power BI Desktop
Create Data Model & Conduct Visualisation in Power BI DesktopCreate Data Model & Conduct Visualisation in Power BI Desktop
Create Data Model & Conduct Visualisation in Power BI DesktopThinkInnovation
 
Using DAX & Time-based Analysis in Data Warehouse
Using DAX & Time-based Analysis in Data WarehouseUsing DAX & Time-based Analysis in Data Warehouse
Using DAX & Time-based Analysis in Data WarehouseThinkInnovation
 
Creating Data Warehouse Using Power Query & Power Pivot
Creating Data Warehouse Using Power Query & Power PivotCreating Data Warehouse Using Power Query & Power Pivot
Creating Data Warehouse Using Power Query & Power PivotThinkInnovation
 
Unlocking New Insights Into the World of European Soccer Through the European...
Unlocking New Insights Into the World of European Soccer Through the European...Unlocking New Insights Into the World of European Soccer Through the European...
Unlocking New Insights Into the World of European Soccer Through the European...ThinkInnovation
 
Breakfast Talk - Manage Projects
Breakfast Talk - Manage ProjectsBreakfast Talk - Manage Projects
Breakfast Talk - Manage ProjectsThinkInnovation
 
Think innovation issue 4 share - scamper
Think innovation issue 4   share - scamperThink innovation issue 4   share - scamper
Think innovation issue 4 share - scamperThinkInnovation
 
Reverse Assumption Method
Reverse Assumption MethodReverse Assumption Method
Reverse Assumption MethodThinkInnovation
 
Psyche of Facilitation - The New Language of Facilitating Conversations
Psyche of Facilitation - The New Language of Facilitating ConversationsPsyche of Facilitation - The New Language of Facilitating Conversations
Psyche of Facilitation - The New Language of Facilitating ConversationsThinkInnovation
 
Visual Connection - Ideation Through Word Association
Visual Connection - Ideation Through Word AssociationVisual Connection - Ideation Through Word Association
Visual Connection - Ideation Through Word AssociationThinkInnovation
 

More from ThinkInnovation (15)

Ordinary Least Square Regression & Stage-2 Regression - Factors Influencing M...
Ordinary Least Square Regression & Stage-2 Regression - Factors Influencing M...Ordinary Least Square Regression & Stage-2 Regression - Factors Influencing M...
Ordinary Least Square Regression & Stage-2 Regression - Factors Influencing M...
 
Decision Making Under Uncertainty - Predict the Chances of a Person Suffering...
Decision Making Under Uncertainty - Predict the Chances of a Person Suffering...Decision Making Under Uncertainty - Predict the Chances of a Person Suffering...
Decision Making Under Uncertainty - Predict the Chances of a Person Suffering...
 
Decision Making Under Uncertainty - Is It Better Off Joining a Partnership or...
Decision Making Under Uncertainty - Is It Better Off Joining a Partnership or...Decision Making Under Uncertainty - Is It Better Off Joining a Partnership or...
Decision Making Under Uncertainty - Is It Better Off Joining a Partnership or...
 
Decision Making Under Uncertainty - Decide Whether Or Not to Take Precautions
Decision Making Under Uncertainty - Decide Whether Or Not to Take PrecautionsDecision Making Under Uncertainty - Decide Whether Or Not to Take Precautions
Decision Making Under Uncertainty - Decide Whether Or Not to Take Precautions
 
Optimal Decision Making - Cost Reduction in Logistics
Optimal Decision Making - Cost Reduction in LogisticsOptimal Decision Making - Cost Reduction in Logistics
Optimal Decision Making - Cost Reduction in Logistics
 
Create Data Model & Conduct Visualisation in Power BI Desktop
Create Data Model & Conduct Visualisation in Power BI DesktopCreate Data Model & Conduct Visualisation in Power BI Desktop
Create Data Model & Conduct Visualisation in Power BI Desktop
 
Using DAX & Time-based Analysis in Data Warehouse
Using DAX & Time-based Analysis in Data WarehouseUsing DAX & Time-based Analysis in Data Warehouse
Using DAX & Time-based Analysis in Data Warehouse
 
Creating Data Warehouse Using Power Query & Power Pivot
Creating Data Warehouse Using Power Query & Power PivotCreating Data Warehouse Using Power Query & Power Pivot
Creating Data Warehouse Using Power Query & Power Pivot
 
Unlocking New Insights Into the World of European Soccer Through the European...
Unlocking New Insights Into the World of European Soccer Through the European...Unlocking New Insights Into the World of European Soccer Through the European...
Unlocking New Insights Into the World of European Soccer Through the European...
 
Breakfast Talk - Manage Projects
Breakfast Talk - Manage ProjectsBreakfast Talk - Manage Projects
Breakfast Talk - Manage Projects
 
Think innovation issue 4 share - scamper
Think innovation issue 4   share - scamperThink innovation issue 4   share - scamper
Think innovation issue 4 share - scamper
 
SCAMPER
SCAMPERSCAMPER
SCAMPER
 
Reverse Assumption Method
Reverse Assumption MethodReverse Assumption Method
Reverse Assumption Method
 
Psyche of Facilitation - The New Language of Facilitating Conversations
Psyche of Facilitation - The New Language of Facilitating ConversationsPsyche of Facilitation - The New Language of Facilitating Conversations
Psyche of Facilitation - The New Language of Facilitating Conversations
 
Visual Connection - Ideation Through Word Association
Visual Connection - Ideation Through Word AssociationVisual Connection - Ideation Through Word Association
Visual Connection - Ideation Through Word Association
 

Recently uploaded

Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 

Recently uploaded (20)

Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 

Predictive Analysis - Using Insight-informed Data to Determine Factors Driving the Lead Conversion Process

  • 1. Using Insight-informed Data to Determine Factors Driving the Lead Conversion Process An Application of Logistic Regression Modelling Author: Anthony Mok Date: 16 Nov 2023 Email: xxiaohao@yahoo.com
  • 2. Agenda 1. Unlock Insights, Efficient Workflows 2. Machine Learning 3. Logistic Regression Model 4. Primary Goals 5. Description of Dataset 6. Modelling Strategies 7. Findings, Conclusions, and Recommendations 2 An Application of Logistic Regression Modelling
  • 3. Unlock Insights, Efficient Workflows Traditional Organisations and Workflows • Workflow defines the specific steps taken to complete a task or process • Workflow is supported by systems, procedures and roles to ensure tasks are completed efficiently and consistently Artificial Intelligence (AI) Enables: • Analysis of large amount of data for its patterns, trends, and insights • Automation of repetitive and time consuming tasks • Predictions of future events or outcomes AI Augmenting Workflow • Unlikely that AI completely eliminate need for workflows in organisations • Since workflows provide structure and ensure tasks to be completed correctly and consistently • AI can help optimise workflows by identifying bottlenecks and suggesting improvements 1 2 1+2 An Application of Logistic Regression Modelling 3
  • 4. Machine Learning • Machine learning, a branch of AI, uses algorithms to learn from data, improving on tasks without explicit instructions • Used for automation, accuracy, and personalization. It is used in image recognition, language processing, and more 4 An Application of Logistic Regression Modelling
  • 5. Logistic Regression Modelling • Logistic regression is a statistical model that predicts the probability of an outcome being one of two classes (e.g., win/lose, yes/no) • It is popular for its ease of interpretation and efficiency, making it well-suited for tasks like spam detection or sentiment analysis where factors influencing the outcome need to be understood 5 An Application of Logistic Regression Modelling
  • 6. Primary Goals • To determine what factors are driving the lead conversion process • To Identify which leads are more likely to convert to paid customers An Application of Logistic Regression Modelling
  • 7. Description of Dataset Data Description Dataset consists of 4613 rows and 15 columns Data Dictionary A sample data dictionary* is given right: 7 * More details are found in the project report, which are not released at the request of the Social Enterprise An Application of Logistic Regression Modelling
  • 8. Modelling Strategies Plan 1. Perform Dummy Encoding 2. List Variables for Modeling 3. Identify metric of interest to judge model's performance Build 4. Build Logistic Regression Model (Preliminary Model) 5. Observe the metrics of the model Improve 6. Identify the significant variables 7. Rebuild model 8. Observe the metrics of the models Decide 9. Compare the results of Logistic Regression model (Base model) and Decision Tree Model 10. Conclude on best model for this project Recommend 11. Determine factors driving the lead conversion process 12. Recommend what that may help to identify which leads are more likely to convert to paying customers 1 2 3 4 5 8 An Application of Logistic Regression Modelling
  • 9. 1. Perform Dummy Encoding 9 1. The ‘CSV Read’ and ‘Data Explorer’ nodes were dragged and dropped onto the KNIME Platform to ingest and explore the variables and data in the dataset: no missing values are found 2. The ‘One_to_Many’ node was used to converting non-numeric variables into the numeric form. Except for the ‘ID’ and ‘status’ variables, the rest of the non- numeric variables, totaling 9, were ported to the ‘Include’ box of the ‘Configure’ function of the ‘One_to_Many’ node An Application of Logistic Regression Modelling
  • 10. 10 3. The final operation was to filter the columns, using the ‘Column Filter’ node to remove at least one of the dummy encoded responses from each of the non-numeric variable An Application of Logistic Regression Modelling 4. This the list of 9 non-numeric variables that can be used for modeling 2. Identify Variables for Modeling
  • 11. 11 Goal To predict whether the lead is converted to a paid customer or not, but the model could make these wrong predictions and produce the following consequences: False Negatives The model predicts that the lead will not convert to a paid customer, but the lead converts. The impact is spending resources when it is not needed since the lead has already intended to convert, thereby affecting cost minimisation False Positives The model predicts that the lead will convert to a paid customer, but the lead has not. So, the impact is the loss of customer because of little or no effort toward creating conversion since sales and marketing team believed that the lead will convert without investing in him An Application of Logistic Regression Modelling To be able to minimise cost and to convert customers, the model has to reduce both its False Negatives and False Positives as wrongly identifying leads who may not convert will affect the sales from the customer base or monetary resources used on the leads These are the metrics of interest: • Accuracy = TP + TN / (TP + FP + FN + TN) • Precision = TP / (TP + FP) • Recall or sensitivity = TP / (TP+FN) • F1 Score = (2 X Precision X Recall)/ (Precision + Recall) A good model should have a high F1 Score! 3. Select Metric of Interest
  • 12. 12 4. Build Logistic Regression Model An Application of Logistic Regression Modelling 1. ‘Partitioning’ node was configured to split the dataset in training and testing sets by the ratio of 7:3 2. The ‘Logistic Regression Learner’ was created with ‘status’ as ‘Target’ 3. Two sets of ‘Logistic Regression Predictor’, ‘Scorer’ and ‘ROC Curve’ were created; one to ingest the training dataset and the other to churn the data from the testing dataset
  • 13. 13 5. Observe Metrics of Model An Application of Logistic Regression Modelling After feeding the training and testing dataset, from the ‘partitioning’ node, into these nodes, their scorers produced the following metrics with their corresponding ROC Curves Training Dataset Scorer ROC Curves
  • 14. 14 5. Observe Metrics of Model After feeding the training and testing dataset, from the ‘partitioning’ node, into these nodes, their scorers produced the following metrics with these corresponding ROC Curves Testing Dataset Scorer An Application of Logistic Regression Modelling • The model’s performance is observed to be nearly the same for the Training and Testing dataset. The overall accuracy on training and testing data is 0.823 and 0.814, respectively • The Recall for predicting the ‘1’ class is around 0.651, which suggests that there are rows that are incorrectly predicted as ‘0’. The F1-score for predicting ‘1’ class is around 0.67 because the recall (at 0.65) and precision (at 0.70) for predicting ‘1’ are low • The AUC score for test data is observed to be around 0.87. As the difference between the training and testing dataset metrics is within 10%, the model is not overfitted • As the F1 score is low, other modeling methods should be explored to improve its performance ROC Curves
  • 15. 15 The p-value measures the significance of observational data. In the dataset, there are 7 variables which p-values are more than 0.05, starting with ‘Yes_digital_media’ at 0.608. Typically, p-value that is less than or equals to 0.05 is statistically significant, which helps to determine if the observed relationship that arises is not a result of chance An Application of Logistic Regression Modelling 6. Identify Significant Variables
  • 16. 16 The model was rebuilt using the following steps: • Shift the variable with the highest p-value, that is >0.05, to the ‘Exclude’ box of the ‘Configure’ function of the ‘Logistic Regression Learner’ • Using the remaining variable, re-execute the node • Observe the changes in the p-values through the ‘Coefficients and Statistics’ function of the node • Identify the next variable with the highest p-value • Continue to iterate the process until all p- values of remaining variables are ≤ 0.05 An Application of Logistic Regression Modelling 7. Rebuild the Model
  • 17. 17 These are the nine variables with p-value ≤ 0.05 that are retained to rebuilt the model since they are statistically significant An Application of Logistic Regression Modelling 7. Rebuild the Model
  • 18. 18 After the model has been rebuilt, the scorers and ROC Curves for the training and testing dataset show the following information: An Application of Logistic Regression Modelling 8. Observe Metrics of Models Training Dataset Scorer ROC Curves
  • 19. 19 An Application of Logistic Regression Modelling 8. Observe Metrics of Models Testing Dataset Scorer • The rebuilt model’s performance is observed to be nearly the same for the Training and Testing dataset • The overall accuracy on training and testing data is 0.819 and 0.818, respectively. This is just 0.004 from 0.823 and 0.814 of the last Logistic Regression Model • For Testing data, the Recall for predicting the ‘1’ class is around 0.658, as compared to the previous model of 0.651 • The F1-score for predicting ‘1’ class is around 0.68 (the last was 0.67) • The AUC score for test data is observed to be around 0.87, which is the same as the last model • As the difference between the training and testing dataset metrics is within 10%, the model is not overfitted • These suggest that the rebuilt model has not improved after removing all the insignificant variables • As the rebuilt model’s F1 score is low, other modeling methods should be explored to improve its performance; a recommendation which was provided earlier, which the rebuilt model confirms ROC Curves
  • 20. 20 9 & 10. Which’s the Better Model? A Decision Tree Model was created, using the same dataset, and its metrics of interest were compared with the rebuilt Logistic Regression Model: An Application of Logistic Regression Modelling • The numbers shared in green and yellow show that the metrics of interest outcomes for the Decision Tree are superior than the Logistic Regression Model (Rebuilt), and should be the model to use to predict whether the lead is converted to a paid customer or not. • Nevertheless, given this benefits in the metrics, Decision Trees have limitations. These include its instabilities and prediction accuracy is not as good as the more complicated models, like the Logistic Regression approach. • For these disadvantages, the rebuilt Logistic Regression Model has been used for the final analysis
  • 21. 11. Factors Driving Lead Conversion* It is observed that the leads with the following features have positive impact on the conversion of leads to paying customers: • High and medium level of the leads’ profiles been filled on the website/mobile app • Leads who first interacted with Social Enterprise through its website • When the current occupation of the leads is in the professional field or is unemployed • That the lead heard about Social Enterprise through references • Where the lead’s last interaction with Social Enterprise was through his activities on its website (on live chat with a representative, updated profile on the website, etc) and through emails (by seeking details about the programme through email, representative shared information with a lead like a brochure of programme, etc) 21 An Application of Logistic Regression Modelling * More details are found in the project report, which are not released at the request of the Social Enterprise
  • 22. 12. Which Leads are Likely to Convert* An increased one unit in filling of the leads’ profile on Social Enterprise’s website/mobile app, having one unit more of leads interacting with Social Enterprise’s Representatives through its website, and targeting marketing and sales outreach to get one more unit of leads who work in the professional field would positively increase the conversion rate of leads into paying customers These are insights useful in informing decisions relating to creating positive total leads’ experience on the use of social medias managed by Social Enterprise, relating to the marketing efforts in increasing the awareness of these social medias amongst the leads, and relating to fine-tuning the marketing and product mix that appeal to the professionals 22 An Application of Logistic Regression Modelling * More details are found in the project report, which are not released at the request of the Social Enterprise
  • 23. Thank you Author: Anthony Mok Date: 16 Nov 2023 Email: xxiaohao@yahoo.com