SlideShare a Scribd company logo
Lead Scoring
Case Study
 Anubhav Maheshwari
 Rachna Goel
 Ritu
CASE STUDY DESCRIPTION
An education company named X Education sells online courses to industry professionals. On any given day,
many professionals who are interested in the courses land on their website and browse for courses.
When these people fill up a form providing their email address or phone number, they are classified to be a lead
which will be then passed to Sales team to start making calls or send emails to convert these leads. The typical
lead conversion rate at X education is around 30%.
Now, although X Education gets a lot of leads, its lead conversion rate is very poor. To make this process more
efficient, the company wishes to identify the most potential leads, also known as ‘Hot Leads’. If they successfully
identify this set of leads, the lead conversion rate should go up as the sales team will now be focusing more on
communicating with the potential leads rather than making calls to everyone.
Problem STATEMENT
The company requires to build a model wherein a lead score is assigned to each of the leads such that the
customers with higher lead score have a higher conversion chance and the customers with lower lead score
have a lower conversion chance.
This case study focuses on building a logistic regression model to assign a lead score between 0 and 100 to
each of the leads which can be used by the company to target potential leads. A higher score would mean that
the lead is hot, i.e. is most likely to convert whereas a lower score would mean that the lead is cold and will
mostly not get converted.
Identification of such leads which can possible be converted is the focus of the case study
APPROACH
To improve the lead conversion rate to be around 80%, Logistic Regression model is created to identify the important
variables and derive insights on how to improve the lead conversion count.
Below Steps are performed in the case study for the outcome :
 Data Loading & Cleaning
 Data Quality & Missing Values Check
 Handling Outliers
 Exploratory Data Analysis
 Data Preparation for Modelling
 Train-Test Data Split
 Scaling
 Feature Selection
 Recursive Model Building to find the optimal model
 Model Evaluation using Performance Metrics & Building ROC Curve
 Finding Optimal Cut-Off point
 Predictions on Test data using Final Model
 Final Evaluation using Performance Metrics on Test Data
 Calculating Lead Score
ASSUMPTIONS
 For this case study we have dropped the columns where missing value%>40% ('Lead Quality', 'Asymmetrique Activity
Index', 'Asymmetrique Profile Score', 'Asymmetrique Activity Score', 'Asymmetrique Profile Index’ ) as applying any
imputation on such huge missing values can impact the overall analysis of case study which is not recommended.
 Values coming as ‘SELECT’ in few columns have been replaced with null.
 For few Category columns, null values has been replaced with a new Category as “Others” to segregate the data.
 For few Category columns, merged the category in Others category which has low volume of records.
 For Numerical columns, null values has been imputed with IQR*1.5 of the variable for those where mean and median
are same but max value is way out of range.
 Dropped few unnecessary columns where data was heavily skewed to not impact the overall model building.
OUTLIERS TREATMENT
- Neil Armstrong
 Observed Outliers with two Numerical columns which
was derived using BoxPlot on them.
 For this case study we have treated any outliers for
Continuous variables using Upper Bound values to be
able to build proper model.
 For Category variable : we have used below two
approaches:
 Creating a new category for missing values
 Proportionately divided the values in existing
categories based on their distribution.
EXPLORATORY DATA
ANALYSIS
UNIVARIATE ANALYSIS
- Neil Armstrong
From above plots, we can infer that
 Majority people are using either Google or Direct
Traffic as lead source
 Unemployed are the majority of people who are
visiting the site
 The last activity for majority of the leads is Email
opened
 Majority leads have Landing Page submission as
the Lead Origin
BIVARIATE ANALYSIS
- Neil Armstrong
From above plots it can be infer that
• From above plot, it can be infer that
Working Professionals have the
higher conversion
• Unemployed have the highest count
in the lead category and additional
focus can be given to them in
conversion
• Google as the lead source has the
highest conversion and the top two
count of leads are from Direct Traffic
or Google
• Lead Origin as Landing Page
Submission has the highest count of
leads along with most conversions
• From city plot, we can see that the
conversion and lead rate is same for
Other cities of Maharashtra, we can
put more emphasis on
advertisements in other states to get
more leads
• People who said No for Free copy of
Mastering the interview are highest in
the conversion
BIVARIATE ANALYSIS
- Neil Armstrong
From above plots it can be infer that
• From above plot, it can be infer that
Working Professionals have the
higher conversion
• Unemployed have the highest count
in the lead category and additional
focus can be given to them in
conversion
• Google as the lead source has the
highest conversion and the top two
count of leads are from Direct Traffic
or Google
• Lead Origin as Landing Page
Submission has the highest count of
leads along with most conversions
• From city plot, we can see that the
conversion and lead rate is same for
Other cities of Maharashtra, we can
put more emphasis on
advertisements in other states to get
more leads
• People who said No for Free copy of
Mastering the interview are highest in
the conversion
MULTIVARIATE ANALYSIS
From above plots it can be infer that
• Students/Others who visited the site
regularly are more likely to be converted
leads
• we can see that leads spending more
time on website are majorly converted
irrespective of Specialization.
• Lead Source as Wellingak Website,
ClarkChat, Referral Sites & Organic
Search are the ones who have most of
them converted amongst the other lead
source.
CORRELATION MATRIX
From above Correlation Matrix, we can see that
 Converted is having positive correlation with Total Time
Spent on Website
 and negative relationship with Page Views Per Visit
MODEL BUILDING
DATA PREPARATION STEPS
 Converted binary variables (Yes/No) to 0 and 1 for model building.
 Created Dummy variables for all category columns using pd.get_dummies
TRAIN-TEST SPLIT
 Split the data into train and test data frame using 70-30% ratio. At this stage we have imported train-test-split library from sklearn
FEATURE SCALING
 We have used MinMax Scaler to convert the numerical columns so that they have comparable scales. If we don't have comparable scales, then some
of the coefficients as obtained by fitting the model might be very large or very small as compared to the other coefficients which is not good at the
time of model evaluation
MODEL BUILDING
 Build 1st Logistic Regression training model using all features.
 To build best fit model, we used Recursive Feature Elimination technique to get the top 20 features to build out next model
 For each model build, we have checked for p-value should be less than 0.05
 To remove Multicollinearity, calculated Variance Inflation Factor(VIF) to check if feature variables are not correlated with each other.
 Dropped the features which have high p-value and highly correlated one by one and recursively build the model to get optimal model.
MODEL EVALUATION – TRAIN DATA
 After getting optimal model, evaluated performance metrics score Accuracy, Recall, Precision, F1 score.
 ROC curve plotted that shows the tradeoff between sensitivity and specificity (any increase in sensitivity will be accompanied by a decrease in
specificity).
 The closer the curve follows the left-hand border and then the top border of the ROC space, the more accurate the test.
 The closer the curve comes to the 45-degree diagonal of the ROC space, the less accurate the test.
 Calculated Optimal cutoff point between sensitivity & specificity. From below plot, we have received 0.33 as the optimal cut-off point.
 Also checked Precision and Recall trade-off as this will help us to identify the predicted CONVERTED is actual CONVERTED
 The Precision and Recall tradeoff came out to be 0.38, we have considered that as our cut-off probability on test data.
MODEL EVALUATION – TEST DATA
 Run the final optimal model on test dataset with below observations :
 ROC Curve came out similar to what we got on our train data.
 Recall/Sensitivity Score is 85.4%
 Accuracy – 88.9%
 Precision – 86.4%
LEAD SCORE PREDICTION
 The final_predicted column shows the conversion probability of
prospective lead
 Lead Score above 39 have a high tendency of converting to a Hot Lead
category
RECOMMENDATIONS
 Prospect spending more time on website have high changes of becoming Hot Leads therefore Sales team can provide more focus on reaching out to
those.
 Lead Score with Welingak websites and referral are the ones who have the highest amount of conversions therefore additional marketing can be done
on the websites and sales team can sent the course details and promotional offers to existing users to get more Hot Leads
 Leads contacted via email/sms has higher chances of conversion.
 Unemployed/Working Professionals as Occupation category can generate more leads by reaching out to them and providing information about the
courses available.
CONCLUSION
 The final model shows 88.9% accuracy with Recall as 85.7% and Precision as 85.3%
 The optimal cut-off was selected based on Precision and Recall trade off score.
 The model also worked fine on test dataset with Recall as 85.4% and Precision as 86.4%
 Overall the model looks good and is able to identify the correct leads which has high chances of conversion using Lead Score prediction

More Related Content

What's hot

Churn prediction data modeling
Churn prediction data modelingChurn prediction data modeling
Churn prediction data modeling
Pierre Gutierrez
 
Churn Analysis in Telecom Industry
Churn Analysis in Telecom IndustryChurn Analysis in Telecom Industry
Churn Analysis in Telecom Industry
Satyam Barsaiyan
 
IRJET - Customer Churn Analysis in Telecom Industry
IRJET - Customer Churn Analysis in Telecom IndustryIRJET - Customer Churn Analysis in Telecom Industry
IRJET - Customer Churn Analysis in Telecom Industry
IRJET Journal
 
MIS637_Final_Project_Rahul_Bhatia
MIS637_Final_Project_Rahul_BhatiaMIS637_Final_Project_Rahul_Bhatia
MIS637_Final_Project_Rahul_BhatiaRahul Bhatia
 
churn prediction in telecom
churn prediction in telecom churn prediction in telecom
churn prediction in telecom
Hong Bui Van
 
Customer churn prediction in banking
Customer churn prediction in bankingCustomer churn prediction in banking
Customer churn prediction in banking
BU - PG Master Computing Conference
 
Customer churn prediction for telecom data set.
Customer churn prediction for telecom data set.Customer churn prediction for telecom data set.
Customer churn prediction for telecom data set.
Kuldeep Mahani
 
Telcom churn .pptx
Telcom churn .pptxTelcom churn .pptx
Telcom churn .pptx
ResearchproGlobal
 
Module 5: Decision Trees
Module 5: Decision TreesModule 5: Decision Trees
Module 5: Decision Trees
Sara Hooker
 
Customer Churn Prevention Powerpoint Presentation Slides
Customer Churn Prevention Powerpoint Presentation SlidesCustomer Churn Prevention Powerpoint Presentation Slides
Customer Churn Prevention Powerpoint Presentation Slides
SlideTeam
 
Customer Churn Analysis and Prediction
Customer Churn Analysis and PredictionCustomer Churn Analysis and Prediction
Customer Churn Analysis and Prediction
SOUMIT KAR
 
Data mining and analysis of customer churn dataset
Data mining and analysis of customer churn datasetData mining and analysis of customer churn dataset
Data mining and analysis of customer churn dataset
Rohan Choksi
 
Decision tree for Predictive Modeling
Decision tree for Predictive ModelingDecision tree for Predictive Modeling
Decision tree for Predictive Modeling
Edureka!
 
Telecom Churn Prediction Presentation
Telecom Churn Prediction PresentationTelecom Churn Prediction Presentation
Telecom Churn Prediction Presentation
PinintiHarishReddy
 
Churn in the Telecommunications Industry
Churn in the Telecommunications IndustryChurn in the Telecommunications Industry
Churn in the Telecommunications Industry
skewdlogix
 
Storytelling-case-study-PPT.ppsx
Storytelling-case-study-PPT.ppsxStorytelling-case-study-PPT.ppsx
Storytelling-case-study-PPT.ppsx
Devanshi358374
 
Data analytics telecom churn final ppt
Data analytics telecom churn final ppt Data analytics telecom churn final ppt
Data analytics telecom churn final ppt
Gunvansh Khanna
 
Churn prediction
Churn predictionChurn prediction
Churn prediction
Gigi Lino
 
Telecom Churn Prediction
Telecom Churn PredictionTelecom Churn Prediction
Telecom Churn Prediction
Anurag Mukhopadhyay
 
Boom bikes data analysis
Boom bikes data analysisBoom bikes data analysis
Boom bikes data analysis
Laveena5
 

What's hot (20)

Churn prediction data modeling
Churn prediction data modelingChurn prediction data modeling
Churn prediction data modeling
 
Churn Analysis in Telecom Industry
Churn Analysis in Telecom IndustryChurn Analysis in Telecom Industry
Churn Analysis in Telecom Industry
 
IRJET - Customer Churn Analysis in Telecom Industry
IRJET - Customer Churn Analysis in Telecom IndustryIRJET - Customer Churn Analysis in Telecom Industry
IRJET - Customer Churn Analysis in Telecom Industry
 
MIS637_Final_Project_Rahul_Bhatia
MIS637_Final_Project_Rahul_BhatiaMIS637_Final_Project_Rahul_Bhatia
MIS637_Final_Project_Rahul_Bhatia
 
churn prediction in telecom
churn prediction in telecom churn prediction in telecom
churn prediction in telecom
 
Customer churn prediction in banking
Customer churn prediction in bankingCustomer churn prediction in banking
Customer churn prediction in banking
 
Customer churn prediction for telecom data set.
Customer churn prediction for telecom data set.Customer churn prediction for telecom data set.
Customer churn prediction for telecom data set.
 
Telcom churn .pptx
Telcom churn .pptxTelcom churn .pptx
Telcom churn .pptx
 
Module 5: Decision Trees
Module 5: Decision TreesModule 5: Decision Trees
Module 5: Decision Trees
 
Customer Churn Prevention Powerpoint Presentation Slides
Customer Churn Prevention Powerpoint Presentation SlidesCustomer Churn Prevention Powerpoint Presentation Slides
Customer Churn Prevention Powerpoint Presentation Slides
 
Customer Churn Analysis and Prediction
Customer Churn Analysis and PredictionCustomer Churn Analysis and Prediction
Customer Churn Analysis and Prediction
 
Data mining and analysis of customer churn dataset
Data mining and analysis of customer churn datasetData mining and analysis of customer churn dataset
Data mining and analysis of customer churn dataset
 
Decision tree for Predictive Modeling
Decision tree for Predictive ModelingDecision tree for Predictive Modeling
Decision tree for Predictive Modeling
 
Telecom Churn Prediction Presentation
Telecom Churn Prediction PresentationTelecom Churn Prediction Presentation
Telecom Churn Prediction Presentation
 
Churn in the Telecommunications Industry
Churn in the Telecommunications IndustryChurn in the Telecommunications Industry
Churn in the Telecommunications Industry
 
Storytelling-case-study-PPT.ppsx
Storytelling-case-study-PPT.ppsxStorytelling-case-study-PPT.ppsx
Storytelling-case-study-PPT.ppsx
 
Data analytics telecom churn final ppt
Data analytics telecom churn final ppt Data analytics telecom churn final ppt
Data analytics telecom churn final ppt
 
Churn prediction
Churn predictionChurn prediction
Churn prediction
 
Telecom Churn Prediction
Telecom Churn PredictionTelecom Churn Prediction
Telecom Churn Prediction
 
Boom bikes data analysis
Boom bikes data analysisBoom bikes data analysis
Boom bikes data analysis
 

Similar to Lead Scoring Case Study_Final.pptx

Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
ThinkInnovation
 
Lead Scoring Group Case Study Presentation.pdf
Lead Scoring Group Case Study Presentation.pdfLead Scoring Group Case Study Presentation.pdf
Lead Scoring Group Case Study Presentation.pdf
KrishP2
 
Data mining - Machine Learning
Data mining - Machine LearningData mining - Machine Learning
Data mining - Machine Learning
RupaDutta3
 
Intelligent Analytics and Agile Customer Journeys - Christian Twiste, Korcomp...
Intelligent Analytics and Agile Customer Journeys - Christian Twiste, Korcomp...Intelligent Analytics and Agile Customer Journeys - Christian Twiste, Korcomp...
Intelligent Analytics and Agile Customer Journeys - Christian Twiste, Korcomp...
DigiMarCon - Digital Marketing, Media and Advertising Conferences & Exhibitions
 
Marketing Campaign Efficacy
Marketing Campaign EfficacyMarketing Campaign Efficacy
Marketing Campaign Efficacy
mercurypradeepu
 
Conversion Rate Optimisation - A fundamental part of your digital marketing s...
Conversion Rate Optimisation - A fundamental part of your digital marketing s...Conversion Rate Optimisation - A fundamental part of your digital marketing s...
Conversion Rate Optimisation - A fundamental part of your digital marketing s...
Visualsoft
 
A Brief History of Optimisation
A Brief History of OptimisationA Brief History of Optimisation
A Brief History of Optimisation
Render Positive
 
Gt 3 4 tips to achieve pipeline accuracy v19 july10
Gt 3 4 tips to achieve pipeline accuracy v19 july10Gt 3 4 tips to achieve pipeline accuracy v19 july10
Gt 3 4 tips to achieve pipeline accuracy v19 july10
Evergreen Growth Advisors
 
Attribution modeling 101, Mariia Bocheva
Attribution modeling 101, Mariia BochevaAttribution modeling 101, Mariia Bocheva
Attribution modeling 101, Mariia Bocheva
Mariia Bocheva
 
Attribution modeling 101
Attribution modeling 101 Attribution modeling 101
Attribution modeling 101
OWOX BI
 
Machine learning project_promotion
Machine learning project_promotionMachine learning project_promotion
Machine learning project_promotion
kahhuey
 
Managing Top Tasks
Managing Top TasksManaging Top Tasks
Managing Top Tasks
Michele Ide-Smith
 
Big Data LDN 2017: Advanced Analytics Applied to Marketing Attribution
Big Data LDN 2017: Advanced Analytics Applied to Marketing AttributionBig Data LDN 2017: Advanced Analytics Applied to Marketing Attribution
Big Data LDN 2017: Advanced Analytics Applied to Marketing Attribution
Matt Stubbs
 
How to Run Landing Page Tests On and Off Paid Social Platforms
How to Run Landing Page Tests On and Off Paid Social PlatformsHow to Run Landing Page Tests On and Off Paid Social Platforms
How to Run Landing Page Tests On and Off Paid Social Platforms
VWO
 
Mastering SaaS Pricing - SaaStr Annual 2018
Mastering SaaS Pricing - SaaStr Annual 2018Mastering SaaS Pricing - SaaStr Annual 2018
Mastering SaaS Pricing - SaaStr Annual 2018
OpenView
 
How To Build a Winning Conversion Optimization Strategy
How To Build a Winning Conversion Optimization StrategyHow To Build a Winning Conversion Optimization Strategy
How To Build a Winning Conversion Optimization Strategy
VWO
 
Conversion Whitepaper
Conversion WhitepaperConversion Whitepaper
Conversion Whitepaper
WSI Ensenada
 
MonetizingStatistics
MonetizingStatisticsMonetizingStatistics
MonetizingStatisticsAaron Sankey
 
[Webinar] The Scalable Way: Unlocking Data To Drive Great Customer Experience...
[Webinar] The Scalable Way: Unlocking Data To Drive Great Customer Experience...[Webinar] The Scalable Way: Unlocking Data To Drive Great Customer Experience...
[Webinar] The Scalable Way: Unlocking Data To Drive Great Customer Experience...
VWO
 
9 CRO Hacks to Accelerate the B2B Funnel
9 CRO Hacks to Accelerate the B2B Funnel9 CRO Hacks to Accelerate the B2B Funnel
9 CRO Hacks to Accelerate the B2B Funnel
DemandWave
 

Similar to Lead Scoring Case Study_Final.pptx (20)

Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
 
Lead Scoring Group Case Study Presentation.pdf
Lead Scoring Group Case Study Presentation.pdfLead Scoring Group Case Study Presentation.pdf
Lead Scoring Group Case Study Presentation.pdf
 
Data mining - Machine Learning
Data mining - Machine LearningData mining - Machine Learning
Data mining - Machine Learning
 
Intelligent Analytics and Agile Customer Journeys - Christian Twiste, Korcomp...
Intelligent Analytics and Agile Customer Journeys - Christian Twiste, Korcomp...Intelligent Analytics and Agile Customer Journeys - Christian Twiste, Korcomp...
Intelligent Analytics and Agile Customer Journeys - Christian Twiste, Korcomp...
 
Marketing Campaign Efficacy
Marketing Campaign EfficacyMarketing Campaign Efficacy
Marketing Campaign Efficacy
 
Conversion Rate Optimisation - A fundamental part of your digital marketing s...
Conversion Rate Optimisation - A fundamental part of your digital marketing s...Conversion Rate Optimisation - A fundamental part of your digital marketing s...
Conversion Rate Optimisation - A fundamental part of your digital marketing s...
 
A Brief History of Optimisation
A Brief History of OptimisationA Brief History of Optimisation
A Brief History of Optimisation
 
Gt 3 4 tips to achieve pipeline accuracy v19 july10
Gt 3 4 tips to achieve pipeline accuracy v19 july10Gt 3 4 tips to achieve pipeline accuracy v19 july10
Gt 3 4 tips to achieve pipeline accuracy v19 july10
 
Attribution modeling 101, Mariia Bocheva
Attribution modeling 101, Mariia BochevaAttribution modeling 101, Mariia Bocheva
Attribution modeling 101, Mariia Bocheva
 
Attribution modeling 101
Attribution modeling 101 Attribution modeling 101
Attribution modeling 101
 
Machine learning project_promotion
Machine learning project_promotionMachine learning project_promotion
Machine learning project_promotion
 
Managing Top Tasks
Managing Top TasksManaging Top Tasks
Managing Top Tasks
 
Big Data LDN 2017: Advanced Analytics Applied to Marketing Attribution
Big Data LDN 2017: Advanced Analytics Applied to Marketing AttributionBig Data LDN 2017: Advanced Analytics Applied to Marketing Attribution
Big Data LDN 2017: Advanced Analytics Applied to Marketing Attribution
 
How to Run Landing Page Tests On and Off Paid Social Platforms
How to Run Landing Page Tests On and Off Paid Social PlatformsHow to Run Landing Page Tests On and Off Paid Social Platforms
How to Run Landing Page Tests On and Off Paid Social Platforms
 
Mastering SaaS Pricing - SaaStr Annual 2018
Mastering SaaS Pricing - SaaStr Annual 2018Mastering SaaS Pricing - SaaStr Annual 2018
Mastering SaaS Pricing - SaaStr Annual 2018
 
How To Build a Winning Conversion Optimization Strategy
How To Build a Winning Conversion Optimization StrategyHow To Build a Winning Conversion Optimization Strategy
How To Build a Winning Conversion Optimization Strategy
 
Conversion Whitepaper
Conversion WhitepaperConversion Whitepaper
Conversion Whitepaper
 
MonetizingStatistics
MonetizingStatisticsMonetizingStatistics
MonetizingStatistics
 
[Webinar] The Scalable Way: Unlocking Data To Drive Great Customer Experience...
[Webinar] The Scalable Way: Unlocking Data To Drive Great Customer Experience...[Webinar] The Scalable Way: Unlocking Data To Drive Great Customer Experience...
[Webinar] The Scalable Way: Unlocking Data To Drive Great Customer Experience...
 
9 CRO Hacks to Accelerate the B2B Funnel
9 CRO Hacks to Accelerate the B2B Funnel9 CRO Hacks to Accelerate the B2B Funnel
9 CRO Hacks to Accelerate the B2B Funnel
 

Recently uploaded

GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 

Recently uploaded (20)

GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 

Lead Scoring Case Study_Final.pptx

  • 1. Lead Scoring Case Study  Anubhav Maheshwari  Rachna Goel  Ritu
  • 2. CASE STUDY DESCRIPTION An education company named X Education sells online courses to industry professionals. On any given day, many professionals who are interested in the courses land on their website and browse for courses. When these people fill up a form providing their email address or phone number, they are classified to be a lead which will be then passed to Sales team to start making calls or send emails to convert these leads. The typical lead conversion rate at X education is around 30%. Now, although X Education gets a lot of leads, its lead conversion rate is very poor. To make this process more efficient, the company wishes to identify the most potential leads, also known as ‘Hot Leads’. If they successfully identify this set of leads, the lead conversion rate should go up as the sales team will now be focusing more on communicating with the potential leads rather than making calls to everyone.
  • 3. Problem STATEMENT The company requires to build a model wherein a lead score is assigned to each of the leads such that the customers with higher lead score have a higher conversion chance and the customers with lower lead score have a lower conversion chance. This case study focuses on building a logistic regression model to assign a lead score between 0 and 100 to each of the leads which can be used by the company to target potential leads. A higher score would mean that the lead is hot, i.e. is most likely to convert whereas a lower score would mean that the lead is cold and will mostly not get converted. Identification of such leads which can possible be converted is the focus of the case study
  • 4. APPROACH To improve the lead conversion rate to be around 80%, Logistic Regression model is created to identify the important variables and derive insights on how to improve the lead conversion count. Below Steps are performed in the case study for the outcome :  Data Loading & Cleaning  Data Quality & Missing Values Check  Handling Outliers  Exploratory Data Analysis  Data Preparation for Modelling  Train-Test Data Split  Scaling  Feature Selection  Recursive Model Building to find the optimal model  Model Evaluation using Performance Metrics & Building ROC Curve  Finding Optimal Cut-Off point  Predictions on Test data using Final Model  Final Evaluation using Performance Metrics on Test Data  Calculating Lead Score
  • 5. ASSUMPTIONS  For this case study we have dropped the columns where missing value%>40% ('Lead Quality', 'Asymmetrique Activity Index', 'Asymmetrique Profile Score', 'Asymmetrique Activity Score', 'Asymmetrique Profile Index’ ) as applying any imputation on such huge missing values can impact the overall analysis of case study which is not recommended.  Values coming as ‘SELECT’ in few columns have been replaced with null.  For few Category columns, null values has been replaced with a new Category as “Others” to segregate the data.  For few Category columns, merged the category in Others category which has low volume of records.  For Numerical columns, null values has been imputed with IQR*1.5 of the variable for those where mean and median are same but max value is way out of range.  Dropped few unnecessary columns where data was heavily skewed to not impact the overall model building.
  • 6. OUTLIERS TREATMENT - Neil Armstrong  Observed Outliers with two Numerical columns which was derived using BoxPlot on them.  For this case study we have treated any outliers for Continuous variables using Upper Bound values to be able to build proper model.  For Category variable : we have used below two approaches:  Creating a new category for missing values  Proportionately divided the values in existing categories based on their distribution.
  • 8. UNIVARIATE ANALYSIS - Neil Armstrong From above plots, we can infer that  Majority people are using either Google or Direct Traffic as lead source  Unemployed are the majority of people who are visiting the site  The last activity for majority of the leads is Email opened  Majority leads have Landing Page submission as the Lead Origin
  • 9. BIVARIATE ANALYSIS - Neil Armstrong From above plots it can be infer that • From above plot, it can be infer that Working Professionals have the higher conversion • Unemployed have the highest count in the lead category and additional focus can be given to them in conversion • Google as the lead source has the highest conversion and the top two count of leads are from Direct Traffic or Google • Lead Origin as Landing Page Submission has the highest count of leads along with most conversions • From city plot, we can see that the conversion and lead rate is same for Other cities of Maharashtra, we can put more emphasis on advertisements in other states to get more leads • People who said No for Free copy of Mastering the interview are highest in the conversion
  • 10. BIVARIATE ANALYSIS - Neil Armstrong From above plots it can be infer that • From above plot, it can be infer that Working Professionals have the higher conversion • Unemployed have the highest count in the lead category and additional focus can be given to them in conversion • Google as the lead source has the highest conversion and the top two count of leads are from Direct Traffic or Google • Lead Origin as Landing Page Submission has the highest count of leads along with most conversions • From city plot, we can see that the conversion and lead rate is same for Other cities of Maharashtra, we can put more emphasis on advertisements in other states to get more leads • People who said No for Free copy of Mastering the interview are highest in the conversion
  • 11. MULTIVARIATE ANALYSIS From above plots it can be infer that • Students/Others who visited the site regularly are more likely to be converted leads • we can see that leads spending more time on website are majorly converted irrespective of Specialization. • Lead Source as Wellingak Website, ClarkChat, Referral Sites & Organic Search are the ones who have most of them converted amongst the other lead source.
  • 12. CORRELATION MATRIX From above Correlation Matrix, we can see that  Converted is having positive correlation with Total Time Spent on Website  and negative relationship with Page Views Per Visit
  • 14. DATA PREPARATION STEPS  Converted binary variables (Yes/No) to 0 and 1 for model building.  Created Dummy variables for all category columns using pd.get_dummies TRAIN-TEST SPLIT  Split the data into train and test data frame using 70-30% ratio. At this stage we have imported train-test-split library from sklearn FEATURE SCALING  We have used MinMax Scaler to convert the numerical columns so that they have comparable scales. If we don't have comparable scales, then some of the coefficients as obtained by fitting the model might be very large or very small as compared to the other coefficients which is not good at the time of model evaluation
  • 15. MODEL BUILDING  Build 1st Logistic Regression training model using all features.  To build best fit model, we used Recursive Feature Elimination technique to get the top 20 features to build out next model  For each model build, we have checked for p-value should be less than 0.05  To remove Multicollinearity, calculated Variance Inflation Factor(VIF) to check if feature variables are not correlated with each other.  Dropped the features which have high p-value and highly correlated one by one and recursively build the model to get optimal model.
  • 16. MODEL EVALUATION – TRAIN DATA  After getting optimal model, evaluated performance metrics score Accuracy, Recall, Precision, F1 score.  ROC curve plotted that shows the tradeoff between sensitivity and specificity (any increase in sensitivity will be accompanied by a decrease in specificity).  The closer the curve follows the left-hand border and then the top border of the ROC space, the more accurate the test.  The closer the curve comes to the 45-degree diagonal of the ROC space, the less accurate the test.  Calculated Optimal cutoff point between sensitivity & specificity. From below plot, we have received 0.33 as the optimal cut-off point.  Also checked Precision and Recall trade-off as this will help us to identify the predicted CONVERTED is actual CONVERTED  The Precision and Recall tradeoff came out to be 0.38, we have considered that as our cut-off probability on test data.
  • 17. MODEL EVALUATION – TEST DATA  Run the final optimal model on test dataset with below observations :  ROC Curve came out similar to what we got on our train data.  Recall/Sensitivity Score is 85.4%  Accuracy – 88.9%  Precision – 86.4%
  • 18. LEAD SCORE PREDICTION  The final_predicted column shows the conversion probability of prospective lead  Lead Score above 39 have a high tendency of converting to a Hot Lead category
  • 19. RECOMMENDATIONS  Prospect spending more time on website have high changes of becoming Hot Leads therefore Sales team can provide more focus on reaching out to those.  Lead Score with Welingak websites and referral are the ones who have the highest amount of conversions therefore additional marketing can be done on the websites and sales team can sent the course details and promotional offers to existing users to get more Hot Leads  Leads contacted via email/sms has higher chances of conversion.  Unemployed/Working Professionals as Occupation category can generate more leads by reaching out to them and providing information about the courses available. CONCLUSION  The final model shows 88.9% accuracy with Recall as 85.7% and Precision as 85.3%  The optimal cut-off was selected based on Precision and Recall trade off score.  The model also worked fine on test dataset with Recall as 85.4% and Precision as 86.4%  Overall the model looks good and is able to identify the correct leads which has high chances of conversion using Lead Score prediction