SlideShare a Scribd company logo
Hackathon
Machine Learning
By Pro Squad
Apoorva, Deepak, Kunal & Yogesh
Problem Statement
Problem: A mall is doing a coupon campaign and wants to ensure the success of campaign using a
Robust prediction model built with Machine Learning techniques.
Context: Mall has provided historical data which comprises of recommended coupons, customer
details and coupon consumption details of previous years.
Relevance: Mall is going to run the campaign again and based on the historical data of coupons
effectiveness they want to increase the footfalls in the Mall which will help the mall to increase
business for the shops in the mall.
Aims and Objectives: The AIM of the project is to come out with Business Insights on the data
provided and Train a Machine Learning model which can predict the success of campaign with
highest accuracy percentage.
Challenges in Historical Data
• 26 features – 9 Numerical and 17
Categorical
• Missing values in 5 Columns
• Categorical Columns have Multiple labels,
going to maximum 25 labels in 1 column.
• Categorical Data has outliers and
skewness
• Most of the features are correlated
Missing Value Treatment
Missing Values
• Car – There are 84 values only out of 10147 in
this column which is less then 1% hence we
removed this column as it has no impact.
• Bar, CoffeeHouse, CarryAway,
RestaurantLessThan20, Restaurant20To50 –
These have missing values around 2% hence we
have used the Feature engineering technique to
fill the most commonly occurring value out of the
total values available in these columns.
Binning
Occupation column has 25 labels and the data frequency variation is very high creating outliers
and skewness, so we used the Binning technique to reduce the number of labels hence removed
the outliers and skewness
Binning contd..
Outliers: on the Left side image we can
see two dots, these are outliers which
we tackled with binning and hence the
Right side image is the result of
binning on the categorical column
Skewness: on the Left side image we
can see the curve is skewed on the
right, which we have tackled with
binning and post processing the Right
side image is the result of binning on
the categorical column
Data Analysis
Success of Coupons (Historical Data)
28%
27%
25%
11%
9%
Coffee House
Restaurant(<20)
Carry out & Take away
Bar
Restaurant(20-50)
Coffee House, Carry out and Restaurant(<20) were
the most successful coupons
Age Vs Coupons (Historical Data)
164
862
817
751
495
363
235
692
268
1271
1216
885
570
516
303
739
<21 21 26 31 36 41 46 50+
N Y
Age group from 21 to 31 and 50+, the coupon
usage is very high. Below 21 years the coupon
distribution is low and hence the usage.
Data Analysis contd..
Occupation Vs Coupon Success (Historical Data)
Student, Unemployed, computer professionals and
Retired categories the success rate is high.
Marital Status (Historical Data)
Age group from 21 to 31 and 50+, the coupon
usage is very high. Below 21 years the coupon
distribution is low and hence the usage.
N, 860
Y, 1262
0
200
400
600
800
1000
1200
1400
40%
38%
17%
4% 1%
Single
Married partner
Unmarried partner
Divorced
Widowed
Data Analysis contd..
Multicollinearity Chart
Colour Legend
• Yellow shade – Correlation is 0
• Red and Dark Green is -1 and +1
Business Understanding
• Customer ID, Temperature, Time,
Weather, Direction, Passenger and
Driving Distance impact is very low
• Age, Has Children, Marital status,
Gender, Occupation the impact is
intermediate.
• Restaurant type visit rating has the
highest impact
Machine Learning Model
ML Model 1: Logistic Regression
Logistic
Regression
Cross
Validation
Accuracy
68.97%
ML Model 2: Decision Tree
Hyper Tuning
Cross
Validation
Accuracy
70.95%
Decision Tree
Hyper Tuning
Cross
Validation
Accuracy
76.46%
Random
Forest
ML Model 3: Random Forest
ML Models with their accuracy scores
Machine Learning Model
Random Forest – Hyper Tuning to get accuracy
No of Estimators: We used Randomize Search and Grid Search
to find the optimum number of Estimators (Trees) which can
give the highest accuracy score and then used the same in our
Machine Learning Model.
No of Folds: We used 5 folds to create random test and train
split within the model to generate 5 accuracy scores and
based on which the average score got select as the most
optimum score.
Random State: We have tuned the Random state to 80 which
is giving the maximum accuracy score in our model.
Business Insights
Advantages to Business
1. Coffee, Restaurant (<20) and Take away coupons are more successful.
2. Coupons are mostly used by age group 21 to 31 and 50+
3. Computer Workers, Retired, students and Unemployed are mostly using the coupons.
4. Customers tend to use the coupons if Driving Distance is between 5 to 15 minutes.
5. Customers tend to use the coupons mostly when the weather is sunny.
6. Carry away coupons utilization is most for customers using it 1~3 times in a month.
7. Most footfalls are at 7:00 AM and 6:00 PM, probably to pick a snack.
ThankYou

More Related Content

Similar to Pro_Squad.pptx

Attribution modeling 101
Attribution modeling 101 Attribution modeling 101
Attribution modeling 101
OWOX BI
 
Data Insight Leaders Summit Barcelona 2017
Data Insight Leaders Summit Barcelona 2017Data Insight Leaders Summit Barcelona 2017
Data Insight Leaders Summit Barcelona 2017
Harvinder Atwal
 
Data analytics in retail
Data analytics in retailData analytics in retail
Data analytics in retail
tanyazyabkina
 
How to Enter the Data Analytics Industry?
How to Enter the Data Analytics Industry?How to Enter the Data Analytics Industry?
How to Enter the Data Analytics Industry?
Ganes Kesari
 
Sathish_Professional
Sathish_ProfessionalSathish_Professional
Sathish_Professional
Sathish Jayabalan
 
Entering the Data Analytics industry
Entering the Data Analytics industryEntering the Data Analytics industry
Entering the Data Analytics industry
Gramener
 
Database Marketing, part two: data enhancement, analytics, and attribution
Database Marketing, part two: data enhancement, analytics, and attribution Database Marketing, part two: data enhancement, analytics, and attribution
Database Marketing, part two: data enhancement, analytics, and attribution
Relevate
 
Quality to Customer Value
Quality to Customer ValueQuality to Customer Value
Quality to Customer Value
Customer Value Foundation
 
What Your Customers Really Do Online: 5 Ways to Remove the Guesswork
What Your Customers Really Do Online: 5 Ways to Remove the GuessworkWhat Your Customers Really Do Online: 5 Ways to Remove the Guesswork
What Your Customers Really Do Online: 5 Ways to Remove the Guesswork
Optimizely
 
Mastering SaaS Pricing - SaaStr Annual 2018
Mastering SaaS Pricing - SaaStr Annual 2018Mastering SaaS Pricing - SaaStr Annual 2018
Mastering SaaS Pricing - SaaStr Annual 2018
OpenView
 
The analysis of the data has been done using excel statistical sof.docx
The analysis of the data has been done using excel statistical sof.docxThe analysis of the data has been done using excel statistical sof.docx
The analysis of the data has been done using excel statistical sof.docx
mattinsonjanel
 
Analyzing Customer Journey And Data From 360 Degree PowerPoint Presentation S...
Analyzing Customer Journey And Data From 360 Degree PowerPoint Presentation S...Analyzing Customer Journey And Data From 360 Degree PowerPoint Presentation S...
Analyzing Customer Journey And Data From 360 Degree PowerPoint Presentation S...
SlideTeam
 
Funnels Workshop Web Summit 2014 @geckoboard @GA
Funnels Workshop Web Summit 2014 @geckoboard @GAFunnels Workshop Web Summit 2014 @geckoboard @GA
Funnels Workshop Web Summit 2014 @geckoboard @GA
Sofia Quintero
 
MidTerm memo
MidTerm memoMidTerm memo
MidTerm memo
Gaurav Purohit
 
Prediction of customer propensity to churn - Telecom Industry
Prediction of customer propensity to churn - Telecom IndustryPrediction of customer propensity to churn - Telecom Industry
Prediction of customer propensity to churn - Telecom Industry
Pranov Mishra
 
Lottery marketing effectiveness case study
Lottery marketing effectiveness case studyLottery marketing effectiveness case study
Lottery marketing effectiveness case study
Michael Wolfe
 
Defining Target Market for Telemarketing Campaigns
Defining Target Market for Telemarketing CampaignsDefining Target Market for Telemarketing Campaigns
Defining Target Market for Telemarketing Campaigns
Melody Ucros
 
An introduction to data mining and its techniques
An introduction to data mining and its techniquesAn introduction to data mining and its techniques
An introduction to data mining and its techniques
Sandhya Tarwani
 
Consumer Insights PowerPoint Presentation Slides
Consumer Insights PowerPoint Presentation Slides Consumer Insights PowerPoint Presentation Slides
Consumer Insights PowerPoint Presentation Slides
SlideTeam
 
Personal Loan Risk Assessment
Personal Loan Risk Assessment Personal Loan Risk Assessment
Personal Loan Risk Assessment
Kunal Kashyap
 

Similar to Pro_Squad.pptx (20)

Attribution modeling 101
Attribution modeling 101 Attribution modeling 101
Attribution modeling 101
 
Data Insight Leaders Summit Barcelona 2017
Data Insight Leaders Summit Barcelona 2017Data Insight Leaders Summit Barcelona 2017
Data Insight Leaders Summit Barcelona 2017
 
Data analytics in retail
Data analytics in retailData analytics in retail
Data analytics in retail
 
How to Enter the Data Analytics Industry?
How to Enter the Data Analytics Industry?How to Enter the Data Analytics Industry?
How to Enter the Data Analytics Industry?
 
Sathish_Professional
Sathish_ProfessionalSathish_Professional
Sathish_Professional
 
Entering the Data Analytics industry
Entering the Data Analytics industryEntering the Data Analytics industry
Entering the Data Analytics industry
 
Database Marketing, part two: data enhancement, analytics, and attribution
Database Marketing, part two: data enhancement, analytics, and attribution Database Marketing, part two: data enhancement, analytics, and attribution
Database Marketing, part two: data enhancement, analytics, and attribution
 
Quality to Customer Value
Quality to Customer ValueQuality to Customer Value
Quality to Customer Value
 
What Your Customers Really Do Online: 5 Ways to Remove the Guesswork
What Your Customers Really Do Online: 5 Ways to Remove the GuessworkWhat Your Customers Really Do Online: 5 Ways to Remove the Guesswork
What Your Customers Really Do Online: 5 Ways to Remove the Guesswork
 
Mastering SaaS Pricing - SaaStr Annual 2018
Mastering SaaS Pricing - SaaStr Annual 2018Mastering SaaS Pricing - SaaStr Annual 2018
Mastering SaaS Pricing - SaaStr Annual 2018
 
The analysis of the data has been done using excel statistical sof.docx
The analysis of the data has been done using excel statistical sof.docxThe analysis of the data has been done using excel statistical sof.docx
The analysis of the data has been done using excel statistical sof.docx
 
Analyzing Customer Journey And Data From 360 Degree PowerPoint Presentation S...
Analyzing Customer Journey And Data From 360 Degree PowerPoint Presentation S...Analyzing Customer Journey And Data From 360 Degree PowerPoint Presentation S...
Analyzing Customer Journey And Data From 360 Degree PowerPoint Presentation S...
 
Funnels Workshop Web Summit 2014 @geckoboard @GA
Funnels Workshop Web Summit 2014 @geckoboard @GAFunnels Workshop Web Summit 2014 @geckoboard @GA
Funnels Workshop Web Summit 2014 @geckoboard @GA
 
MidTerm memo
MidTerm memoMidTerm memo
MidTerm memo
 
Prediction of customer propensity to churn - Telecom Industry
Prediction of customer propensity to churn - Telecom IndustryPrediction of customer propensity to churn - Telecom Industry
Prediction of customer propensity to churn - Telecom Industry
 
Lottery marketing effectiveness case study
Lottery marketing effectiveness case studyLottery marketing effectiveness case study
Lottery marketing effectiveness case study
 
Defining Target Market for Telemarketing Campaigns
Defining Target Market for Telemarketing CampaignsDefining Target Market for Telemarketing Campaigns
Defining Target Market for Telemarketing Campaigns
 
An introduction to data mining and its techniques
An introduction to data mining and its techniquesAn introduction to data mining and its techniques
An introduction to data mining and its techniques
 
Consumer Insights PowerPoint Presentation Slides
Consumer Insights PowerPoint Presentation Slides Consumer Insights PowerPoint Presentation Slides
Consumer Insights PowerPoint Presentation Slides
 
Personal Loan Risk Assessment
Personal Loan Risk Assessment Personal Loan Risk Assessment
Personal Loan Risk Assessment
 

Recently uploaded

Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Speck&Tech
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
Edge AI and Vision Alliance
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
Zilliz
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
kumardaparthi1024
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
Zilliz
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
IndexBug
 
Things to Consider When Choosing a Website Developer for your Website | FODUU
Things to Consider When Choosing a Website Developer for your Website | FODUUThings to Consider When Choosing a Website Developer for your Website | FODUU
Things to Consider When Choosing a Website Developer for your Website | FODUU
FODUU
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
Claudio Di Ciccio
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
Ivanti
 

Recently uploaded (20)

Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
 
Things to Consider When Choosing a Website Developer for your Website | FODUU
Things to Consider When Choosing a Website Developer for your Website | FODUUThings to Consider When Choosing a Website Developer for your Website | FODUU
Things to Consider When Choosing a Website Developer for your Website | FODUU
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
 

Pro_Squad.pptx

  • 1. Hackathon Machine Learning By Pro Squad Apoorva, Deepak, Kunal & Yogesh
  • 2. Problem Statement Problem: A mall is doing a coupon campaign and wants to ensure the success of campaign using a Robust prediction model built with Machine Learning techniques. Context: Mall has provided historical data which comprises of recommended coupons, customer details and coupon consumption details of previous years. Relevance: Mall is going to run the campaign again and based on the historical data of coupons effectiveness they want to increase the footfalls in the Mall which will help the mall to increase business for the shops in the mall. Aims and Objectives: The AIM of the project is to come out with Business Insights on the data provided and Train a Machine Learning model which can predict the success of campaign with highest accuracy percentage.
  • 3. Challenges in Historical Data • 26 features – 9 Numerical and 17 Categorical • Missing values in 5 Columns • Categorical Columns have Multiple labels, going to maximum 25 labels in 1 column. • Categorical Data has outliers and skewness • Most of the features are correlated
  • 4. Missing Value Treatment Missing Values • Car – There are 84 values only out of 10147 in this column which is less then 1% hence we removed this column as it has no impact. • Bar, CoffeeHouse, CarryAway, RestaurantLessThan20, Restaurant20To50 – These have missing values around 2% hence we have used the Feature engineering technique to fill the most commonly occurring value out of the total values available in these columns.
  • 5. Binning Occupation column has 25 labels and the data frequency variation is very high creating outliers and skewness, so we used the Binning technique to reduce the number of labels hence removed the outliers and skewness
  • 6. Binning contd.. Outliers: on the Left side image we can see two dots, these are outliers which we tackled with binning and hence the Right side image is the result of binning on the categorical column Skewness: on the Left side image we can see the curve is skewed on the right, which we have tackled with binning and post processing the Right side image is the result of binning on the categorical column
  • 7. Data Analysis Success of Coupons (Historical Data) 28% 27% 25% 11% 9% Coffee House Restaurant(<20) Carry out & Take away Bar Restaurant(20-50) Coffee House, Carry out and Restaurant(<20) were the most successful coupons Age Vs Coupons (Historical Data) 164 862 817 751 495 363 235 692 268 1271 1216 885 570 516 303 739 <21 21 26 31 36 41 46 50+ N Y Age group from 21 to 31 and 50+, the coupon usage is very high. Below 21 years the coupon distribution is low and hence the usage.
  • 8. Data Analysis contd.. Occupation Vs Coupon Success (Historical Data) Student, Unemployed, computer professionals and Retired categories the success rate is high. Marital Status (Historical Data) Age group from 21 to 31 and 50+, the coupon usage is very high. Below 21 years the coupon distribution is low and hence the usage. N, 860 Y, 1262 0 200 400 600 800 1000 1200 1400 40% 38% 17% 4% 1% Single Married partner Unmarried partner Divorced Widowed
  • 9. Data Analysis contd.. Multicollinearity Chart Colour Legend • Yellow shade – Correlation is 0 • Red and Dark Green is -1 and +1 Business Understanding • Customer ID, Temperature, Time, Weather, Direction, Passenger and Driving Distance impact is very low • Age, Has Children, Marital status, Gender, Occupation the impact is intermediate. • Restaurant type visit rating has the highest impact
  • 10. Machine Learning Model ML Model 1: Logistic Regression Logistic Regression Cross Validation Accuracy 68.97% ML Model 2: Decision Tree Hyper Tuning Cross Validation Accuracy 70.95% Decision Tree Hyper Tuning Cross Validation Accuracy 76.46% Random Forest ML Model 3: Random Forest ML Models with their accuracy scores
  • 11. Machine Learning Model Random Forest – Hyper Tuning to get accuracy No of Estimators: We used Randomize Search and Grid Search to find the optimum number of Estimators (Trees) which can give the highest accuracy score and then used the same in our Machine Learning Model. No of Folds: We used 5 folds to create random test and train split within the model to generate 5 accuracy scores and based on which the average score got select as the most optimum score. Random State: We have tuned the Random state to 80 which is giving the maximum accuracy score in our model.
  • 12. Business Insights Advantages to Business 1. Coffee, Restaurant (<20) and Take away coupons are more successful. 2. Coupons are mostly used by age group 21 to 31 and 50+ 3. Computer Workers, Retired, students and Unemployed are mostly using the coupons. 4. Customers tend to use the coupons if Driving Distance is between 5 to 15 minutes. 5. Customers tend to use the coupons mostly when the weather is sunny. 6. Carry away coupons utilization is most for customers using it 1~3 times in a month. 7. Most footfalls are at 7:00 AM and 6:00 PM, probably to pick a snack.