SlideShare a Scribd company logo
1 of 13
Hackathon
Machine Learning
By Pro Squad
Apoorva, Deepak, Kunal & Yogesh
Problem Statement
Problem: A mall is doing a coupon campaign and wants to ensure the success of campaign using a
Robust prediction model built with Machine Learning techniques.
Context: Mall has provided historical data which comprises of recommended coupons, customer
details and coupon consumption details of previous years.
Relevance: Mall is going to run the campaign again and based on the historical data of coupons
effectiveness they want to increase the footfalls in the Mall which will help the mall to increase
business for the shops in the mall.
Aims and Objectives: The AIM of the project is to come out with Business Insights on the data
provided and Train a Machine Learning model which can predict the success of campaign with
highest accuracy percentage.
Challenges in Historical Data
• 26 features – 9 Numerical and 17
Categorical
• Missing values in 5 Columns
• Categorical Columns have Multiple labels,
going to maximum 25 labels in 1 column.
• Categorical Data has outliers and
skewness
• Most of the features are correlated
Missing Value Treatment
Missing Values
• Car – There are 84 values only out of 10147 in
this column which is less then 1% hence we
removed this column as it has no impact.
• Bar, CoffeeHouse, CarryAway,
RestaurantLessThan20, Restaurant20To50 –
These have missing values around 2% hence we
have used the Feature engineering technique to
fill the most commonly occurring value out of the
total values available in these columns.
Binning
Occupation column has 25 labels and the data frequency variation is very high creating outliers
and skewness, so we used the Binning technique to reduce the number of labels hence removed
the outliers and skewness
Binning contd..
Outliers: on the Left side image we can
see two dots, these are outliers which
we tackled with binning and hence the
Right side image is the result of
binning on the categorical column
Skewness: on the Left side image we
can see the curve is skewed on the
right, which we have tackled with
binning and post processing the Right
side image is the result of binning on
the categorical column
Data Analysis
Success of Coupons (Historical Data)
28%
27%
25%
11%
9%
Coffee House
Restaurant(<20)
Carry out & Take away
Bar
Restaurant(20-50)
Coffee House, Carry out and Restaurant(<20) were
the most successful coupons
Age Vs Coupons (Historical Data)
164
862
817
751
495
363
235
692
268
1271
1216
885
570
516
303
739
<21 21 26 31 36 41 46 50+
N Y
Age group from 21 to 31 and 50+, the coupon
usage is very high. Below 21 years the coupon
distribution is low and hence the usage.
Data Analysis contd..
Occupation Vs Coupon Success (Historical Data)
Student, Unemployed, computer professionals and
Retired categories the success rate is high.
Marital Status (Historical Data)
Age group from 21 to 31 and 50+, the coupon
usage is very high. Below 21 years the coupon
distribution is low and hence the usage.
N, 860
Y, 1262
0
200
400
600
800
1000
1200
1400
40%
38%
17%
4% 1%
Single
Married partner
Unmarried partner
Divorced
Widowed
Data Analysis contd..
Multicollinearity Chart
Colour Legend
• Yellow shade – Correlation is 0
• Red and Dark Green is -1 and +1
Business Understanding
• Customer ID, Temperature, Time,
Weather, Direction, Passenger and
Driving Distance impact is very low
• Age, Has Children, Marital status,
Gender, Occupation the impact is
intermediate.
• Restaurant type visit rating has the
highest impact
Machine Learning Model
ML Model 1: Logistic Regression
Logistic
Regression
Cross
Validation
Accuracy
68.97%
ML Model 2: Decision Tree
Hyper Tuning
Cross
Validation
Accuracy
70.95%
Decision Tree
Hyper Tuning
Cross
Validation
Accuracy
76.46%
Random
Forest
ML Model 3: Random Forest
ML Models with their accuracy scores
Machine Learning Model
Random Forest – Hyper Tuning to get accuracy
No of Estimators: We used Randomize Search and Grid Search
to find the optimum number of Estimators (Trees) which can
give the highest accuracy score and then used the same in our
Machine Learning Model.
No of Folds: We used 5 folds to create random test and train
split within the model to generate 5 accuracy scores and
based on which the average score got select as the most
optimum score.
Random State: We have tuned the Random state to 80 which
is giving the maximum accuracy score in our model.
Business Insights
Advantages to Business
1. Coffee, Restaurant (<20) and Take away coupons are more successful.
2. Coupons are mostly used by age group 21 to 31 and 50+
3. Computer Workers, Retired, students and Unemployed are mostly using the coupons.
4. Customers tend to use the coupons if Driving Distance is between 5 to 15 minutes.
5. Customers tend to use the coupons mostly when the weather is sunny.
6. Carry away coupons utilization is most for customers using it 1~3 times in a month.
7. Most footfalls are at 7:00 AM and 6:00 PM, probably to pick a snack.
ThankYou

More Related Content

Similar to Pro_Squad.pptx

Attribution modeling 101
Attribution modeling 101 Attribution modeling 101
Attribution modeling 101
OWOX BI
 
The analysis of the data has been done using excel statistical sof.docx
The analysis of the data has been done using excel statistical sof.docxThe analysis of the data has been done using excel statistical sof.docx
The analysis of the data has been done using excel statistical sof.docx
mattinsonjanel
 
Analyzing Customer Journey And Data From 360 Degree PowerPoint Presentation S...
Analyzing Customer Journey And Data From 360 Degree PowerPoint Presentation S...Analyzing Customer Journey And Data From 360 Degree PowerPoint Presentation S...
Analyzing Customer Journey And Data From 360 Degree PowerPoint Presentation S...
SlideTeam
 

Similar to Pro_Squad.pptx (20)

Attribution modeling 101
Attribution modeling 101 Attribution modeling 101
Attribution modeling 101
 
Data Insight Leaders Summit Barcelona 2017
Data Insight Leaders Summit Barcelona 2017Data Insight Leaders Summit Barcelona 2017
Data Insight Leaders Summit Barcelona 2017
 
Data analytics in retail
Data analytics in retailData analytics in retail
Data analytics in retail
 
How to Enter the Data Analytics Industry?
How to Enter the Data Analytics Industry?How to Enter the Data Analytics Industry?
How to Enter the Data Analytics Industry?
 
Sathish_Professional
Sathish_ProfessionalSathish_Professional
Sathish_Professional
 
Entering the Data Analytics industry
Entering the Data Analytics industryEntering the Data Analytics industry
Entering the Data Analytics industry
 
Database Marketing, part two: data enhancement, analytics, and attribution
Database Marketing, part two: data enhancement, analytics, and attribution Database Marketing, part two: data enhancement, analytics, and attribution
Database Marketing, part two: data enhancement, analytics, and attribution
 
Quality to Customer Value
Quality to Customer ValueQuality to Customer Value
Quality to Customer Value
 
What Your Customers Really Do Online: 5 Ways to Remove the Guesswork
What Your Customers Really Do Online: 5 Ways to Remove the GuessworkWhat Your Customers Really Do Online: 5 Ways to Remove the Guesswork
What Your Customers Really Do Online: 5 Ways to Remove the Guesswork
 
Mastering SaaS Pricing - SaaStr Annual 2018
Mastering SaaS Pricing - SaaStr Annual 2018Mastering SaaS Pricing - SaaStr Annual 2018
Mastering SaaS Pricing - SaaStr Annual 2018
 
The analysis of the data has been done using excel statistical sof.docx
The analysis of the data has been done using excel statistical sof.docxThe analysis of the data has been done using excel statistical sof.docx
The analysis of the data has been done using excel statistical sof.docx
 
Analyzing Customer Journey And Data From 360 Degree PowerPoint Presentation S...
Analyzing Customer Journey And Data From 360 Degree PowerPoint Presentation S...Analyzing Customer Journey And Data From 360 Degree PowerPoint Presentation S...
Analyzing Customer Journey And Data From 360 Degree PowerPoint Presentation S...
 
Funnels Workshop Web Summit 2014 @geckoboard @GA
Funnels Workshop Web Summit 2014 @geckoboard @GAFunnels Workshop Web Summit 2014 @geckoboard @GA
Funnels Workshop Web Summit 2014 @geckoboard @GA
 
MidTerm memo
MidTerm memoMidTerm memo
MidTerm memo
 
Prediction of customer propensity to churn - Telecom Industry
Prediction of customer propensity to churn - Telecom IndustryPrediction of customer propensity to churn - Telecom Industry
Prediction of customer propensity to churn - Telecom Industry
 
Lottery marketing effectiveness case study
Lottery marketing effectiveness case studyLottery marketing effectiveness case study
Lottery marketing effectiveness case study
 
Defining Target Market for Telemarketing Campaigns
Defining Target Market for Telemarketing CampaignsDefining Target Market for Telemarketing Campaigns
Defining Target Market for Telemarketing Campaigns
 
An introduction to data mining and its techniques
An introduction to data mining and its techniquesAn introduction to data mining and its techniques
An introduction to data mining and its techniques
 
Consumer Insights PowerPoint Presentation Slides
Consumer Insights PowerPoint Presentation Slides Consumer Insights PowerPoint Presentation Slides
Consumer Insights PowerPoint Presentation Slides
 
Personal Loan Risk Assessment
Personal Loan Risk Assessment Personal Loan Risk Assessment
Personal Loan Risk Assessment
 

Recently uploaded

TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc
 
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
FIDO Alliance
 

Recently uploaded (20)

TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
 
Google I/O Extended 2024 Warsaw
Google I/O Extended 2024 WarsawGoogle I/O Extended 2024 Warsaw
Google I/O Extended 2024 Warsaw
 
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
 
Oauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoftOauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoft
 
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
 
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on ThanabotsContinuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
 
Portal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russePortal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russe
 
JavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuideJavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate Guide
 
Introduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptxIntroduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptx
 
Cyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptx
Cyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptxCyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptx
Cyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptx
 
AI mind or machine power point presentation
AI mind or machine power point presentationAI mind or machine power point presentation
AI mind or machine power point presentation
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
 
Design Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptxDesign Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptx
 
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
 
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
 
Generative AI Use Cases and Applications.pdf
Generative AI Use Cases and Applications.pdfGenerative AI Use Cases and Applications.pdf
Generative AI Use Cases and Applications.pdf
 
Design and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data ScienceDesign and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data Science
 
Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdf
Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdfFrisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdf
Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdf
 
UiPath manufacturing technology benefits and AI overview
UiPath manufacturing technology benefits and AI overviewUiPath manufacturing technology benefits and AI overview
UiPath manufacturing technology benefits and AI overview
 
Overview of Hyperledger Foundation
Overview of Hyperledger FoundationOverview of Hyperledger Foundation
Overview of Hyperledger Foundation
 

Pro_Squad.pptx

  • 1. Hackathon Machine Learning By Pro Squad Apoorva, Deepak, Kunal & Yogesh
  • 2. Problem Statement Problem: A mall is doing a coupon campaign and wants to ensure the success of campaign using a Robust prediction model built with Machine Learning techniques. Context: Mall has provided historical data which comprises of recommended coupons, customer details and coupon consumption details of previous years. Relevance: Mall is going to run the campaign again and based on the historical data of coupons effectiveness they want to increase the footfalls in the Mall which will help the mall to increase business for the shops in the mall. Aims and Objectives: The AIM of the project is to come out with Business Insights on the data provided and Train a Machine Learning model which can predict the success of campaign with highest accuracy percentage.
  • 3. Challenges in Historical Data • 26 features – 9 Numerical and 17 Categorical • Missing values in 5 Columns • Categorical Columns have Multiple labels, going to maximum 25 labels in 1 column. • Categorical Data has outliers and skewness • Most of the features are correlated
  • 4. Missing Value Treatment Missing Values • Car – There are 84 values only out of 10147 in this column which is less then 1% hence we removed this column as it has no impact. • Bar, CoffeeHouse, CarryAway, RestaurantLessThan20, Restaurant20To50 – These have missing values around 2% hence we have used the Feature engineering technique to fill the most commonly occurring value out of the total values available in these columns.
  • 5. Binning Occupation column has 25 labels and the data frequency variation is very high creating outliers and skewness, so we used the Binning technique to reduce the number of labels hence removed the outliers and skewness
  • 6. Binning contd.. Outliers: on the Left side image we can see two dots, these are outliers which we tackled with binning and hence the Right side image is the result of binning on the categorical column Skewness: on the Left side image we can see the curve is skewed on the right, which we have tackled with binning and post processing the Right side image is the result of binning on the categorical column
  • 7. Data Analysis Success of Coupons (Historical Data) 28% 27% 25% 11% 9% Coffee House Restaurant(<20) Carry out & Take away Bar Restaurant(20-50) Coffee House, Carry out and Restaurant(<20) were the most successful coupons Age Vs Coupons (Historical Data) 164 862 817 751 495 363 235 692 268 1271 1216 885 570 516 303 739 <21 21 26 31 36 41 46 50+ N Y Age group from 21 to 31 and 50+, the coupon usage is very high. Below 21 years the coupon distribution is low and hence the usage.
  • 8. Data Analysis contd.. Occupation Vs Coupon Success (Historical Data) Student, Unemployed, computer professionals and Retired categories the success rate is high. Marital Status (Historical Data) Age group from 21 to 31 and 50+, the coupon usage is very high. Below 21 years the coupon distribution is low and hence the usage. N, 860 Y, 1262 0 200 400 600 800 1000 1200 1400 40% 38% 17% 4% 1% Single Married partner Unmarried partner Divorced Widowed
  • 9. Data Analysis contd.. Multicollinearity Chart Colour Legend • Yellow shade – Correlation is 0 • Red and Dark Green is -1 and +1 Business Understanding • Customer ID, Temperature, Time, Weather, Direction, Passenger and Driving Distance impact is very low • Age, Has Children, Marital status, Gender, Occupation the impact is intermediate. • Restaurant type visit rating has the highest impact
  • 10. Machine Learning Model ML Model 1: Logistic Regression Logistic Regression Cross Validation Accuracy 68.97% ML Model 2: Decision Tree Hyper Tuning Cross Validation Accuracy 70.95% Decision Tree Hyper Tuning Cross Validation Accuracy 76.46% Random Forest ML Model 3: Random Forest ML Models with their accuracy scores
  • 11. Machine Learning Model Random Forest – Hyper Tuning to get accuracy No of Estimators: We used Randomize Search and Grid Search to find the optimum number of Estimators (Trees) which can give the highest accuracy score and then used the same in our Machine Learning Model. No of Folds: We used 5 folds to create random test and train split within the model to generate 5 accuracy scores and based on which the average score got select as the most optimum score. Random State: We have tuned the Random state to 80 which is giving the maximum accuracy score in our model.
  • 12. Business Insights Advantages to Business 1. Coffee, Restaurant (<20) and Take away coupons are more successful. 2. Coupons are mostly used by age group 21 to 31 and 50+ 3. Computer Workers, Retired, students and Unemployed are mostly using the coupons. 4. Customers tend to use the coupons if Driving Distance is between 5 to 15 minutes. 5. Customers tend to use the coupons mostly when the weather is sunny. 6. Carry away coupons utilization is most for customers using it 1~3 times in a month. 7. Most footfalls are at 7:00 AM and 6:00 PM, probably to pick a snack.