SlideShare a Scribd company logo
Wayfair
Cutomer data analysis
Goal of the Project
1. Whether a B2B customer will purchase or not in
the next 30 days
2. How much a B2B customer will spend in the next
30 days
Steps to Process
• Data Cleaning
• Data Splitting
• Model Selection
• Model Evaluation
• Feature Importance
• Model Optimization
• Final Output
Data Cleaning
1.First I divided the columns into two parts :
• Categorical Column
• Numerical Column
2. 75% of the numerical column has values 0.0
I put 0.0 in null values.
3. The categorical columns are given values
0,1,2,3..to help the operation.
For example:
Purchase id : ‘None’ : 0,
’1to2’ : 1,
4.This is how all the columns are turned into
numerical columns
Data Splitting
• K fold method
• Train test split
• K Nearest Neighbor (KNN)
• Logistic Regression (LR)
• Stochastic Gradient Descent (SDGC)
• Naïve Byes (NB)
• Random Forest (RF)
• Gradient Boosting (GB)
• Decision Tree (DT)
• Lasso Regression
• ElasticNet Regression
• XGB Boost
• Ridge Regression
Model Selection
K Fold
Method
Train Test
Split
Method
Classification Model Evaluation
For ‘Convert_30’ based on accuracy and AUC score
36,15
%
45,80
%
85,20
%
Baseline and Optimization
Classification Problem
• After analyzing AUC and
Accuracy Score the model
Random Forest, Gradient
Boosting and Naïve Byes are
three best models for the
classification problem.
• The results shows Gradient
Boosting as the best model .
Feature Importance
A Startup PowerPoint Presentation
Positively Co-related
• The Top three Positively
Correlated features are :
• Days since last visit
• Number of total page view
thirty seven days
• Current Status
7
6
5
• The Top three Negatively Correlated
features are :
• Number of billing
• Average quote price
• Number of tasks attendance in year
sixty
Negatively Co-
related
• The Optimized Gradient Boosting
Score is 0.891
• According to AUC and ROC curve
the normalized data give AUC of
0.755
AUC ROC Curve
• After Running the dataset through
Linear ,Lasso, Ridge, Random
Forest and Gradient Boosting
egression, we get these rmse
scores.
• Gradient Boosting gives lowest
rmse score among them
Regression Model Evaluation
Based on Root Means Square Error
Model Selection and Final
Output
For Predicting ‘Convert_30’ and ’Revenue Gradient Boosting algorithm gives out best scores

More Related Content

Similar to Wayfair-Data Science Project

30thSep2014
30thSep201430thSep2014
30thSep2014
Mia liu
 
Week 12 Dimensionality Reduction Bagian 1
Week 12 Dimensionality Reduction Bagian 1Week 12 Dimensionality Reduction Bagian 1
Week 12 Dimensionality Reduction Bagian 1
khairulhuda242
 

Similar to Wayfair-Data Science Project (20)

Six sigma
Six sigma Six sigma
Six sigma
 
Six Sigma Workshop for World Bank, Chennai - India
Six Sigma Workshop for World Bank, Chennai -  IndiaSix Sigma Workshop for World Bank, Chennai -  India
Six Sigma Workshop for World Bank, Chennai - India
 
Predicting Azure Churn with Deep Learning and Explaining Predictions with LIME
Predicting Azure Churn with Deep Learning and Explaining Predictions with LIMEPredicting Azure Churn with Deep Learning and Explaining Predictions with LIME
Predicting Azure Churn with Deep Learning and Explaining Predictions with LIME
 
1710 track3 zhu
1710 track3 zhu1710 track3 zhu
1710 track3 zhu
 
Reducing the Cost of the Linear Growth Effect using Adaptive Rules with Unlin...
Reducing the Cost of the Linear Growth Effect using Adaptive Rules with Unlin...Reducing the Cost of the Linear Growth Effect using Adaptive Rules with Unlin...
Reducing the Cost of the Linear Growth Effect using Adaptive Rules with Unlin...
 
When Should I Use Simulation?
When Should I Use Simulation?When Should I Use Simulation?
When Should I Use Simulation?
 
Dadm (lys)
Dadm (lys)Dadm (lys)
Dadm (lys)
 
30thSep2014
30thSep201430thSep2014
30thSep2014
 
Week 12 Dimensionality Reduction Bagian 1
Week 12 Dimensionality Reduction Bagian 1Week 12 Dimensionality Reduction Bagian 1
Week 12 Dimensionality Reduction Bagian 1
 
BMDSE v1 - Data Scientist Deck
BMDSE v1 - Data Scientist DeckBMDSE v1 - Data Scientist Deck
BMDSE v1 - Data Scientist Deck
 
Performance OR Capacity #CMGimPACt2016
Performance OR Capacity #CMGimPACt2016 Performance OR Capacity #CMGimPACt2016
Performance OR Capacity #CMGimPACt2016
 
machineLearningTypingTool_Rev1
machineLearningTypingTool_Rev1machineLearningTypingTool_Rev1
machineLearningTypingTool_Rev1
 
Kaggle Higgs Boson Machine Learning Challenge
Kaggle Higgs Boson Machine Learning ChallengeKaggle Higgs Boson Machine Learning Challenge
Kaggle Higgs Boson Machine Learning Challenge
 
Barga Galvanize Sept 2015
Barga Galvanize Sept 2015Barga Galvanize Sept 2015
Barga Galvanize Sept 2015
 
Ali upload
Ali uploadAli upload
Ali upload
 
Customer_Churn_prediction.pptx
Customer_Churn_prediction.pptxCustomer_Churn_prediction.pptx
Customer_Churn_prediction.pptx
 
Customer_Churn_prediction.pptx
Customer_Churn_prediction.pptxCustomer_Churn_prediction.pptx
Customer_Churn_prediction.pptx
 
Like-for-Like Comparisons of Machine Learning Algorithms - Dominik Dahlem, Bo...
Like-for-Like Comparisons of Machine Learning Algorithms - Dominik Dahlem, Bo...Like-for-Like Comparisons of Machine Learning Algorithms - Dominik Dahlem, Bo...
Like-for-Like Comparisons of Machine Learning Algorithms - Dominik Dahlem, Bo...
 
Intro to Reinforcement learning - part III
Intro to Reinforcement learning - part IIIIntro to Reinforcement learning - part III
Intro to Reinforcement learning - part III
 
Virtual segment brief
Virtual segment briefVirtual segment brief
Virtual segment brief
 

Recently uploaded

Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...
Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...
Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...
Domenico Conte
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
Opendatabay
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
ukgaet
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
ewymefz
 
Computer Presentation.pptx ecommerce advantage s
Computer Presentation.pptx ecommerce advantage sComputer Presentation.pptx ecommerce advantage s
Computer Presentation.pptx ecommerce advantage s
MAQIB18
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
vcaxypu
 

Recently uploaded (20)

Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...
Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...
Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
Jpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization Sample
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
Supply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflictSupply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflict
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
How can I successfully sell my pi coins in Philippines?
How can I successfully sell my pi coins in Philippines?How can I successfully sell my pi coins in Philippines?
How can I successfully sell my pi coins in Philippines?
 
Business update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIBusiness update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMI
 
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPsWebinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 
Uber Ride Supply Demand Gap Analysis Report
Uber Ride Supply Demand Gap Analysis ReportUber Ride Supply Demand Gap Analysis Report
Uber Ride Supply Demand Gap Analysis Report
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
 
Using PDB Relocation to Move a Single PDB to Another Existing CDB
Using PDB Relocation to Move a Single PDB to Another Existing CDBUsing PDB Relocation to Move a Single PDB to Another Existing CDB
Using PDB Relocation to Move a Single PDB to Another Existing CDB
 
Computer Presentation.pptx ecommerce advantage s
Computer Presentation.pptx ecommerce advantage sComputer Presentation.pptx ecommerce advantage s
Computer Presentation.pptx ecommerce advantage s
 
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
 

Wayfair-Data Science Project

  • 2. Goal of the Project 1. Whether a B2B customer will purchase or not in the next 30 days 2. How much a B2B customer will spend in the next 30 days Steps to Process • Data Cleaning • Data Splitting • Model Selection • Model Evaluation • Feature Importance • Model Optimization • Final Output
  • 3. Data Cleaning 1.First I divided the columns into two parts : • Categorical Column • Numerical Column 2. 75% of the numerical column has values 0.0 I put 0.0 in null values. 3. The categorical columns are given values 0,1,2,3..to help the operation. For example: Purchase id : ‘None’ : 0, ’1to2’ : 1, 4.This is how all the columns are turned into numerical columns
  • 4. Data Splitting • K fold method • Train test split • K Nearest Neighbor (KNN) • Logistic Regression (LR) • Stochastic Gradient Descent (SDGC) • Naïve Byes (NB) • Random Forest (RF) • Gradient Boosting (GB) • Decision Tree (DT) • Lasso Regression • ElasticNet Regression • XGB Boost • Ridge Regression Model Selection
  • 5. K Fold Method Train Test Split Method Classification Model Evaluation For ‘Convert_30’ based on accuracy and AUC score
  • 6. 36,15 % 45,80 % 85,20 % Baseline and Optimization Classification Problem • After analyzing AUC and Accuracy Score the model Random Forest, Gradient Boosting and Naïve Byes are three best models for the classification problem. • The results shows Gradient Boosting as the best model .
  • 7. Feature Importance A Startup PowerPoint Presentation Positively Co-related • The Top three Positively Correlated features are : • Days since last visit • Number of total page view thirty seven days • Current Status 7 6 5 • The Top three Negatively Correlated features are : • Number of billing • Average quote price • Number of tasks attendance in year sixty Negatively Co- related
  • 8. • The Optimized Gradient Boosting Score is 0.891 • According to AUC and ROC curve the normalized data give AUC of 0.755 AUC ROC Curve
  • 9. • After Running the dataset through Linear ,Lasso, Ridge, Random Forest and Gradient Boosting egression, we get these rmse scores. • Gradient Boosting gives lowest rmse score among them Regression Model Evaluation Based on Root Means Square Error
  • 10. Model Selection and Final Output For Predicting ‘Convert_30’ and ’Revenue Gradient Boosting algorithm gives out best scores