SlideShare a Scribd company logo
1 of 9
Instacart Market
Basket Analysis
Marine Veits
Motivation
Create a classification model that can predict
which previously purchased products will be in a
user’s next order, to better understand user’s
preferences and optimize the recommendation
system.
Motivation
Procedure
Features
User features:
User total orders
User avg. cart size
User total products
User avg. days since prior order
User avg. reorder per cart
User avg. days between orders
Product features:
Product total orders
Product avg. add to cart order
Product total reorder
Product reorder probability
Department total order
Aisle total order
Avg. order hour of day
Avg. order day of week
User-Product features:
User-product total orders
User-product latest in cart
User-product avg. add to cart
User-product order frequency
Orders since previous product order
User-product avg. days between orders
User-product order days max
User-product days since last product order
User-product avg. order hour of day
User-product avg. order day of week
User-product days since last order max
Results: XGBoost & F1 Optimization
Scores XGBoost:
Training Baseline
XGBoost:
Test+Optimization
F1 0.206 0.396
Accuracy 0.906 0.862
Precision 0.62 0.456
Recall 0.123 0.350
Threshold Adjustment - F1 Optimization
Decision Threshold: 0.199negative~90%, positive~10%
Feature Importance
Conclusions
● Additional model features could
further improve model scores
● Additional metrics, such as dates,
user location, user personal
information, could further improve
the model
● Pickle your models and results!

More Related Content

Similar to Metis Project 3: Predicting Instacart Product Reorder - Kaggle Challenge

Project Report_Food Bazar_Ver1
Project Report_Food Bazar_Ver1Project Report_Food Bazar_Ver1
Project Report_Food Bazar_Ver1
MEGHA JAIN
 
1.1 DetailsCase Study Scenario - Global Trading PLCGlobal Tra.docx
1.1 DetailsCase Study Scenario - Global Trading PLCGlobal Tra.docx1.1 DetailsCase Study Scenario - Global Trading PLCGlobal Tra.docx
1.1 DetailsCase Study Scenario - Global Trading PLCGlobal Tra.docx
jackiewalcutt
 
Chapter 6 terms Assemble-to-orderA production environment wher.docx
Chapter 6 terms Assemble-to-orderA production environment wher.docxChapter 6 terms Assemble-to-orderA production environment wher.docx
Chapter 6 terms Assemble-to-orderA production environment wher.docx
christinemaritza
 
Rick Watkins Power Point Resume
Rick Watkins Power Point ResumeRick Watkins Power Point Resume
Rick Watkins Power Point Resume
rickwatkins
 

Similar to Metis Project 3: Predicting Instacart Product Reorder - Kaggle Challenge (20)

Project Report_Food Bazar_Ver1
Project Report_Food Bazar_Ver1Project Report_Food Bazar_Ver1
Project Report_Food Bazar_Ver1
 
161 200606 Advance Google Analytics 3-4
161 200606 Advance Google Analytics 3-4161 200606 Advance Google Analytics 3-4
161 200606 Advance Google Analytics 3-4
 
Caching Business Logic in the Database
Caching Business Logic in the DatabaseCaching Business Logic in the Database
Caching Business Logic in the Database
 
Chapter 02
Chapter 02Chapter 02
Chapter 02
 
Sdlc plan
Sdlc planSdlc plan
Sdlc plan
 
1.1 DetailsCase Study Scenario - Global Trading PLCGlobal Tra.docx
1.1 DetailsCase Study Scenario - Global Trading PLCGlobal Tra.docx1.1 DetailsCase Study Scenario - Global Trading PLCGlobal Tra.docx
1.1 DetailsCase Study Scenario - Global Trading PLCGlobal Tra.docx
 
Stock Management System
Stock Management SystemStock Management System
Stock Management System
 
Successfully Managing Customer Experience Combining VoC and UX Testing
Successfully Managing Customer Experience Combining VoC and UX TestingSuccessfully Managing Customer Experience Combining VoC and UX Testing
Successfully Managing Customer Experience Combining VoC and UX Testing
 
Chapter 6 terms Assemble-to-orderA production environment wher.docx
Chapter 6 terms Assemble-to-orderA production environment wher.docxChapter 6 terms Assemble-to-orderA production environment wher.docx
Chapter 6 terms Assemble-to-orderA production environment wher.docx
 
Providing Highly Accurate Service Recommendation over Big Data using Adaptive...
Providing Highly Accurate Service Recommendation over Big Data using Adaptive...Providing Highly Accurate Service Recommendation over Big Data using Adaptive...
Providing Highly Accurate Service Recommendation over Big Data using Adaptive...
 
AWS July Webinar Series: Amazon Redshift Reporting and Advanced Analytics
AWS July Webinar Series: Amazon Redshift Reporting and Advanced AnalyticsAWS July Webinar Series: Amazon Redshift Reporting and Advanced Analytics
AWS July Webinar Series: Amazon Redshift Reporting and Advanced Analytics
 
Project Report Of Inventory Management Software [POS]
Project Report Of Inventory Management Software [POS]Project Report Of Inventory Management Software [POS]
Project Report Of Inventory Management Software [POS]
 
Retail Design
Retail DesignRetail Design
Retail Design
 
Rick Watkins Power Point Resume
Rick Watkins Power Point ResumeRick Watkins Power Point Resume
Rick Watkins Power Point Resume
 
Mark Foley Agile Methods And The Business Analystc
Mark Foley   Agile Methods And The Business AnalystcMark Foley   Agile Methods And The Business Analystc
Mark Foley Agile Methods And The Business Analystc
 
Magento Community & Enterprise: De voordelen voor jouw webshop
Magento Community & Enterprise: De voordelen voor jouw webshopMagento Community & Enterprise: De voordelen voor jouw webshop
Magento Community & Enterprise: De voordelen voor jouw webshop
 
Srs for ims dhavisoft
Srs for ims dhavisoftSrs for ims dhavisoft
Srs for ims dhavisoft
 
major ppt.pptx
major ppt.pptxmajor ppt.pptx
major ppt.pptx
 
Scrum sprint structure workshop by Nermina Durmić
Scrum sprint structure workshop by Nermina DurmićScrum sprint structure workshop by Nermina Durmić
Scrum sprint structure workshop by Nermina Durmić
 
Erp presentation with garment industry case study
Erp presentation with garment industry case studyErp presentation with garment industry case study
Erp presentation with garment industry case study
 

Recently uploaded

Abortion Clinic in Alberton +27791653574 WhatsApp Abortion Clinic Services in...
Abortion Clinic in Alberton +27791653574 WhatsApp Abortion Clinic Services in...Abortion Clinic in Alberton +27791653574 WhatsApp Abortion Clinic Services in...
Abortion Clinic in Alberton +27791653574 WhatsApp Abortion Clinic Services in...
mikehavy0
 
Mastering Affiliate Marketing: A Comprehensive Guide to Success
Mastering Affiliate Marketing: A Comprehensive Guide to SuccessMastering Affiliate Marketing: A Comprehensive Guide to Success
Mastering Affiliate Marketing: A Comprehensive Guide to Success
Abdulsamad Lukman
 
4 TRIK CARA MENGGUGURKAN JANIN ATAU ABORSI KANDUNGAN
4 TRIK CARA MENGGUGURKAN JANIN ATAU ABORSI KANDUNGAN4 TRIK CARA MENGGUGURKAN JANIN ATAU ABORSI KANDUNGAN
4 TRIK CARA MENGGUGURKAN JANIN ATAU ABORSI KANDUNGAN
Cara Menggugurkan Kandungan 087776558899
 

Recently uploaded (20)

2024 Social Trends Report V4 from Later.com
2024 Social Trends Report V4 from Later.com2024 Social Trends Report V4 from Later.com
2024 Social Trends Report V4 from Later.com
 
Abortion Clinic in Alberton +27791653574 WhatsApp Abortion Clinic Services in...
Abortion Clinic in Alberton +27791653574 WhatsApp Abortion Clinic Services in...Abortion Clinic in Alberton +27791653574 WhatsApp Abortion Clinic Services in...
Abortion Clinic in Alberton +27791653574 WhatsApp Abortion Clinic Services in...
 
Discover Ardency Elite: Elevate Your Lifestyle
Discover Ardency Elite: Elevate Your LifestyleDiscover Ardency Elite: Elevate Your Lifestyle
Discover Ardency Elite: Elevate Your Lifestyle
 
Global Trends in Market Reserch & Insights - Ray Poynter - May 2023.pdf
Global Trends in Market Reserch & Insights - Ray Poynter - May 2023.pdfGlobal Trends in Market Reserch & Insights - Ray Poynter - May 2023.pdf
Global Trends in Market Reserch & Insights - Ray Poynter - May 2023.pdf
 
Niche Analysis for Client Outreach Outside Marketplace.pptx
Niche Analysis for Client Outreach Outside Marketplace.pptxNiche Analysis for Client Outreach Outside Marketplace.pptx
Niche Analysis for Client Outreach Outside Marketplace.pptx
 
Mastering Affiliate Marketing: A Comprehensive Guide to Success
Mastering Affiliate Marketing: A Comprehensive Guide to SuccessMastering Affiliate Marketing: A Comprehensive Guide to Success
Mastering Affiliate Marketing: A Comprehensive Guide to Success
 
Meta­ unveils­ enhanced­ gen-AI­ tools­ catering­ to­marketers.pdf
Meta­ unveils­ enhanced­ gen-AI­ tools­ catering­ to­marketers.pdfMeta­ unveils­ enhanced­ gen-AI­ tools­ catering­ to­marketers.pdf
Meta­ unveils­ enhanced­ gen-AI­ tools­ catering­ to­marketers.pdf
 
The Art of sales from fictional characters.
The Art of sales from fictional characters.The Art of sales from fictional characters.
The Art of sales from fictional characters.
 
SALES-PITCH-an-introduction-to-sales.pptx
SALES-PITCH-an-introduction-to-sales.pptxSALES-PITCH-an-introduction-to-sales.pptx
SALES-PITCH-an-introduction-to-sales.pptx
 
Personal Brand Exploration Selk_Ingrid_DMBS_PB1_2024-01.pptx
Personal Brand Exploration Selk_Ingrid_DMBS_PB1_2024-01.pptxPersonal Brand Exploration Selk_Ingrid_DMBS_PB1_2024-01.pptx
Personal Brand Exploration Selk_Ingrid_DMBS_PB1_2024-01.pptx
 
Meeting Koordinasi All Team Sales dan Marketing Departement
Meeting Koordinasi All Team Sales dan Marketing DepartementMeeting Koordinasi All Team Sales dan Marketing Departement
Meeting Koordinasi All Team Sales dan Marketing Departement
 
SEO: A Beginner's Guide to Ranking Higher
SEO: A Beginner's Guide to Ranking HigherSEO: A Beginner's Guide to Ranking Higher
SEO: A Beginner's Guide to Ranking Higher
 
Alpha Media March 2024 Buyers Guide.pptx
Alpha Media March 2024 Buyers Guide.pptxAlpha Media March 2024 Buyers Guide.pptx
Alpha Media March 2024 Buyers Guide.pptx
 
TAM_AdEx-Cross_Media_Report-Banking_Finance_Investment_(BFSI)_2023.pdf
TAM_AdEx-Cross_Media_Report-Banking_Finance_Investment_(BFSI)_2023.pdfTAM_AdEx-Cross_Media_Report-Banking_Finance_Investment_(BFSI)_2023.pdf
TAM_AdEx-Cross_Media_Report-Banking_Finance_Investment_(BFSI)_2023.pdf
 
Gain potential customers through Lead Generation
Gain potential customers through Lead GenerationGain potential customers through Lead Generation
Gain potential customers through Lead Generation
 
Optimizing Your Marketing with AI-Powered Prompts
Optimizing Your Marketing with AI-Powered PromptsOptimizing Your Marketing with AI-Powered Prompts
Optimizing Your Marketing with AI-Powered Prompts
 
The Essence of Mothers Celebrating the Heart of the Family.pptx
The Essence of Mothers Celebrating the Heart of the Family.pptxThe Essence of Mothers Celebrating the Heart of the Family.pptx
The Essence of Mothers Celebrating the Heart of the Family.pptx
 
4 TRIK CARA MENGGUGURKAN JANIN ATAU ABORSI KANDUNGAN
4 TRIK CARA MENGGUGURKAN JANIN ATAU ABORSI KANDUNGAN4 TRIK CARA MENGGUGURKAN JANIN ATAU ABORSI KANDUNGAN
4 TRIK CARA MENGGUGURKAN JANIN ATAU ABORSI KANDUNGAN
 
Cartona.pptx. Marketing how to present your project very well , discussed a...
Cartona.pptx.   Marketing how to present your project very well , discussed a...Cartona.pptx.   Marketing how to present your project very well , discussed a...
Cartona.pptx. Marketing how to present your project very well , discussed a...
 
Social Media Marketing Portfolio - Maharsh Benday
Social Media Marketing Portfolio - Maharsh BendaySocial Media Marketing Portfolio - Maharsh Benday
Social Media Marketing Portfolio - Maharsh Benday
 

Metis Project 3: Predicting Instacart Product Reorder - Kaggle Challenge

  • 2. Motivation Create a classification model that can predict which previously purchased products will be in a user’s next order, to better understand user’s preferences and optimize the recommendation system.
  • 5. Features User features: User total orders User avg. cart size User total products User avg. days since prior order User avg. reorder per cart User avg. days between orders Product features: Product total orders Product avg. add to cart order Product total reorder Product reorder probability Department total order Aisle total order Avg. order hour of day Avg. order day of week User-Product features: User-product total orders User-product latest in cart User-product avg. add to cart User-product order frequency Orders since previous product order User-product avg. days between orders User-product order days max User-product days since last product order User-product avg. order hour of day User-product avg. order day of week User-product days since last order max
  • 6. Results: XGBoost & F1 Optimization Scores XGBoost: Training Baseline XGBoost: Test+Optimization F1 0.206 0.396 Accuracy 0.906 0.862 Precision 0.62 0.456 Recall 0.123 0.350
  • 7. Threshold Adjustment - F1 Optimization Decision Threshold: 0.199negative~90%, positive~10%
  • 9. Conclusions ● Additional model features could further improve model scores ● Additional metrics, such as dates, user location, user personal information, could further improve the model ● Pickle your models and results!

Editor's Notes

  1. For this project I worked on Kaggle past competition on Instacart Market Basket Analysis
  2. The main goal of this problem was to Create a classification model that can predict which previously purchased products will be in a user’s next order, to better understand user’s preferences and optimize the recommendation system.
  3. Basically, we want to use user’s order history in order to make relevant recommendations for each user to maximize the probability of ordering a recommended product.
  4. I used Instacart dataset from kaggle, which includes more than 30M rows of data on more than 200k Instacart users. The dataset includes 5 relational tables (orders, products, product orders, aisles and departments). Data processing was done using PostgreSQL In order to test and create models for this data I used AWS (64GiB) The initial evaluation, training and testing of my models was done on 10% of the data to save time and then I moved on to the whole dataset For my final model I engineered 25 product, user, time and product-user features. During model training the data was split twice to create a holdout set: first 80% for training set and 20% for testing, and then the training set was split again 75% for training and 25% percent for validation. The best classification model for this data was XGBoost
  5. The target of my model was to predict reorder of a product by user, 1 if a specific product will be reordered by a specific user in the next cart and 0 if the product won’t be reordered. During my model choosing and training process I created multiple features using the metrics that were given in the dataset. Those are the final 25 features I used in my classification model. User features: 1. user_total_orders: how many orders a user has 2. user_avg_cartsize: average number of products they buy in an order 3. user_total_products: how many different products they've bought over time 4. user_avg_days_since_prior_order: how long they typically wait between orders 5. user_avg_reorder_per_cart: average number of reorders per cart 6. avg_days_between_orders 7. user_avg_order_hod: average order hour of day 8. user_avg_order_dow: average order day of the week Product features: 1. product_total_orders: product popularity across all users 2. product_avg_add_to_cart_order: typical priority level in an order by averaging its add_to_cart_order 3. product_total_reorder: times product was reordered ( 4. product_reorder_probability= product_total_reorder/product_total_orders 5. department_total_order: categorical 6. aisle: categorical User-Product features: 1. user_product_total_order: how many times a user ordered a product 2. latest_in_cart: user’s latest cart products 3. user_product_avg_add_to_cart_order: get a sense of how much priority each user places on each product by looking at the typical add_to_cart_order for that user-product combination 4. user_product_order_freq=user_product_total_orders/user_total_orders: % of times a product occurs across all of a user's orders 5. orders_since_previous_product_order: how many orders made by user since the last time a product was ordered by a user 6. user_product_order_days_max: max days of product order per user 7. user_product _days_since_ last_product_order: when was the last time a product was ordered, how many days past since last order 8. user_product _days_last_order_max= days_since_ previous_product_order – user_product_order_days_max
  6. After evaluating and comparing various scores, the best model for this dataset classification model was XgBoost. In addition, due to target Class imbalance, in this data the reordered products were only 10% of all targets, I adjusted the decision threshold to optimize F1 score after my model training. We can see improvement in F1 and recall scores Accuracy=(TP+TN)/(TP+TN+FP+FN) : All Correctly classified ratio Precision=TP/(TN+FP) : Correctly classified True Positive out of all negatives Recall(sensitivity)=TP/(TP+FP): Correctly classified True Positive out of all positive classifications
  7. Threshold Adjustment for F1 optimization after model training was the highest result in this case. Based on the adjustment, the decision threshold is 0.199, so I used this threshold for my model final testing set.
  8. I found that the most important features in this model were User-product order frequency, product reorder probability, average reorder per user and user total orders. user_product_order_freq=user_product_total_orders/user_total_orders: % of times a product occurs across all of a user's orders product_reorder_probability= product_total_reorder/product_total_orders user_avg_reorder_per_cart: average number of reorders per cart user_total_orders: how many orders a user has
  9. To conclude: Additional model features could further improve model scores Additional metrics, such as dates, user location, user personal information, could further improve the model Pickle your models and results!