SlideShare a Scribd company logo
Data Science @ Instacart
Sharath Rao
Data Scientist / Manager
Search and Discovery
Collaborators: Angadh Singh and Shishir Prasad
v
The Instacart Value Proposition
Groceries from stores
you love
delivered
to your
doorstep
in as little
as an hour
+ + + =
v
Customer Experience
Select a

Store
Shop for
Groceries
Checkout Select Delivery
Time
Delivered
to Doorstep
v
Shopper Experience
Accept Order Find the
Groceries
Out for
Delivery
Delivered
to Doorstep
Scan Barcode
v
Four Sided Marketplace
Customers Shoppers
Products

(Advertisers)
Search
Advertising
Shopping
Delivery
Customer Service
Inventory
Picking
Loyalty
Stores

(Retailers)
v
Two topics today
A Recommendation System
for Discovery
Using Data Science for out of
stock mitigation
v
Online grocery vs Traditional e-commerce
Week 3Week 2
Online
Grocery
Week 1
Traditional
e-commerce
v
Grocery Shopping in “Low Dimensional Space”
Search
Restock
Explore
+
+
=
v
Why personalization at Instacart
Your storeEverybody’s store
v
Repeat purchases increase LTV of recommendations
$5.49
$549
Today A year later
1 +….+ 100
$549
$549
vDifferent recommendation systems address different needs
v
Personalized Top N recommendations
Promote broad-based discovery
in a dynamic catalog
Including from stores customers
may have never shopped
v
Run out of X?
Rank products by
repurchase probability
v
Personalized recommendations of new products
when customers seek out what is new out there
Also addresses product cold start problems
v
Replacement Product Recommendations
Mitigate adverse impact of
last-minute out of stocks
v
“Frequently bought with” Recommendations
Not necessarily
consumed together
Help customers shop for
complementary products
and try alternatives
Probably
consumed together
vPersonalized Top N Recommendations
v
Learning from feedback
Traditionally collaborative filtering used explicit feedback to predict ratings
There may still bias in whether the user chooses to rate
Explicit Feedback Implicit Feedback
v
Learning from Explicit Feedback
• Explicit feedback may be more reliable but there is much less of it
• Less reliable if users rate based on aspirations instead of true preferences
vs
v
Implicit Feedback - trade-off quality and quantity
Strengthofevidence
Number of Events
v
Architecture
Event Data Score and
Select Top N
(Spark/EMR)
User/Product Factors
Event Data
Run-time
ranking for
diversity
Candidate
Selection
ALS
(Spark/EMR)
Generate
User-Product
Matrix
v
A Matrix Factorization Formulation for Implicit Feedback
N Products
MUsers
1
-
-
9
-
-
-
3
20
User Product Matrix
R; (M x N)
1
0
0
1
0
0
0
1
1binary
preferences
Preference Matrix R;
(M x N)
“Collaborative Filtering for Implicit Feedback” - Hu et. al
v
A Matrix Factorization Formulation for Implicit Feedback
~
Y
XT
Product Factors
(k x N)
User Factors
(M x k)
1
0
0
1
0
0
0
1
1
x
Preference Matrix R;
(M x N)
v
Matrix Factorization from Implicit Feedback - The Intuition
#Purchases Preference p Confidence c
0 0 Low
1 1 Low
>>1 1 High
• Confidence increases linearly with purchases r
• c = 1 + alpha * r
• alpha controls the marginal rate of learning from user purchases
• Key questions
• How should the unobserved events be treated
• How should one trade-off observed and the unobserved
v
Regularized Weighted Squared Loss
Confidence
User
Factors
Matrix
Product
Factors
Matrix
Preference
Matrix Regularization
Solve using Alternating Least Squares
v
Architecture
Generate
User-Product
Matrix
ALS
(Spark/EMR)
Score and
Select Top N
(Spark/EMR)
User/Product Factors
Run-time
ranking for
diversity
Candidate
Selection
Event Data
Event Data
v
Spark ALS Hyper-parameter Tuning
• rank k - diminishing returns after 150
• alpha - controls rate of learning from observed events
• iterations - ALS tends to converge within 5, seldom more than 10
• lambda - regularization parameter
v
Architecture
Generate
User-Product
Matrix
ALS Matrix
Factorization
(Spark/EMR)
Candidate
Selection
Score and
Select Top N
(Spark/EMR)
User/Product Factors
Run-time
ranking for
diversity
Event Data
Event Data
v
Scoring user and products
With millions of products and users, scoring every (user, product) pair is prohibitive
Two goals in selecting products to score
• Long tail which have not been discovered
• Products that have an a priori high purchase rate (popular)
~
v
Trade-off popularity and discovery in the tail
We start with simple stratified sampling
For each user, score N products
Sample h products from Head
Sample t products from tail
N ~ 10000
h ~ 3000
t ~7000
v
Tuning Spark For ALS
Understanding Spark execution model and its implementation of ALS helps
• Training is communication heavy1
, set partitions <= #CPU cores
• Scoring is memory intensive
• Broad guidelines2
• Limit executor memory to 64GB
• 5 cores per executor
• Set executors based on data size
1 - http://apache-spark-user-list.1001560.n3.nabble.com/Error-No-space-left-on-device-tp9887p9896.html
2 - http://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-1/
v
A/B Test Setup
Generate
User-Product
Matrix
ALS
(Spark/EMR)
Score and
Select Top N
(Spark/EMR)
User/Product Factors
Run-time
diversity ranking
Candidate
Selection
Event Data
Event Data
Weekly for past N
months data
Weekly for users with
recent activity
v
A/B Test Results
• Statistically significant increases
• Items per order
• GMV per order
• Total product sales spread over more
categories
v
Ok, we have a recommendation system
Where do we go from here?
v
What else do you do with user and product factors?
Score (user, product) pair on demand
Get Top N similar users
Get Top N similar product
As features in other models
v
Products similar to “Haigs Spicy Hummus"
More “Spicy Hummus”
Spicy Salsas
Generated using Approximate Nearest Neighbor
(“annoy” from Spotify)
v
What next
• Make recommendations more contextual
• Explain recommendations (“Because you did X”)
vMitigating the effect of out of stocks
v
• what are out of stocks
• why do they happen
• how data science helps mitigate effects
v
Out of stocks - Customer Context
“Deliver Ice Cream from Whole Foods
Market SOMA at 8 pm tomorrow”
v
Online services
Supply InfiniteLimited
Fulfillment
Immediate
Future
v
Traditional E-commerce
• Manage inventory in warehouses optimized for quick
fulfillment
• Customers only specify the “What”
• Disallow users from ordering out of stock products
• Set expectations
• “3 day shipping” but will ship in 10 business days
v
On-demand delivery from local retailers
• Shoppers navigate a complex environment where products
• may have run out
• may be misplaced
• may be damaged
• Customers specify “What”, “When” and “Where from”
• Improvise under uncertainty
v
Customers
Advertisers

(brands)
Stores

(Retailers)
lose revenue and
trust of customers
Everybody loses when out of stocks happen
• don’t get exactly what
they want
• must contemplate
and/or communicate
replacements
lose revenue and
trust of customers
• waste time searching for
items that aren’t in store
• context switch to
searching and
communicating
replacements
Shoppers
v
Out of stock rate - an illustration
v
v
A probable solution
Do not show or allow customers to order items
that are currently out of stock
v
A probable (but terrible) solution
• Customers really know these stores
• “Missing” items is seen as a sign of an unreliable catalog/service
• May have been out of stock this morning but could be available when the
order is fulfilled
• Sets up negative spirals
“I was there over the
weekend. Please check behind
the cheeses aisle”
“Are you telling
me they don’t carry
strawberries?”
v
Solution that works reasonably well
• Shoppers can see Instacart recommended
replacements while shopping in the store
• Customers may also specify or choose from
recommended replacements
• Relatively more flexibility with groceries
• Some services offer to cancel the order if
an item isn’t available
v
Instacart Recommended Replacements
Flavor PackingSizeBrand Price
• Several product attributes matter
• Context matters, might benefit from personalization
• Must scale to millions of products
• Not always symmetric
• May be ok to replace X with gluten free X but not the other way around
Diet
Info
v
• Shoppers are trained to pick replacements
• But shoppers can benefit from algorithmic suggestions
• Many unfamiliar products in a vast catalog
• Validation for common products
• Finding replacements fast improves operational efficiency
Replacement Recommendations for Shoppers
v
• Customers can specify replacements while placing the order
• Can choose to communicate with the shopper in store to verify
Replacement Recommendations for Customers
v
What could we do if could predict item availability?
Customer
location
Nearest
store
Farther, but
better availability
Controlling for retailer and quality,
customer is indifferent to physical location
v
The Item Availability Prediction
Probability( Item in store | time, context)
What is probability that an item will be at
the store when the shopper shows up to
look for it?
v
Item Availability as a Classification Problem
TIMESTAMP, ITEM IDENTIFIED, IN STORE?
• Millions of examples from historical data
• Feature Engineering
• historical availability at multiple resolutions
• Eg: time since last “not found” event
• Item attributes
• Eg: perishables restocked differently than personal care
• Temporal Features
v
Training and Scoring
Feature
Extraction
XGBoost
Training
Scoring
Feature
Extraction
Event Data
Event Data
Model
Store
Weekly with over 2 months
of training data
Cache
availability
scores
Score tens of millions of items
every hour
v
Serving and Optimization Layer
Fulfillment
Engine
Order
Fulfillment plan:
Store location, Shopper etc.
Items,
eligible store
locations
Availability
scores
Active in production with an acceptable
trade-off between
fulfillment efficiency and refund rate
v
Whats next
• Leverage model predictions for other features/data products
• Avoid negative feedback loops!
• Biased training data
• only have access to what is ordered through Instacart
• Tighter integrations with retailer data
• Scaling: continue to score a growing catalog at tight SLAs
WE’RE HIRING!


@sharathrao
v
Appendix
v
Offline evaluation
• Ideally we want to evaluate user response to recommendations
• But we will only know this from an live A/B test
• Recall based metrics are an offline proxy (albeit not the best)
• Recall: “Fraction of purchased products covered among Top N
recommendations”
• We only use this for hyper parameter tuning
v
Ensembles
Use different types of evidence and/or product metadata to easily create ensembles
User x Products Purchased
User x Products Viewed
User x Brands Purchased
Model or Linear
Combination
…
v
What better promotes broad-based discovery
vs
v
Online ranking for diversity
“Diversity within sessions, Novelty across sessions”
“Establish trust in a fresh and comprehensive catalog”
“Less is more”
Cached list of
~1000 products
per user
Final list of
<100 products
promote diversity
v
Diversity
Top K products - ranked by score
Rank product categories by their median product score
> > >
v
Weighted sampling for diversity
Sample category in
proportion to score
Within category, sample in
proportion to product score
v
Architecture
Generate
User-Product
Matrix
ALS
(Spark/EMR)
Score and
Select Top N
(Spark/EMR)
User/Product Factors
Run-time
diversity ranking
Candidate
Selection
Event Data
Event Data
v
Out of stocks happen due to uncertainty in several places
Order fulfillment in (distant) future
Cannot hold inventory
Real-time inventory tracking across
thousands of locations isn’t perfect (yet)
Customer might reschedule delivery

More Related Content

What's hot

How to Run Product Discovery Experiments in FinTech
How to Run Product Discovery Experiments in FinTechHow to Run Product Discovery Experiments in FinTech
How to Run Product Discovery Experiments in FinTech
Product School
 
Product Brief - The Primary Artefact by Spotify Product Manager
Product Brief - The Primary Artefact by Spotify Product ManagerProduct Brief - The Primary Artefact by Spotify Product Manager
Product Brief - The Primary Artefact by Spotify Product Manager
Product School
 
Demystifying the Customer Journey Map
Demystifying the Customer Journey MapDemystifying the Customer Journey Map
Demystifying the Customer Journey Map
GetFeedback (by SurveyMonkey)
 
Insights & Advertising on the Growing Instacart Marketplace
Insights & Advertising on the Growing Instacart MarketplaceInsights & Advertising on the Growing Instacart Marketplace
Insights & Advertising on the Growing Instacart Marketplace
Tinuiti
 
7 steps of product discovery
7 steps of product discovery7 steps of product discovery
7 steps of product discovery
Daniil Lanovyi
 
Test for Success: A Guide to A/B Testing on Emails & Landing Pages
Test for Success: A Guide to A/B Testing on Emails & Landing PagesTest for Success: A Guide to A/B Testing on Emails & Landing Pages
Test for Success: A Guide to A/B Testing on Emails & Landing Pages
Optimizely
 
How does instacart work? Slide Share
How does instacart work? Slide Share How does instacart work? Slide Share
How does instacart work? Slide Share
Growcer
 
Measuring and Improving CX as a PM by fmr Twilio Staff PM
Measuring and Improving CX as a PM by fmr Twilio Staff PMMeasuring and Improving CX as a PM by fmr Twilio Staff PM
Measuring and Improving CX as a PM by fmr Twilio Staff PM
Product School
 
How to Build a Product Vision by Spotify Product Manager
How to Build a Product Vision by Spotify Product ManagerHow to Build a Product Vision by Spotify Product Manager
How to Build a Product Vision by Spotify Product Manager
Product School
 
Customer journey mapping
Customer journey mappingCustomer journey mapping
Customer journey mapping
Erik Roscam Abbing
 
Prioritization Method for Every Case by fmr Atlassian Principal PM
Prioritization Method for Every Case by fmr Atlassian Principal PMPrioritization Method for Every Case by fmr Atlassian Principal PM
Prioritization Method for Every Case by fmr Atlassian Principal PM
Product School
 
Customer Journey Mapping
Customer Journey MappingCustomer Journey Mapping
Customer Journey Mapping
Nitten Bbinhhani
 
Customer Journey Map in B2B projects
Customer Journey Map in B2B projectsCustomer Journey Map in B2B projects
Customer Journey Map in B2B projects
SDDMilan
 
Product management 101
Product management 101Product management 101
Product management 101
Rajesh Nerlikar
 
How to Know Your Customers by Amazon Senior Product Manager
 How to Know Your Customers by Amazon Senior Product Manager How to Know Your Customers by Amazon Senior Product Manager
How to Know Your Customers by Amazon Senior Product Manager
Product School
 
Gain Competitive Advantage With Personalization
Gain Competitive Advantage With PersonalizationGain Competitive Advantage With Personalization
Gain Competitive Advantage With Personalization
Jack Nguyen (Hung Tien)
 
Marketplace Product Management by Tal Flanchraych
Marketplace Product Management by Tal FlanchraychMarketplace Product Management by Tal Flanchraych
Marketplace Product Management by Tal Flanchraych
Product School
 
Practical Introduction to A/B Testing
Practical Introduction to A/B TestingPractical Introduction to A/B Testing
Practical Introduction to A/B Testing
Alex Alwan
 
Building Customer Experience
Building Customer ExperienceBuilding Customer Experience
Building Customer Experience
Harsha MV
 

What's hot (20)

How to Run Product Discovery Experiments in FinTech
How to Run Product Discovery Experiments in FinTechHow to Run Product Discovery Experiments in FinTech
How to Run Product Discovery Experiments in FinTech
 
Product Brief - The Primary Artefact by Spotify Product Manager
Product Brief - The Primary Artefact by Spotify Product ManagerProduct Brief - The Primary Artefact by Spotify Product Manager
Product Brief - The Primary Artefact by Spotify Product Manager
 
Demystifying the Customer Journey Map
Demystifying the Customer Journey MapDemystifying the Customer Journey Map
Demystifying the Customer Journey Map
 
Insights & Advertising on the Growing Instacart Marketplace
Insights & Advertising on the Growing Instacart MarketplaceInsights & Advertising on the Growing Instacart Marketplace
Insights & Advertising on the Growing Instacart Marketplace
 
7 steps of product discovery
7 steps of product discovery7 steps of product discovery
7 steps of product discovery
 
Test for Success: A Guide to A/B Testing on Emails & Landing Pages
Test for Success: A Guide to A/B Testing on Emails & Landing PagesTest for Success: A Guide to A/B Testing on Emails & Landing Pages
Test for Success: A Guide to A/B Testing on Emails & Landing Pages
 
How does instacart work? Slide Share
How does instacart work? Slide Share How does instacart work? Slide Share
How does instacart work? Slide Share
 
Measuring and Improving CX as a PM by fmr Twilio Staff PM
Measuring and Improving CX as a PM by fmr Twilio Staff PMMeasuring and Improving CX as a PM by fmr Twilio Staff PM
Measuring and Improving CX as a PM by fmr Twilio Staff PM
 
How to Build a Product Vision by Spotify Product Manager
How to Build a Product Vision by Spotify Product ManagerHow to Build a Product Vision by Spotify Product Manager
How to Build a Product Vision by Spotify Product Manager
 
Customer journey mapping
Customer journey mappingCustomer journey mapping
Customer journey mapping
 
Product vision
Product visionProduct vision
Product vision
 
Prioritization Method for Every Case by fmr Atlassian Principal PM
Prioritization Method for Every Case by fmr Atlassian Principal PMPrioritization Method for Every Case by fmr Atlassian Principal PM
Prioritization Method for Every Case by fmr Atlassian Principal PM
 
Customer Journey Mapping
Customer Journey MappingCustomer Journey Mapping
Customer Journey Mapping
 
Customer Journey Map in B2B projects
Customer Journey Map in B2B projectsCustomer Journey Map in B2B projects
Customer Journey Map in B2B projects
 
Product management 101
Product management 101Product management 101
Product management 101
 
How to Know Your Customers by Amazon Senior Product Manager
 How to Know Your Customers by Amazon Senior Product Manager How to Know Your Customers by Amazon Senior Product Manager
How to Know Your Customers by Amazon Senior Product Manager
 
Gain Competitive Advantage With Personalization
Gain Competitive Advantage With PersonalizationGain Competitive Advantage With Personalization
Gain Competitive Advantage With Personalization
 
Marketplace Product Management by Tal Flanchraych
Marketplace Product Management by Tal FlanchraychMarketplace Product Management by Tal Flanchraych
Marketplace Product Management by Tal Flanchraych
 
Practical Introduction to A/B Testing
Practical Introduction to A/B TestingPractical Introduction to A/B Testing
Practical Introduction to A/B Testing
 
Building Customer Experience
Building Customer ExperienceBuilding Customer Experience
Building Customer Experience
 

Viewers also liked

Instacart_Presentation[1]
Instacart_Presentation[1]Instacart_Presentation[1]
Instacart_Presentation[1]Nishant Saboo
 
Learned Embeddings for Search and Discovery at Instacart
Learned Embeddings for  Search and Discovery at InstacartLearned Embeddings for  Search and Discovery at Instacart
Learned Embeddings for Search and Discovery at Instacart
Sharath Rao
 
PSFK Future of Retail 2015 Report - Summary Presentation
PSFK Future of Retail 2015 Report - Summary PresentationPSFK Future of Retail 2015 Report - Summary Presentation
PSFK Future of Retail 2015 Report - Summary Presentation
PSFK
 
Recommendation Systems @ Instacart
Recommendation Systems @ InstacartRecommendation Systems @ Instacart
Recommendation Systems @ Instacart
Sharath Rao
 
Resume 2.0 for @kymchiho
Resume 2.0 for @kymchihoResume 2.0 for @kymchiho
Resume 2.0 for @kymchiho
Kim Ho
 
Amazon Fresh - Mock Marketing Plan
Amazon Fresh - Mock Marketing PlanAmazon Fresh - Mock Marketing Plan
Amazon Fresh - Mock Marketing PlanHelena Lavieri
 
Coffee Like - презентация компании
Coffee Like - презентация компанииCoffee Like - презентация компании
Coffee Like - презентация компании
Аяз Шабутдинов
 
PSFK Future of Retail 2016 Summary Report
PSFK Future of Retail 2016 Summary ReportPSFK Future of Retail 2016 Summary Report
PSFK Future of Retail 2016 Summary Report
PSFK
 
Solving for X: Why the Future of Business is Experiential
Solving for X: Why the Future of Business is ExperientialSolving for X: Why the Future of Business is Experiential
Solving for X: Why the Future of Business is Experiential
Brian Solis
 
Startup Ideas and Validation
Startup Ideas and ValidationStartup Ideas and Validation
Startup Ideas and Validation
Yevgeniy Brikman
 

Viewers also liked (11)

Instacart_Presentation[1]
Instacart_Presentation[1]Instacart_Presentation[1]
Instacart_Presentation[1]
 
Learned Embeddings for Search and Discovery at Instacart
Learned Embeddings for  Search and Discovery at InstacartLearned Embeddings for  Search and Discovery at Instacart
Learned Embeddings for Search and Discovery at Instacart
 
PSFK Future of Retail 2015 Report - Summary Presentation
PSFK Future of Retail 2015 Report - Summary PresentationPSFK Future of Retail 2015 Report - Summary Presentation
PSFK Future of Retail 2015 Report - Summary Presentation
 
Recommendation Systems @ Instacart
Recommendation Systems @ InstacartRecommendation Systems @ Instacart
Recommendation Systems @ Instacart
 
Resume 2.0 for @kymchiho
Resume 2.0 for @kymchihoResume 2.0 for @kymchiho
Resume 2.0 for @kymchiho
 
Dr.martin luther king
 Dr.martin luther king Dr.martin luther king
Dr.martin luther king
 
Amazon Fresh - Mock Marketing Plan
Amazon Fresh - Mock Marketing PlanAmazon Fresh - Mock Marketing Plan
Amazon Fresh - Mock Marketing Plan
 
Coffee Like - презентация компании
Coffee Like - презентация компанииCoffee Like - презентация компании
Coffee Like - презентация компании
 
PSFK Future of Retail 2016 Summary Report
PSFK Future of Retail 2016 Summary ReportPSFK Future of Retail 2016 Summary Report
PSFK Future of Retail 2016 Summary Report
 
Solving for X: Why the Future of Business is Experiential
Solving for X: Why the Future of Business is ExperientialSolving for X: Why the Future of Business is Experiential
Solving for X: Why the Future of Business is Experiential
 
Startup Ideas and Validation
Startup Ideas and ValidationStartup Ideas and Validation
Startup Ideas and Validation
 

Similar to Data Science @ Instacart

WrangleConf 2017 - Lessons from Integrating ML models into Data Products
WrangleConf 2017 - Lessons from Integrating ML models into Data ProductsWrangleConf 2017 - Lessons from Integrating ML models into Data Products
WrangleConf 2017 - Lessons from Integrating ML models into Data Products
Sharath Rao
 
Market Basket Analysis.ppt
Market Basket Analysis.pptMarket Basket Analysis.ppt
Market Basket Analysis.ppt
UshaSeshadri1
 
Data Science and Future of Retail: Beacon analytics
Data Science and Future of Retail: Beacon analyticsData Science and Future of Retail: Beacon analytics
Data Science and Future of Retail: Beacon analytics
Sandeep Sharma IIMK Smart City,IoT,Bigdata,Cloud,BI,DW
 
Lessons From Integrating Machine Learning into Data Products | Wrangle Confer...
Lessons From Integrating Machine Learning into Data Products | Wrangle Confer...Lessons From Integrating Machine Learning into Data Products | Wrangle Confer...
Lessons From Integrating Machine Learning into Data Products | Wrangle Confer...
Cloudera, Inc.
 
Webinar: Increase Conversion With Better Search
Webinar: Increase Conversion With Better SearchWebinar: Increase Conversion With Better Search
Webinar: Increase Conversion With Better Search
Lucidworks
 
Winning Supply Chain in Omnichannel - Trends and Implications
Winning Supply Chain in Omnichannel - Trends and ImplicationsWinning Supply Chain in Omnichannel - Trends and Implications
Winning Supply Chain in Omnichannel - Trends and Implications
Michael Hu
 
Data and Consumer Product Development
Data and Consumer Product DevelopmentData and Consumer Product Development
Data and Consumer Product Development
Gaurav Bhalotia
 
Analytics for the supply chain
Analytics for the supply chain Analytics for the supply chain
Analytics for the supply chain
Saurav Kumar
 
Holly Jolly Holiday Retention Strategies
Holly Jolly Holiday Retention StrategiesHolly Jolly Holiday Retention Strategies
Holly Jolly Holiday Retention Strategies
WhatConts
 
Big data certification training mumbai
Big data certification training mumbaiBig data certification training mumbai
Big data certification training mumbai
TejaspathiLV
 
Best data science courses in pune
Best data science courses in puneBest data science courses in pune
Best data science courses in pune
prathyusha1234
 
best online data science courses
best online data science coursesbest online data science courses
best online data science courses
prathyusha1234
 
Top data science institutes in hyderabad
Top data science institutes in hyderabadTop data science institutes in hyderabad
Top data science institutes in hyderabad
prathyusha1234
 
Agile supply chain
Agile supply chainAgile supply chain
Agile supply chain
Archil Nasrashvili
 
Would this clothing fit me
Would this clothing fit meWould this clothing fit me
Would this clothing fit me
Rishabh Misra
 
Retail & Warehouse transactions, design and analytic for FMCG, Grocery and fr...
Retail & Warehouse transactions, design and analytic for FMCG, Grocery and fr...Retail & Warehouse transactions, design and analytic for FMCG, Grocery and fr...
Retail & Warehouse transactions, design and analytic for FMCG, Grocery and fr...
SIBM Bangalore
 
Predictive Analytics for Customer Targeting: A Telemarketing Banking Example
Predictive Analytics for Customer Targeting: A Telemarketing Banking ExamplePredictive Analytics for Customer Targeting: A Telemarketing Banking Example
Predictive Analytics for Customer Targeting: A Telemarketing Banking Example
Pedro Ecija Serrano
 
Bmgt 411 chapter_4
Bmgt 411 chapter_4Bmgt 411 chapter_4
Bmgt 411 chapter_4
Chris Lovett
 
Restaurant analytics pdf
Restaurant analytics pdfRestaurant analytics pdf
Restaurant analytics pdf
Manthan Solutions
 
Chapter 4 customer buying behavior (original)
Chapter 4 customer buying behavior (original)Chapter 4 customer buying behavior (original)
Chapter 4 customer buying behavior (original)
jayvee galicia
 

Similar to Data Science @ Instacart (20)

WrangleConf 2017 - Lessons from Integrating ML models into Data Products
WrangleConf 2017 - Lessons from Integrating ML models into Data ProductsWrangleConf 2017 - Lessons from Integrating ML models into Data Products
WrangleConf 2017 - Lessons from Integrating ML models into Data Products
 
Market Basket Analysis.ppt
Market Basket Analysis.pptMarket Basket Analysis.ppt
Market Basket Analysis.ppt
 
Data Science and Future of Retail: Beacon analytics
Data Science and Future of Retail: Beacon analyticsData Science and Future of Retail: Beacon analytics
Data Science and Future of Retail: Beacon analytics
 
Lessons From Integrating Machine Learning into Data Products | Wrangle Confer...
Lessons From Integrating Machine Learning into Data Products | Wrangle Confer...Lessons From Integrating Machine Learning into Data Products | Wrangle Confer...
Lessons From Integrating Machine Learning into Data Products | Wrangle Confer...
 
Webinar: Increase Conversion With Better Search
Webinar: Increase Conversion With Better SearchWebinar: Increase Conversion With Better Search
Webinar: Increase Conversion With Better Search
 
Winning Supply Chain in Omnichannel - Trends and Implications
Winning Supply Chain in Omnichannel - Trends and ImplicationsWinning Supply Chain in Omnichannel - Trends and Implications
Winning Supply Chain in Omnichannel - Trends and Implications
 
Data and Consumer Product Development
Data and Consumer Product DevelopmentData and Consumer Product Development
Data and Consumer Product Development
 
Analytics for the supply chain
Analytics for the supply chain Analytics for the supply chain
Analytics for the supply chain
 
Holly Jolly Holiday Retention Strategies
Holly Jolly Holiday Retention StrategiesHolly Jolly Holiday Retention Strategies
Holly Jolly Holiday Retention Strategies
 
Big data certification training mumbai
Big data certification training mumbaiBig data certification training mumbai
Big data certification training mumbai
 
Best data science courses in pune
Best data science courses in puneBest data science courses in pune
Best data science courses in pune
 
best online data science courses
best online data science coursesbest online data science courses
best online data science courses
 
Top data science institutes in hyderabad
Top data science institutes in hyderabadTop data science institutes in hyderabad
Top data science institutes in hyderabad
 
Agile supply chain
Agile supply chainAgile supply chain
Agile supply chain
 
Would this clothing fit me
Would this clothing fit meWould this clothing fit me
Would this clothing fit me
 
Retail & Warehouse transactions, design and analytic for FMCG, Grocery and fr...
Retail & Warehouse transactions, design and analytic for FMCG, Grocery and fr...Retail & Warehouse transactions, design and analytic for FMCG, Grocery and fr...
Retail & Warehouse transactions, design and analytic for FMCG, Grocery and fr...
 
Predictive Analytics for Customer Targeting: A Telemarketing Banking Example
Predictive Analytics for Customer Targeting: A Telemarketing Banking ExamplePredictive Analytics for Customer Targeting: A Telemarketing Banking Example
Predictive Analytics for Customer Targeting: A Telemarketing Banking Example
 
Bmgt 411 chapter_4
Bmgt 411 chapter_4Bmgt 411 chapter_4
Bmgt 411 chapter_4
 
Restaurant analytics pdf
Restaurant analytics pdfRestaurant analytics pdf
Restaurant analytics pdf
 
Chapter 4 customer buying behavior (original)
Chapter 4 customer buying behavior (original)Chapter 4 customer buying behavior (original)
Chapter 4 customer buying behavior (original)
 

Recently uploaded

The+Prospects+of+E-Commerce+in+China.pptx
The+Prospects+of+E-Commerce+in+China.pptxThe+Prospects+of+E-Commerce+in+China.pptx
The+Prospects+of+E-Commerce+in+China.pptx
laozhuseo02
 
JAVIER LASA-EXPERIENCIA digital 1986-2024.pdf
JAVIER LASA-EXPERIENCIA digital 1986-2024.pdfJAVIER LASA-EXPERIENCIA digital 1986-2024.pdf
JAVIER LASA-EXPERIENCIA digital 1986-2024.pdf
Javier Lasa
 
This 7-second Brain Wave Ritual Attracts Money To You.!
This 7-second Brain Wave Ritual Attracts Money To You.!This 7-second Brain Wave Ritual Attracts Money To You.!
This 7-second Brain Wave Ritual Attracts Money To You.!
nirahealhty
 
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
3ipehhoa
 
guildmasters guide to ravnica Dungeons & Dragons 5...
guildmasters guide to ravnica Dungeons & Dragons 5...guildmasters guide to ravnica Dungeons & Dragons 5...
guildmasters guide to ravnica Dungeons & Dragons 5...
Rogerio Filho
 
一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理
一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理
一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理
ufdana
 
Internet-Security-Safeguarding-Your-Digital-World (1).pptx
Internet-Security-Safeguarding-Your-Digital-World (1).pptxInternet-Security-Safeguarding-Your-Digital-World (1).pptx
Internet-Security-Safeguarding-Your-Digital-World (1).pptx
VivekSinghShekhawat2
 
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024
APNIC
 
原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样
原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样
原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样
3ipehhoa
 
BASIC C++ lecture NOTE C++ lecture 3.pptx
BASIC C++ lecture NOTE C++ lecture 3.pptxBASIC C++ lecture NOTE C++ lecture 3.pptx
BASIC C++ lecture NOTE C++ lecture 3.pptx
natyesu
 
Comptia N+ Standard Networking lesson guide
Comptia N+ Standard Networking lesson guideComptia N+ Standard Networking lesson guide
Comptia N+ Standard Networking lesson guide
GTProductions1
 
Latest trends in computer networking.pptx
Latest trends in computer networking.pptxLatest trends in computer networking.pptx
Latest trends in computer networking.pptx
JungkooksNonexistent
 
1.Wireless Communication System_Wireless communication is a broad term that i...
1.Wireless Communication System_Wireless communication is a broad term that i...1.Wireless Communication System_Wireless communication is a broad term that i...
1.Wireless Communication System_Wireless communication is a broad term that i...
JeyaPerumal1
 
How to Use Contact Form 7 Like a Pro.pptx
How to Use Contact Form 7 Like a Pro.pptxHow to Use Contact Form 7 Like a Pro.pptx
How to Use Contact Form 7 Like a Pro.pptx
Gal Baras
 
一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理
一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理
一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理
keoku
 
History+of+E-commerce+Development+in+China-www.cfye-commerce.shop
History+of+E-commerce+Development+in+China-www.cfye-commerce.shopHistory+of+E-commerce+Development+in+China-www.cfye-commerce.shop
History+of+E-commerce+Development+in+China-www.cfye-commerce.shop
laozhuseo02
 
test test test test testtest test testtest test testtest test testtest test ...
test test  test test testtest test testtest test testtest test testtest test ...test test  test test testtest test testtest test testtest test testtest test ...
test test test test testtest test testtest test testtest test testtest test ...
Arif0071
 
1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样
1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样
1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样
3ipehhoa
 
Multi-cluster Kubernetes Networking- Patterns, Projects and Guidelines
Multi-cluster Kubernetes Networking- Patterns, Projects and GuidelinesMulti-cluster Kubernetes Networking- Patterns, Projects and Guidelines
Multi-cluster Kubernetes Networking- Patterns, Projects and Guidelines
Sanjeev Rampal
 
一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理
一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理
一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理
eutxy
 

Recently uploaded (20)

The+Prospects+of+E-Commerce+in+China.pptx
The+Prospects+of+E-Commerce+in+China.pptxThe+Prospects+of+E-Commerce+in+China.pptx
The+Prospects+of+E-Commerce+in+China.pptx
 
JAVIER LASA-EXPERIENCIA digital 1986-2024.pdf
JAVIER LASA-EXPERIENCIA digital 1986-2024.pdfJAVIER LASA-EXPERIENCIA digital 1986-2024.pdf
JAVIER LASA-EXPERIENCIA digital 1986-2024.pdf
 
This 7-second Brain Wave Ritual Attracts Money To You.!
This 7-second Brain Wave Ritual Attracts Money To You.!This 7-second Brain Wave Ritual Attracts Money To You.!
This 7-second Brain Wave Ritual Attracts Money To You.!
 
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
急速办(bedfordhire毕业证书)英国贝德福特大学毕业证成绩单原版一模一样
 
guildmasters guide to ravnica Dungeons & Dragons 5...
guildmasters guide to ravnica Dungeons & Dragons 5...guildmasters guide to ravnica Dungeons & Dragons 5...
guildmasters guide to ravnica Dungeons & Dragons 5...
 
一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理
一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理
一比一原版(CSU毕业证)加利福尼亚州立大学毕业证成绩单专业办理
 
Internet-Security-Safeguarding-Your-Digital-World (1).pptx
Internet-Security-Safeguarding-Your-Digital-World (1).pptxInternet-Security-Safeguarding-Your-Digital-World (1).pptx
Internet-Security-Safeguarding-Your-Digital-World (1).pptx
 
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024
 
原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样
原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样
原版仿制(uob毕业证书)英国伯明翰大学毕业证本科学历证书原版一模一样
 
BASIC C++ lecture NOTE C++ lecture 3.pptx
BASIC C++ lecture NOTE C++ lecture 3.pptxBASIC C++ lecture NOTE C++ lecture 3.pptx
BASIC C++ lecture NOTE C++ lecture 3.pptx
 
Comptia N+ Standard Networking lesson guide
Comptia N+ Standard Networking lesson guideComptia N+ Standard Networking lesson guide
Comptia N+ Standard Networking lesson guide
 
Latest trends in computer networking.pptx
Latest trends in computer networking.pptxLatest trends in computer networking.pptx
Latest trends in computer networking.pptx
 
1.Wireless Communication System_Wireless communication is a broad term that i...
1.Wireless Communication System_Wireless communication is a broad term that i...1.Wireless Communication System_Wireless communication is a broad term that i...
1.Wireless Communication System_Wireless communication is a broad term that i...
 
How to Use Contact Form 7 Like a Pro.pptx
How to Use Contact Form 7 Like a Pro.pptxHow to Use Contact Form 7 Like a Pro.pptx
How to Use Contact Form 7 Like a Pro.pptx
 
一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理
一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理
一比一原版(SLU毕业证)圣路易斯大学毕业证成绩单专业办理
 
History+of+E-commerce+Development+in+China-www.cfye-commerce.shop
History+of+E-commerce+Development+in+China-www.cfye-commerce.shopHistory+of+E-commerce+Development+in+China-www.cfye-commerce.shop
History+of+E-commerce+Development+in+China-www.cfye-commerce.shop
 
test test test test testtest test testtest test testtest test testtest test ...
test test  test test testtest test testtest test testtest test testtest test ...test test  test test testtest test testtest test testtest test testtest test ...
test test test test testtest test testtest test testtest test testtest test ...
 
1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样
1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样
1比1复刻(bath毕业证书)英国巴斯大学毕业证学位证原版一模一样
 
Multi-cluster Kubernetes Networking- Patterns, Projects and Guidelines
Multi-cluster Kubernetes Networking- Patterns, Projects and GuidelinesMulti-cluster Kubernetes Networking- Patterns, Projects and Guidelines
Multi-cluster Kubernetes Networking- Patterns, Projects and Guidelines
 
一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理
一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理
一比一原版(LBS毕业证)伦敦商学院毕业证成绩单专业办理
 

Data Science @ Instacart

  • 1. Data Science @ Instacart Sharath Rao Data Scientist / Manager Search and Discovery Collaborators: Angadh Singh and Shishir Prasad
  • 2. v The Instacart Value Proposition Groceries from stores you love delivered to your doorstep in as little as an hour + + + =
  • 3. v Customer Experience Select a
 Store Shop for Groceries Checkout Select Delivery Time Delivered to Doorstep
  • 4. v Shopper Experience Accept Order Find the Groceries Out for Delivery Delivered to Doorstep Scan Barcode
  • 5. v Four Sided Marketplace Customers Shoppers Products
 (Advertisers) Search Advertising Shopping Delivery Customer Service Inventory Picking Loyalty Stores
 (Retailers)
  • 6. v Two topics today A Recommendation System for Discovery Using Data Science for out of stock mitigation
  • 7. v Online grocery vs Traditional e-commerce Week 3Week 2 Online Grocery Week 1 Traditional e-commerce
  • 8. v Grocery Shopping in “Low Dimensional Space” Search Restock Explore + + =
  • 9. v Why personalization at Instacart Your storeEverybody’s store
  • 10. v Repeat purchases increase LTV of recommendations $5.49 $549 Today A year later 1 +….+ 100 $549 $549
  • 11. vDifferent recommendation systems address different needs
  • 12. v Personalized Top N recommendations Promote broad-based discovery in a dynamic catalog Including from stores customers may have never shopped
  • 13. v Run out of X? Rank products by repurchase probability
  • 14. v Personalized recommendations of new products when customers seek out what is new out there Also addresses product cold start problems
  • 15. v Replacement Product Recommendations Mitigate adverse impact of last-minute out of stocks
  • 16. v “Frequently bought with” Recommendations Not necessarily consumed together Help customers shop for complementary products and try alternatives Probably consumed together
  • 17. vPersonalized Top N Recommendations
  • 18. v Learning from feedback Traditionally collaborative filtering used explicit feedback to predict ratings There may still bias in whether the user chooses to rate Explicit Feedback Implicit Feedback
  • 19. v Learning from Explicit Feedback • Explicit feedback may be more reliable but there is much less of it • Less reliable if users rate based on aspirations instead of true preferences vs
  • 20. v Implicit Feedback - trade-off quality and quantity Strengthofevidence Number of Events
  • 21. v Architecture Event Data Score and Select Top N (Spark/EMR) User/Product Factors Event Data Run-time ranking for diversity Candidate Selection ALS (Spark/EMR) Generate User-Product Matrix
  • 22. v A Matrix Factorization Formulation for Implicit Feedback N Products MUsers 1 - - 9 - - - 3 20 User Product Matrix R; (M x N) 1 0 0 1 0 0 0 1 1binary preferences Preference Matrix R; (M x N) “Collaborative Filtering for Implicit Feedback” - Hu et. al
  • 23. v A Matrix Factorization Formulation for Implicit Feedback ~ Y XT Product Factors (k x N) User Factors (M x k) 1 0 0 1 0 0 0 1 1 x Preference Matrix R; (M x N)
  • 24. v Matrix Factorization from Implicit Feedback - The Intuition #Purchases Preference p Confidence c 0 0 Low 1 1 Low >>1 1 High • Confidence increases linearly with purchases r • c = 1 + alpha * r • alpha controls the marginal rate of learning from user purchases • Key questions • How should the unobserved events be treated • How should one trade-off observed and the unobserved
  • 25. v Regularized Weighted Squared Loss Confidence User Factors Matrix Product Factors Matrix Preference Matrix Regularization Solve using Alternating Least Squares
  • 26. v Architecture Generate User-Product Matrix ALS (Spark/EMR) Score and Select Top N (Spark/EMR) User/Product Factors Run-time ranking for diversity Candidate Selection Event Data Event Data
  • 27. v Spark ALS Hyper-parameter Tuning • rank k - diminishing returns after 150 • alpha - controls rate of learning from observed events • iterations - ALS tends to converge within 5, seldom more than 10 • lambda - regularization parameter
  • 28. v Architecture Generate User-Product Matrix ALS Matrix Factorization (Spark/EMR) Candidate Selection Score and Select Top N (Spark/EMR) User/Product Factors Run-time ranking for diversity Event Data Event Data
  • 29. v Scoring user and products With millions of products and users, scoring every (user, product) pair is prohibitive Two goals in selecting products to score • Long tail which have not been discovered • Products that have an a priori high purchase rate (popular) ~
  • 30. v Trade-off popularity and discovery in the tail We start with simple stratified sampling For each user, score N products Sample h products from Head Sample t products from tail N ~ 10000 h ~ 3000 t ~7000
  • 31. v Tuning Spark For ALS Understanding Spark execution model and its implementation of ALS helps • Training is communication heavy1 , set partitions <= #CPU cores • Scoring is memory intensive • Broad guidelines2 • Limit executor memory to 64GB • 5 cores per executor • Set executors based on data size 1 - http://apache-spark-user-list.1001560.n3.nabble.com/Error-No-space-left-on-device-tp9887p9896.html 2 - http://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-1/
  • 32. v A/B Test Setup Generate User-Product Matrix ALS (Spark/EMR) Score and Select Top N (Spark/EMR) User/Product Factors Run-time diversity ranking Candidate Selection Event Data Event Data Weekly for past N months data Weekly for users with recent activity
  • 33. v A/B Test Results • Statistically significant increases • Items per order • GMV per order • Total product sales spread over more categories
  • 34. v Ok, we have a recommendation system Where do we go from here?
  • 35. v What else do you do with user and product factors? Score (user, product) pair on demand Get Top N similar users Get Top N similar product As features in other models
  • 36. v Products similar to “Haigs Spicy Hummus" More “Spicy Hummus” Spicy Salsas Generated using Approximate Nearest Neighbor (“annoy” from Spotify)
  • 37. v What next • Make recommendations more contextual • Explain recommendations (“Because you did X”)
  • 38. vMitigating the effect of out of stocks
  • 39. v • what are out of stocks • why do they happen • how data science helps mitigate effects
  • 40. v Out of stocks - Customer Context “Deliver Ice Cream from Whole Foods Market SOMA at 8 pm tomorrow”
  • 42. v Traditional E-commerce • Manage inventory in warehouses optimized for quick fulfillment • Customers only specify the “What” • Disallow users from ordering out of stock products • Set expectations • “3 day shipping” but will ship in 10 business days
  • 43. v On-demand delivery from local retailers • Shoppers navigate a complex environment where products • may have run out • may be misplaced • may be damaged • Customers specify “What”, “When” and “Where from” • Improvise under uncertainty
  • 44. v Customers Advertisers
 (brands) Stores
 (Retailers) lose revenue and trust of customers Everybody loses when out of stocks happen • don’t get exactly what they want • must contemplate and/or communicate replacements lose revenue and trust of customers • waste time searching for items that aren’t in store • context switch to searching and communicating replacements Shoppers
  • 45. v Out of stock rate - an illustration
  • 46. v
  • 47. v A probable solution Do not show or allow customers to order items that are currently out of stock
  • 48. v A probable (but terrible) solution • Customers really know these stores • “Missing” items is seen as a sign of an unreliable catalog/service • May have been out of stock this morning but could be available when the order is fulfilled • Sets up negative spirals “I was there over the weekend. Please check behind the cheeses aisle” “Are you telling me they don’t carry strawberries?”
  • 49. v Solution that works reasonably well • Shoppers can see Instacart recommended replacements while shopping in the store • Customers may also specify or choose from recommended replacements • Relatively more flexibility with groceries • Some services offer to cancel the order if an item isn’t available
  • 50. v Instacart Recommended Replacements Flavor PackingSizeBrand Price • Several product attributes matter • Context matters, might benefit from personalization • Must scale to millions of products • Not always symmetric • May be ok to replace X with gluten free X but not the other way around Diet Info
  • 51. v • Shoppers are trained to pick replacements • But shoppers can benefit from algorithmic suggestions • Many unfamiliar products in a vast catalog • Validation for common products • Finding replacements fast improves operational efficiency Replacement Recommendations for Shoppers
  • 52. v • Customers can specify replacements while placing the order • Can choose to communicate with the shopper in store to verify Replacement Recommendations for Customers
  • 53. v What could we do if could predict item availability? Customer location Nearest store Farther, but better availability Controlling for retailer and quality, customer is indifferent to physical location
  • 54. v The Item Availability Prediction Probability( Item in store | time, context) What is probability that an item will be at the store when the shopper shows up to look for it?
  • 55. v Item Availability as a Classification Problem TIMESTAMP, ITEM IDENTIFIED, IN STORE? • Millions of examples from historical data • Feature Engineering • historical availability at multiple resolutions • Eg: time since last “not found” event • Item attributes • Eg: perishables restocked differently than personal care • Temporal Features
  • 56. v Training and Scoring Feature Extraction XGBoost Training Scoring Feature Extraction Event Data Event Data Model Store Weekly with over 2 months of training data Cache availability scores Score tens of millions of items every hour
  • 57. v Serving and Optimization Layer Fulfillment Engine Order Fulfillment plan: Store location, Shopper etc. Items, eligible store locations Availability scores Active in production with an acceptable trade-off between fulfillment efficiency and refund rate
  • 58. v Whats next • Leverage model predictions for other features/data products • Avoid negative feedback loops! • Biased training data • only have access to what is ordered through Instacart • Tighter integrations with retailer data • Scaling: continue to score a growing catalog at tight SLAs
  • 61. v Offline evaluation • Ideally we want to evaluate user response to recommendations • But we will only know this from an live A/B test • Recall based metrics are an offline proxy (albeit not the best) • Recall: “Fraction of purchased products covered among Top N recommendations” • We only use this for hyper parameter tuning
  • 62. v Ensembles Use different types of evidence and/or product metadata to easily create ensembles User x Products Purchased User x Products Viewed User x Brands Purchased Model or Linear Combination …
  • 63. v What better promotes broad-based discovery vs
  • 64. v Online ranking for diversity “Diversity within sessions, Novelty across sessions” “Establish trust in a fresh and comprehensive catalog” “Less is more” Cached list of ~1000 products per user Final list of <100 products promote diversity
  • 65. v Diversity Top K products - ranked by score Rank product categories by their median product score > > >
  • 66. v Weighted sampling for diversity Sample category in proportion to score Within category, sample in proportion to product score
  • 67. v Architecture Generate User-Product Matrix ALS (Spark/EMR) Score and Select Top N (Spark/EMR) User/Product Factors Run-time diversity ranking Candidate Selection Event Data Event Data
  • 68. v Out of stocks happen due to uncertainty in several places Order fulfillment in (distant) future Cannot hold inventory Real-time inventory tracking across thousands of locations isn’t perfect (yet) Customer might reschedule delivery