SlideShare a Scribd company logo
Learned Embeddings for
Search and Discovery at Instacart
Sharath Rao
Data Scientist / Manager
Search and Discovery
Collaborators: Angadh Singh
v
Talk Outline
• Quick overview of Search and Discovery at Instacart
• Word2vec for product recommendations
• Extending Word2vec embeddings to improve search ranking
v
About me
• Worked on several ML products in catalog, search and
personalization at Instacart since 2015
• Currently leading full stack search and discovery team
@sharathrao
v
Search and Discovery@ Instacart
Help customers find what they are looking for
and discover what they might like
v
Grocery Shopping in “Low Dimensional Space”
Search
Restock
Discover
v
“Here is what I want”
Query has easily accessible
information content
“Here I am, what next?”
Context is the query, and
recommendations vary
as contexts do
Search Discovery
v
Entities we try to model
items
products
aisles
departments
retailers
queries
customers
brands
Most of our data products are
about modeling relationships
between them
(part of which is learning embeddings)
v
Common paradigm for search/discovery problems
1st phase: Candidate Generation
2st phase: Reranking
v
• Select top 100s from among potentially millions
• Must be fast and simple
• often not even a learned model
• Recall oriented
Candidate Generation
v
• Ranks fewer products, but users richer models/features
• Tuned towards high precision
• Often happens in real-time
Reranking
vWord2vec for Product Recommendations
v
“Frequently bought with” recommendations
Not necessarily
consumed together
Help customers shop for the next item
Probably
consumed together
v
Quick Tour of Word2vec
(simplified, with a view to develop some intuition)
v
• We need to represent words as features in a model/task
• Naive representation => one hot encoding
• Problem: Every word is dissimilar to every other word => not intuitive
One-hot encodings are good, but not good enough
v
• stadium, field and restaurant are in
some sense similar
• stadium and field are more similar
than stadium and restaurant
• game has another meaning/sense that
is similar to stadium/field/restaurant
What one-hot encodings fail to capture
Observation 1: “game at the stadium”
Observation 2: “game at the field”
Observation 3: “ate at the restaurant”
Observation 4: “met at the game”
Data What models must learn
None of these are easy to learn with one-hot encoded representations without
supplementing with hand-engineered features
vYou shall know a word by the company it keeps - Firth, J.R., 1957
Core motivation behind semantic word representations
v
• Learns a feature vector per word
• for 1M vocabulary and 100 dimensional vector, we learn 1M vectors => 100M numbers
• Vectors must be such that words appearing in similar contexts are closer than any random pair of words
• Must work with unlabeled training data
• Must scale to large (unlabeled) datasets
• Embedding space itself must be general enough that word representations are broadly applicable
Word2vec is a scalable way to learn semantic word representations
vWord2vec beyond text and words
v
On graphs where random walks are the sequences
v
On songs where plays are the sequences
v
Even the emojis weren’t spared!
v
You shall know a
word product
by the company it keeps by what its purchased with
v
We applied word2vec to purchase sequences
Typical purchasing session contains tens of cart adds
Sequences of products added to cart are the ‘sentences’
v
“Frequently Bought With” Recommendations
Extract
Training
Sequences
Learn
word2vec
representations
Eliminate
substitute
products
Event Data
Approximate
Nearest Neighbors
Cache
recommendations
v
Word2vec for product recommendations
Surfaces
complementary
products
v
Next step is to make recommendations contextual
Not ideal if
already shopped
for sugar recently
inappropriate if
allergic to walnuts
Not if favorite brand
of butter isn’t
otherwise popular
v
We see word2vec recommendations as a candidate generation step
vWord2vec for Search Ranking
v
The Search Ranking Problem
In response to a query, rank products in the order that
maximizes the probability of purchase
v
The Search Ranking Problem
Learning to
Rank Model
Candidate Generation
Boolean matching
BM25 ranking
Synonym/Semantic Expansion
(Online) Reranking
Matching features
Historical aggregates
Personalization features
v
Search Ranking - Training
Generate
training
features
Label Generation
with Implicit
Feedback
Event Data
Learning to Rank
Training
Model
Repository
v
Search Ranking - Scoring
Online reranker
query
Model
Repository
Reranked
products
Top N
products
Top N
productsprocessed query
Final
ranking
v
Features for Learning to Rank
Historical aggregates => normalized purchase counts
High coverage
Low precision
Low coverage
More precision
[query, product]
[query, product, user]
[query, brand]
[query, aisle]
[product]
But historical aggregates alone are not enough.
Because sparsity and cold start.
[user, brand]
v
Learning word2vec embeddings from search logs
• Representations learnt from wikipedia or petabyte text aren’t ideal for Instacart
• No constraint that word2vec models must be learnt on temporal or spatial sequences
• We constructed training data for word2vec from search logs
v
Training Contexts from Search Logs
• Create contexts that are observed and desirable
<query1> <purchased description of converted product1>
<query2> <purchased description of converted product2>
..
..
• Learn embeddings for each word in the unified vocabulary of query
tokens and catalog product descriptions
v
Examples of nearest neighbors in embeddings space
OK, so now we have word representations
Matching features require product features in the same space
bread
orowheat
bonaparte
zissel
ryebread
raisins
sunmade
monukka
raisinettes
fruitsource
mushrooms
cremini
portabello
shitake
v
Deriving other features from word embeddings
• One option is to create contexts with product identifiers in training sequences
• Promising but we will have a product cold start problem
Use learnt word embeddings to derive features for other
entities such as products, brand names, queries and users etc.
v
Simple averaging of word representations works well
Product
Average embeddings
of words in product
description
Brands:
Average embeddings of
products sold by the
brand
User
Average embeddings of
products purchased by
the user
Aisles/Departments
Average embeddings of
products in the aisle/
department
Word
Representation
Learnt from converted
search logs
Query
Average embeddings
of words in query
v
Wait, so we have 2 different representations for products?
Embeddings learnt from
purchase sequences
Embeddings derived from words
from search logs
Products are similar if
they bought together
Products are similar if
their descriptions are
semantically similar
v
Product representations in different embedding spaces
Embeddings learnt from
purchase sequences
Products are similar if
they bought together
v
Product representations in different embedding spaces
Embeddings derived from words
from search logs
Products are similar if
their descriptions are
semantically similar
v
Examples of nearest neighbors for products
Cinnamon Toast Crunch CerealGolden Grahams Cereal
Not much common between the product names
v
Examples of nearest neighbors for brands
v
Using word2vec features in search ranking
[query, product]
[query, brand]
[query, aisle]
[query, department]
[user, product]
[user, brand]
[user, aisle]
[user, department]
• Construct similarity scores between different entities as features
Matching Features
Personalization
Features
v
We saw significant improvement with word2vec features
98.0%
99.0%
100.0%
101.0%
102.0%
103.0%
104.0%
AUC Recall@10
Relative(Improvement(with(word2vec(features
Baseline With:word2vec:features
v
word2vec features rank high among top features
[query,(aisle](word2vec
[query,(product](historical(aggregate
[query,(department](word2vec
[query,(product](historical(aggregate
[user,(product](word2vec
lda(model(2
lda(model(1(
query(length
[user,(brand](word2vec
[query,(product](word2vec
bm25
position
XGBoost
logistic loss
with early stopping
vOther contextual recommendation problems
v
Broad based discovery oriented recommendations
Including from stores customers may have never shopped from
v
Run out of X?
Rank products by
repurchase probability
v
Introduce customers to products new in the catalog
Also mitigates product cold start problems
v
Replacement Product Recommendations
Mitigate adverse impact of
last-minute out of stocks
v
Data Products in Search and Discovery
• query autocorrection
• query spell correction
• query expansion
• deep matching/document expansion
• search ranking
• search advertising
• Substitute/replacements products
• Frequently bought with products
• Next basket recommendations
• Guided discovery
• Interpretable recommendations
Search Discovery
v
Thank you!
We are
hiring!
Senior Machine Learning Engineer http://bit.ly/2kzHpcg

More Related Content

What's hot

Boston ML - Architecting Recommender Systems
Boston ML - Architecting Recommender SystemsBoston ML - Architecting Recommender Systems
Boston ML - Architecting Recommender SystemsJames Kirk
 
레코픽(RecoPick)소개서_220823 (1).pptx
레코픽(RecoPick)소개서_220823 (1).pptx레코픽(RecoPick)소개서_220823 (1).pptx
레코픽(RecoPick)소개서_220823 (1).pptxrecopick
 
Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Recommender Systems (Machine Learning Summer School 2014 @ CMU)Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Recommender Systems (Machine Learning Summer School 2014 @ CMU)Xavier Amatriain
 
Matrix Factorization In Recommender Systems
Matrix Factorization In Recommender SystemsMatrix Factorization In Recommender Systems
Matrix Factorization In Recommender SystemsYONG ZHENG
 
Real-Time Recommendations with Hopsworks and OpenSearch - MLOps World 2022
Real-Time Recommendations  with Hopsworks and OpenSearch - MLOps World 2022Real-Time Recommendations  with Hopsworks and OpenSearch - MLOps World 2022
Real-Time Recommendations with Hopsworks and OpenSearch - MLOps World 2022Jim Dowling
 
Automated Hyperparameter Tuning, Scaling and Tracking
Automated Hyperparameter Tuning, Scaling and TrackingAutomated Hyperparameter Tuning, Scaling and Tracking
Automated Hyperparameter Tuning, Scaling and TrackingDatabricks
 
Deep Learning for Recommender Systems RecSys2017 Tutorial
Deep Learning for Recommender Systems RecSys2017 Tutorial Deep Learning for Recommender Systems RecSys2017 Tutorial
Deep Learning for Recommender Systems RecSys2017 Tutorial Alexandros Karatzoglou
 
Search2Vec at OLX Group - Pydata Meetup Berlin
Search2Vec at OLX Group - Pydata Meetup BerlinSearch2Vec at OLX Group - Pydata Meetup Berlin
Search2Vec at OLX Group - Pydata Meetup BerlinMariano Semelman
 
How to build a recommender system?
How to build a recommender system?How to build a recommender system?
How to build a recommender system?blueace
 
Embedded based retrieval in modern search ranking system
Embedded based retrieval in modern search ranking systemEmbedded based retrieval in modern search ranking system
Embedded based retrieval in modern search ranking systemMarsan Ma
 
An introduction to Recommender Systems
An introduction to Recommender SystemsAn introduction to Recommender Systems
An introduction to Recommender SystemsDavid Zibriczky
 
Introduction to Sentiment Analysis
Introduction to Sentiment AnalysisIntroduction to Sentiment Analysis
Introduction to Sentiment AnalysisJaganadh Gopinadhan
 
Word2Vec model to generate synonyms on the fly in Apache Lucene.pdf
Word2Vec model to generate synonyms on the fly in Apache Lucene.pdfWord2Vec model to generate synonyms on the fly in Apache Lucene.pdf
Word2Vec model to generate synonyms on the fly in Apache Lucene.pdfSease
 
Recommender system introduction
Recommender system   introductionRecommender system   introduction
Recommender system introductionLiang Xiang
 
Amazon Product Sentiment review
Amazon Product Sentiment reviewAmazon Product Sentiment review
Amazon Product Sentiment reviewLalit Jain
 
제 18회 보아즈(BOAZ) 빅데이터 컨퍼런스 - [보아酒] : 리뷰 감정분석을 통한 전통주 추천 서비스
제 18회 보아즈(BOAZ) 빅데이터 컨퍼런스 - [보아酒] : 리뷰 감정분석을 통한 전통주 추천 서비스제 18회 보아즈(BOAZ) 빅데이터 컨퍼런스 - [보아酒] : 리뷰 감정분석을 통한 전통주 추천 서비스
제 18회 보아즈(BOAZ) 빅데이터 컨퍼런스 - [보아酒] : 리뷰 감정분석을 통한 전통주 추천 서비스BOAZ Bigdata
 
제 14회 보아즈(BOAZ) 빅데이터 컨퍼런스 - [TweetViz팀] : 카프카와 스파크를 통한 tweetdeck 개발
제 14회 보아즈(BOAZ) 빅데이터 컨퍼런스 - [TweetViz팀] : 카프카와 스파크를 통한 tweetdeck 개발제 14회 보아즈(BOAZ) 빅데이터 컨퍼런스 - [TweetViz팀] : 카프카와 스파크를 통한 tweetdeck 개발
제 14회 보아즈(BOAZ) 빅데이터 컨퍼런스 - [TweetViz팀] : 카프카와 스파크를 통한 tweetdeck 개발BOAZ Bigdata
 
[분석]워드임베딩과 인공신경망을 이용한 개인 맞춤형 레시피 추천
[분석]워드임베딩과 인공신경망을 이용한 개인 맞춤형 레시피 추천[분석]워드임베딩과 인공신경망을 이용한 개인 맞춤형 레시피 추천
[분석]워드임베딩과 인공신경망을 이용한 개인 맞춤형 레시피 추천BOAZ Bigdata
 
Deep Learning for Semantic Search in E-commerce​
Deep Learning for Semantic Search in E-commerce​Deep Learning for Semantic Search in E-commerce​
Deep Learning for Semantic Search in E-commerce​Somnath Banerjee
 

What's hot (20)

Boston ML - Architecting Recommender Systems
Boston ML - Architecting Recommender SystemsBoston ML - Architecting Recommender Systems
Boston ML - Architecting Recommender Systems
 
레코픽(RecoPick)소개서_220823 (1).pptx
레코픽(RecoPick)소개서_220823 (1).pptx레코픽(RecoPick)소개서_220823 (1).pptx
레코픽(RecoPick)소개서_220823 (1).pptx
 
Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Recommender Systems (Machine Learning Summer School 2014 @ CMU)Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Recommender Systems (Machine Learning Summer School 2014 @ CMU)
 
Matrix Factorization In Recommender Systems
Matrix Factorization In Recommender SystemsMatrix Factorization In Recommender Systems
Matrix Factorization In Recommender Systems
 
Real-Time Recommendations with Hopsworks and OpenSearch - MLOps World 2022
Real-Time Recommendations  with Hopsworks and OpenSearch - MLOps World 2022Real-Time Recommendations  with Hopsworks and OpenSearch - MLOps World 2022
Real-Time Recommendations with Hopsworks and OpenSearch - MLOps World 2022
 
Automated Hyperparameter Tuning, Scaling and Tracking
Automated Hyperparameter Tuning, Scaling and TrackingAutomated Hyperparameter Tuning, Scaling and Tracking
Automated Hyperparameter Tuning, Scaling and Tracking
 
Deep Learning for Recommender Systems RecSys2017 Tutorial
Deep Learning for Recommender Systems RecSys2017 Tutorial Deep Learning for Recommender Systems RecSys2017 Tutorial
Deep Learning for Recommender Systems RecSys2017 Tutorial
 
Search2Vec at OLX Group - Pydata Meetup Berlin
Search2Vec at OLX Group - Pydata Meetup BerlinSearch2Vec at OLX Group - Pydata Meetup Berlin
Search2Vec at OLX Group - Pydata Meetup Berlin
 
How to build a recommender system?
How to build a recommender system?How to build a recommender system?
How to build a recommender system?
 
Embedded based retrieval in modern search ranking system
Embedded based retrieval in modern search ranking systemEmbedded based retrieval in modern search ranking system
Embedded based retrieval in modern search ranking system
 
An introduction to Recommender Systems
An introduction to Recommender SystemsAn introduction to Recommender Systems
An introduction to Recommender Systems
 
Introduction to Sentiment Analysis
Introduction to Sentiment AnalysisIntroduction to Sentiment Analysis
Introduction to Sentiment Analysis
 
Word2Vec model to generate synonyms on the fly in Apache Lucene.pdf
Word2Vec model to generate synonyms on the fly in Apache Lucene.pdfWord2Vec model to generate synonyms on the fly in Apache Lucene.pdf
Word2Vec model to generate synonyms on the fly in Apache Lucene.pdf
 
Recommender system introduction
Recommender system   introductionRecommender system   introduction
Recommender system introduction
 
Amazon Product Sentiment review
Amazon Product Sentiment reviewAmazon Product Sentiment review
Amazon Product Sentiment review
 
Word embedding
Word embedding Word embedding
Word embedding
 
제 18회 보아즈(BOAZ) 빅데이터 컨퍼런스 - [보아酒] : 리뷰 감정분석을 통한 전통주 추천 서비스
제 18회 보아즈(BOAZ) 빅데이터 컨퍼런스 - [보아酒] : 리뷰 감정분석을 통한 전통주 추천 서비스제 18회 보아즈(BOAZ) 빅데이터 컨퍼런스 - [보아酒] : 리뷰 감정분석을 통한 전통주 추천 서비스
제 18회 보아즈(BOAZ) 빅데이터 컨퍼런스 - [보아酒] : 리뷰 감정분석을 통한 전통주 추천 서비스
 
제 14회 보아즈(BOAZ) 빅데이터 컨퍼런스 - [TweetViz팀] : 카프카와 스파크를 통한 tweetdeck 개발
제 14회 보아즈(BOAZ) 빅데이터 컨퍼런스 - [TweetViz팀] : 카프카와 스파크를 통한 tweetdeck 개발제 14회 보아즈(BOAZ) 빅데이터 컨퍼런스 - [TweetViz팀] : 카프카와 스파크를 통한 tweetdeck 개발
제 14회 보아즈(BOAZ) 빅데이터 컨퍼런스 - [TweetViz팀] : 카프카와 스파크를 통한 tweetdeck 개발
 
[분석]워드임베딩과 인공신경망을 이용한 개인 맞춤형 레시피 추천
[분석]워드임베딩과 인공신경망을 이용한 개인 맞춤형 레시피 추천[분석]워드임베딩과 인공신경망을 이용한 개인 맞춤형 레시피 추천
[분석]워드임베딩과 인공신경망을 이용한 개인 맞춤형 레시피 추천
 
Deep Learning for Semantic Search in E-commerce​
Deep Learning for Semantic Search in E-commerce​Deep Learning for Semantic Search in E-commerce​
Deep Learning for Semantic Search in E-commerce​
 

Similar to Learned Embeddings for Search and Discovery at Instacart

Identifying Parts of an E-commerce Query on Target.com Using Search Logs - Vi...
Identifying Parts of an E-commerce Query on Target.com Using Search Logs - Vi...Identifying Parts of an E-commerce Query on Target.com Using Search Logs - Vi...
Identifying Parts of an E-commerce Query on Target.com Using Search Logs - Vi...Lucidworks
 
Value Proposition: How to leverage your company's most valuable asset in your...
Value Proposition: How to leverage your company's most valuable asset in your...Value Proposition: How to leverage your company's most valuable asset in your...
Value Proposition: How to leverage your company's most valuable asset in your...MarketingSherpa
 
Keyword Research-Misbah-Jalal-Siddiqui
Keyword Research-Misbah-Jalal-SiddiquiKeyword Research-Misbah-Jalal-Siddiqui
Keyword Research-Misbah-Jalal-SiddiquiMisbah Jalal Siddiqui
 
Keyword Research Process_Training Deck
Keyword Research Process_Training DeckKeyword Research Process_Training Deck
Keyword Research Process_Training DeckSilvia Alongi
 
DataEngConf 2017 - Machine Learning Models in Production
DataEngConf 2017 - Machine Learning Models in ProductionDataEngConf 2017 - Machine Learning Models in Production
DataEngConf 2017 - Machine Learning Models in ProductionSharath Rao
 
Inbound Growth for SaaS Scale-Ups #INBOUND18
Inbound Growth for SaaS Scale-Ups #INBOUND18Inbound Growth for SaaS Scale-Ups #INBOUND18
Inbound Growth for SaaS Scale-Ups #INBOUND18Hull
 
3 Advanced Google Shopping Strategies to Maximize Holiday Conversions
3 Advanced Google Shopping Strategies to Maximize Holiday Conversions3 Advanced Google Shopping Strategies to Maximize Holiday Conversions
3 Advanced Google Shopping Strategies to Maximize Holiday ConversionsTinuiti
 
Keyword Research Process
Keyword Research ProcessKeyword Research Process
Keyword Research ProcessRakesh Kumar
 
Seo Tools You Can Use Today
Seo Tools You Can Use TodaySeo Tools You Can Use Today
Seo Tools You Can Use TodayAffiliate Summit
 
Lesson 07 Ist402 Keywords Take 02
Lesson 07 Ist402 Keywords Take 02Lesson 07 Ist402 Keywords Take 02
Lesson 07 Ist402 Keywords Take 02Jim Jansen
 
2011 12 ECMOD360 Writing for Readers and Search Bots
2011 12 ECMOD360 Writing for Readers and Search Bots2011 12 ECMOD360 Writing for Readers and Search Bots
2011 12 ECMOD360 Writing for Readers and Search BotsGillian Muessig
 
Competitive Keyword Research For SEO
Competitive Keyword Research For SEOCompetitive Keyword Research For SEO
Competitive Keyword Research For SEOSearchHOU
 
The Salesforce Playbook- 6 Steps to Better Deployments
The Salesforce Playbook- 6 Steps to Better DeploymentsThe Salesforce Playbook- 6 Steps to Better Deployments
The Salesforce Playbook- 6 Steps to Better DeploymentsAlex Cowan
 
C-T-R-You Ready for 2021?! - On-SERP SEO Strategies
C-T-R-You Ready for 2021?! - On-SERP SEO StrategiesC-T-R-You Ready for 2021?! - On-SERP SEO Strategies
C-T-R-You Ready for 2021?! - On-SERP SEO StrategiesIzzi Smith
 
Using the Wisdom of the Crowd for Content Excellence
Using the Wisdom of the Crowd for Content ExcellenceUsing the Wisdom of the Crowd for Content Excellence
Using the Wisdom of the Crowd for Content ExcellenceKeith Goode
 
SEO Keyword Research & Mapping
SEO Keyword Research & Mapping SEO Keyword Research & Mapping
SEO Keyword Research & Mapping Vivastream
 

Similar to Learned Embeddings for Search and Discovery at Instacart (20)

Identifying Parts of an E-commerce Query on Target.com Using Search Logs - Vi...
Identifying Parts of an E-commerce Query on Target.com Using Search Logs - Vi...Identifying Parts of an E-commerce Query on Target.com Using Search Logs - Vi...
Identifying Parts of an E-commerce Query on Target.com Using Search Logs - Vi...
 
Value Proposition: How to leverage your company's most valuable asset in your...
Value Proposition: How to leverage your company's most valuable asset in your...Value Proposition: How to leverage your company's most valuable asset in your...
Value Proposition: How to leverage your company's most valuable asset in your...
 
Keyword Research-Misbah-Jalal-Siddiqui
Keyword Research-Misbah-Jalal-SiddiquiKeyword Research-Misbah-Jalal-Siddiqui
Keyword Research-Misbah-Jalal-Siddiqui
 
Job Seeking in a Web Based World
Job Seeking in a Web Based World Job Seeking in a Web Based World
Job Seeking in a Web Based World
 
Keyword Research Process_Training Deck
Keyword Research Process_Training DeckKeyword Research Process_Training Deck
Keyword Research Process_Training Deck
 
DataEngConf 2017 - Machine Learning Models in Production
DataEngConf 2017 - Machine Learning Models in ProductionDataEngConf 2017 - Machine Learning Models in Production
DataEngConf 2017 - Machine Learning Models in Production
 
Inbound Growth for SaaS Scale-Ups #INBOUND18
Inbound Growth for SaaS Scale-Ups #INBOUND18Inbound Growth for SaaS Scale-Ups #INBOUND18
Inbound Growth for SaaS Scale-Ups #INBOUND18
 
meetup-talk
meetup-talkmeetup-talk
meetup-talk
 
3 Advanced Google Shopping Strategies to Maximize Holiday Conversions
3 Advanced Google Shopping Strategies to Maximize Holiday Conversions3 Advanced Google Shopping Strategies to Maximize Holiday Conversions
3 Advanced Google Shopping Strategies to Maximize Holiday Conversions
 
Keyword Research Process
Keyword Research ProcessKeyword Research Process
Keyword Research Process
 
Seo Tools You Can Use Today
Seo Tools You Can Use TodaySeo Tools You Can Use Today
Seo Tools You Can Use Today
 
online research
online researchonline research
online research
 
Lesson 07 Ist402 Keywords Take 02
Lesson 07 Ist402 Keywords Take 02Lesson 07 Ist402 Keywords Take 02
Lesson 07 Ist402 Keywords Take 02
 
2011 12 ECMOD360 Writing for Readers and Search Bots
2011 12 ECMOD360 Writing for Readers and Search Bots2011 12 ECMOD360 Writing for Readers and Search Bots
2011 12 ECMOD360 Writing for Readers and Search Bots
 
Competitive Keyword Research For SEO
Competitive Keyword Research For SEOCompetitive Keyword Research For SEO
Competitive Keyword Research For SEO
 
The Salesforce Playbook- 6 Steps to Better Deployments
The Salesforce Playbook- 6 Steps to Better DeploymentsThe Salesforce Playbook- 6 Steps to Better Deployments
The Salesforce Playbook- 6 Steps to Better Deployments
 
C-T-R-You Ready for 2021?! - On-SERP SEO Strategies
C-T-R-You Ready for 2021?! - On-SERP SEO StrategiesC-T-R-You Ready for 2021?! - On-SERP SEO Strategies
C-T-R-You Ready for 2021?! - On-SERP SEO Strategies
 
Using the Wisdom of the Crowd for Content Excellence
Using the Wisdom of the Crowd for Content ExcellenceUsing the Wisdom of the Crowd for Content Excellence
Using the Wisdom of the Crowd for Content Excellence
 
SEO Keyword Research & Mapping
SEO Keyword Research & Mapping SEO Keyword Research & Mapping
SEO Keyword Research & Mapping
 
Search 1
Search 1Search 1
Search 1
 

Recently uploaded

Automobile Management System Project Report.pdf
Automobile Management System Project Report.pdfAutomobile Management System Project Report.pdf
Automobile Management System Project Report.pdfKamal Acharya
 
ONLINE VEHICLE RENTAL SYSTEM PROJECT REPORT.pdf
ONLINE VEHICLE RENTAL SYSTEM PROJECT REPORT.pdfONLINE VEHICLE RENTAL SYSTEM PROJECT REPORT.pdf
ONLINE VEHICLE RENTAL SYSTEM PROJECT REPORT.pdfKamal Acharya
 
Scaling in conventional MOSFET for constant electric field and constant voltage
Scaling in conventional MOSFET for constant electric field and constant voltageScaling in conventional MOSFET for constant electric field and constant voltage
Scaling in conventional MOSFET for constant electric field and constant voltageRCC Institute of Information Technology
 
Architectural Portfolio Sean Lockwood
Architectural Portfolio Sean LockwoodArchitectural Portfolio Sean Lockwood
Architectural Portfolio Sean Lockwoodseandesed
 
Explosives Industry manufacturing process.pdf
Explosives Industry manufacturing process.pdfExplosives Industry manufacturing process.pdf
Explosives Industry manufacturing process.pdf884710SadaqatAli
 
Top 13 Famous Civil Engineering Scientist
Top 13 Famous Civil Engineering ScientistTop 13 Famous Civil Engineering Scientist
Top 13 Famous Civil Engineering Scientistgettygaming1
 
HYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generationHYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generationRobbie Edward Sayers
 
Immunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary AttacksImmunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary Attacksgerogepatton
 
The Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdfThe Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdfPipe Restoration Solutions
 
Construction method of steel structure space frame .pptx
Construction method of steel structure space frame .pptxConstruction method of steel structure space frame .pptx
Construction method of steel structure space frame .pptxwendy cai
 
ASME IX(9) 2007 Full Version .pdf
ASME IX(9)  2007 Full Version       .pdfASME IX(9)  2007 Full Version       .pdf
ASME IX(9) 2007 Full Version .pdfAhmedHussein950959
 
2024 DevOps Pro Europe - Growing at the edge
2024 DevOps Pro Europe - Growing at the edge2024 DevOps Pro Europe - Growing at the edge
2024 DevOps Pro Europe - Growing at the edgePaco Orozco
 
Introduction to Machine Learning Unit-4 Notes for II-II Mechanical Engineering
Introduction to Machine Learning Unit-4 Notes for II-II Mechanical EngineeringIntroduction to Machine Learning Unit-4 Notes for II-II Mechanical Engineering
Introduction to Machine Learning Unit-4 Notes for II-II Mechanical EngineeringC Sai Kiran
 
Online resume builder management system project report.pdf
Online resume builder management system project report.pdfOnline resume builder management system project report.pdf
Online resume builder management system project report.pdfKamal Acharya
 
Digital Signal Processing Lecture notes n.pdf
Digital Signal Processing Lecture notes n.pdfDigital Signal Processing Lecture notes n.pdf
Digital Signal Processing Lecture notes n.pdfAbrahamGadissa
 
Fruit shop management system project report.pdf
Fruit shop management system project report.pdfFruit shop management system project report.pdf
Fruit shop management system project report.pdfKamal Acharya
 
Democratizing Fuzzing at Scale by Abhishek Arya
Democratizing Fuzzing at Scale by Abhishek AryaDemocratizing Fuzzing at Scale by Abhishek Arya
Democratizing Fuzzing at Scale by Abhishek Aryaabh.arya
 
The Ultimate Guide to External Floating Roofs for Oil Storage Tanks.docx
The Ultimate Guide to External Floating Roofs for Oil Storage Tanks.docxThe Ultimate Guide to External Floating Roofs for Oil Storage Tanks.docx
The Ultimate Guide to External Floating Roofs for Oil Storage Tanks.docxCenterEnamel
 
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptxCFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptxR&R Consult
 
Quality defects in TMT Bars, Possible causes and Potential Solutions.
Quality defects in TMT Bars, Possible causes and Potential Solutions.Quality defects in TMT Bars, Possible causes and Potential Solutions.
Quality defects in TMT Bars, Possible causes and Potential Solutions.PrashantGoswami42
 

Recently uploaded (20)

Automobile Management System Project Report.pdf
Automobile Management System Project Report.pdfAutomobile Management System Project Report.pdf
Automobile Management System Project Report.pdf
 
ONLINE VEHICLE RENTAL SYSTEM PROJECT REPORT.pdf
ONLINE VEHICLE RENTAL SYSTEM PROJECT REPORT.pdfONLINE VEHICLE RENTAL SYSTEM PROJECT REPORT.pdf
ONLINE VEHICLE RENTAL SYSTEM PROJECT REPORT.pdf
 
Scaling in conventional MOSFET for constant electric field and constant voltage
Scaling in conventional MOSFET for constant electric field and constant voltageScaling in conventional MOSFET for constant electric field and constant voltage
Scaling in conventional MOSFET for constant electric field and constant voltage
 
Architectural Portfolio Sean Lockwood
Architectural Portfolio Sean LockwoodArchitectural Portfolio Sean Lockwood
Architectural Portfolio Sean Lockwood
 
Explosives Industry manufacturing process.pdf
Explosives Industry manufacturing process.pdfExplosives Industry manufacturing process.pdf
Explosives Industry manufacturing process.pdf
 
Top 13 Famous Civil Engineering Scientist
Top 13 Famous Civil Engineering ScientistTop 13 Famous Civil Engineering Scientist
Top 13 Famous Civil Engineering Scientist
 
HYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generationHYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generation
 
Immunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary AttacksImmunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary Attacks
 
The Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdfThe Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdf
 
Construction method of steel structure space frame .pptx
Construction method of steel structure space frame .pptxConstruction method of steel structure space frame .pptx
Construction method of steel structure space frame .pptx
 
ASME IX(9) 2007 Full Version .pdf
ASME IX(9)  2007 Full Version       .pdfASME IX(9)  2007 Full Version       .pdf
ASME IX(9) 2007 Full Version .pdf
 
2024 DevOps Pro Europe - Growing at the edge
2024 DevOps Pro Europe - Growing at the edge2024 DevOps Pro Europe - Growing at the edge
2024 DevOps Pro Europe - Growing at the edge
 
Introduction to Machine Learning Unit-4 Notes for II-II Mechanical Engineering
Introduction to Machine Learning Unit-4 Notes for II-II Mechanical EngineeringIntroduction to Machine Learning Unit-4 Notes for II-II Mechanical Engineering
Introduction to Machine Learning Unit-4 Notes for II-II Mechanical Engineering
 
Online resume builder management system project report.pdf
Online resume builder management system project report.pdfOnline resume builder management system project report.pdf
Online resume builder management system project report.pdf
 
Digital Signal Processing Lecture notes n.pdf
Digital Signal Processing Lecture notes n.pdfDigital Signal Processing Lecture notes n.pdf
Digital Signal Processing Lecture notes n.pdf
 
Fruit shop management system project report.pdf
Fruit shop management system project report.pdfFruit shop management system project report.pdf
Fruit shop management system project report.pdf
 
Democratizing Fuzzing at Scale by Abhishek Arya
Democratizing Fuzzing at Scale by Abhishek AryaDemocratizing Fuzzing at Scale by Abhishek Arya
Democratizing Fuzzing at Scale by Abhishek Arya
 
The Ultimate Guide to External Floating Roofs for Oil Storage Tanks.docx
The Ultimate Guide to External Floating Roofs for Oil Storage Tanks.docxThe Ultimate Guide to External Floating Roofs for Oil Storage Tanks.docx
The Ultimate Guide to External Floating Roofs for Oil Storage Tanks.docx
 
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptxCFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
 
Quality defects in TMT Bars, Possible causes and Potential Solutions.
Quality defects in TMT Bars, Possible causes and Potential Solutions.Quality defects in TMT Bars, Possible causes and Potential Solutions.
Quality defects in TMT Bars, Possible causes and Potential Solutions.
 

Learned Embeddings for Search and Discovery at Instacart

  • 1. Learned Embeddings for Search and Discovery at Instacart Sharath Rao Data Scientist / Manager Search and Discovery Collaborators: Angadh Singh
  • 2. v Talk Outline • Quick overview of Search and Discovery at Instacart • Word2vec for product recommendations • Extending Word2vec embeddings to improve search ranking
  • 3. v About me • Worked on several ML products in catalog, search and personalization at Instacart since 2015 • Currently leading full stack search and discovery team @sharathrao
  • 4. v Search and Discovery@ Instacart Help customers find what they are looking for and discover what they might like
  • 5. v Grocery Shopping in “Low Dimensional Space” Search Restock Discover
  • 6. v “Here is what I want” Query has easily accessible information content “Here I am, what next?” Context is the query, and recommendations vary as contexts do Search Discovery
  • 7. v Entities we try to model items products aisles departments retailers queries customers brands Most of our data products are about modeling relationships between them (part of which is learning embeddings)
  • 8. v Common paradigm for search/discovery problems 1st phase: Candidate Generation 2st phase: Reranking
  • 9. v • Select top 100s from among potentially millions • Must be fast and simple • often not even a learned model • Recall oriented Candidate Generation
  • 10. v • Ranks fewer products, but users richer models/features • Tuned towards high precision • Often happens in real-time Reranking
  • 11. vWord2vec for Product Recommendations
  • 12. v “Frequently bought with” recommendations Not necessarily consumed together Help customers shop for the next item Probably consumed together
  • 13. v Quick Tour of Word2vec (simplified, with a view to develop some intuition)
  • 14. v • We need to represent words as features in a model/task • Naive representation => one hot encoding • Problem: Every word is dissimilar to every other word => not intuitive One-hot encodings are good, but not good enough
  • 15. v • stadium, field and restaurant are in some sense similar • stadium and field are more similar than stadium and restaurant • game has another meaning/sense that is similar to stadium/field/restaurant What one-hot encodings fail to capture Observation 1: “game at the stadium” Observation 2: “game at the field” Observation 3: “ate at the restaurant” Observation 4: “met at the game” Data What models must learn None of these are easy to learn with one-hot encoded representations without supplementing with hand-engineered features
  • 16. vYou shall know a word by the company it keeps - Firth, J.R., 1957 Core motivation behind semantic word representations
  • 17. v • Learns a feature vector per word • for 1M vocabulary and 100 dimensional vector, we learn 1M vectors => 100M numbers • Vectors must be such that words appearing in similar contexts are closer than any random pair of words • Must work with unlabeled training data • Must scale to large (unlabeled) datasets • Embedding space itself must be general enough that word representations are broadly applicable Word2vec is a scalable way to learn semantic word representations
  • 19. v On graphs where random walks are the sequences
  • 20. v On songs where plays are the sequences
  • 21. v Even the emojis weren’t spared!
  • 22. v You shall know a word product by the company it keeps by what its purchased with
  • 23. v We applied word2vec to purchase sequences Typical purchasing session contains tens of cart adds Sequences of products added to cart are the ‘sentences’
  • 24. v “Frequently Bought With” Recommendations Extract Training Sequences Learn word2vec representations Eliminate substitute products Event Data Approximate Nearest Neighbors Cache recommendations
  • 25. v Word2vec for product recommendations Surfaces complementary products
  • 26. v Next step is to make recommendations contextual Not ideal if already shopped for sugar recently inappropriate if allergic to walnuts Not if favorite brand of butter isn’t otherwise popular
  • 27. v We see word2vec recommendations as a candidate generation step
  • 29. v The Search Ranking Problem In response to a query, rank products in the order that maximizes the probability of purchase
  • 30. v The Search Ranking Problem Learning to Rank Model Candidate Generation Boolean matching BM25 ranking Synonym/Semantic Expansion (Online) Reranking Matching features Historical aggregates Personalization features
  • 31. v Search Ranking - Training Generate training features Label Generation with Implicit Feedback Event Data Learning to Rank Training Model Repository
  • 32. v Search Ranking - Scoring Online reranker query Model Repository Reranked products Top N products Top N productsprocessed query Final ranking
  • 33. v Features for Learning to Rank Historical aggregates => normalized purchase counts High coverage Low precision Low coverage More precision [query, product] [query, product, user] [query, brand] [query, aisle] [product] But historical aggregates alone are not enough. Because sparsity and cold start. [user, brand]
  • 34. v Learning word2vec embeddings from search logs • Representations learnt from wikipedia or petabyte text aren’t ideal for Instacart • No constraint that word2vec models must be learnt on temporal or spatial sequences • We constructed training data for word2vec from search logs
  • 35. v Training Contexts from Search Logs • Create contexts that are observed and desirable <query1> <purchased description of converted product1> <query2> <purchased description of converted product2> .. .. • Learn embeddings for each word in the unified vocabulary of query tokens and catalog product descriptions
  • 36. v Examples of nearest neighbors in embeddings space OK, so now we have word representations Matching features require product features in the same space bread orowheat bonaparte zissel ryebread raisins sunmade monukka raisinettes fruitsource mushrooms cremini portabello shitake
  • 37. v Deriving other features from word embeddings • One option is to create contexts with product identifiers in training sequences • Promising but we will have a product cold start problem Use learnt word embeddings to derive features for other entities such as products, brand names, queries and users etc.
  • 38. v Simple averaging of word representations works well Product Average embeddings of words in product description Brands: Average embeddings of products sold by the brand User Average embeddings of products purchased by the user Aisles/Departments Average embeddings of products in the aisle/ department Word Representation Learnt from converted search logs Query Average embeddings of words in query
  • 39. v Wait, so we have 2 different representations for products? Embeddings learnt from purchase sequences Embeddings derived from words from search logs Products are similar if they bought together Products are similar if their descriptions are semantically similar
  • 40. v Product representations in different embedding spaces Embeddings learnt from purchase sequences Products are similar if they bought together
  • 41. v Product representations in different embedding spaces Embeddings derived from words from search logs Products are similar if their descriptions are semantically similar
  • 42. v Examples of nearest neighbors for products Cinnamon Toast Crunch CerealGolden Grahams Cereal Not much common between the product names
  • 43. v Examples of nearest neighbors for brands
  • 44. v Using word2vec features in search ranking [query, product] [query, brand] [query, aisle] [query, department] [user, product] [user, brand] [user, aisle] [user, department] • Construct similarity scores between different entities as features Matching Features Personalization Features
  • 45. v We saw significant improvement with word2vec features 98.0% 99.0% 100.0% 101.0% 102.0% 103.0% 104.0% AUC Recall@10 Relative(Improvement(with(word2vec(features Baseline With:word2vec:features
  • 46. v word2vec features rank high among top features [query,(aisle](word2vec [query,(product](historical(aggregate [query,(department](word2vec [query,(product](historical(aggregate [user,(product](word2vec lda(model(2 lda(model(1( query(length [user,(brand](word2vec [query,(product](word2vec bm25 position XGBoost logistic loss with early stopping
  • 48. v Broad based discovery oriented recommendations Including from stores customers may have never shopped from
  • 49. v Run out of X? Rank products by repurchase probability
  • 50. v Introduce customers to products new in the catalog Also mitigates product cold start problems
  • 51. v Replacement Product Recommendations Mitigate adverse impact of last-minute out of stocks
  • 52. v Data Products in Search and Discovery • query autocorrection • query spell correction • query expansion • deep matching/document expansion • search ranking • search advertising • Substitute/replacements products • Frequently bought with products • Next basket recommendations • Guided discovery • Interpretable recommendations Search Discovery
  • 53. v Thank you! We are hiring! Senior Machine Learning Engineer http://bit.ly/2kzHpcg