Haystack 2019 - Evolution of Yelp search to a generalized ranking platform - Umesh Dangat

OpenSource Connections
OpenSource ConnectionsPrincipal, OpenSource Connections and Solr Consultant at OpenSource Connections
Yelp Search and Learning to Rank
Umesh Dangat
Yelp’s Mission
Connecting people with great
local businesses.
Yelp by the Numbers
● Our users have written more than 177
million reviews by the end of Q4 2018
● Monthly average of unique visitors who
visited Yelp in Q4 2018
○ 33 million via the Yelp app and
○ 69 million via mobile web
● Billions of queries served per year
Yelp Core Search
Yelp core Search SLAs
● Search query latencies: two digit
milliseconds
● Real time indexing: no more than a
few seconds of delay for indexed
data to be searchable.
Search Backend
Yelp Custom Scoring
● Documents are recalled/retrieved as
per the “filters” in the query
● Each document is scored using a
heuristic
● Typical heuristics are
○ Document features
○ Query features
○ Some derivatives of the two
above
ML and search scoring
● Search relevance engineers train models “offline”
○ Iterate on models using data for logged feature score
components
● Serialized format of model is deployed “online”
○ Elasticsearch uses this model at query time to rank
documents
Scoring Before LTR
Issues with this
approach
● Scoring logic API leak
○ No single point of contained relevance logic
○ Hard to understand and iterate
Issues with this
approach
● Issues with code pushes
○ Not always rollback safe
○ Easy to push code with missing/new features
● Difficult to extend for other types of models
● Size of queries gets longer and ser-de becomes
more expensive
● More teams at Yelp wanting to solve similar
ranking problems backed by elasticsearch
Looking for an
alternative
● Solution should scale to more than one specific
team/use case
● Decouple model and feature training from online
deployment
● Should allow for iterations without Elasticsearch
cluster restarts.
● Hosted model server aka scoring outside
elasticsearch was ruled out due to latency
constraints
● Allow for tiered scoring
This plugin:
● Allows you to store features (Elasticsearch query templates) in Elasticsearch
● Logs features scores (relevance scores) to create a training set for offline model development
● Stores linear, xgboost, or ranklib ranking models in Elasticsearch that use features you've stored
● Ranks search results using a stored model
Elasticsearch Learning to Rank Plugin
Looking for an
alternative
● Solution should scale to more than one specific
team/use case (yes)
● Decouple model and feature training from online
deployment (maybe)
● Should allow for iterations without Elasticsearch
cluster restarts. (yes generally speaking)
● Hosted model server aka scoring outside
elasticsearch was ruled out due to latency
constraints (NA)
● Allow for tiered scoring (yes)
Scoring with
LTR
Learning to Rank
Plugin and Yelp
● Yelp had to make some changes to the then
existing LTR plugin functionality in order to make it
workable for our use cases.
● Let us look at a couple of the most important
ones.
Learning to Rank
Plugin and Yelp
● Selective feature selection
Linear model might for
these features might
resemble:
{
"title_query" : 0.3,
"user_rating" : 0.5
}
Learning to Rank
Plugin and Yelp
● Selective feature selection
Learning to Rank
Plugin and Yelp
● Passing feature vector between LTR and native java plugins so
that features do not have to be recomputed
● Consider a scenario where you have one base feature
potentially used as a seed value in multiple derived features
● Example
○ Base feature: document field value look up e.g. rating
○ Derived features: derived computation
■ feature A: log(rating) + log(word_score)
■ feature B: log(rating) + log(popularity)
● In the above example we don’t want to end up re-computing the
rating multiple times.
Learning to Rank
Plugin and Yelp
LTR @Yelp today
● Enabled Yelp search to do tiered scoring
● Newer ranking use cases at Yelp solved using
LTR
○ E.g. painless scripts, ES query features,
custom native plugins.
LTR @Yelp challenges
● Decouple model and feature training from online
deployment (maybe)
● Potential to solve for NeuralNetwork, vector
embeddings.
Thank you
● Doug Turnbull for being accessible to answer our
questions!
● David Causse for all the code reviews!
www.yelp.com/careers/
We're Hiring!
@YelpEngineering
fb.com/YelpEngineers
engineeringblog.yelp.com
github.com/yelp
Questions?
Thank you.
1 of 27

Recommended

Rated Ranking Evaluator: An Open Source Approach for Search Quality Evaluation by
Rated Ranking Evaluator: An Open Source Approach for Search Quality EvaluationRated Ranking Evaluator: An Open Source Approach for Search Quality Evaluation
Rated Ranking Evaluator: An Open Source Approach for Search Quality EvaluationAlessandro Benedetti
8.7K views36 slides
Guide to Data Visualization in Kibana by
Guide to Data Visualization in KibanaGuide to Data Visualization in Kibana
Guide to Data Visualization in KibanaFaithWestdorp
245 views14 slides
Solving Real World Challenges with Enterprise Search by
Solving Real World Challenges with Enterprise SearchSolving Real World Challenges with Enterprise Search
Solving Real World Challenges with Enterprise SearchSPC Adriatics
1.6K views39 slides
An introduction to Elasticsearch's advanced relevance ranking toolbox by
An introduction to Elasticsearch's advanced relevance ranking toolboxAn introduction to Elasticsearch's advanced relevance ranking toolbox
An introduction to Elasticsearch's advanced relevance ranking toolboxElasticsearch
356 views157 slides
Succeed Evaluation Infrastructure - Apostolos Antonacopoulos by
Succeed Evaluation Infrastructure  - Apostolos AntonacopoulosSucceed Evaluation Infrastructure  - Apostolos Antonacopoulos
Succeed Evaluation Infrastructure - Apostolos AntonacopoulosIMPACT Centre of Competence
750 views11 slides
Architecting for Data Science by
Architecting for Data ScienceArchitecting for Data Science
Architecting for Data ScienceJohann Schleier-Smith
1.3K views114 slides

More Related Content

What's hot

Adding azuresearch by
Adding azuresearchAdding azuresearch
Adding azuresearchEvan Boyle
263 views42 slides
MLOps at OLX by
MLOps at OLXMLOps at OLX
MLOps at OLXAlexey Grigorev
130 views43 slides
Tensors Are All You Need: Faster Inference with Hummingbird by
Tensors Are All You Need: Faster Inference with HummingbirdTensors Are All You Need: Faster Inference with Hummingbird
Tensors Are All You Need: Faster Inference with HummingbirdDatabricks
267 views49 slides
Amanda Tilot: Can I say that? by
Amanda Tilot: Can I say that?Amanda Tilot: Can I say that?
Amanda Tilot: Can I say that?Amanda Tilot
45 views7 slides
APIs and Linked Data: A match made in Heaven by
APIs and Linked Data: A match made in HeavenAPIs and Linked Data: A match made in Heaven
APIs and Linked Data: A match made in HeavenMichael Petychakis
1.7K views29 slides
GraphQL: From Graph Theory to Impelementation by
GraphQL: From Graph Theory to ImpelementationGraphQL: From Graph Theory to Impelementation
GraphQL: From Graph Theory to ImpelementationAll Things Open
225 views29 slides

What's hot(13)

Adding azuresearch by Evan Boyle
Adding azuresearchAdding azuresearch
Adding azuresearch
Evan Boyle263 views
Tensors Are All You Need: Faster Inference with Hummingbird by Databricks
Tensors Are All You Need: Faster Inference with HummingbirdTensors Are All You Need: Faster Inference with Hummingbird
Tensors Are All You Need: Faster Inference with Hummingbird
Databricks267 views
Amanda Tilot: Can I say that? by Amanda Tilot
Amanda Tilot: Can I say that?Amanda Tilot: Can I say that?
Amanda Tilot: Can I say that?
Amanda Tilot45 views
APIs and Linked Data: A match made in Heaven by Michael Petychakis
APIs and Linked Data: A match made in HeavenAPIs and Linked Data: A match made in Heaven
APIs and Linked Data: A match made in Heaven
Michael Petychakis1.7K views
GraphQL: From Graph Theory to Impelementation by All Things Open
GraphQL: From Graph Theory to ImpelementationGraphQL: From Graph Theory to Impelementation
GraphQL: From Graph Theory to Impelementation
All Things Open225 views
High Performance Transfer Learning for Classifying Intent of Sales Engagement... by Databricks
High Performance Transfer Learning for Classifying Intent of Sales Engagement...High Performance Transfer Learning for Classifying Intent of Sales Engagement...
High Performance Transfer Learning for Classifying Intent of Sales Engagement...
Databricks625 views
Introduction to Azure Search by Radoslav Gatev
Introduction to Azure SearchIntroduction to Azure Search
Introduction to Azure Search
Radoslav Gatev256 views
Apply MLOps at Scale by Databricks
Apply MLOps at ScaleApply MLOps at Scale
Apply MLOps at Scale
Databricks687 views
MLOps and Data Quality: Deploying Reliable ML Models in Production by Provectus
MLOps and Data Quality: Deploying Reliable ML Models in ProductionMLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in Production
Provectus211 views
Deep-Dive to Azure Search by Gunnar Peipman
Deep-Dive to Azure SearchDeep-Dive to Azure Search
Deep-Dive to Azure Search
Gunnar Peipman1.8K views
APMP - Automation Support for Proposal Development by VisibleThread
APMP - Automation Support for Proposal DevelopmentAPMP - Automation Support for Proposal Development
APMP - Automation Support for Proposal Development
VisibleThread1.1K views
FlorenceAI: Reinventing Data Science at Humana by Databricks
FlorenceAI: Reinventing Data Science at HumanaFlorenceAI: Reinventing Data Science at Humana
FlorenceAI: Reinventing Data Science at Humana
Databricks466 views

Similar to Haystack 2019 - Evolution of Yelp search to a generalized ranking platform - Umesh Dangat

Automation in Jira for beginners by
Automation in Jira for beginnersAutomation in Jira for beginners
Automation in Jira for beginnersElad Ben-Noam
826 views41 slides
Dataiku at SF DataMining Meetup - Kaggle Yandex Challenge by
Dataiku at SF DataMining Meetup - Kaggle Yandex ChallengeDataiku at SF DataMining Meetup - Kaggle Yandex Challenge
Dataiku at SF DataMining Meetup - Kaggle Yandex ChallengeDataiku
3.8K views66 slides
Machine learning pipeline with spark ml by
Machine learning pipeline with spark mlMachine learning pipeline with spark ml
Machine learning pipeline with spark mldatamantra
3.6K views34 slides
Open source ml systems that need to be built by
Open source ml systems that need to be builtOpen source ml systems that need to be built
Open source ml systems that need to be builtNikhil Garg
251 views43 slides
Real world machine learning with Java for Fumankaitori.com by
Real world machine learning with Java for Fumankaitori.comReal world machine learning with Java for Fumankaitori.com
Real world machine learning with Java for Fumankaitori.comMathieu Dumoulin
10.7K views36 slides
ProjectsSummary.pptx by
ProjectsSummary.pptxProjectsSummary.pptx
ProjectsSummary.pptxJamesKirk79
59 views85 slides

Similar to Haystack 2019 - Evolution of Yelp search to a generalized ranking platform - Umesh Dangat(20)

Automation in Jira for beginners by Elad Ben-Noam
Automation in Jira for beginnersAutomation in Jira for beginners
Automation in Jira for beginners
Elad Ben-Noam826 views
Dataiku at SF DataMining Meetup - Kaggle Yandex Challenge by Dataiku
Dataiku at SF DataMining Meetup - Kaggle Yandex ChallengeDataiku at SF DataMining Meetup - Kaggle Yandex Challenge
Dataiku at SF DataMining Meetup - Kaggle Yandex Challenge
Dataiku3.8K views
Machine learning pipeline with spark ml by datamantra
Machine learning pipeline with spark mlMachine learning pipeline with spark ml
Machine learning pipeline with spark ml
datamantra3.6K views
Open source ml systems that need to be built by Nikhil Garg
Open source ml systems that need to be builtOpen source ml systems that need to be built
Open source ml systems that need to be built
Nikhil Garg251 views
Real world machine learning with Java for Fumankaitori.com by Mathieu Dumoulin
Real world machine learning with Java for Fumankaitori.comReal world machine learning with Java for Fumankaitori.com
Real world machine learning with Java for Fumankaitori.com
Mathieu Dumoulin10.7K views
ProjectsSummary.pptx by JamesKirk79
ProjectsSummary.pptxProjectsSummary.pptx
ProjectsSummary.pptx
JamesKirk7959 views
Continuous delivery for machine learning by Rajesh Muppalla
Continuous delivery for machine learningContinuous delivery for machine learning
Continuous delivery for machine learning
Rajesh Muppalla2.9K views
Graph processing at scale using spark & graph frames by Ron Barabash
Graph processing at scale using spark & graph framesGraph processing at scale using spark & graph frames
Graph processing at scale using spark & graph frames
Ron Barabash15 views
Strata 2016 - Lessons Learned from building real-life Machine Learning Systems by Xavier Amatriain
Strata 2016 -  Lessons Learned from building real-life Machine Learning SystemsStrata 2016 -  Lessons Learned from building real-life Machine Learning Systems
Strata 2016 - Lessons Learned from building real-life Machine Learning Systems
Xavier Amatriain5.9K views
Productionizing Deep Reinforcement Learning with Spark and MLflow by Databricks
Productionizing Deep Reinforcement Learning with Spark and MLflowProductionizing Deep Reinforcement Learning with Spark and MLflow
Productionizing Deep Reinforcement Learning with Spark and MLflow
Databricks914 views
Pull_Request_PAW_Shared_Rohit.pptx by rohitagarwal24
Pull_Request_PAW_Shared_Rohit.pptxPull_Request_PAW_Shared_Rohit.pptx
Pull_Request_PAW_Shared_Rohit.pptx
rohitagarwal2410 views
Building multi billion ( dollars, users, documents ) search engines on open ... by Andrei Lopatenko
Building multi billion ( dollars, users, documents ) search engines  on open ...Building multi billion ( dollars, users, documents ) search engines  on open ...
Building multi billion ( dollars, users, documents ) search engines on open ...
Andrei Lopatenko1.8K views
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys... by Xavier Amatriain
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Xavier Amatriain16.5K views
How To Build your own Custom Search Engine by Richa Budhraja
How To Build your own Custom Search EngineHow To Build your own Custom Search Engine
How To Build your own Custom Search Engine
Richa Budhraja1.1K views
Method based views in django applications by Gary Reynolds
Method based views in django applicationsMethod based views in django applications
Method based views in django applications
Gary Reynolds361 views
Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon ... by Lucidworks
Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon ...Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon ...
Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon ...
Lucidworks532 views

More from OpenSource Connections

Encores by
EncoresEncores
EncoresOpenSource Connections
2K views53 slides
Test driven relevancy by
Test driven relevancyTest driven relevancy
Test driven relevancyOpenSource Connections
272 views20 slides
How To Structure Your Search Team for Success by
How To Structure Your Search Team for SuccessHow To Structure Your Search Team for Success
How To Structure Your Search Team for SuccessOpenSource Connections
162 views25 slides
The right path to making search relevant - Taxonomy Bootcamp London 2019 by
The right path to making search relevant  - Taxonomy Bootcamp London 2019The right path to making search relevant  - Taxonomy Bootcamp London 2019
The right path to making search relevant - Taxonomy Bootcamp London 2019OpenSource Connections
993 views56 slides
Payloads and OCR with Solr by
Payloads and OCR with SolrPayloads and OCR with Solr
Payloads and OCR with SolrOpenSource Connections
655 views22 slides
Haystack 2019 Lightning Talk - The Future of Quepid - Charlie Hull by
Haystack 2019 Lightning Talk - The Future of Quepid - Charlie HullHaystack 2019 Lightning Talk - The Future of Quepid - Charlie Hull
Haystack 2019 Lightning Talk - The Future of Quepid - Charlie HullOpenSource Connections
498 views5 slides

More from OpenSource Connections(20)

The right path to making search relevant - Taxonomy Bootcamp London 2019 by OpenSource Connections
The right path to making search relevant  - Taxonomy Bootcamp London 2019The right path to making search relevant  - Taxonomy Bootcamp London 2019
The right path to making search relevant - Taxonomy Bootcamp London 2019
Haystack 2019 Lightning Talk - The Future of Quepid - Charlie Hull by OpenSource Connections
Haystack 2019 Lightning Talk - The Future of Quepid - Charlie HullHaystack 2019 Lightning Talk - The Future of Quepid - Charlie Hull
Haystack 2019 Lightning Talk - The Future of Quepid - Charlie Hull
Haystack 2019 Lightning Talk - State of Apache Tika - Tim Allison by OpenSource Connections
Haystack 2019 Lightning Talk - State of Apache Tika - Tim AllisonHaystack 2019 Lightning Talk - State of Apache Tika - Tim Allison
Haystack 2019 Lightning Talk - State of Apache Tika - Tim Allison
Haystack 2019 Lightning Talk - Relevance on 17 million full text documents - ... by OpenSource Connections
Haystack 2019 Lightning Talk - Relevance on 17 million full text documents - ...Haystack 2019 Lightning Talk - Relevance on 17 million full text documents - ...
Haystack 2019 Lightning Talk - Relevance on 17 million full text documents - ...
Haystack 2019 Lightning Talk - Solr Cloud on Kubernetes - Manoj Bharadwaj by OpenSource Connections
Haystack 2019 Lightning Talk - Solr Cloud on Kubernetes - Manoj BharadwajHaystack 2019 Lightning Talk - Solr Cloud on Kubernetes - Manoj Bharadwaj
Haystack 2019 Lightning Talk - Solr Cloud on Kubernetes - Manoj Bharadwaj
Haystack 2019 Lightning Talk - Quaerite a Search relevance evaluation toolkit... by OpenSource Connections
Haystack 2019 Lightning Talk - Quaerite a Search relevance evaluation toolkit...Haystack 2019 Lightning Talk - Quaerite a Search relevance evaluation toolkit...
Haystack 2019 Lightning Talk - Quaerite a Search relevance evaluation toolkit...
Haystack 2019 - Search-based recommendations at Politico - Ryan Kohl by OpenSource Connections
Haystack 2019 - Search-based recommendations at Politico - Ryan KohlHaystack 2019 - Search-based recommendations at Politico - Ryan Kohl
Haystack 2019 - Search-based recommendations at Politico - Ryan Kohl
Haystack 2019 - Natural Language Search with Knowledge Graphs - Trey Grainger by OpenSource Connections
Haystack 2019 - Natural Language Search with Knowledge Graphs - Trey GraingerHaystack 2019 - Natural Language Search with Knowledge Graphs - Trey Grainger
Haystack 2019 - Natural Language Search with Knowledge Graphs - Trey Grainger
Haystack 2019 - Search Logs + Machine Learning = Auto-Tagging Inventory - Joh... by OpenSource Connections
Haystack 2019 - Search Logs + Machine Learning = Auto-Tagging Inventory - Joh...Haystack 2019 - Search Logs + Machine Learning = Auto-Tagging Inventory - Joh...
Haystack 2019 - Search Logs + Machine Learning = Auto-Tagging Inventory - Joh...
Haystack 2019 - Improving Search Relevance with Numeric Features in Elasticse... by OpenSource Connections
Haystack 2019 - Improving Search Relevance with Numeric Features in Elasticse...Haystack 2019 - Improving Search Relevance with Numeric Features in Elasticse...
Haystack 2019 - Improving Search Relevance with Numeric Features in Elasticse...
Haystack 2019 - Architectural considerations on search relevancy in the conte... by OpenSource Connections
Haystack 2019 - Architectural considerations on search relevancy in the conte...Haystack 2019 - Architectural considerations on search relevancy in the conte...
Haystack 2019 - Architectural considerations on search relevancy in the conte...
Haystack 2019 - Custom Solr Query Parser Design Option, and Pros & Cons - Ber... by OpenSource Connections
Haystack 2019 - Custom Solr Query Parser Design Option, and Pros & Cons - Ber...Haystack 2019 - Custom Solr Query Parser Design Option, and Pros & Cons - Ber...
Haystack 2019 - Custom Solr Query Parser Design Option, and Pros & Cons - Ber...
Haystack 2019 - Establishing a relevance focused culture in a large organizat... by OpenSource Connections
Haystack 2019 - Establishing a relevance focused culture in a large organizat...Haystack 2019 - Establishing a relevance focused culture in a large organizat...
Haystack 2019 - Establishing a relevance focused culture in a large organizat...
Haystack 2019 - Solving for Satisfaction: Introduction to Click Models - Eliz... by OpenSource Connections
Haystack 2019 - Solving for Satisfaction: Introduction to Click Models - Eliz...Haystack 2019 - Solving for Satisfaction: Introduction to Click Models - Eliz...
Haystack 2019 - Solving for Satisfaction: Introduction to Click Models - Eliz...
2019 Haystack - How The New York Times Tackles Relevance - Jeremiah Via by OpenSource Connections
2019 Haystack - How The New York Times Tackles Relevance - Jeremiah Via2019 Haystack - How The New York Times Tackles Relevance - Jeremiah Via
2019 Haystack - How The New York Times Tackles Relevance - Jeremiah Via

Recently uploaded

[DSC Europe 23] Aleksandar Tomcic - Adversarial Attacks by
[DSC Europe 23] Aleksandar Tomcic - Adversarial Attacks[DSC Europe 23] Aleksandar Tomcic - Adversarial Attacks
[DSC Europe 23] Aleksandar Tomcic - Adversarial AttacksDataScienceConferenc1
5 views20 slides
Organic Shopping in Google Analytics 4.pdf by
Organic Shopping in Google Analytics 4.pdfOrganic Shopping in Google Analytics 4.pdf
Organic Shopping in Google Analytics 4.pdfGA4 Tutorials
16 views13 slides
SUPER STORE SQL PROJECT.pptx by
SUPER STORE SQL PROJECT.pptxSUPER STORE SQL PROJECT.pptx
SUPER STORE SQL PROJECT.pptxkhan888620
13 views16 slides
[DSC Europe 23][AI:CSI] Dragan Pleskonjic - AI Impact on Cybersecurity and P... by
[DSC Europe 23][AI:CSI]  Dragan Pleskonjic - AI Impact on Cybersecurity and P...[DSC Europe 23][AI:CSI]  Dragan Pleskonjic - AI Impact on Cybersecurity and P...
[DSC Europe 23][AI:CSI] Dragan Pleskonjic - AI Impact on Cybersecurity and P...DataScienceConferenc1
8 views36 slides
OECD-Persol Holdings Workshop on Advancing Employee Well-being in Business an... by
OECD-Persol Holdings Workshop on Advancing Employee Well-being in Business an...OECD-Persol Holdings Workshop on Advancing Employee Well-being in Business an...
OECD-Persol Holdings Workshop on Advancing Employee Well-being in Business an...StatsCommunications
5 views26 slides
AvizoImageSegmentation.pptx by
AvizoImageSegmentation.pptxAvizoImageSegmentation.pptx
AvizoImageSegmentation.pptxnathanielbutterworth1
6 views14 slides

Recently uploaded(20)

Organic Shopping in Google Analytics 4.pdf by GA4 Tutorials
Organic Shopping in Google Analytics 4.pdfOrganic Shopping in Google Analytics 4.pdf
Organic Shopping in Google Analytics 4.pdf
GA4 Tutorials16 views
SUPER STORE SQL PROJECT.pptx by khan888620
SUPER STORE SQL PROJECT.pptxSUPER STORE SQL PROJECT.pptx
SUPER STORE SQL PROJECT.pptx
khan88862013 views
[DSC Europe 23][AI:CSI] Dragan Pleskonjic - AI Impact on Cybersecurity and P... by DataScienceConferenc1
[DSC Europe 23][AI:CSI]  Dragan Pleskonjic - AI Impact on Cybersecurity and P...[DSC Europe 23][AI:CSI]  Dragan Pleskonjic - AI Impact on Cybersecurity and P...
[DSC Europe 23][AI:CSI] Dragan Pleskonjic - AI Impact on Cybersecurity and P...
OECD-Persol Holdings Workshop on Advancing Employee Well-being in Business an... by StatsCommunications
OECD-Persol Holdings Workshop on Advancing Employee Well-being in Business an...OECD-Persol Holdings Workshop on Advancing Employee Well-being in Business an...
OECD-Persol Holdings Workshop on Advancing Employee Well-being in Business an...
Cross-network in Google Analytics 4.pdf by GA4 Tutorials
Cross-network in Google Analytics 4.pdfCross-network in Google Analytics 4.pdf
Cross-network in Google Analytics 4.pdf
GA4 Tutorials6 views
Short Story Assignment by Kelly Nguyen by kellynguyen01
Short Story Assignment by Kelly NguyenShort Story Assignment by Kelly Nguyen
Short Story Assignment by Kelly Nguyen
kellynguyen0119 views
[DSC Europe 23][Cryptica] Martin_Summer_Digital_central_bank_money_Ideas_init... by DataScienceConferenc1
[DSC Europe 23][Cryptica] Martin_Summer_Digital_central_bank_money_Ideas_init...[DSC Europe 23][Cryptica] Martin_Summer_Digital_central_bank_money_Ideas_init...
[DSC Europe 23][Cryptica] Martin_Summer_Digital_central_bank_money_Ideas_init...
Survey on Factuality in LLM's.pptx by NeethaSherra1
Survey on Factuality in LLM's.pptxSurvey on Factuality in LLM's.pptx
Survey on Factuality in LLM's.pptx
NeethaSherra17 views
CRIJ4385_Death Penalty_F23.pptx by yvettemm100
CRIJ4385_Death Penalty_F23.pptxCRIJ4385_Death Penalty_F23.pptx
CRIJ4385_Death Penalty_F23.pptx
yvettemm1007 views
CRM stick or twist workshop by info828217
CRM stick or twist workshopCRM stick or twist workshop
CRM stick or twist workshop
info82821711 views
[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M... by DataScienceConferenc1
[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...
[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...
[DSC Europe 23] Stefan Mrsic_Goran Savic - Evolving Technology Excellence.pptx by DataScienceConferenc1
[DSC Europe 23] Stefan Mrsic_Goran Savic - Evolving Technology Excellence.pptx[DSC Europe 23] Stefan Mrsic_Goran Savic - Evolving Technology Excellence.pptx
[DSC Europe 23] Stefan Mrsic_Goran Savic - Evolving Technology Excellence.pptx
[DSC Europe 23][AI:CSI] Aleksa Stojanovic - Applying AI for Threat Detection ... by DataScienceConferenc1
[DSC Europe 23][AI:CSI] Aleksa Stojanovic - Applying AI for Threat Detection ...[DSC Europe 23][AI:CSI] Aleksa Stojanovic - Applying AI for Threat Detection ...
[DSC Europe 23][AI:CSI] Aleksa Stojanovic - Applying AI for Threat Detection ...
Data Journeys Hard Talk workshop final.pptx by info828217
Data Journeys Hard Talk workshop final.pptxData Journeys Hard Talk workshop final.pptx
Data Journeys Hard Talk workshop final.pptx
info82821710 views

Haystack 2019 - Evolution of Yelp search to a generalized ranking platform - Umesh Dangat

  • 1. Yelp Search and Learning to Rank Umesh Dangat
  • 2. Yelp’s Mission Connecting people with great local businesses.
  • 3. Yelp by the Numbers ● Our users have written more than 177 million reviews by the end of Q4 2018 ● Monthly average of unique visitors who visited Yelp in Q4 2018 ○ 33 million via the Yelp app and ○ 69 million via mobile web ● Billions of queries served per year
  • 5. Yelp core Search SLAs ● Search query latencies: two digit milliseconds ● Real time indexing: no more than a few seconds of delay for indexed data to be searchable.
  • 7. Yelp Custom Scoring ● Documents are recalled/retrieved as per the “filters” in the query ● Each document is scored using a heuristic ● Typical heuristics are ○ Document features ○ Query features ○ Some derivatives of the two above
  • 8. ML and search scoring ● Search relevance engineers train models “offline” ○ Iterate on models using data for logged feature score components ● Serialized format of model is deployed “online” ○ Elasticsearch uses this model at query time to rank documents
  • 10. Issues with this approach ● Scoring logic API leak ○ No single point of contained relevance logic ○ Hard to understand and iterate
  • 11. Issues with this approach ● Issues with code pushes ○ Not always rollback safe ○ Easy to push code with missing/new features ● Difficult to extend for other types of models ● Size of queries gets longer and ser-de becomes more expensive ● More teams at Yelp wanting to solve similar ranking problems backed by elasticsearch
  • 12. Looking for an alternative ● Solution should scale to more than one specific team/use case ● Decouple model and feature training from online deployment ● Should allow for iterations without Elasticsearch cluster restarts. ● Hosted model server aka scoring outside elasticsearch was ruled out due to latency constraints ● Allow for tiered scoring
  • 13. This plugin: ● Allows you to store features (Elasticsearch query templates) in Elasticsearch ● Logs features scores (relevance scores) to create a training set for offline model development ● Stores linear, xgboost, or ranklib ranking models in Elasticsearch that use features you've stored ● Ranks search results using a stored model Elasticsearch Learning to Rank Plugin
  • 14. Looking for an alternative ● Solution should scale to more than one specific team/use case (yes) ● Decouple model and feature training from online deployment (maybe) ● Should allow for iterations without Elasticsearch cluster restarts. (yes generally speaking) ● Hosted model server aka scoring outside elasticsearch was ruled out due to latency constraints (NA) ● Allow for tiered scoring (yes)
  • 16. Learning to Rank Plugin and Yelp ● Yelp had to make some changes to the then existing LTR plugin functionality in order to make it workable for our use cases. ● Let us look at a couple of the most important ones.
  • 17. Learning to Rank Plugin and Yelp ● Selective feature selection Linear model might for these features might resemble: { "title_query" : 0.3, "user_rating" : 0.5 }
  • 18. Learning to Rank Plugin and Yelp ● Selective feature selection
  • 19. Learning to Rank Plugin and Yelp ● Passing feature vector between LTR and native java plugins so that features do not have to be recomputed ● Consider a scenario where you have one base feature potentially used as a seed value in multiple derived features ● Example ○ Base feature: document field value look up e.g. rating ○ Derived features: derived computation ■ feature A: log(rating) + log(word_score) ■ feature B: log(rating) + log(popularity) ● In the above example we don’t want to end up re-computing the rating multiple times.
  • 21. LTR @Yelp today ● Enabled Yelp search to do tiered scoring ● Newer ranking use cases at Yelp solved using LTR ○ E.g. painless scripts, ES query features, custom native plugins.
  • 22. LTR @Yelp challenges ● Decouple model and feature training from online deployment (maybe) ● Potential to solve for NeuralNetwork, vector embeddings.
  • 23. Thank you ● Doug Turnbull for being accessible to answer our questions! ● David Causse for all the code reviews!