SlideShare a Scribd company logo
1 of 17
Download to read offline
Using AI to Understand Search Intent
MICES 2021
Aritra Mandal
eBay Search
June, 2021
About Me
Aritra Mandal is an applied researcher on the Search team
at eBay. He focuses on search quality and is leveraging
AI/ML, structured data, and knowledge graphs to improve
the search engine that powers eBay’s marketplace. Aritra
received his BEng in computer science from Birla Institute of
Technology and his MS in computer and information science
from Indiana University–Purdue University Indianapolis.
Agenda
• What is Query Understanding?
• Using AI to Understand Search Intent
– Query Categorization
– Query Equivalence and Similarity
• Summary
What is Query Understanding?
Query understanding is the process of inferring the intent of a search query from the
searcher’s keywords.
It takes place before the search engine retrieves and ranks results.
Query understanding is useful not only for retrieval and ranking, but also for interface
decisions, recommendations, promotions, analytics, etc.
Spectrum of Query Intent
• Coarse-Grained
– Query Categorization
– Broad vs. Narrow
• Fine-Grained
– Entity Recognition
– Query Equivalence
Query Categorization
Benefits of Query Categorization
• Relevance filtering: excludes out-of-category results that match keywords.
– Especially useful for non-default sorts, like sorting by price.
• Helps provide searchers accurate summary information about query results.
– Accurate numbers for total number of results and category / facet counts.
– Contextual appropriate sub-categories and facets, sorts, filters, etc.
vs.
Using AI for Query Categorization
• Create labeled training data from clicks or purchases.
– Query => clicked or purchased product => product category.
– Identify the appropriate category with sufficient demand for each query.
– Use fastText, BERT, or any text classification model.
• Apply model to other queries -- especially torso and tail queries.
– Compute categories for medium to low frequency queries online.
Collect Query
Engaged Items
pairs
Queries with
Few or No
engagements
Use category
Hierarchy to roll
up and create
training records
Train Model
Bert/FastText
Category
Prediction for
Queries
Query Similarity based on Intent
• Multiple queries often map to the same intent.
– e.g., mens shoes = shoes for men.
Why group of queries with similar intent ?
• If we can recognize queries with the same (or nearly the same) intent, we can:
– Map poorly performing queries to better-performing equivalent ones.
– Intelligently recover from queries that return few or no results.
– Analyze search behavior grouping by intent, rather than by query.
– Obtain better signals to train machine learning models, e.g., for ranking.
• Recognizing query equivalence and similarity allows us to transform search queries
into canonical representations of search intent, establishing a more robust foundation
to optimize the search experience.
Recognizing Query Equivalence: 2 Strategies
• How do we recognize that two queries represent the same search intent?
– Surface Query Similarity
• stemming, word order, compounds, noise words
• e.g., mens wristwatch = wrist watches for men
– Post-Search Behavior
• engagement (clicks, conversions) with similar results
• requires a way to measure similar results (hold that thought!)
Surface Query Similarity
• The Good:
– Easy to recognize queries that differ only in stemming, word order, etc.
– High precision as a standalone indicator of query equivalence.
– Decent coverage that can be extended (e.g., low-edit-distance query pairs).
– Simple and explainable!
• The Bad:
– There are false positives, e.g., kiss != kisses; dress shirt != shirt dress.
– Need guardrails to avoid embarrassing mistakes.
– Limited coverage; extending it significantly increases risk of false positives.
Post-Search Behavior
• The Good:
– Higher coverage than can be obtained from surface query similarity.
– Learn from user behavior, which can look far beyond literal query tokens.
– Use whole-query context for contextual expansion and relaxation.
• The Bad:
– Complex compared to surface similarity, stochastic, and difficult to explain.
– Replace simple binary query equivalence with continuous similarity metric.
– Picking the right similarity threshold is as much an art as a science.
– Rely on a pipeline of black-box approaches, starting with embeddings.
Representing Queries as Vectors
• Think of a query as a bag of products associated with the query.
– Conversions, clicks, or even impressions.
– Trade-off between signal strength and sparsity.
• Compute query vector as average (mean pooling) of associated product vectors.
– Assumes that products can be mapped to vectors!
– Use pre-trained embeddings, fine-tuning, or can train from scratch.
• Product titles tend to be longer and more self-contained than queries.
– Especially on a marketplace where sellers optimize for findability.
Representing Queries as Vectors
The Devil is in the Details!
• Challenging to decide how similar is similar enough for equivalence.
– Threshold depends on data, application, and specific query.
– Similarity may not be the only goal, e.g., may be trying to increase recall.
• Evaluation of similarity model is critical -- but also challenging.
– Use human judgements of query pairs or end-to-end results.
– Ultimately need to A/B test the end-to-end search application.
• This approach only works when you can pre-compute query vectors offline.
– Associating queries with results at query time is too slow and expensive.
– Offline approach works for head and maybe torso, but not for tail.
Summary
• Query understanding is the process of inferring searcher’s intent from keywords.
• Spectrum from coarse-grained categorization to fine-grained equivalence.
• Categorization improves relevance, navigation, presentation, and analytics.
• Train query categorization using the categories of clicked and purchased products.
• Recognizing query equivalence establishes canonical representation of search intent.
• Model a query as average of associated product vectors from an embedding model.
For More information : arimandal@ebay.com LinkedIn : linkedin.com/in/aritram

More Related Content

What's hot

Rethinking State Management in Cloud-Native Streaming Systems
Rethinking State Management in Cloud-Native Streaming SystemsRethinking State Management in Cloud-Native Streaming Systems
Rethinking State Management in Cloud-Native Streaming Systems
Yingjun Wu
 
Etsy Activity Feeds Architecture
Etsy Activity Feeds ArchitectureEtsy Activity Feeds Architecture
Etsy Activity Feeds Architecture
Dan McKinley
 
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Databricks
 

What's hot (20)

Hadoop Overview kdd2011
Hadoop Overview kdd2011Hadoop Overview kdd2011
Hadoop Overview kdd2011
 
ClickHouse Introduction by Alexander Zaitsev, Altinity CTO
ClickHouse Introduction by Alexander Zaitsev, Altinity CTOClickHouse Introduction by Alexander Zaitsev, Altinity CTO
ClickHouse Introduction by Alexander Zaitsev, Altinity CTO
 
Rethinking State Management in Cloud-Native Streaming Systems
Rethinking State Management in Cloud-Native Streaming SystemsRethinking State Management in Cloud-Native Streaming Systems
Rethinking State Management in Cloud-Native Streaming Systems
 
Query Understanding: A Manifesto
Query Understanding: A ManifestoQuery Understanding: A Manifesto
Query Understanding: A Manifesto
 
MongoDB Fundamentals
MongoDB FundamentalsMongoDB Fundamentals
MongoDB Fundamentals
 
Natural Language Processing and Search Intent Understanding C3 Conductor 2019...
Natural Language Processing and Search Intent Understanding C3 Conductor 2019...Natural Language Processing and Search Intent Understanding C3 Conductor 2019...
Natural Language Processing and Search Intent Understanding C3 Conductor 2019...
 
Introduction to Machine Learning with H2O and Python
Introduction to Machine Learning with H2O and PythonIntroduction to Machine Learning with H2O and Python
Introduction to Machine Learning with H2O and Python
 
[211] HBase 기반 검색 데이터 저장소 (공개용)
[211] HBase 기반 검색 데이터 저장소 (공개용)[211] HBase 기반 검색 데이터 저장소 (공개용)
[211] HBase 기반 검색 데이터 저장소 (공개용)
 
MongoDB Schema Design
MongoDB Schema DesignMongoDB Schema Design
MongoDB Schema Design
 
Etsy Activity Feeds Architecture
Etsy Activity Feeds ArchitectureEtsy Activity Feeds Architecture
Etsy Activity Feeds Architecture
 
Apache Hive Tutorial
Apache Hive TutorialApache Hive Tutorial
Apache Hive Tutorial
 
PySpark dataframe
PySpark dataframePySpark dataframe
PySpark dataframe
 
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
 
Semantic Web, Ontology, and Ontology Learning: Introduction
Semantic Web, Ontology, and Ontology Learning: IntroductionSemantic Web, Ontology, and Ontology Learning: Introduction
Semantic Web, Ontology, and Ontology Learning: Introduction
 
Using Apache Spark to Solve Sessionization Problem in Batch and Streaming
Using Apache Spark to Solve Sessionization Problem in Batch and StreamingUsing Apache Spark to Solve Sessionization Problem in Batch and Streaming
Using Apache Spark to Solve Sessionization Problem in Batch and Streaming
 
NLTK
NLTKNLTK
NLTK
 
MongoDB, E-commerce and Transactions
MongoDB, E-commerce and TransactionsMongoDB, E-commerce and Transactions
MongoDB, E-commerce and Transactions
 
Geospatial Options in Apache Spark
Geospatial Options in Apache SparkGeospatial Options in Apache Spark
Geospatial Options in Apache Spark
 
Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)
 
Usage of Linked Data: Introduction and Application Scenarios
Usage of Linked Data: Introduction and Application ScenariosUsage of Linked Data: Introduction and Application Scenarios
Usage of Linked Data: Introduction and Application Scenarios
 

Similar to Using AI to understand search intent

Similar to Using AI to understand search intent (20)

Search Behavior Patterns
Search Behavior PatternsSearch Behavior Patterns
Search Behavior Patterns
 
MMM, Search!
MMM, Search!MMM, Search!
MMM, Search!
 
Query Understanding and Ecommerce
Query Understanding and EcommerceQuery Understanding and Ecommerce
Query Understanding and Ecommerce
 
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.com
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.comEnhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.com
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.com
 
Demystifying Recommendation Systems
Demystifying Recommendation SystemsDemystifying Recommendation Systems
Demystifying Recommendation Systems
 
Search Analytics - Comperio
Search Analytics - ComperioSearch Analytics - Comperio
Search Analytics - Comperio
 
Keyword research - Digital Marketing - SEO
Keyword research - Digital Marketing - SEOKeyword research - Digital Marketing - SEO
Keyword research - Digital Marketing - SEO
 
Lecture Notes on Recommender System Introduction
Lecture Notes on Recommender System IntroductionLecture Notes on Recommender System Introduction
Lecture Notes on Recommender System Introduction
 
Haystacks slides
Haystacks slidesHaystacks slides
Haystacks slides
 
Keyword research tools for Search Engine Optimisation (SEO)
Keyword research tools for Search Engine Optimisation (SEO)Keyword research tools for Search Engine Optimisation (SEO)
Keyword research tools for Search Engine Optimisation (SEO)
 
Tuning Up Site Search - IA Summit 2007
Tuning Up Site Search - IA Summit 2007Tuning Up Site Search - IA Summit 2007
Tuning Up Site Search - IA Summit 2007
 
Data warehouse 16 data analysis techniques
Data warehouse 16 data analysis techniquesData warehouse 16 data analysis techniques
Data warehouse 16 data analysis techniques
 
Research Theory Pro-Forma (1).pptx
Research Theory Pro-Forma (1).pptxResearch Theory Pro-Forma (1).pptx
Research Theory Pro-Forma (1).pptx
 
Great Survey Design
Great Survey DesignGreat Survey Design
Great Survey Design
 
e3_chapter__5_evaluation_technics_HCeVpPLCvE.ppt
e3_chapter__5_evaluation_technics_HCeVpPLCvE.ppte3_chapter__5_evaluation_technics_HCeVpPLCvE.ppt
e3_chapter__5_evaluation_technics_HCeVpPLCvE.ppt
 
Introduction to Enterprise Search
Introduction to Enterprise SearchIntroduction to Enterprise Search
Introduction to Enterprise Search
 
meetup-talk
meetup-talkmeetup-talk
meetup-talk
 
ACIS 2015 Bibliographical-based Facets for Expertise Search
ACIS 2015 Bibliographical-based Facets for Expertise SearchACIS 2015 Bibliographical-based Facets for Expertise Search
ACIS 2015 Bibliographical-based Facets for Expertise Search
 
Modelling and evaluation
Modelling and evaluationModelling and evaluation
Modelling and evaluation
 
detailed notes AMR.pdf
detailed notes AMR.pdfdetailed notes AMR.pdf
detailed notes AMR.pdf
 

Recently uploaded

PowerDirector Explination Process...pptx
PowerDirector Explination Process...pptxPowerDirector Explination Process...pptx
PowerDirector Explination Process...pptx
galaxypingy
 
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查
ydyuyu
 
75539-Cyber Security Challenges PPT.pptx
75539-Cyber Security Challenges PPT.pptx75539-Cyber Security Challenges PPT.pptx
75539-Cyber Security Challenges PPT.pptx
Asmae Rabhi
 
Russian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
Russian Escort Abu Dhabi 0503464457 Abu DHabi EscortsRussian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
Russian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
Monica Sydney
 
一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样
一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样
一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样
ayvbos
 
哪里办理美国迈阿密大学毕业证(本硕)umiami在读证明存档可查
哪里办理美国迈阿密大学毕业证(本硕)umiami在读证明存档可查哪里办理美国迈阿密大学毕业证(本硕)umiami在读证明存档可查
哪里办理美国迈阿密大学毕业证(本硕)umiami在读证明存档可查
ydyuyu
 
在线制作约克大学毕业证(yu毕业证)在读证明认证可查
在线制作约克大学毕业证(yu毕业证)在读证明认证可查在线制作约克大学毕业证(yu毕业证)在读证明认证可查
在线制作约克大学毕业证(yu毕业证)在读证明认证可查
ydyuyu
 
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girlsRussian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
Monica Sydney
 

Recently uploaded (20)

PowerDirector Explination Process...pptx
PowerDirector Explination Process...pptxPowerDirector Explination Process...pptx
PowerDirector Explination Process...pptx
 
APNIC Updates presented by Paul Wilson at ARIN 53
APNIC Updates presented by Paul Wilson at ARIN 53APNIC Updates presented by Paul Wilson at ARIN 53
APNIC Updates presented by Paul Wilson at ARIN 53
 
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查
 
Trump Diapers Over Dems t shirts Sweatshirt
Trump Diapers Over Dems t shirts SweatshirtTrump Diapers Over Dems t shirts Sweatshirt
Trump Diapers Over Dems t shirts Sweatshirt
 
75539-Cyber Security Challenges PPT.pptx
75539-Cyber Security Challenges PPT.pptx75539-Cyber Security Challenges PPT.pptx
75539-Cyber Security Challenges PPT.pptx
 
APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...
APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...
APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...
 
Russian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
Russian Escort Abu Dhabi 0503464457 Abu DHabi EscortsRussian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
Russian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
 
Power point inglese - educazione civica di Nuria Iuzzolino
Power point inglese - educazione civica di Nuria IuzzolinoPower point inglese - educazione civica di Nuria Iuzzolino
Power point inglese - educazione civica di Nuria Iuzzolino
 
一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样
一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样
一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样
 
20240510 QFM016 Irresponsible AI Reading List April 2024.pdf
20240510 QFM016 Irresponsible AI Reading List April 2024.pdf20240510 QFM016 Irresponsible AI Reading List April 2024.pdf
20240510 QFM016 Irresponsible AI Reading List April 2024.pdf
 
Real Men Wear Diapers T Shirts sweatshirt
Real Men Wear Diapers T Shirts sweatshirtReal Men Wear Diapers T Shirts sweatshirt
Real Men Wear Diapers T Shirts sweatshirt
 
Meaning of On page SEO & its process in detail.
Meaning of On page SEO & its process in detail.Meaning of On page SEO & its process in detail.
Meaning of On page SEO & its process in detail.
 
Vip Firozabad Phone 8250092165 Escorts Service At 6k To 30k Along With Ac Room
Vip Firozabad Phone 8250092165 Escorts Service At 6k To 30k Along With Ac RoomVip Firozabad Phone 8250092165 Escorts Service At 6k To 30k Along With Ac Room
Vip Firozabad Phone 8250092165 Escorts Service At 6k To 30k Along With Ac Room
 
哪里办理美国迈阿密大学毕业证(本硕)umiami在读证明存档可查
哪里办理美国迈阿密大学毕业证(本硕)umiami在读证明存档可查哪里办理美国迈阿密大学毕业证(本硕)umiami在读证明存档可查
哪里办理美国迈阿密大学毕业证(本硕)umiami在读证明存档可查
 
Microsoft Azure Arc Customer Deck Microsoft
Microsoft Azure Arc Customer Deck MicrosoftMicrosoft Azure Arc Customer Deck Microsoft
Microsoft Azure Arc Customer Deck Microsoft
 
Nagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime Nagercoil
Nagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime NagercoilNagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime Nagercoil
Nagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime Nagercoil
 
20240508 QFM014 Elixir Reading List April 2024.pdf
20240508 QFM014 Elixir Reading List April 2024.pdf20240508 QFM014 Elixir Reading List April 2024.pdf
20240508 QFM014 Elixir Reading List April 2024.pdf
 
在线制作约克大学毕业证(yu毕业证)在读证明认证可查
在线制作约克大学毕业证(yu毕业证)在读证明认证可查在线制作约克大学毕业证(yu毕业证)在读证明认证可查
在线制作约克大学毕业证(yu毕业证)在读证明认证可查
 
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girlsRussian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
 
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
 

Using AI to understand search intent

  • 1. Using AI to Understand Search Intent MICES 2021 Aritra Mandal eBay Search June, 2021
  • 2. About Me Aritra Mandal is an applied researcher on the Search team at eBay. He focuses on search quality and is leveraging AI/ML, structured data, and knowledge graphs to improve the search engine that powers eBay’s marketplace. Aritra received his BEng in computer science from Birla Institute of Technology and his MS in computer and information science from Indiana University–Purdue University Indianapolis.
  • 3. Agenda • What is Query Understanding? • Using AI to Understand Search Intent – Query Categorization – Query Equivalence and Similarity • Summary
  • 4. What is Query Understanding? Query understanding is the process of inferring the intent of a search query from the searcher’s keywords. It takes place before the search engine retrieves and ranks results. Query understanding is useful not only for retrieval and ranking, but also for interface decisions, recommendations, promotions, analytics, etc.
  • 5. Spectrum of Query Intent • Coarse-Grained – Query Categorization – Broad vs. Narrow • Fine-Grained – Entity Recognition – Query Equivalence
  • 7. Benefits of Query Categorization • Relevance filtering: excludes out-of-category results that match keywords. – Especially useful for non-default sorts, like sorting by price. • Helps provide searchers accurate summary information about query results. – Accurate numbers for total number of results and category / facet counts. – Contextual appropriate sub-categories and facets, sorts, filters, etc. vs.
  • 8. Using AI for Query Categorization • Create labeled training data from clicks or purchases. – Query => clicked or purchased product => product category. – Identify the appropriate category with sufficient demand for each query. – Use fastText, BERT, or any text classification model. • Apply model to other queries -- especially torso and tail queries. – Compute categories for medium to low frequency queries online. Collect Query Engaged Items pairs Queries with Few or No engagements Use category Hierarchy to roll up and create training records Train Model Bert/FastText Category Prediction for Queries
  • 9. Query Similarity based on Intent • Multiple queries often map to the same intent. – e.g., mens shoes = shoes for men.
  • 10. Why group of queries with similar intent ? • If we can recognize queries with the same (or nearly the same) intent, we can: – Map poorly performing queries to better-performing equivalent ones. – Intelligently recover from queries that return few or no results. – Analyze search behavior grouping by intent, rather than by query. – Obtain better signals to train machine learning models, e.g., for ranking. • Recognizing query equivalence and similarity allows us to transform search queries into canonical representations of search intent, establishing a more robust foundation to optimize the search experience.
  • 11. Recognizing Query Equivalence: 2 Strategies • How do we recognize that two queries represent the same search intent? – Surface Query Similarity • stemming, word order, compounds, noise words • e.g., mens wristwatch = wrist watches for men – Post-Search Behavior • engagement (clicks, conversions) with similar results • requires a way to measure similar results (hold that thought!)
  • 12. Surface Query Similarity • The Good: – Easy to recognize queries that differ only in stemming, word order, etc. – High precision as a standalone indicator of query equivalence. – Decent coverage that can be extended (e.g., low-edit-distance query pairs). – Simple and explainable! • The Bad: – There are false positives, e.g., kiss != kisses; dress shirt != shirt dress. – Need guardrails to avoid embarrassing mistakes. – Limited coverage; extending it significantly increases risk of false positives.
  • 13. Post-Search Behavior • The Good: – Higher coverage than can be obtained from surface query similarity. – Learn from user behavior, which can look far beyond literal query tokens. – Use whole-query context for contextual expansion and relaxation. • The Bad: – Complex compared to surface similarity, stochastic, and difficult to explain. – Replace simple binary query equivalence with continuous similarity metric. – Picking the right similarity threshold is as much an art as a science. – Rely on a pipeline of black-box approaches, starting with embeddings.
  • 14. Representing Queries as Vectors • Think of a query as a bag of products associated with the query. – Conversions, clicks, or even impressions. – Trade-off between signal strength and sparsity. • Compute query vector as average (mean pooling) of associated product vectors. – Assumes that products can be mapped to vectors! – Use pre-trained embeddings, fine-tuning, or can train from scratch. • Product titles tend to be longer and more self-contained than queries. – Especially on a marketplace where sellers optimize for findability.
  • 16. The Devil is in the Details! • Challenging to decide how similar is similar enough for equivalence. – Threshold depends on data, application, and specific query. – Similarity may not be the only goal, e.g., may be trying to increase recall. • Evaluation of similarity model is critical -- but also challenging. – Use human judgements of query pairs or end-to-end results. – Ultimately need to A/B test the end-to-end search application. • This approach only works when you can pre-compute query vectors offline. – Associating queries with results at query time is too slow and expensive. – Offline approach works for head and maybe torso, but not for tail.
  • 17. Summary • Query understanding is the process of inferring searcher’s intent from keywords. • Spectrum from coarse-grained categorization to fine-grained equivalence. • Categorization improves relevance, navigation, presentation, and analytics. • Train query categorization using the categories of clicked and purchased products. • Recognizing query equivalence establishes canonical representation of search intent. • Model a query as average of associated product vectors from an embedding model. For More information : arimandal@ebay.com LinkedIn : linkedin.com/in/aritram