MMM, Search!
Daniel Tunkelang
dtunkelang@gmail.com

Presented to Wikimedia Foundation on April 27, 2020
What is search?
Search is a process.
• Searchers
• start with information-seeking goals.

• express and elaborate those goals as queries.

• Search Engines
• translate queries into representations of intent.

• retrieve results relevant to that intent and rank them.

Communication isn’t perfect, so the process is iterative.
Search is many things.
• Known-Item search vs. exploratory search.

• Seeking specific item vs. knowing when you see it.

• Search is a means to an end, not the end itself.

• Getting information, shopping, communication, etc.

• It takes a lot of hard work to make search feel effortless.

• Indexing, query understanding, matching, ranking.
Metrics, Models, Methods
The most important decisions for a search engine are:

• Metrics: what we measure and optimize for.

• Models: how we model the search experience.

• Methods: how we help searchers achieve success.
Metrics
Metrics:
What do we need to know?
• Binary Relevance

• Are searchers finding relevant results?

• Session Success

• How often are search sessions successful?

• Search Efficiency

• How much effort are searchers making?
Binary Relevance
Relevance is a measure of
information conveyed by a
document relative to a query.



Relationship between document
and query, though necessary, is not
sufficient to determine relevance.

William Goffman, 1964
Relevance has shades of gray, but
non-relevance is black and white.
Example: Email
• Can Google decide which of my emails are important?

• ¯_(ツ)_/¯

• Can Google decide which of my emails are spam?

• Definitely!
Measure Binary Relevance!
• Build a (query, document) binary relevance model.

• (we’ll get back to that in a moment)

• Embrace positional bias: measure at top ranks.

• Can use top k results or weighted sample.

• Stratify for meaningful query and user segments.

• Leverage query classification and user data.
Search is a journey.
Searcher
Search isn’t always one-shot.
Search can’t be always one-shot.
Measure Session Success!
• Measure session conversion, not just query conversion.

• Much better proxy for user’s success!

• Compute metrics based on first query of session.

• Distribution of journeys for common intent.

• Segment sessions into tasks? Maybe, but optional. 

• Multi-task sessions uncommon; treat as noise.
Search Efficiency
Searching is not fun.
Having found is fun.
• If search is too hard or takes too long, searchers give up.

• Compare successful and unsuccessful sessions.

• Measure how much time searchers spend in sessions.

• Especially time on search rather than results.

• Measure searcher effort.

• Pagination, reformulation, refinement, etc.
Metrics: Summary
• Binary Relevance

• Are searchers finding relevant results?

• Session Success

• How often are search sessions successful?

• Search Efficiency

• How much effort are searchers making?
Models
Models:
What do we model and how?
• Query Categorization

• What is the primary domain for a query?

• Query Similarity

• Do two queries express similar / identical intent?

• Binary Relevance

• How to estimate relevance of results to queries?
Query Categorization
Search starts with query understanding.
Query understanding starts with categorization.
• Map query to a primary content taxonomy.

• Subject, product type, domain, etc.

• Identify high-level intent, independent of content interest.

• Title, category, brand, site help, etc.

• Categories should be coherent, distinctive, and useful.

• Good categorization requires good categories.
How to Train Your
Query Categorization Model
• Label your most frequent head queries manually.

• Top 1000 queries are probably worth it.

• For torso queries, infer categories from engagement.

• Looking for overwhelmingly dominant category.

• Now train a model using labeled head and torso queries.

• This training data is biased, but manageably so.

• No need to use fancy deep learning / AI. Try fastText.
Query Similarity
Query ambiguity is rare.
Query similarity is common.
• Some queries do not express a clear intent, but most do.

• Most “ambiguous” queries turn out to be broad.

• Bigger opportunity: multiple queries express same intent.

• Or at least the same distribution of intents.

• Recognizing similar / identical queries is huge opportunity.

• Query rewriting, aggregating signals, etc.
How to Model
Query Similarity
• Start with the simple stuff: shallow query canonicalization.

• Character normalization, stemming, word order.

• Look at edit distance, especially for spelling errors.

• Tail queries at edit distance 1 from head queries.

• Compare embeddings of queries and results.

• Especially to keep the other methods honest.
Binary Relevance
Focus on simplest question.
• Worry whether a result is relevant or non-relevant.

• Relevant vs. more relevant is often subjective.

• Assume that query understanding has done its job first.

• Result relevance depends on query understanding.

• Assume that relevance is objective and universal.

• Personalization: a nice-to-have, not a must-have.
How to Train Your
Binary Relevance Model
• Collect human binary relevance judgments. Lots of them. 

• Quantity is more important than quality.

• Pay attention to query distribution and stratify sample. 

• Collect judgements that teach you something.

• Come to terms with presentation and position bias.

• Users mostly interact with top-ranked results.
Models: Summary
• Query Categorization

• Simple model to map query to primary intent.

• Query Similarity

• Recognize queries with same or similar intent.

• Binary Relevance

• Use human judgments to train relevance model.
Methods
Methods:
What are some useful tricks?
• Optimize for Query Performance

• Help searchers make better queries.

• Map Tail Queries to Head Intents

• Searchers aren’t as unique as you think!

• Learn from Successful Sessions

• Help others discovers successful paths.
Optimize for
Query Performance
• Expected searcher success for query.

• Function of query, not of any particular result.

• Can use any measure of searcher success.

• But consider focusing on session success.

• Can incorporate sorting, refinement, or other factors.

• But keep it simple. Query is probably enough.
What is query performance?
Best way to predict query performance?
Historical query performance.
Stuck in the tail? No data?
These methods can help.
Predict query performance.
Then optimize for it.
• Consider every surface where you suggest queries.

• Autocomplete, guides, related searches, etc.

• Offer suggestions with high predicted performance.

• Or at least nudge users wherever possible.

• Use query rewriting to improve query performance.

• Rewrite to similar, high-performing queries.
Pull Your Tail From Your Head
Many tail queries
express head intents.
• Misspelled queries are often misspellings of head queries.

• Common misspellings are uncommon.

• Many queries have a dominant singular or plural form.

• Often, though not always, the same intent.

• Also word order or other grammatical transformations.

• Such removal of low-information / noise words.
Rewrite tail queries!
• Prioritize correcting misspellings of head queries.

• Be more aggressive, skip tokenization, etc.

• Look for head queries equivalent to tail queries.

• Stemming, reordering terms, dropping noise words.

• But check to make sure intent is actually preserved!

• Remember earlier discussion of query similarity.
Learn From Success
Successful searchers
can help everyone else.
• Some queries lead to great performance for everyone.

• e.g., known-item searches by name or title.

• But for some queries, performance is user-dependent.

• Some users are more sophisticated or persistent.

• Successful users discovers successful paths.

• Use trails of successful users to build shortcuts!
Optimize complex journeys.
• Detect the searches for which searchers need help.

• Queries for which successful sessions are long.

• Find the actions that successful searchers take.

• Category / facet refinements, reformulations.

• Promote those actions in the search experience.

• Create shortcuts in the navigational landscape.
Methods: Summary
• Optimize for Query Performance

• Suggest better queries and rewrite others.

• Map Tail Queries to Head Intents

• Rewrite tail queries as similar head queries.

• Learn from Successful Sessions

• Create shortcuts based on successful paths.
Putting It All Together
• Metrics, models, and methods — they all matter.

• Query understanding first, then result relevance.

• Binary result relevance first, then result ranking.

• Session performance, not just query performance.

• Get as much leverage as possible from head queries.
Thank You!
• More Resources

• Query Understanding

https://queryunderstanding.com/

• My Medium (not just about search)

https://medium.com/@dtunkelang

• Contact me directly!

dtunkelang@gmail.com

MMM, Search!

  • 1.
    MMM, Search! Daniel Tunkelang dtunkelang@gmail.com Presentedto Wikimedia Foundation on April 27, 2020
  • 2.
  • 3.
    Search is aprocess. • Searchers • start with information-seeking goals. • express and elaborate those goals as queries. • Search Engines • translate queries into representations of intent. • retrieve results relevant to that intent and rank them. Communication isn’t perfect, so the process is iterative.
  • 4.
    Search is manythings. • Known-Item search vs. exploratory search. • Seeking specific item vs. knowing when you see it. • Search is a means to an end, not the end itself. • Getting information, shopping, communication, etc. • It takes a lot of hard work to make search feel effortless. • Indexing, query understanding, matching, ranking.
  • 5.
    Metrics, Models, Methods Themost important decisions for a search engine are: • Metrics: what we measure and optimize for. • Models: how we model the search experience. • Methods: how we help searchers achieve success.
  • 6.
  • 7.
    Metrics: What do weneed to know? • Binary Relevance • Are searchers finding relevant results? • Session Success • How often are search sessions successful? • Search Efficiency • How much effort are searchers making?
  • 8.
    Binary Relevance Relevance isa measure of information conveyed by a document relative to a query.
 
 Relationship between document and query, though necessary, is not sufficient to determine relevance. William Goffman, 1964
  • 9.
    Relevance has shadesof gray, but non-relevance is black and white.
  • 10.
    Example: Email • CanGoogle decide which of my emails are important? • ¯_(ツ)_/¯ • Can Google decide which of my emails are spam? • Definitely!
  • 11.
    Measure Binary Relevance! •Build a (query, document) binary relevance model. • (we’ll get back to that in a moment) • Embrace positional bias: measure at top ranks. • Can use top k results or weighted sample. • Stratify for meaningful query and user segments. • Leverage query classification and user data.
  • 12.
    Search is ajourney. Searcher
  • 13.
    Search isn’t alwaysone-shot. Search can’t be always one-shot.
  • 14.
    Measure Session Success! •Measure session conversion, not just query conversion. • Much better proxy for user’s success! • Compute metrics based on first query of session. • Distribution of journeys for common intent. • Segment sessions into tasks? Maybe, but optional. • Multi-task sessions uncommon; treat as noise.
  • 15.
  • 16.
    Searching is notfun. Having found is fun. • If search is too hard or takes too long, searchers give up. • Compare successful and unsuccessful sessions. • Measure how much time searchers spend in sessions. • Especially time on search rather than results. • Measure searcher effort. • Pagination, reformulation, refinement, etc.
  • 17.
    Metrics: Summary • BinaryRelevance • Are searchers finding relevant results? • Session Success • How often are search sessions successful? • Search Efficiency • How much effort are searchers making?
  • 18.
  • 19.
    Models: What do wemodel and how? • Query Categorization • What is the primary domain for a query? • Query Similarity • Do two queries express similar / identical intent? • Binary Relevance • How to estimate relevance of results to queries?
  • 20.
  • 21.
    Search starts withquery understanding. Query understanding starts with categorization. • Map query to a primary content taxonomy. • Subject, product type, domain, etc. • Identify high-level intent, independent of content interest. • Title, category, brand, site help, etc. • Categories should be coherent, distinctive, and useful. • Good categorization requires good categories.
  • 22.
    How to TrainYour Query Categorization Model • Label your most frequent head queries manually. • Top 1000 queries are probably worth it. • For torso queries, infer categories from engagement. • Looking for overwhelmingly dominant category. • Now train a model using labeled head and torso queries. • This training data is biased, but manageably so. • No need to use fancy deep learning / AI. Try fastText.
  • 23.
  • 24.
    Query ambiguity israre. Query similarity is common. • Some queries do not express a clear intent, but most do. • Most “ambiguous” queries turn out to be broad. • Bigger opportunity: multiple queries express same intent. • Or at least the same distribution of intents. • Recognizing similar / identical queries is huge opportunity. • Query rewriting, aggregating signals, etc.
  • 25.
    How to Model QuerySimilarity • Start with the simple stuff: shallow query canonicalization. • Character normalization, stemming, word order. • Look at edit distance, especially for spelling errors. • Tail queries at edit distance 1 from head queries. • Compare embeddings of queries and results. • Especially to keep the other methods honest.
  • 26.
  • 27.
    Focus on simplestquestion. • Worry whether a result is relevant or non-relevant. • Relevant vs. more relevant is often subjective. • Assume that query understanding has done its job first. • Result relevance depends on query understanding. • Assume that relevance is objective and universal. • Personalization: a nice-to-have, not a must-have.
  • 28.
    How to TrainYour Binary Relevance Model • Collect human binary relevance judgments. Lots of them. • Quantity is more important than quality. • Pay attention to query distribution and stratify sample. • Collect judgements that teach you something. • Come to terms with presentation and position bias. • Users mostly interact with top-ranked results.
  • 29.
    Models: Summary • QueryCategorization • Simple model to map query to primary intent. • Query Similarity • Recognize queries with same or similar intent. • Binary Relevance • Use human judgments to train relevance model.
  • 30.
  • 31.
    Methods: What are someuseful tricks? • Optimize for Query Performance • Help searchers make better queries. • Map Tail Queries to Head Intents • Searchers aren’t as unique as you think! • Learn from Successful Sessions • Help others discovers successful paths.
  • 32.
  • 33.
    • Expected searchersuccess for query. • Function of query, not of any particular result. • Can use any measure of searcher success. • But consider focusing on session success. • Can incorporate sorting, refinement, or other factors. • But keep it simple. Query is probably enough. What is query performance?
  • 34.
    Best way topredict query performance? Historical query performance.
  • 35.
    Stuck in thetail? No data? These methods can help.
  • 36.
    Predict query performance. Thenoptimize for it. • Consider every surface where you suggest queries. • Autocomplete, guides, related searches, etc. • Offer suggestions with high predicted performance. • Or at least nudge users wherever possible. • Use query rewriting to improve query performance. • Rewrite to similar, high-performing queries.
  • 37.
    Pull Your TailFrom Your Head
  • 38.
    Many tail queries expresshead intents. • Misspelled queries are often misspellings of head queries. • Common misspellings are uncommon. • Many queries have a dominant singular or plural form. • Often, though not always, the same intent. • Also word order or other grammatical transformations. • Such removal of low-information / noise words.
  • 39.
    Rewrite tail queries! •Prioritize correcting misspellings of head queries. • Be more aggressive, skip tokenization, etc. • Look for head queries equivalent to tail queries. • Stemming, reordering terms, dropping noise words. • But check to make sure intent is actually preserved! • Remember earlier discussion of query similarity.
  • 40.
  • 41.
    Successful searchers can helpeveryone else. • Some queries lead to great performance for everyone. • e.g., known-item searches by name or title. • But for some queries, performance is user-dependent. • Some users are more sophisticated or persistent. • Successful users discovers successful paths. • Use trails of successful users to build shortcuts!
  • 42.
    Optimize complex journeys. •Detect the searches for which searchers need help. • Queries for which successful sessions are long. • Find the actions that successful searchers take. • Category / facet refinements, reformulations. • Promote those actions in the search experience. • Create shortcuts in the navigational landscape.
  • 43.
    Methods: Summary • Optimizefor Query Performance • Suggest better queries and rewrite others. • Map Tail Queries to Head Intents • Rewrite tail queries as similar head queries. • Learn from Successful Sessions • Create shortcuts based on successful paths.
  • 44.
    Putting It AllTogether • Metrics, models, and methods — they all matter. • Query understanding first, then result relevance. • Binary result relevance first, then result ranking. • Session performance, not just query performance. • Get as much leverage as possible from head queries.
  • 45.
    Thank You! • MoreResources • Query Understanding
 https://queryunderstanding.com/ • My Medium (not just about search)
 https://medium.com/@dtunkelang • Contact me directly!
 dtunkelang@gmail.com