Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Yahoo Real Time Search SMX March 2010


Published on

Published in: Technology

Yahoo Real Time Search SMX March 2010

  1. 1. Yahoo! Search Real Time Search Exploring the frontiers in modern information retrieval <ul><li>March 2010 </li></ul>Ivan Davtchev @ivan_d
  2. 2. What is real time search? <ul><li>Showing the most relevant up-to-date content for a topic of recently increased interest. </li></ul><ul><li>Freshest content is great, but not always best. </li></ul>
  3. 3. Recent Y! real time search launches Tweets on SRP Improved Yahoo! News + Twitter
  4. 4. Challenges of real time search <ul><ul><li>Real-time indexing : get new content as it is published </li></ul></ul><ul><ul><ul><li>Crawl really, really fast </li></ul></ul></ul><ul><ul><ul><li>Index news feeds, RSS, Twitter </li></ul></ul></ul><ul><ul><li>Query analysis : discover queries to handle differently </li></ul></ul><ul><ul><ul><li>For most queries, promoting recent content degrades relevance </li></ul></ul></ul><ul><ul><li>Ranking for fresh content : adjust ranking algorithms </li></ul></ul><ul><ul><ul><li>For most newly-discovered content, many traditional ranking signals do not exist or are weak (e.g. anchor text) </li></ul></ul></ul>
  5. 5. How do you know a query is really hot? <ul><li>Find, in real-time, queries about emerging events and news stories </li></ul><ul><ul><li>E.g. natural disasters; sports updates; political breaking stories; etc. </li></ul></ul><ul><li>Standard approach not ideal: </li></ul><ul><ul><li>Maintain temporal model for each query </li></ul></ul><ul><ul><ul><li>Full time series or just statistics </li></ul></ul></ul><ul><ul><li>Identify irregularities in model </li></ul></ul><ul><ul><ul><li>Change in moving average of more than n σ’s </li></ul></ul></ul><ul><ul><li>Works well for head queries, not so for torso/tail </li></ul></ul>Screenshots of Google Trends (©2008 Google) taken at 22:23 PST on 3/1/2010 to illustrate temporal model and tail queries.
  6. 6. Our Answer: Yahoo! TimeSense
  7. 7. How is this different and better? <ul><li>It uses language modeling </li></ul><ul><ul><li>Language = “collection of words” </li></ul></ul><ul><ul><ul><li>All words in Webster’s Dictionary = English </li></ul></ul></ul><ul><ul><ul><li>All words in Yahoo! query logs, including misspellings = Query Log Language </li></ul></ul></ul><ul><ul><li>Language model = a way to explain a “vocabulary” to a computer </li></ul></ul>Sentence Times seen Fraction of all sentences ebay 90000 1/1000 apple 80000 1/2000 britney spears 40000 1/5000 ebay apple britney spears 0 0
  8. 8. Why Language Modeling? <ul><li>Language model allows us to ask questions like </li></ul><ul><ul><li>“Is this word part of the language”? </li></ul></ul><ul><ul><ul><li>Answer: if in table, yes </li></ul></ul></ul><ul><ul><li>“Which word is more likely to appear in the language, A or B?” </li></ul></ul><ul><ul><ul><li>Answer: whatever is higher up in the table </li></ul></ul></ul><ul><ul><li>Is a sentence more likely to be found in text from language A or language B? </li></ul></ul><ul><ul><ul><li>Answer: look at the table where the sentence is higher </li></ul></ul></ul><ul><ul><li>We convert the task of classifying buzzing queries to a series of “language model questions” </li></ul></ul><ul><ul><li>And build models to answer these from query logs </li></ul></ul>
  9. 9. <ul><li>Q: Is this query much more prominent right now? </li></ul><ul><li>To answer, we build many small language models: </li></ul><ul><ul><li>One for each X minutes of query logs in the past month </li></ul></ul>Buzzing / Spiking Queries Source: Y! paper “Towards Recency Ranking in Web Search”, WSDM 2010 Current Model for last X minutes Feb. 6 Feb. 7 Feb. 8 Model for 02/07/2010 1:0Xpm
  10. 10. Buzzing / Spiking Queries <ul><li>Q: Is this query much more prominent right now? </li></ul><ul><ul><li>Language Model: Is this more likely to belong to the last X minutes than to </li></ul></ul><ul><ul><ul><li>The previous X minutes </li></ul></ul></ul><ul><ul><ul><li>The same X minutes in the previous day </li></ul></ul></ul><ul><ul><ul><li>The same X minutes in the previous week </li></ul></ul></ul><ul><ul><ul><li>Etc. </li></ul></ul></ul>Source: Y! paper “Towards Recency Ranking in Web Search”, WSDM 2010 Feb. 6 Feb. 7 Feb. 8 Current Model for last X minutes Same X minutes, previous day
  11. 11. There is more secret sauce of course… <ul><li>Perhaps building language models for fresh content like Yahoo! News and Twitter… </li></ul><ul><li>And doing this very fast in production… </li></ul><ul><li>And then ranking content in real time – we have an interesting paper coming out soon: “Improving Recency Ranking Using Twitter Data” </li></ul>
  12. 12. Yahoo! Real Time features to look forward to <ul><li>Ranking + indexing even closer to true real time </li></ul><ul><li>Real time results in search verticals beyond news </li></ul><ul><li>Real time relevance algorithms powering experiences in Yahoo! properties beyond Search </li></ul>
  13. 13. Real Time practices to avoid <ul><li>Do not create content with unrelated buzz terms </li></ul><ul><li>Do not abuse shortening services for spam links </li></ul><ul><li>Do not go overboard with Twitter #hashtags </li></ul><ul><li>We aim to completely remove real time spam! </li></ul>Screenshots taken from on 3/1/2010 © 2010 Twitter
  14. 14. Learn more <ul><li>Yahoo! Search Blog: </li></ul><ul><li> </li></ul><ul><li>@YahooSearch </li></ul><ul><li>Yahoo! Search Sciences: </li></ul><ul><li> </li></ul><ul><li>@YahooLabs </li></ul>