Yahoo Real Time Search SMX March 2010

Uploaded on


More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads


Total Views
On Slideshare
From Embeds
Number of Embeds



Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

    No notes for slide


  • 1. Yahoo! Search Real Time Search Exploring the frontiers in modern information retrieval
    • March 2010
    Ivan Davtchev @ivan_d
  • 2. What is real time search?
    • Showing the most relevant up-to-date content for a topic of recently increased interest.
    • Freshest content is great, but not always best.
  • 3. Recent Y! real time search launches Tweets on SRP Improved Yahoo! News + Twitter
  • 4. Challenges of real time search
      • Real-time indexing : get new content as it is published
        • Crawl really, really fast
        • Index news feeds, RSS, Twitter
      • Query analysis : discover queries to handle differently
        • For most queries, promoting recent content degrades relevance
      • Ranking for fresh content : adjust ranking algorithms
        • For most newly-discovered content, many traditional ranking signals do not exist or are weak (e.g. anchor text)
  • 5. How do you know a query is really hot?
    • Find, in real-time, queries about emerging events and news stories
      • E.g. natural disasters; sports updates; political breaking stories; etc.
    • Standard approach not ideal:
      • Maintain temporal model for each query
        • Full time series or just statistics
      • Identify irregularities in model
        • Change in moving average of more than n σ’s
      • Works well for head queries, not so for torso/tail
    Screenshots of Google Trends (©2008 Google) taken at 22:23 PST on 3/1/2010 to illustrate temporal model and tail queries.
  • 6. Our Answer: Yahoo! TimeSense
  • 7. How is this different and better?
    • It uses language modeling
      • Language = “collection of words”
        • All words in Webster’s Dictionary = English
        • All words in Yahoo! query logs, including misspellings = Query Log Language
      • Language model = a way to explain a “vocabulary” to a computer
    Sentence Times seen Fraction of all sentences ebay 90000 1/1000 apple 80000 1/2000 britney spears 40000 1/5000 ebay apple britney spears 0 0
  • 8. Why Language Modeling?
    • Language model allows us to ask questions like
      • “Is this word part of the language”?
        • Answer: if in table, yes
      • “Which word is more likely to appear in the language, A or B?”
        • Answer: whatever is higher up in the table
      • Is a sentence more likely to be found in text from language A or language B?
        • Answer: look at the table where the sentence is higher
      • We convert the task of classifying buzzing queries to a series of “language model questions”
      • And build models to answer these from query logs
  • 9.
    • Q: Is this query much more prominent right now?
    • To answer, we build many small language models:
      • One for each X minutes of query logs in the past month
    Buzzing / Spiking Queries Source: Y! paper “Towards Recency Ranking in Web Search”, WSDM 2010 Current Model for last X minutes Feb. 6 Feb. 7 Feb. 8 Model for 02/07/2010 1:0Xpm
  • 10. Buzzing / Spiking Queries
    • Q: Is this query much more prominent right now?
      • Language Model: Is this more likely to belong to the last X minutes than to
        • The previous X minutes
        • The same X minutes in the previous day
        • The same X minutes in the previous week
        • Etc.
    Source: Y! paper “Towards Recency Ranking in Web Search”, WSDM 2010 Feb. 6 Feb. 7 Feb. 8 Current Model for last X minutes Same X minutes, previous day
  • 11. There is more secret sauce of course…
    • Perhaps building language models for fresh content like Yahoo! News and Twitter…
    • And doing this very fast in production…
    • And then ranking content in real time – we have an interesting paper coming out soon: “Improving Recency Ranking Using Twitter Data”
  • 12. Yahoo! Real Time features to look forward to
    • Ranking + indexing even closer to true real time
    • Real time results in search verticals beyond news
    • Real time relevance algorithms powering experiences in Yahoo! properties beyond Search
  • 13. Real Time practices to avoid
    • Do not create content with unrelated buzz terms
    • Do not abuse shortening services for spam links
    • Do not go overboard with Twitter #hashtags
    • We aim to completely remove real time spam!
    Screenshots taken from on 3/1/2010 © 2010 Twitter
  • 14. Learn more
    • Yahoo! Search Blog:
    • @YahooSearch
    • Yahoo! Search Sciences:
    • @YahooLabs