Yahoo Real Time Search SMX March 2010

  • 2,318 views
Uploaded on

 

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
2,318
On Slideshare
0
From Embeds
0
Number of Embeds
1

Actions

Shares
Downloads
0
Comments
0
Likes
5

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Yahoo! Search Real Time Search Exploring the frontiers in modern information retrieval
    • March 2010
    Ivan Davtchev @ivan_d
  • 2. What is real time search?
    • Showing the most relevant up-to-date content for a topic of recently increased interest.
    • Freshest content is great, but not always best.
  • 3. Recent Y! real time search launches Tweets on SRP Improved Yahoo! News + Twitter
  • 4. Challenges of real time search
      • Real-time indexing : get new content as it is published
        • Crawl really, really fast
        • Index news feeds, RSS, Twitter
      • Query analysis : discover queries to handle differently
        • For most queries, promoting recent content degrades relevance
      • Ranking for fresh content : adjust ranking algorithms
        • For most newly-discovered content, many traditional ranking signals do not exist or are weak (e.g. anchor text)
  • 5. How do you know a query is really hot?
    • Find, in real-time, queries about emerging events and news stories
      • E.g. natural disasters; sports updates; political breaking stories; etc.
    • Standard approach not ideal:
      • Maintain temporal model for each query
        • Full time series or just statistics
      • Identify irregularities in model
        • Change in moving average of more than n σ’s
      • Works well for head queries, not so for torso/tail
    Screenshots of Google Trends (©2008 Google) taken at 22:23 PST on 3/1/2010 to illustrate temporal model and tail queries.
  • 6. Our Answer: Yahoo! TimeSense
  • 7. How is this different and better?
    • It uses language modeling
      • Language = “collection of words”
        • All words in Webster’s Dictionary = English
        • All words in Yahoo! query logs, including misspellings = Query Log Language
      • Language model = a way to explain a “vocabulary” to a computer
    Sentence Times seen Fraction of all sentences ebay 90000 1/1000 apple 80000 1/2000 britney spears 40000 1/5000 ebay apple britney spears 0 0
  • 8. Why Language Modeling?
    • Language model allows us to ask questions like
      • “Is this word part of the language”?
        • Answer: if in table, yes
      • “Which word is more likely to appear in the language, A or B?”
        • Answer: whatever is higher up in the table
      • Is a sentence more likely to be found in text from language A or language B?
        • Answer: look at the table where the sentence is higher
      • We convert the task of classifying buzzing queries to a series of “language model questions”
      • And build models to answer these from query logs
  • 9.
    • Q: Is this query much more prominent right now?
    • To answer, we build many small language models:
      • One for each X minutes of query logs in the past month
    Buzzing / Spiking Queries Source: Y! paper “Towards Recency Ranking in Web Search”, WSDM 2010 Current Model for last X minutes Feb. 6 Feb. 7 Feb. 8 Model for 02/07/2010 1:0Xpm
  • 10. Buzzing / Spiking Queries
    • Q: Is this query much more prominent right now?
      • Language Model: Is this more likely to belong to the last X minutes than to
        • The previous X minutes
        • The same X minutes in the previous day
        • The same X minutes in the previous week
        • Etc.
    Source: Y! paper “Towards Recency Ranking in Web Search”, WSDM 2010 Feb. 6 Feb. 7 Feb. 8 Current Model for last X minutes Same X minutes, previous day
  • 11. There is more secret sauce of course…
    • Perhaps building language models for fresh content like Yahoo! News and Twitter…
    • And doing this very fast in production…
    • And then ranking content in real time – we have an interesting paper coming out soon: “Improving Recency Ranking Using Twitter Data”
  • 12. Yahoo! Real Time features to look forward to
    • Ranking + indexing even closer to true real time
    • Real time results in search verticals beyond news
    • Real time relevance algorithms powering experiences in Yahoo! properties beyond Search
  • 13. Real Time practices to avoid
    • Do not create content with unrelated buzz terms
    • Do not abuse shortening services for spam links
    • Do not go overboard with Twitter #hashtags
    • We aim to completely remove real time spam!
    Screenshots taken from Twitter.com on 3/1/2010 © 2010 Twitter
  • 14. Learn more
    • Yahoo! Search Blog:
    • http://www.ysearchblog.com/
    • @YahooSearch
    • Yahoo! Search Sciences:
    • http://labs.yahoo.com/Search_Sciences
    • @YahooLabs