Effective Use of the Twitter Search API


Published on

Chirp 2010 presentation on recent API changes, particularly ranking the top results beyond recency.

Published in: Technology
1 Comment
  • what's the meaning?why can't I download?
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

  • i will talk about:
    - start by giving some of our thinking about why we have a search api and what differentiates it from the other api’s twitter offers
    - i’ll get into some technical implications of these differences with respect to polling on search versus tracking keywords on the streaming api
    - next, i’ll talk briefly about how the search api has changed over time
    - and then we’ll dig into the most recent change where we began ranking the top results beyond recency order. i’ll show you how i’ve modified one of our own search api clients to take advantage of that change
  • simple definition: user provides a query by engaging with an api application, we provide the best stuff (currently tweets and trends) for that query

    Obviously the “best” stuff for twitter has a lot to do with how recent it is, so our primary focus is on the “here and now”
  • Just to give you an idea of the parameters search operates under:
    - as ev told you yesterday we are doing more than 600M queries per day, seen up to 750M on a day recently
    - while realtime is our main focus, our index does contain hundreds of millions of tweets and we’ve roughly doubled its size in the last six months.
    - of course, the amount of tweets has grown even faster than we’ve increased that index size, so this only covers about a week of them right now, but that is something we’re currently working on expanding
  • So obviously we’re operating a large scale, but what’s really interesting to me about the search API is the variety of applications you as developers have found for it. I’ve listed just a few here to illustrate what people are currently doing with the API.
  • So that’s what people are doing with the search api, but the streaming api also supports tracking keywords and some location and language filtering. So, if you’re developing a new app, how do you decide which to use?
  • The biggest difference between the search API and the track API is how you get new results matching your standing query. On the streaming API the push model makes this obvious: new results are sent to you as they come in. Since the focus of the search API is on apps that let the user manipulate the query (whether explicitly or implicitly), registering a standing query for every request makes less sense. Instead, the search API uses a polling model with a cursor.


    make sure you explain this diagram by pointing at it (or at least describing it). It took me a minute to get the visual presentation

  • One question that comes up frequently is why we encourage apps to use this cursor to poll and how that helps us to support refreshes more efficiently, so here’s a diagram of what happens under the covers. A lot like the streaming API, when you make any query to search we actually do register that as a standing query, but only in one of our caching layers we call the timeline cache.

  • Next I’d like to take a step back and talk briefly about the history of the search API and how our thinking about it has developed.

    twitter search and the API have been around for about two years now, and we made a lot of changes early on like supporting location search, but after that we had to shift our focus to scaling the system to support the growth in tweets and queries. It’s really just in the last six months that we’ve made enough progress with scaling and grown the search team enough to be able to focus more on relevance and figuring out what that means for twitter search.
  • Our mission:

    Under “many factors” you should note that it’s not always the popular users that show up here -- that seems to be an early misconception. Our algorithm looks to find things that are interesting from any user - things that “resonate,” to use a word that Dick talked about yesterday (good to tie it in to other things being said at Chirp).

    Rather than “not final” (which seems to imply there is a “final” step when we won’t be improving this) I’d say something like “First step of a long road of relevance improvements” (implying that we’ve got lots of ideas and we’ll be delivering cool stuff for a long way.
  • right now at the top

  • explain that this uses since_id

  • we want to hear from you
  • Effective Use of the Twitter Search API

    1. Effective Use of the Twitter Search API Eric Jensen Twitter Search Submit your questions via http://bit.ly/chirpsearch or hashtag #chirpsearch
    2. Agenda • Mission of the Twitter Search API • History • Most recently: ranking the top results • What’s next
    3. Search API Mission Connect users with what's most important and interesting to them in the here and now (return the best stuff for a query)
    4. Search Stats • Over 600 million queries per day • Typically less than 200 milliseconds per query • Typically less than 20 seconds indexing latency • Index of hundreds of millions of tweets
    5. Search API Use Cases • Search interfaces: collecta, oneriot, crowdeye, ... • Dashboard clients: tweetdeck, seesmic, ... • Widgets: twitter, tweetgrid, monitter, ... • Location search: trendsmap, foursquare, ... • Visualizations: radian6, crimsonhexagon, twistori, ... • Analytics: stocktwits, trendrr, tweetstats, ... • Recommenders: mrtweet, ... • Thousands not listed here + not invented yet
    6. Search vs. Streaming • Do use the search API for your app when: • The user can input a query • You need immediate results, not tracking • Don’t use the search API for your app when: • Your user experience requires comprehensive results (all the tweets, not just the best ones) • You only need tweets from/to/at particular users
    7. Refreshing Results Client API search.json?q=twitter "refresh_url":"?since_id=9290798834&q=twitter" seconds ~20 search.json?since_id=9290798834&q=twitter "refresh_url":"?since_id=9290800152&q=twitter"
    8. Why is this OK? search.json?q=twitter search.json?since_id=9290798834 &q=twitter Timeline Cache Timeline Cache q=twitter 1 2 3 4 Search Tweets Index
    9. Search API History Quality Filtering on Trends Nov 5, 2009 Summize Launches Twitter Search Top Results Include Popular Apr 4, 2008 Apr 1, 2010 Summize Acquired by Twitter Search on Twitter.com Local Trends Chirp! Jul 14, 2008 Apr 1, 2009 Jan 6, 2010 Apr 15, 2010 Twitter Search API Sep 1, 2008 Jan 1, 2009 May 1, 2009 Sep 1, 2009 Jan 1, 2010
    10. Ranking Top Results • Best stuff for a query • Many factors • First step • Available from API
    11. Top Results API • New parameter: result_type • mixed: Eventually this will become the default value. Include both popular and real time results in the response. • recent: The current default value. Return only the most recent results in the response. • popular: Return only the most popular results in the response.
    12. Top Results Metadata {"results":[      {"text":"@twitterapi  http:// tinyurl.com/ctrefg",      "from_user":"jkoum",      "metadata":      {       "result_type":"popular",       "recent_retweets": 100      },      "id":1478555574,   
    13. Top Results API Example • Initial load includes top results • Metadata annotates them • Refreshes recent results on top
    14. Include Top Results url = ‘http://search.twitter.com/search.' + format + '?q=' + query + '&result_type=mixed'
    15. Annotate w/ Metadata if (tweet.metadata.result_type == 'popular') { return '<div class="twtr-popular">' + tweet.metadata.recent_retweets + ' recent retweets</div>'; }
    16. Refresh Recent Results refresh_url = response.refresh_url ... url = ‘http://search.twitter.com/search.' + format + refresh_url
    17. The Near Future • Remove duplicates (retweets) • Deeper index • Hit highlighting in the API • More consistency (with the REST API) • Better rate limiting
    18. The Future (cont) • More relevance • More metadata • More stuff • More operators • places, @anywhere, annotations
    19. Open Source in Search • http://twitter.com/about/opensource • mysql, hadoop, kestrel, twitter-text, etc. • lucene • commons-pipeline • varnish • jmeter • nutch language identifier • mecab
    20. We’re Hiring • http://twitter.com/jobs • Data Analyst - Search • Product Manager - Search • Software Engineer - Search • Software Engineer - Search Front-End • Software Engineer - Search Relevance
    21. Questions? http://bit.ly/chirpsearch or hashtag #chirpsearch Also join us at the Real-Time Search Birds of a Feather @ 1:30 in The Coop