We are losing our tweets!

Loading...

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

0 comments

Post a comment

    Post a comment
    Embed Video
    Edit your comment Cancel

    Notes on slide 1

    Love the circle!

    2 Favorites

    We are losing our tweets! - Presentation Transcript

    1. We are losing our tweets!
      An analysis, a prototype, lessons learned, and proposed third party solution to the problem
      John O’Brien III
      @jobrieniii
      http://www.linkedin.com/in/jobrieniii
    2. Twitter “Primer”
      Social network / micro blogging site
      Send / read 140 character messages
      You can follow anyone, and they can follow you
      Sent messages are delivered to all your followers
      Sent messages are also publically indexed and searchable
      Permissions can be established to restrict delivery, but this is not the norm
    3. Problem
      As the usage of Twitter has exploded, Twitter’s ability to provide long term access to tweets that mention key events (typically #hashtag’ed) has eroded
    4. First, who cares?
      Individuals
      Bloggers
      Conference Attendees / Leaders
      Academia / “Web” Ecologists
      Media Outlets
      Companies
      Government
    5. So lets dive into the problem...
      Followers
      Search
    6. Search UI / API Constraints
      Limited to keywords, #hashtags, or @mentions within 140 char body of tweet
      100 tweets x 15 pages = 1500 per search term
      For a given keyword, exists in search for “around 1.5 weeks but is dynamic and subject to shrink as the number of tweets per day continues to grow.” – Twitter website
    7. Hmmmm….
      No other ‘in the cloud’ sites were found back in June, only client side applications and ‘hacked’ custom scripts
      RSS feeds were considered but initially dismissed because they typically require an end user client
      Decision was to “build our own” and see if we can solve the problem
    8. A little bit about my thoughts on the SDLC process…
      **FOCUS**
      ON
      LEARNING
      “Minimally Viable”
      PROTOTYPE
    9. “Minimally Viable” Micro App
      What if we could get ahead of the problem and store the data before Twitter “loses” it?
      Functional Requirements
      Ability for user to define #hashtags of importance
      Create a background script that leverages the Twitter /search REST API to keep an eye on each hash tag and store data in local database
      **Sweep, grab, and record…**
      Must be running at all times and publically available
      Technical Specs
      Build on LAMP stack, put into the cloud, running 24/7/365
    10. “Minimally Viable” Micro App
      internet
      php script to
      query each
      #hashtag
      Twitter
      /search
      API
      Our Database
    11. TwapperKeeper.com “BETA”was born on Saturday and released to public on Sunday…
    12. And we started to grow and get customer feedback…
    13. And we lived through a key world event…
      http://mashable.com/2009/09/16/white-house-records/
    14. So what did we learn?
      We need to be whitelisted
      People often don’t start the archiving until after they start using #hashtags
      Thus, point forward solution not enough, need to reach back as well
      While hashtags are the norm, some people would just like to track keywords
      Velocity of tweets can be a major issue
      What if a hashtag results are greater than 1500 tweets per minute?
      Hashtags of archive interest typically spike in velocity and die off in traffic.
      However some archives get VERY, VERY big!
    15. And more learning…
      URL shortening services are of long time concern to users and archiving community
      Twitter /search REST API periodically is unresponsive
      Twitter /search REST API sometimes glitches and returns duplicate data
      People want not only output in html, but raw exports for publication, analysis and real time consumption (txt, csv, xml, json, etc)
      Twitter engineers contact us and recommend also incorporating newly releasedreal time streams
      /track, /sample , /firehose
    16. Recommended “out-of-beta” V2.0
      Anticipate #hashtags to archive based upon Twitter trending stats and autocreate archives
      Hybrid approach of using /search and /track (real time stream) APIs to handle velocity issues
      Check for duplicates “before” inserts
      Implement monitoring and “self healing” services
      Shortened URLs should be resolved into fully qualified URLs and stored separately for reference (at time of capture)
      Create TwapperKeeper API by modularizing the archiving engine into a SOA architecture (/create, /info, /get) for internal and external consumption
      Include additional output formats to be provided for download
      “Extracts” of large archives should be automatically generated on a daily basis and made available for download
      VERSION
      2.0
    17. Recommended “out-of-beta” V2.0
      Twitter
      /track
      API
      hybrid php / curl script
      to archive per #hashtag
      Monitor Health and Self Heal
      Twitter
      /search
      API
      auto
      create trends
      Twitter
      /trends
      API
      Our Database
      File extractor
      api
      /create
      /info
      /get
      external sites
      short url lookup
    18. Questions?
    SlideShare Zeitgeist 2009

    + John  O'Brien IIIJohn O'Brien III Nominate

    custom

    879 views, 2 favs, 1 embeds more stats

    Lessons learned from TwapperKeeper prototype.

    More info about this document

    © All Rights Reserved

    Go to text version

    • Total Views 879
      • 876 on SlideShare
      • 3 from embeds
    • Comments 0
    • Favorites 2
    • Downloads 6
    Most viewed embeds
    • 3 views on http://www.slideshare.net

    more

    All embeds
    • 3 views on http://www.slideshare.net

    less

    Flagged as inappropriate Flag as inappropriate
    Flag as inappropriate

    Select your reason for flagging this presentation as inappropriate. If needed, use the feedback form to let us know more details.

    Cancel
    File a copyright complaint
    Having problems? Go to our helpdesk?

    Categories