Hacking RSS: Filtering & Processing Obscene Amounts of Information (short version)


Published on

The 15 minute version of the longer talk that I delivered at SXSW in March. More details: http://fastwonderblog.com/yahoo-pipes-and-rss-hacks/

Published in: Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Hacking RSS: Filtering & Processing Obscene Amounts of Information (short version)

  1. 1. Hacking RSS: Filtering & Processing Obscene Amounts of Information #hackingRSS Dawn FosterIntel Community Manager for MeeGo dawn@fastwonder.com
  2. 2. Information Overload CD Photo: http://www.flickr.com/photos/chefranden/2751354004/
  3. 3. Who Cares?● Most of it is … – complete crap – out of date / obsolete – not interesting to you – irrelevant for you Junk Pile: http://www.flickr.com/photos/zen/4013525/
  4. 4. You Want to Find the Needle Haystacks: http://www.flickr.com/photos/rasekh/4911673659/
  5. 5. RSS Alone is a Start● Sources you care about delivered right to you. But … – Do you care about everything in each feed? – What about the feeds you arent subscribed to? – Can you keep up with what you have?
  6. 6. Prioritize Your Reader● Put things you care about at the top● Categorize● Dont try to read everything
  7. 7. The Real Magic is in Filtering RSS Complete Crap Interesting Maybe Relevant Yay!● In my Google Reader right now: – Analyst research blogs mentioning Online Community – Analyst research blogs mentioning MeeGo – Searches across social sites mentioning me, my projects, my websites etc. - filtering out things I dont care about – My favorite blogs filtered using PostRank to find only the ones with a lot of comments or social mentions
  8. 8. RSS Filtering Tools● Yahoo Pipes (my favorite) – More powerful & fexible: options to filter any data found in any field in the rss feed (URL, title, description, author …) – Downside: takes some time to learn & can be a little faky at times. Also a single point of failure if Yahoo ever killed it.● Other Options – FeedRinse: easy to use, not as fexible. Import RSS feeds, add filters, get new RSS feeds out. – RSS readers with filtering / alerts (FeedDemon) – Code: write your own filters – Note: many free RSS filtering services have gone out of business – can be bandwidth intensive & costly to host.
  9. 9. Yahoo Pipes Filtering Example● Input: – WebWorkerDaily – ReadWriteWeb● Filter by content: – Collaborate – Collaboration – Collaborative● Output: – 1 RSS Feed – Matching 3 keywords 2 Minute Yahoo Pipe Video How-tos: http://fastwonderblog.com/yahoo-pipes-and-rss-hacks/
  10. 10. PostRank● Best Posts in a feed● Ranked on engagement (links, sharing, comments)● Can get output as RSS feed● Feed includes postrank number as a field
  11. 11. Whats In a Feed? PostRank (Yahoo Pipes View)● Content in feeds varies wildly depending on site.● Common: title, author, pubDate, link, content, description● Site-specific: postrank, lat/long, image links, username, twitter source … (most RSS readers dont show these)● API: usually has additional data & can output RSS● If its in the feed, you can use it!
  12. 12. Reformatting / Modifying RSS Feeds Dont be satisfied with default RSS feed formats! Twitter Search Twitter RSS Feed Modify & more quickly scan key data
  13. 13. Yahoo Pipes: Reformat Twitter Feed● Input: – Twitter Search feed● Loop String Build: – Author – : (spacing) – Title● Loop Assign: – Store result back into title● Output: – 1 RSS feed – Efficient format
  14. 14. BackTweets (BackType API)● Data about links on Twitter● Finds links regardless of shortening service● No RSS Feeds● But … You can use API + Pipes to build one!
  15. 15. BackType + Twitter API + Pipes Output● Data from BackType + Twitter● Built an RSS feed using Yahoo Pipes● Included the information relevant for me● Could have included or filtered on: name, listed count, location, profile image, user URL, ...
  16. 16. Admit it, we ALL do vanity searches ● You can enter your search queries in Google, Twitter, Flickr … – Add a new project & have to update all of them – Can be hard to filter out some results – May have duplicates from multiple searches ● Yahoo Pipes – Update keywords in a CSV file – Use CSV file as input into a bunch of searches (RSS or API inputs) – Filter out what you dont want – Get 1 filtered RSS feed as output2 minute video: http://fastwonderblog.com/2009/05/01/keyword-csv-files-and-searching-2-minute-yahoo-pipes-demo/
  17. 17. How Should / Shouldnt You Use All of This?● Do: – Use this for personal productivity – Play around, create prototypes and understand the possibilities● Dont: – Dont violate licenses on content or republish w/o permission – Dont use in critical or production environments● For production use or putting data on websites: – Re-write in a real programming language with cached results and error checking XKCD Comic: http://xkcd.com/327/
  18. 18. Learn MoreAbout Dawn:● Intel Community Manager for MeeGo● Author of Companies and Communities● More Info: http://fastwonderblog.com● Dawn@FastWonder.com● @geekygirldawn on Twitter 18Additional Reading & audio from 1 hour version of this talk:● http://fastwonderblog.com/yahoo-pipes-and-rss-hacks/ Photo of Dawn: http://www.flickr.com/photos/ahockley/3036575066/
  19. 19. Backup
  20. 20. Outsource / Crowdsource New Sources
  21. 21. Yahoo Pipes: Reformat PostRank Feed● Input: – 3 PostRank feeds● Loop String Build: – PostRank – : (spacing) – Title● Loop Assign: – Store result back into title● Output: – 1 RSS feed – Efficient format
  22. 22. Yahoo Pipes PostRank Example● Input PostRank Feeds: – Engadget – CrunchGear – Boy Genius● Filter by content – Tablet● Sort: – PostRank● Output – 1 RSS feed – Best tablet posts
  23. 23. Using Web APIs 101● Many API calls are basically URLs● Constructing URLs – Use API documentation/examples to format the URL – http://api.twitter.com/1/statuses/show /ID.xml ● Version 1 of API show status for ID in .format● API keys – Tells API who you are (password)● Rate limiting – Only get so much & youre cut of – Limited by IP or API key – Chill out for a while & come back XKCD Comic: http://xkcd.com/844/
  24. 24. Backtweets API + Twitter API + Yahoo Pipes● What we want to do: – Start with a set of URLs (blog posts in a feed) – Find any tweet mentioning those URLs – Return the tweet and data about the person who posted it● Mission: Build feed using only data from these 2 APIs● BackType API provides Tweet ID (not humanly useful) – http://api.backtype.com/tweets/search/links.xml? q=URL&mode=batch&key=KEY – List of Twitter Status IDs for Tweets linking to URL – Note: I think this feature may be deprecated● Twitter API uses Tweet ID to get everything else – http://api.twitter.com/1/statuses/show/ID.xml – Returns a single status all relevant data for ID
  25. 25. BackTweets API: Get Tweet ID● Take WebWorkerDaily Author Feed● Use WWD URLs to build URLs for BackType API call● Fetch data from BackType URLs to get Tweet ID
  26. 26. Twitter API: Get Data Based on Tweet ID● Use BackType tweet ID to build URL for Twitter API● Fetch data about Tweet & User from Twitter API● Re-Build title to show “user (followers): tweet”
  27. 27. Add Filters to BackType + Twitter Example● Show only tweets from people with 1000+ followers