Hacking RSS: Filtering & Processing Obscene Amounts of Information (short version)
Hacking RSS: Filtering & Processing Obscene Amounts of Information #hackingRSS Dawn FosterIntel Community Manager for MeeGo firstname.lastname@example.org
Information Overload CD Photo: http://www.flickr.com/photos/chefranden/2751354004/
Who Cares?● Most of it is … – complete crap – out of date / obsolete – not interesting to you – irrelevant for you Junk Pile: http://www.flickr.com/photos/zen/4013525/
You Want to Find the Needle Haystacks: http://www.flickr.com/photos/rasekh/4911673659/
RSS Alone is a Start● Sources you care about delivered right to you. But … – Do you care about everything in each feed? – What about the feeds you arent subscribed to? – Can you keep up with what you have?
Prioritize Your Reader● Put things you care about at the top● Categorize● Dont try to read everything
The Real Magic is in Filtering RSS Complete Crap Interesting Maybe Relevant Yay!● In my Google Reader right now: – Analyst research blogs mentioning Online Community – Analyst research blogs mentioning MeeGo – Searches across social sites mentioning me, my projects, my websites etc. - filtering out things I dont care about – My favorite blogs filtered using PostRank to find only the ones with a lot of comments or social mentions
RSS Filtering Tools● Yahoo Pipes (my favorite) – More powerful & fexible: options to filter any data found in any field in the rss feed (URL, title, description, author …) – Downside: takes some time to learn & can be a little faky at times. Also a single point of failure if Yahoo ever killed it.● Other Options – FeedRinse: easy to use, not as fexible. Import RSS feeds, add filters, get new RSS feeds out. – RSS readers with filtering / alerts (FeedDemon) – Code: write your own filters – Note: many free RSS filtering services have gone out of business – can be bandwidth intensive & costly to host.
PostRank● Best Posts in a feed● Ranked on engagement (links, sharing, comments)● Can get output as RSS feed● Feed includes postrank number as a field
Whats In a Feed? PostRank (Yahoo Pipes View)● Content in feeds varies wildly depending on site.● Common: title, author, pubDate, link, content, description● Site-specific: postrank, lat/long, image links, username, twitter source … (most RSS readers dont show these)● API: usually has additional data & can output RSS● If its in the feed, you can use it!
Reformatting / Modifying RSS Feeds Dont be satisfied with default RSS feed formats! Twitter Search Twitter RSS Feed Modify & more quickly scan key data
Yahoo Pipes: Reformat Twitter Feed● Input: – Twitter Search feed● Loop String Build: – Author – : (spacing) – Title● Loop Assign: – Store result back into title● Output: – 1 RSS feed – Efficient format
BackTweets (BackType API)● Data about links on Twitter● Finds links regardless of shortening service● No RSS Feeds● But … You can use API + Pipes to build one!
BackType + Twitter API + Pipes Output● Data from BackType + Twitter● Built an RSS feed using Yahoo Pipes● Included the information relevant for me● Could have included or filtered on: name, listed count, location, profile image, user URL, ...
Admit it, we ALL do vanity searches ● You can enter your search queries in Google, Twitter, Flickr … – Add a new project & have to update all of them – Can be hard to filter out some results – May have duplicates from multiple searches ● Yahoo Pipes – Update keywords in a CSV file – Use CSV file as input into a bunch of searches (RSS or API inputs) – Filter out what you dont want – Get 1 filtered RSS feed as output2 minute video: http://fastwonderblog.com/2009/05/01/keyword-csv-files-and-searching-2-minute-yahoo-pipes-demo/
How Should / Shouldnt You Use All of This?● Do: – Use this for personal productivity – Play around, create prototypes and understand the possibilities● Dont: – Dont violate licenses on content or republish w/o permission – Dont use in critical or production environments● For production use or putting data on websites: – Re-write in a real programming language with cached results and error checking XKCD Comic: http://xkcd.com/327/
Learn MoreAbout Dawn:● Intel Community Manager for MeeGo● Author of Companies and Communities● More Info: http://fastwonderblog.com● Dawn@FastWonder.com● @geekygirldawn on Twitter 18Additional Reading & audio from 1 hour version of this talk:● http://fastwonderblog.com/yahoo-pipes-and-rss-hacks/ Photo of Dawn: http://www.flickr.com/photos/ahockley/3036575066/
Using Web APIs 101● Many API calls are basically URLs● Constructing URLs – Use API documentation/examples to format the URL – http://api.twitter.com/1/statuses/show /ID.xml ● Version 1 of API show status for ID in .format● API keys – Tells API who you are (password)● Rate limiting – Only get so much & youre cut of – Limited by IP or API key – Chill out for a while & come back XKCD Comic: http://xkcd.com/844/
Backtweets API + Twitter API + Yahoo Pipes● What we want to do: – Start with a set of URLs (blog posts in a feed) – Find any tweet mentioning those URLs – Return the tweet and data about the person who posted it● Mission: Build feed using only data from these 2 APIs● BackType API provides Tweet ID (not humanly useful) – http://api.backtype.com/tweets/search/links.xml? q=URL&mode=batch&key=KEY – List of Twitter Status IDs for Tweets linking to URL – Note: I think this feature may be deprecated● Twitter API uses Tweet ID to get everything else – http://api.twitter.com/1/statuses/show/ID.xml – Returns a single status all relevant data for ID
BackTweets API: Get Tweet ID● Take WebWorkerDaily Author Feed● Use WWD URLs to build URLs for BackType API call● Fetch data from BackType URLs to get Tweet ID
Twitter API: Get Data Based on Tweet ID● Use BackType tweet ID to build URL for Twitter API● Fetch data about Tweet & User from Twitter API● Re-Build title to show “user (followers): tweet”
Add Filters to BackType + Twitter Example● Show only tweets from people with 1000+ followers