SlideShare a Scribd company logo
1 of 27
Download to read offline
Hacking RSS:
        Filtering & Processing
    Obscene Amounts of Information
              #hackingRSS

       Dawn Foster
Intel Community Manager
        for MeeGo
 dawn@fastwonder.com
Information Overload




                       CD Photo: http://www.flickr.com/photos/chefranden/2751354004/
Who Cares?


●   Most of it is …
    –   complete crap
    –   out of date / obsolete
    –   not interesting to you
    –   irrelevant for you




                                 Junk Pile: http://www.flickr.com/photos/zen/4013525/
You Want to Find the Needle




                      Haystacks: http://www.flickr.com/photos/rasekh/4911673659/
RSS Alone is a Start
●   Sources you care about delivered right to you. But …
    –   Do you care about everything in each feed?
    –   What about the feeds you aren't subscribed to?
    –   Can you keep up with what you have?
Prioritize Your Reader



●   Put things you care about at the top
●   Categorize
●   Don't try to read everything
The Real Magic is in Filtering RSS
                       Complete Crap
                         Interesting
                        Maybe Relevant
                               Yay!
●   In my Google Reader right now:
    –   Analyst research blogs mentioning Online Community
    –   Analyst research blogs mentioning MeeGo
    –   Searches across social sites mentioning me, my projects, my
        websites etc. - filtering out things I don't care about
    –   My favorite blogs filtered using PostRank to find only the
        ones with a lot of comments or social mentions
RSS Filtering Tools
●   Yahoo Pipes (my favorite)
    –   More powerful & fexible: options to filter any data found in
        any field in the rss feed (URL, title, description, author …)
    –   Downside: takes some time to learn & can be a little faky at
        times. Also a single point of failure if Yahoo ever killed it.



●   Other Options
    –   FeedRinse: easy to use, not as fexible. Import RSS feeds,
        add filters, get new RSS feeds out.
    –   RSS readers with filtering / alerts (FeedDemon)
    –   Code: write your own filters
    –   Note: many free RSS filtering services have gone out of
        business – can be bandwidth intensive & costly to host.
Yahoo Pipes Filtering Example
●   Input:
    –   WebWorkerDaily
    –   ReadWriteWeb
●   Filter by content:
    –   Collaborate
    –   Collaboration
    –   Collaborative
●   Output:
    –   1 RSS Feed
    –   Matching 3 keywords




          2 Minute Yahoo Pipe Video How-to's: http://fastwonderblog.com/yahoo-pipes-and-rss-hacks/
PostRank
●   Best Posts in a
    feed
●   Ranked on
    engagement (links,
    sharing, comments)
●   Can get output as
    RSS feed
●   Feed includes
    postrank number as
    a field
What's In a Feed? PostRank (Yahoo Pipes View)




●   Content in feeds varies wildly depending on site.
●   Common: title, author, pubDate, link, content, description
●   Site-specific: postrank, lat/long, image links, username,
    twitter source … (most RSS readers don't show these)
●   API: usually has additional data & can output RSS
●   If it's in the feed, you can use it!
Reformatting / Modifying RSS Feeds
   Don't be satisfied with default RSS feed formats!

 Twitter
 Search




 Twitter
 RSS
 Feed

           Modify & more quickly scan key data
Yahoo Pipes: Reformat Twitter Feed
●   Input:
    –   Twitter Search
        feed
●   Loop String Build:
    –   Author
    –   : (spacing)
    –   Title
●   Loop Assign:
    –   Store result back
        into title
●   Output:
    –   1 RSS feed
    –   Efficient format
BackTweets (BackType API)
●   Data about links on
    Twitter
●   Finds links regardless of
    shortening service
●   No RSS Feeds
●   But … You can use
    API + Pipes to build
    one!
BackType + Twitter API + Pipes Output
●   Data from BackType + Twitter
●   Built an RSS feed using Yahoo Pipes
●   Included the information relevant for me
●   Could have included or filtered on: name, listed count,
    location, profile image, user URL, ...
Admit it, we ALL do vanity searches
 ●   You can enter your search queries in Google, Twitter,
     Flickr …
       –   Add a new project & have to update all of them
       –   Can be hard to filter out some results
       –   May have duplicates from multiple searches
 ●   Yahoo Pipes
       –   Update keywords in a CSV file
       –   Use CSV file as input into a bunch of searches (RSS or
           API inputs)
       –   Filter out what you don't want
       –   Get 1 filtered RSS feed as output



2 minute video: http://fastwonderblog.com/2009/05/01/keyword-csv-files-and-searching-2-minute-yahoo-pipes-demo/
How Should / Shouldn't You Use All of This?
●   Do:
    –   Use this for personal productivity
    –   Play around, create prototypes and understand the possibilities
●   Don't:
    –   Don't violate licenses on content or republish w/o permission
    –   Don't use in critical or production environments




●   For production use or putting data on websites:
    –   Re-write in a real programming language with cached results
        and error checking
                       XKCD Comic: http://xkcd.com/327/
Learn More
About Dawn:
● Intel Community Manager for MeeGo

● Author of Companies and Communities

● More Info: http://fastwonderblog.com

● Dawn@FastWonder.com

● @geekygirldawn on Twitter




                                                                         18


Additional Reading & audio from 1 hour version of this talk:
● http://fastwonderblog.com/yahoo-pipes-and-rss-hacks/


                            Photo of Dawn: http://www.flickr.com/photos/ahockley/3036575066/
Backup
Outsource / Crowdsource New Sources
Yahoo Pipes: Reformat PostRank Feed
●   Input:
    –   3 PostRank feeds
●   Loop String Build:
    –   PostRank
    –   : (spacing)
    –   Title
●   Loop Assign:
    –   Store result back
        into title
●   Output:
    –   1 RSS feed
    –   Efficient format
Yahoo Pipes PostRank Example
●   Input PostRank
    Feeds:
    –   Engadget
    –   CrunchGear
    –   Boy Genius
●   Filter by content
    –   Tablet
●   Sort:
    –   PostRank
●   Output
    –   1 RSS feed
    –   Best tablet posts
Using Web APIs 101
●   Many API calls are basically URLs
●   Constructing URLs
    –   Use API documentation/examples to
        format the URL
    –   http://api.twitter.com/1/statuses/show
        /ID.xml
         ●   Version 1 of API show status for ID
             in .format
●   API keys
    –   Tells API who you are (password)
●   Rate limiting
    –   Only get so much & you're cut of
    –   Limited by IP or API key
    –   Chill out for a while & come back
                                                   XKCD Comic: http://xkcd.com/844/
Backtweets API + Twitter API + Yahoo Pipes
●   What we want to do:
    –   Start with a set of URLs (blog posts in a feed)
    –   Find any tweet mentioning those URLs
    –   Return the tweet and data about the person who posted it
●   Mission: Build feed using only data from these 2 APIs
●   BackType API provides Tweet ID (not humanly useful)
    –   http://api.backtype.com/tweets/search/links.xml?
        q=URL&mode=batch&key=KEY
    –   List of Twitter Status IDs for Tweets linking to URL
    –   Note: I think this feature may be deprecated
●   Twitter API uses Tweet ID to get everything else
    –   http://api.twitter.com/1/statuses/show/ID.xml
    –   Returns a single status all relevant data for ID
BackTweets API: Get Tweet ID




●   Take WebWorkerDaily Author Feed
●   Use WWD URLs to build URLs for BackType API call
●   Fetch data from BackType URLs to get Tweet ID
Twitter API: Get Data Based on Tweet ID




●   Use BackType tweet ID to build URL for Twitter API
●   Fetch data about Tweet & User from Twitter API
●   Re-Build title to show “user (followers): tweet”
Add Filters to BackType + Twitter Example
●   Show only tweets from people with 1000+ followers

More Related Content

Similar to Hacking RSS: Filtering & Processing Obscene Amounts of Information (short version)

Optimizing Content Visibility (St. Louis WordCamp)
Optimizing Content Visibility (St. Louis WordCamp)Optimizing Content Visibility (St. Louis WordCamp)
Optimizing Content Visibility (St. Louis WordCamp)Teresa Lane
 
Webinar Structured Data
Webinar Structured DataWebinar Structured Data
Webinar Structured DataBotify
 
SMX Advanced 2015 Seattle | SEO Recap
SMX Advanced 2015 Seattle | SEO RecapSMX Advanced 2015 Seattle | SEO Recap
SMX Advanced 2015 Seattle | SEO RecapRenee Girard
 
How to annotate_with_wordpress
How to annotate_with_wordpressHow to annotate_with_wordpress
How to annotate_with_wordpressSTIinnsbruck
 
SEO for Developers - Little Rock Tech Fest 2014
SEO for Developers - Little Rock Tech Fest 2014SEO for Developers - Little Rock Tech Fest 2014
SEO for Developers - Little Rock Tech Fest 2014Bill Hartzer
 
Tracking online conversations with Yahoo Pipes
Tracking online conversations with Yahoo PipesTracking online conversations with Yahoo Pipes
Tracking online conversations with Yahoo PipesCorinne Weisgerber
 
StripeCon EU 2021 - Can you make it more like google?
StripeCon EU 2021 - Can you make it more like google?StripeCon EU 2021 - Can you make it more like google?
StripeCon EU 2021 - Can you make it more like google?Andrew Paxley
 
Social Media Data Collection & Analysis
Social Media Data Collection & AnalysisSocial Media Data Collection & Analysis
Social Media Data Collection & AnalysisScott Sanders
 
The Zeitgeist Movement
The Zeitgeist MovementThe Zeitgeist Movement
The Zeitgeist Movementguest915c8c5
 
India Pr Wire May 11, 2009 Sensex Down 193 Points On Profit Booking
India Pr Wire May 11, 2009 Sensex Down 193 Points On Profit BookingIndia Pr Wire May 11, 2009 Sensex Down 193 Points On Profit Booking
India Pr Wire May 11, 2009 Sensex Down 193 Points On Profit BookingJagannadham Thunuguntla
 
Miyagawa
MiyagawaMiyagawa
Miyagawaguru100
 
Curiosity Bits Python Tutorial: Mining Facebook Fan Page - getting posts and ...
Curiosity Bits Python Tutorial: Mining Facebook Fan Page - getting posts and ...Curiosity Bits Python Tutorial: Mining Facebook Fan Page - getting posts and ...
Curiosity Bits Python Tutorial: Mining Facebook Fan Page - getting posts and ...Weiai Wayne Xu
 
DMAP: Data Aggregation and Presentation Framework
DMAP: Data Aggregation and Presentation FrameworkDMAP: Data Aggregation and Presentation Framework
DMAP: Data Aggregation and Presentation FrameworkParang Saraf
 
Working Smarter: SEO Automation to Increase Efficiency and Effectiveness - Pa...
Working Smarter: SEO Automation to Increase Efficiency and Effectiveness - Pa...Working Smarter: SEO Automation to Increase Efficiency and Effectiveness - Pa...
Working Smarter: SEO Automation to Increase Efficiency and Effectiveness - Pa...State of Search Conference
 

Similar to Hacking RSS: Filtering & Processing Obscene Amounts of Information (short version) (20)

Optimizing Content Visibility (St. Louis WordCamp)
Optimizing Content Visibility (St. Louis WordCamp)Optimizing Content Visibility (St. Louis WordCamp)
Optimizing Content Visibility (St. Louis WordCamp)
 
Webinar Structured Data
Webinar Structured DataWebinar Structured Data
Webinar Structured Data
 
SMX Advanced 2015 Seattle | SEO Recap
SMX Advanced 2015 Seattle | SEO RecapSMX Advanced 2015 Seattle | SEO Recap
SMX Advanced 2015 Seattle | SEO Recap
 
How to annotate_with_wordpress
How to annotate_with_wordpressHow to annotate_with_wordpress
How to annotate_with_wordpress
 
SEO for Developers - Little Rock Tech Fest 2014
SEO for Developers - Little Rock Tech Fest 2014SEO for Developers - Little Rock Tech Fest 2014
SEO for Developers - Little Rock Tech Fest 2014
 
Tracking online conversations with Yahoo Pipes
Tracking online conversations with Yahoo PipesTracking online conversations with Yahoo Pipes
Tracking online conversations with Yahoo Pipes
 
StripeCon EU 2021 - Can you make it more like google?
StripeCon EU 2021 - Can you make it more like google?StripeCon EU 2021 - Can you make it more like google?
StripeCon EU 2021 - Can you make it more like google?
 
Recsys 2016
Recsys 2016Recsys 2016
Recsys 2016
 
Social Media Data Collection & Analysis
Social Media Data Collection & AnalysisSocial Media Data Collection & Analysis
Social Media Data Collection & Analysis
 
The Zeitgeist Movement
The Zeitgeist MovementThe Zeitgeist Movement
The Zeitgeist Movement
 
India Pr Wire May 11, 2009 Sensex Down 193 Points On Profit Booking
India Pr Wire May 11, 2009 Sensex Down 193 Points On Profit BookingIndia Pr Wire May 11, 2009 Sensex Down 193 Points On Profit Booking
India Pr Wire May 11, 2009 Sensex Down 193 Points On Profit Booking
 
Miyagawa
MiyagawaMiyagawa
Miyagawa
 
Miyagawa
MiyagawaMiyagawa
Miyagawa
 
Miyagawa
MiyagawaMiyagawa
Miyagawa
 
Miyagawa
MiyagawaMiyagawa
Miyagawa
 
Rss Feeds
Rss FeedsRss Feeds
Rss Feeds
 
Curiosity Bits Python Tutorial: Mining Facebook Fan Page - getting posts and ...
Curiosity Bits Python Tutorial: Mining Facebook Fan Page - getting posts and ...Curiosity Bits Python Tutorial: Mining Facebook Fan Page - getting posts and ...
Curiosity Bits Python Tutorial: Mining Facebook Fan Page - getting posts and ...
 
DMAP: Data Aggregation and Presentation Framework
DMAP: Data Aggregation and Presentation FrameworkDMAP: Data Aggregation and Presentation Framework
DMAP: Data Aggregation and Presentation Framework
 
Indexing repositories: Pitfalls & best practices
Indexing repositories: Pitfalls & best practicesIndexing repositories: Pitfalls & best practices
Indexing repositories: Pitfalls & best practices
 
Working Smarter: SEO Automation to Increase Efficiency and Effectiveness - Pa...
Working Smarter: SEO Automation to Increase Efficiency and Effectiveness - Pa...Working Smarter: SEO Automation to Increase Efficiency and Effectiveness - Pa...
Working Smarter: SEO Automation to Increase Efficiency and Effectiveness - Pa...
 

More from Dawn Foster

CHAOSS Metrics Overview and Examples
CHAOSS Metrics Overview and ExamplesCHAOSS Metrics Overview and Examples
CHAOSS Metrics Overview and ExamplesDawn Foster
 
Be a Good Corporate Citizen in Kubernetes
Be a Good Corporate Citizen in KubernetesBe a Good Corporate Citizen in Kubernetes
Be a Good Corporate Citizen in KubernetesDawn Foster
 
Overcoming Imposter Syndrome to Become a Conference Speaker!
Overcoming Imposter Syndrome to Become a Conference Speaker!Overcoming Imposter Syndrome to Become a Conference Speaker!
Overcoming Imposter Syndrome to Become a Conference Speaker!Dawn Foster
 
How to Be a Good Corporate Citizen in Open Source
How to Be a Good Corporate Citizen in Open SourceHow to Be a Good Corporate Citizen in Open Source
How to Be a Good Corporate Citizen in Open SourceDawn Foster
 
Open Source Collaboration and Companies: Finding the Right Balance
Open Source Collaboration and Companies: Finding the Right BalanceOpen Source Collaboration and Companies: Finding the Right Balance
Open Source Collaboration and Companies: Finding the Right BalanceDawn Foster
 
Navigating Open Source Risk
Navigating Open Source RiskNavigating Open Source Risk
Navigating Open Source RiskDawn Foster
 
Measuring Project Health at VMware
Measuring Project Health at VMwareMeasuring Project Health at VMware
Measuring Project Health at VMwareDawn Foster
 
Navigating Open Source Risk
Navigating Open Source RiskNavigating Open Source Risk
Navigating Open Source RiskDawn Foster
 
Collaborative Leadership: Governance Beyond Company Affiliation
Collaborative Leadership: Governance Beyond Company AffiliationCollaborative Leadership: Governance Beyond Company Affiliation
Collaborative Leadership: Governance Beyond Company AffiliationDawn Foster
 
Collaborative Leadership: Governance Beyond Company Affiliation
Collaborative Leadership: Governance Beyond Company AffiliationCollaborative Leadership: Governance Beyond Company Affiliation
Collaborative Leadership: Governance Beyond Company AffiliationDawn Foster
 
Collaborative Leadership: Governance Beyond Company Affiliation
Collaborative Leadership: Governance Beyond Company AffiliationCollaborative Leadership: Governance Beyond Company Affiliation
Collaborative Leadership: Governance Beyond Company AffiliationDawn Foster
 
Collaborative Leadership: Governance Beyond Company Affiliation
Collaborative Leadership: Governance Beyond Company AffiliationCollaborative Leadership: Governance Beyond Company Affiliation
Collaborative Leadership: Governance Beyond Company AffiliationDawn Foster
 
Is this Open Source Project Healthy or Lifeless?
Is this Open Source Project Healthy or Lifeless?Is this Open Source Project Healthy or Lifeless?
Is this Open Source Project Healthy or Lifeless?Dawn Foster
 
Collaboration in Linux Kernel Mailing Lists
Collaboration in Linux Kernel Mailing Lists Collaboration in Linux Kernel Mailing Lists
Collaboration in Linux Kernel Mailing Lists Dawn Foster
 
Be a Good Corporate Citizen in Kubernetes
Be a Good Corporate Citizen in KubernetesBe a Good Corporate Citizen in Kubernetes
Be a Good Corporate Citizen in KubernetesDawn Foster
 
Being a Good Corporate Citizen in Open Source
Being a Good Corporate Citizen in Open SourceBeing a Good Corporate Citizen in Open Source
Being a Good Corporate Citizen in Open SourceDawn Foster
 
Building Community for your Company’s OSS Projects
Building Community for your Company’s OSS ProjectsBuilding Community for your Company’s OSS Projects
Building Community for your Company’s OSS ProjectsDawn Foster
 
Building Community for your Company’s OSS Project
Building Community for your Company’s OSS ProjectBuilding Community for your Company’s OSS Project
Building Community for your Company’s OSS ProjectDawn Foster
 
How to be a terrible hiring manager
How to be a terrible hiring managerHow to be a terrible hiring manager
How to be a terrible hiring managerDawn Foster
 
A week in the Life of Kubernetes
A week in the Life of KubernetesA week in the Life of Kubernetes
A week in the Life of KubernetesDawn Foster
 

More from Dawn Foster (20)

CHAOSS Metrics Overview and Examples
CHAOSS Metrics Overview and ExamplesCHAOSS Metrics Overview and Examples
CHAOSS Metrics Overview and Examples
 
Be a Good Corporate Citizen in Kubernetes
Be a Good Corporate Citizen in KubernetesBe a Good Corporate Citizen in Kubernetes
Be a Good Corporate Citizen in Kubernetes
 
Overcoming Imposter Syndrome to Become a Conference Speaker!
Overcoming Imposter Syndrome to Become a Conference Speaker!Overcoming Imposter Syndrome to Become a Conference Speaker!
Overcoming Imposter Syndrome to Become a Conference Speaker!
 
How to Be a Good Corporate Citizen in Open Source
How to Be a Good Corporate Citizen in Open SourceHow to Be a Good Corporate Citizen in Open Source
How to Be a Good Corporate Citizen in Open Source
 
Open Source Collaboration and Companies: Finding the Right Balance
Open Source Collaboration and Companies: Finding the Right BalanceOpen Source Collaboration and Companies: Finding the Right Balance
Open Source Collaboration and Companies: Finding the Right Balance
 
Navigating Open Source Risk
Navigating Open Source RiskNavigating Open Source Risk
Navigating Open Source Risk
 
Measuring Project Health at VMware
Measuring Project Health at VMwareMeasuring Project Health at VMware
Measuring Project Health at VMware
 
Navigating Open Source Risk
Navigating Open Source RiskNavigating Open Source Risk
Navigating Open Source Risk
 
Collaborative Leadership: Governance Beyond Company Affiliation
Collaborative Leadership: Governance Beyond Company AffiliationCollaborative Leadership: Governance Beyond Company Affiliation
Collaborative Leadership: Governance Beyond Company Affiliation
 
Collaborative Leadership: Governance Beyond Company Affiliation
Collaborative Leadership: Governance Beyond Company AffiliationCollaborative Leadership: Governance Beyond Company Affiliation
Collaborative Leadership: Governance Beyond Company Affiliation
 
Collaborative Leadership: Governance Beyond Company Affiliation
Collaborative Leadership: Governance Beyond Company AffiliationCollaborative Leadership: Governance Beyond Company Affiliation
Collaborative Leadership: Governance Beyond Company Affiliation
 
Collaborative Leadership: Governance Beyond Company Affiliation
Collaborative Leadership: Governance Beyond Company AffiliationCollaborative Leadership: Governance Beyond Company Affiliation
Collaborative Leadership: Governance Beyond Company Affiliation
 
Is this Open Source Project Healthy or Lifeless?
Is this Open Source Project Healthy or Lifeless?Is this Open Source Project Healthy or Lifeless?
Is this Open Source Project Healthy or Lifeless?
 
Collaboration in Linux Kernel Mailing Lists
Collaboration in Linux Kernel Mailing Lists Collaboration in Linux Kernel Mailing Lists
Collaboration in Linux Kernel Mailing Lists
 
Be a Good Corporate Citizen in Kubernetes
Be a Good Corporate Citizen in KubernetesBe a Good Corporate Citizen in Kubernetes
Be a Good Corporate Citizen in Kubernetes
 
Being a Good Corporate Citizen in Open Source
Being a Good Corporate Citizen in Open SourceBeing a Good Corporate Citizen in Open Source
Being a Good Corporate Citizen in Open Source
 
Building Community for your Company’s OSS Projects
Building Community for your Company’s OSS ProjectsBuilding Community for your Company’s OSS Projects
Building Community for your Company’s OSS Projects
 
Building Community for your Company’s OSS Project
Building Community for your Company’s OSS ProjectBuilding Community for your Company’s OSS Project
Building Community for your Company’s OSS Project
 
How to be a terrible hiring manager
How to be a terrible hiring managerHow to be a terrible hiring manager
How to be a terrible hiring manager
 
A week in the Life of Kubernetes
A week in the Life of KubernetesA week in the Life of Kubernetes
A week in the Life of Kubernetes
 

Recently uploaded

TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...marcuskenyatta275
 
Introduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptxIntroduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptxFIDO Alliance
 
Cyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptx
Cyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptxCyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptx
Cyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptxMasterG
 
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc
 
Vector Search @ sw2con for slideshare.pptx
Vector Search @ sw2con for slideshare.pptxVector Search @ sw2con for slideshare.pptx
Vector Search @ sw2con for slideshare.pptxjbellis
 
Working together SRE & Platform Engineering
Working together SRE & Platform EngineeringWorking together SRE & Platform Engineering
Working together SRE & Platform EngineeringMarcus Vechiato
 
Design and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data ScienceDesign and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data SciencePaolo Missier
 
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...FIDO Alliance
 
UiPath manufacturing technology benefits and AI overview
UiPath manufacturing technology benefits and AI overviewUiPath manufacturing technology benefits and AI overview
UiPath manufacturing technology benefits and AI overviewDianaGray10
 
Top 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development CompaniesTop 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development CompaniesTopCSSGallery
 
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...ScyllaDB
 
State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!Memoori
 
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...Skynet Technologies
 
Syngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdfSyngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdfSyngulon
 
How we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdfHow we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdfSrushith Repakula
 
Google I/O Extended 2024 Warsaw
Google I/O Extended 2024 WarsawGoogle I/O Extended 2024 Warsaw
Google I/O Extended 2024 WarsawGDSC PJATK
 
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdfMuhammad Subhan
 
The Metaverse: Are We There Yet?
The  Metaverse:    Are   We  There  Yet?The  Metaverse:    Are   We  There  Yet?
The Metaverse: Are We There Yet?Mark Billinghurst
 
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?Paolo Missier
 
Generative AI Use Cases and Applications.pdf
Generative AI Use Cases and Applications.pdfGenerative AI Use Cases and Applications.pdf
Generative AI Use Cases and Applications.pdfalexjohnson7307
 

Recently uploaded (20)

TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
 
Introduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptxIntroduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptx
 
Cyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptx
Cyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptxCyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptx
Cyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptx
 
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
 
Vector Search @ sw2con for slideshare.pptx
Vector Search @ sw2con for slideshare.pptxVector Search @ sw2con for slideshare.pptx
Vector Search @ sw2con for slideshare.pptx
 
Working together SRE & Platform Engineering
Working together SRE & Platform EngineeringWorking together SRE & Platform Engineering
Working together SRE & Platform Engineering
 
Design and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data ScienceDesign and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data Science
 
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
 
UiPath manufacturing technology benefits and AI overview
UiPath manufacturing technology benefits and AI overviewUiPath manufacturing technology benefits and AI overview
UiPath manufacturing technology benefits and AI overview
 
Top 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development CompaniesTop 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development Companies
 
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
 
State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!
 
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
 
Syngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdfSyngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdf
 
How we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdfHow we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdf
 
Google I/O Extended 2024 Warsaw
Google I/O Extended 2024 WarsawGoogle I/O Extended 2024 Warsaw
Google I/O Extended 2024 Warsaw
 
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
 
The Metaverse: Are We There Yet?
The  Metaverse:    Are   We  There  Yet?The  Metaverse:    Are   We  There  Yet?
The Metaverse: Are We There Yet?
 
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
 
Generative AI Use Cases and Applications.pdf
Generative AI Use Cases and Applications.pdfGenerative AI Use Cases and Applications.pdf
Generative AI Use Cases and Applications.pdf
 

Hacking RSS: Filtering & Processing Obscene Amounts of Information (short version)

  • 1. Hacking RSS: Filtering & Processing Obscene Amounts of Information #hackingRSS Dawn Foster Intel Community Manager for MeeGo dawn@fastwonder.com
  • 2. Information Overload CD Photo: http://www.flickr.com/photos/chefranden/2751354004/
  • 3. Who Cares? ● Most of it is … – complete crap – out of date / obsolete – not interesting to you – irrelevant for you Junk Pile: http://www.flickr.com/photos/zen/4013525/
  • 4. You Want to Find the Needle Haystacks: http://www.flickr.com/photos/rasekh/4911673659/
  • 5. RSS Alone is a Start ● Sources you care about delivered right to you. But … – Do you care about everything in each feed? – What about the feeds you aren't subscribed to? – Can you keep up with what you have?
  • 6. Prioritize Your Reader ● Put things you care about at the top ● Categorize ● Don't try to read everything
  • 7. The Real Magic is in Filtering RSS Complete Crap Interesting Maybe Relevant Yay! ● In my Google Reader right now: – Analyst research blogs mentioning Online Community – Analyst research blogs mentioning MeeGo – Searches across social sites mentioning me, my projects, my websites etc. - filtering out things I don't care about – My favorite blogs filtered using PostRank to find only the ones with a lot of comments or social mentions
  • 8. RSS Filtering Tools ● Yahoo Pipes (my favorite) – More powerful & fexible: options to filter any data found in any field in the rss feed (URL, title, description, author …) – Downside: takes some time to learn & can be a little faky at times. Also a single point of failure if Yahoo ever killed it. ● Other Options – FeedRinse: easy to use, not as fexible. Import RSS feeds, add filters, get new RSS feeds out. – RSS readers with filtering / alerts (FeedDemon) – Code: write your own filters – Note: many free RSS filtering services have gone out of business – can be bandwidth intensive & costly to host.
  • 9. Yahoo Pipes Filtering Example ● Input: – WebWorkerDaily – ReadWriteWeb ● Filter by content: – Collaborate – Collaboration – Collaborative ● Output: – 1 RSS Feed – Matching 3 keywords 2 Minute Yahoo Pipe Video How-to's: http://fastwonderblog.com/yahoo-pipes-and-rss-hacks/
  • 10. PostRank ● Best Posts in a feed ● Ranked on engagement (links, sharing, comments) ● Can get output as RSS feed ● Feed includes postrank number as a field
  • 11. What's In a Feed? PostRank (Yahoo Pipes View) ● Content in feeds varies wildly depending on site. ● Common: title, author, pubDate, link, content, description ● Site-specific: postrank, lat/long, image links, username, twitter source … (most RSS readers don't show these) ● API: usually has additional data & can output RSS ● If it's in the feed, you can use it!
  • 12. Reformatting / Modifying RSS Feeds Don't be satisfied with default RSS feed formats! Twitter Search Twitter RSS Feed Modify & more quickly scan key data
  • 13. Yahoo Pipes: Reformat Twitter Feed ● Input: – Twitter Search feed ● Loop String Build: – Author – : (spacing) – Title ● Loop Assign: – Store result back into title ● Output: – 1 RSS feed – Efficient format
  • 14. BackTweets (BackType API) ● Data about links on Twitter ● Finds links regardless of shortening service ● No RSS Feeds ● But … You can use API + Pipes to build one!
  • 15. BackType + Twitter API + Pipes Output ● Data from BackType + Twitter ● Built an RSS feed using Yahoo Pipes ● Included the information relevant for me ● Could have included or filtered on: name, listed count, location, profile image, user URL, ...
  • 16. Admit it, we ALL do vanity searches ● You can enter your search queries in Google, Twitter, Flickr … – Add a new project & have to update all of them – Can be hard to filter out some results – May have duplicates from multiple searches ● Yahoo Pipes – Update keywords in a CSV file – Use CSV file as input into a bunch of searches (RSS or API inputs) – Filter out what you don't want – Get 1 filtered RSS feed as output 2 minute video: http://fastwonderblog.com/2009/05/01/keyword-csv-files-and-searching-2-minute-yahoo-pipes-demo/
  • 17. How Should / Shouldn't You Use All of This? ● Do: – Use this for personal productivity – Play around, create prototypes and understand the possibilities ● Don't: – Don't violate licenses on content or republish w/o permission – Don't use in critical or production environments ● For production use or putting data on websites: – Re-write in a real programming language with cached results and error checking XKCD Comic: http://xkcd.com/327/
  • 18. Learn More About Dawn: ● Intel Community Manager for MeeGo ● Author of Companies and Communities ● More Info: http://fastwonderblog.com ● Dawn@FastWonder.com ● @geekygirldawn on Twitter 18 Additional Reading & audio from 1 hour version of this talk: ● http://fastwonderblog.com/yahoo-pipes-and-rss-hacks/ Photo of Dawn: http://www.flickr.com/photos/ahockley/3036575066/
  • 20. Outsource / Crowdsource New Sources
  • 21. Yahoo Pipes: Reformat PostRank Feed ● Input: – 3 PostRank feeds ● Loop String Build: – PostRank – : (spacing) – Title ● Loop Assign: – Store result back into title ● Output: – 1 RSS feed – Efficient format
  • 22. Yahoo Pipes PostRank Example ● Input PostRank Feeds: – Engadget – CrunchGear – Boy Genius ● Filter by content – Tablet ● Sort: – PostRank ● Output – 1 RSS feed – Best tablet posts
  • 23. Using Web APIs 101 ● Many API calls are basically URLs ● Constructing URLs – Use API documentation/examples to format the URL – http://api.twitter.com/1/statuses/show /ID.xml ● Version 1 of API show status for ID in .format ● API keys – Tells API who you are (password) ● Rate limiting – Only get so much & you're cut of – Limited by IP or API key – Chill out for a while & come back XKCD Comic: http://xkcd.com/844/
  • 24. Backtweets API + Twitter API + Yahoo Pipes ● What we want to do: – Start with a set of URLs (blog posts in a feed) – Find any tweet mentioning those URLs – Return the tweet and data about the person who posted it ● Mission: Build feed using only data from these 2 APIs ● BackType API provides Tweet ID (not humanly useful) – http://api.backtype.com/tweets/search/links.xml? q=URL&mode=batch&key=KEY – List of Twitter Status IDs for Tweets linking to URL – Note: I think this feature may be deprecated ● Twitter API uses Tweet ID to get everything else – http://api.twitter.com/1/statuses/show/ID.xml – Returns a single status all relevant data for ID
  • 25. BackTweets API: Get Tweet ID ● Take WebWorkerDaily Author Feed ● Use WWD URLs to build URLs for BackType API call ● Fetch data from BackType URLs to get Tweet ID
  • 26. Twitter API: Get Data Based on Tweet ID ● Use BackType tweet ID to build URL for Twitter API ● Fetch data about Tweet & User from Twitter API ● Re-Build title to show “user (followers): tweet”
  • 27. Add Filters to BackType + Twitter Example ● Show only tweets from people with 1000+ followers