Plagger the duct tape of internet

  • 4,532 views
Uploaded on

at XML developers' day 2006

at XML developers' day 2006

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
4,532
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
52
Comments
0
Likes
3

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Plagger the duct tape of the Web Tatsuhiko Miyagawa [email_address] Six Apart, Ltd. / Shibuya Perl Mongers XML Developers' Day #9
  • 2.
    • IRC
    • #plagger-ja
    • chat.freenode.net
    • (iso-2022-jp)
  • 3.
    • アウェイっぽいので
    • 自己紹介
  • 4.
    • Tatsuhiko Miyagawa
  • 5.  
  • 6.  
  • 7. http://www.vox.com/
  • 8.
    • What is Plagger?
  • 9.  
  • 10.  
  • 11.  
  • 12.  
  • 13.
    • Pl uggable
    • RSS/Atom
    • Agg regato r
  • 14.
    • Why Pluggable?
    • Just for a feed aggregation?
  • 15.
    • History
  • 16.
    • 2002 Apr.
    • baseball2rss
    • http://search.cpan.org/dist/WWW-Baseball-NPB/
  • 17.
    • 2003 Oct.
    • rss2javascript
    • http://blog.bulknews.net/cookbook/blosxom/rss/rss2js.html
  • 18.
    • 2004 Sep.
    • bloglines2ipod
    • http://bulknews.net/lib/utils/bloglines2ipod/
  • 19.  
  • 20.
    • 2004 Oct.
    • rss2audiobook
    • http://bulknews.net/lib/utils/rss2audiobook/
  • 21.
    • 2005 Aug.
    • bloglines2gmail
    • http://svn.bulknews.net/repos/public/bloglines2email/trunk/
  • 22. #!/usr/bin/perl use strict; use warnings; use LWP::UserAgent; use XML::RSS; my $url = "http://example.com/rss.xml"; my $agent = LWP::UserAgent->new; my $xml = $agent->get($url)->content; my $rss = XML::RSS->new; $rss->parse($xml); for my $item (@{$rss->items}) { # do something with $item }
  • 23. #!/usr/bin/perl use strict; use warnings; use LWP::UserAgent; use XML::RSS; my $url = "http://example.com/rss.xml"; my $agent = LWP::UserAgent->new; my $xml = $agent->get($url)->content; my $rss = XML::RSS->new; $rss->parse($xml); for my $item (@{$rss->items}) { # do something with $item }
  • 24.
    • "HTTP リクエストが
    • エラーになったら ?"
  • 25. #!/usr/bin/perl use strict; use warnings; use LWP::UserAgent; use XML::RSS; my $url = "http://example.com/rss.xml"; my $agent = LWP::UserAgent->new; my $res = $agent->get($url); if ($res->is_error) { die "Bah." } my $xml = $res->content; my $rss = XML::RSS->new; $rss->parse($xml); for my $item (@{$rss->items}) { # do something with $item }
  • 26.
    • "Atom も読みたい !"
  • 27. #!/usr/bin/perl use strict; use warnings; use LWP::UserAgent; use XML::RSS; use XML::Atom::Feed; my $url = "http://example.com/rss.xml"; my $agent = LWP::UserAgent->new; my $res = $agent->get($url); if ($res->is_error) { die "Bah." } my $xml = $res->content; if ($res->content_type =~ /atom/) { my $feed = XML::Atom::Feed->new($xml); } else { my $rss = XML::RSS->new; $rss->parse($xml); }
  • 28.
    • If-Modified-Since で
    • 帯域節約したい!
  • 29. #!/usr/bin/perl use strict; use warnings; use LWP::UserAgent; use XML::RSS; my $url = "http://example.com/rss.xml"; my $agent = LWP::UserAgent->new; my $local = cache_path_for($url); my $res = $agent->mirror($url, $local); if ($res->is_error) { die "Bah." } my $xml = $res->content; …
  • 30.
    • " 壊れたフィードも
    • パースしたい! "
  • 31.
    • Etc., etc.
  • 32.
    • 他にも
  • 33.
    • rss2opml
    • http://aruntx.com/software/rss2opml/
  • 34.
    • rss2pdf
    • http://rss2pdf.com/
  • 35.
    • rss2atom
    • brian.wanamaker.com/mybicycle/2004/02/rss2atom.html
  • 36.
    • atom2rss
    • http://www.2rss.com/software.php?page=atom2rss
  • 37.
    • rss2ical
    • http://bura-bura.com/blog/archives/2004/06/22/rss2ical/
  • 38.
    • Bloglines2opml
    • http://mycvs.org/wp/wp-content/wp-transform.php
  • 39.
    • rss2gmail
    • http://www.cs.utexas.edu/~karu/gmailrss/
  • 40.
    • rss2imap
    • http://rss2imap.sourceforge.jp/
  • 41.
    • ebay2rss
    • http://www.2rss.com/software.php?page=ebay2rss
  • 42.
    • svn2rss
    • http://twiki.org/cgi-bin/view/Codev/Svn2rss
  • 43.
    • <any>2<any>
    Where either of <any> is RSS|Atom|OPML
  • 44.
    • This is ridiculous.
  • 45.
    • Different Languages,
    • Different Bugs.
    • No hackability
  • 46.  
  • 47. via http://www.atmarkit.co.jp/fnetwork/rensai/5minplagger/02.html
  • 48. IRC, Eject, Growl MSAgent, SSTP … Filter Publish StripRSSAd TruePermalink EntryFullText Pipe Thumbnail FindEnclosures FetchEnclosure SpamAssassin RSSLiberalDateTime URLBL ResolveRelativeLink … Gmail Delicious PDF MT Feed Planet Speech … Notify Bloglines Config OPML, XOXO File, DBI, FOAF … Mixi, Frepa POP3, iCal iTunes, Amazon YouTube … Subscription CustomFeed
  • 49. IRC, Eject, Growl MSAgent, SSTP … Filter Publish StripRSSAd TruePermalink EntryFullText Pipe Thumbnail FindEnclosures FetchEnclosure SpamAssassin RSSLiberalDateTime URLBL ResolveRelativeLink … Gmail Delicious PDF MT Feed Planet Speech … Notify Bloglines Config OPML, XOXO File, DBI, FOAF … Mixi, Frepa POP3, iCal iTunes, Amazon YouTube … Subscription CustomFeed
  • 50. IRC, Eject, Growl MSAgent, SSTP … Filter Publish StripRSSAd TruePermalink EntryFullText Pipe Thumbnail FindEnclosures FetchEnclosure SpamAssassin RSSLiberalDateTime URLBL ResolveRelativeLink … Gmail Delicious PDF MT Feed Planet Speech … Notify Bloglines Config OPML, XOXO File, DBI, FOAF … Mixi, Frepa POP3, iCal iTunes, Amazon YouTube … Subscription CustomFeed
  • 51. IRC, Eject, Growl MSAgent, SSTP … Filter Publish StripRSSAd TruePermalink EntryFullText Pipe Thumbnail FindEnclosures FetchEnclosure SpamAssassin RSSLiberalDateTime URLBL ResolveRelativeLink … Gmail Delicious PDF MT Feed Planet Speech … Notify Bloglines Config OPML , XOXO File, DBI, FOAF … Mixi, Frepa POP3, iCal iTunes, Amazon YouTube … Subscription CustomFeed
  • 52.
    • Just like UNIX pipe
    Subscribe OPML | StripRSSAd | ResolveRelativeLink | Publish Feed --type=Atom
  • 53. I believe RSS has the potential to be the “UNIX pipe of the internet” … Ray Ozzie CTO of Microsoft http://rayozzie.spaces.live.com/blog/cns!FB3017FBB9B2E142!285.entry
  • 54. &quot;the Unix shell for Web 2.0&quot;
  • 55.
    • 組み合わせの数
    • {CustomFeed,Subscription}/*.pm: 35
    • {Publish,Notify}/*.pm: 37
    • 35 * 37 = 1295
  • 56.
    • Plagger
    • Core features
  • 57.
    • RSS/Atom
    • Auto-Discovery
  • 58.
    • Support Feed formats
    • RSS 0.91 to Atom 1.0
    • (XML::Feed + hacks)
  • 59.
    • Support parsing
    • Broken feeds
    • (XML::Liberal)
  • 60.
    • HTTP optimizations
    • If-Modified-Since / gzip
    • (URI::Fetch)
  • 61.
    • Podcast / Videocast
    • Support
    • (RSS 2.0 & Atom 1.0)
  • 62.
    • Photocast
    • Media RSS
    • iTunes RSS*
  • 63.
    • 非同期ダウンロード
    • cURL, wget, HTTP::Parallel & HTTP::Async*
  • 64.
    • 完全な国際化
    • Unicode & Timezone
  • 65.
    • Access to
    • browser Cookies
    • IE, Safari, Firefox and w3m
  • 66.
    • Screen-scraping
    • Via CutomFeed::*
  • 67.
    • Stackable Plugins
  • 68.
    • Rule-based
    • Dispatch of Plugins
  • 69.
    • Plagger の
    • インストール
  • 70.  
  • 71.  
  • 72.
    • 省略
    http://plagger.org/trac/wiki/PlaggerQuickStart
  • 73.
    • Plagger
    • クイックチュートリアル
  • 74.
    • Email クライアントで
    • RSS フィードを読みたい!
    • (Gmail, Thunderbird)
  • 75. rss2email.yaml plugins: - module: Subscription::Config config: feed: - http://bulknews.vox.com/library/posts/atom.xml - http://bulknews.typepad.com/blog/ - module: Publish::Gmail config: mailto: miyagawa@gmail.com
  • 76. RSS in Gmail
  • 77. HTML + Images
  • 78. Feed Image (Logo / Buddy Icon)
  • 79. Search
  • 80. Auto grouping (“Conversations”)
  • 81. Diff
  • 82.
    • オフライン(飛行機)でも
    • フィードを読みたい!
  • 83. rss2email.yaml plugins: - module: Subscription::Config config: feed: - http://bulknews.vox.com/library/posts/atom.xml - http://bulknews.typepad.com/blog/ - module: Filter::FindEnclosures - module: Filter::FetchEnclosure config: dir: /tmp - module: Publish::Gmail config: mailto: miyagawa@gmail.com attach_enclosures: 1
  • 84. Offline Mode POP3 + Thunderbird
  • 85.
    • &quot; フィードのリストを
    • YAML で管理するのメンドウ &quot;
  • 86. opml2email.yaml plugins: - module: Subscription::OPML config: url: http://example.com/subscription.opml - module: Publish::Gmail config: mailto: miyagawa@gmail.com # subscription.opml <?xml version=&quot;1.0&quot;?> <opml> <outline title=&quot;Subscriptions&quot;> <outline title=&quot;miyagawa&quot; type=&quot;rss&quot; xmlUrl=&quot;http://bulknews.typepad.com/blog/atom.xml&quot; /> <outline title=&quot;miyagawa on Vox&quot; type=&quot;rss&quot; htmlUrl=&quot;http://bulknews.vox.com/&quot; /> </outline> </opml>
  • 87.
    • &quot;OPML を手動で編集するのはテラダルス &quot;
  • 88. filesub2email.yaml plugins: - module: Subscription::File config: url: file:///path/to/subscription.txt - module: Publish::Gmail config: mailto: miyagawa@gmail.com > cat subscription.txt http://bulknews.typepad.com/blog/atom.xml http://bulknews.vox.com/ >
  • 89.
    • &quot;1000 フィード以上読んでると自分のマシンで読むのは帯域のムダ。 Bloglines にクロールさせたい。 &quot;
  • 90. bloglines2email.yaml plugins: - module: Subscription::Bloglines config: username: YOU@example.com password: blahblahblah - module: Publish::Gmail config: mailto: miyagawa@gmail.com
  • 91.
    • &quot; それ Livedoor Reader で &quot;
  • 92. bloglines2email.yaml plugins: - module: Subscription::LivedoorReader config: username: YOU@example.com password: blahblahblah - module: Publish::Gmail config: mailto: miyagawa@gmail.com
  • 93.
    • &quot;Gmail / Thunderbird から
    • del.icio.us にブクマしたい !&quot;
  • 94. bloglines2email.yaml plugins: - module: Subscription::Bloglines config: username: YOU@example.com password: blahblahblah - module: Widget::Simple config: widget: delicious - module: Publish::Gmail config: mailto: miyagawa@gmail.com
  • 95.  
  • 96.
    • &quot; それはてなブックマークで &quot;
  • 97. bloglines2email.yaml plugins: - module: Subscription::Bloglines config: username: YOU@example.com password: blahblahblah - module: Widget::Simple config: widget: hatena_bookmark_users - module: Publish::Gmail config: mailto: miyagawa@gmail.com
  • 98.  
  • 99.
    • &quot;RSS 広告ウザス &quot;
  • 100. bloglines2email.yaml plugins: - module: Subscription::Bloglines config: username: YOU@example.com password: blahblahblah - module: Widget::Simple config: widget: delicious - module: Filter::StripRSSAd - module: Publish::Gmail config: mailto: miyagawa@gmail.com
  • 101.
    • Quick tour
    • For more plugins
  • 102. Plugin phases (types)
    • Subscription
    • Aggregator
    • CustomFeed
    • Filter
    • Publish
    • Search
    • Notify
  • 103. Plugin phases (types)
    • Subscription
    • Aggregator
    • CustomFeed
    • Filter
    • Publish
    • Search
    • Notify
  • 104.
    • Subscription
    • load subscriptions
    • (list the feeds/URLs to aggregate)
  • 105.
    • Subscription::Config
    - module: Subscription::Config config: feed: - http://www.yapcchicago.org/feed/ - http://tokyo.yapcasia.org/blog/
  • 106.
    • Subscription::OPML
    - module: Subscription::OPML config: url: http://www.example.com/subs.opml # subs.opml <opml> <outline xmlUrl=&quot;http://www.yapcchicago.org/feed/&quot; /> <outline htmlUrl=&quot;http://tokyo.yapcasia.org/blog/&quot; /> </opml>
  • 107.
    • Subscription::File
    - module: Subscription::File config: url: file:///path/to/subscription.txt % cat subscription.txt http://www.yapcchicago.org/feed/ http://tokyo.yapcasia.org/blog/ %
  • 108.
    • Subscription::XOXO
    - module: Subscription::XOXO config: url: http://www.example.com/subscription.html # subscription.html <ul class=&quot;xoxo&quot;> <li><a href=&quot;http://www.yapcchicago.org/feed/&quot;>YAPC::NA</a></li> <li><a href=&quot;http://tokyo.yapcasia.org/blog/&quot;>YAPC::NA</a></li> </ul>
  • 109.
    • Subscription::Bookmarks
    • Read bookmarks file of IE, Firefox and Safari
  • 110. Plugin phases (types)
    • Subscription
    • Aggregator
    • CustomFeed
    • Filter
    • Publish
    • Search
    • Notify
  • 111.
    • Filter
    • Normalize / Repair feed metadata
    • Upgrade feed content
    • Filter feed content using text filters
    • Invoke some action on entries
  • 112.
    • Filter::EntryFullText
    • 本文なしのフィードをアップグレード
    • 個別 HTML を取得して正規表現 / XPath
  • 113.
    • Filter::TruePermalink
    • リダイレクト URL などを Canonicalize
    • (e.g. http://…/go.php?url=….)
  • 114.
    • Filter::FindEnclosures
    • コンテンツからエンクロージャを抽出
    • <a href=&quot;http://…./foo.mp3&quot;>episode #1</a>
  • 115. Plugin phases (types)
    • Subscription
    • Aggregator
    • CustomFeed
    • Filter
    • Publish
    • Search
    • Notify
  • 116.
    • Publish
    • Publish aggregated entry to online services
    • reBlogging
    • Convert feeds to other formats
  • 117.
    • Publish::Feed
    • Republish feed in RSS/Atom
    • Good to use with scrapers
  • 118.
    • Publish::MT
    • Reblog entries using MT XML-RPC
  • 119.
    • Publish::MTWidget
  • 120.
    • Publish::Email
    • text/plain, multipart/alternative
    • Pluggable email protocols
    • (SMTP, SMTP Auth, IMAP, Maildir …)
  • 121.
    • Publish::iCal
    • Publish iCal feeds out of RSS/Atom
  • 122.
    • Publish::Excel
    仕事の合間に !
  • 123. Plugin phases (types)
    • Subscription
    • Aggregator
    • CustomFeed
    • Filter
    • Publish
    • Search
    • Notify
  • 124.
    • Search
    • Index aggregated entries on search engines
  • 125.
    • Search::Spotlight
  • 126.
    • Search::Estraier
    • Uses HyperEstraier XMLRPC node API
  • 127.
    • Search::Lucene*
    • Use Lucene WebService API
    • (OpenSearch 1.1 and Atom)
  • 128. Plugin phases (types)
    • Subscription
    • Aggregator
    • CustomFeed
    • Filter
    • Publish
    • Search
    • Notify
  • 129.
    • Notify
    • Notify feed updates in various ways
  • 130.
    • Notify::Growl
  • 131.
    • Notify::Balloon
  • 132.
    • Notify::MSAgent
  • 133. Notify::Eject Supports: Windows, Linux, FreeBSD and Mac OSX!
  • 134.
    • Notify::Pizza
  • 135.
    • Notify::Pizza
    Now it does Sushi too!
  • 136.
    • Plagger に
    • 対する誤解
  • 137.  
  • 138. http://d.hatena.ne.jp/sugarcut/20061117/p1 それ Pla 脳 それ Plagger で できるよ それプラズマで 説明できるよ
  • 139.  
  • 140.
    • 何でもできる
    • … わけじゃない
  • 141.  
  • 142.
    • Plagger is a
    • &quot;Feed&quot; aggregator
  • 143. Plagger::Subscription Plagger::Feed author id link title tags url entries Plagger::Feed author id link title tags url entries Plagger::Feed author id link title tags url entries Plagger::Entry author id link permalink title tags enclosures
  • 144. ピザ Pla の意義
  • 145.  
  • 146.  
  • 147.  
  • 148. &quot;the Unix shell for Web 2.0&quot;
  • 149. &quot;RSS is the Standard IO for the Web&quot;
  • 150. &quot;RSS is the Standard IO for the Web&quot;
  • 151. Feed formats RSS 0.91 RSS 2.0 RSS1.0 / RDF Atom 1.0 JSON iCal OPML XOXO XBEL Sitemaps attention.xml Amazon API Google API OpenSearch AtomPP GData
  • 152. Feed Vocabulary / Extensions rvw: Enclosures Photocast iTunes RSS Media RSS Dublin Core FOAF microformats
  • 153.  
  • 154. Plagger = The duct tape of the web.
  • 155.
    • Example:
    • Location metadata
  • 156. N 37.7782 W 122.3973
  • 157. GeoRSS <georss:point>37.7782 -122.3973</georss:point> <georss:where> <gml:Point> <gml:pos>37.7782 -122.3973</gml:pos> </gml:Point> </georss:where> xmlns:georss=&quot;http://www.georss.org/georss&quot; xmlns:gml=&quot;http://www.opengis.net/gml&quot;
  • 158. RDF geo vocabulary <foaf:based_near> <geo:Point> <geo:lat>35.678</geo:lat> <geo:long>139.770</geo:long> </geo:Point> </foaf:based_near> xmlns:geo=&quot;http://www.w3.org/2003/01/geo/wgs84_pos#&quot;
  • 159. Flickr geo tag <media:category scheme=&quot;urn:flickr:tags&quot;> geo:lat=37.7782 geo:lon=-122.3973 </media:category> xmlns:media=&quot;http://search.yahoo.com/mrss/&quot;
  • 160. GeoURL <meta name=&quot;ICBM&quot; content=&quot;37.7782, -122.3973&quot; />
  • 161. geo microformats <div class=&quot;geo&quot;> <span class=&quot;latitude&quot;>37.7782</span> <span class=&quot;longitude&quot;>-122.3973</span> </div>
  • 162. Links to Google Maps <a href=&quot;http://maps.google.com/maps?q=37.7782,-122.3973&z=16&quot;> Link to Google Maps</a>
  • 163. Eznavi mail
  • 164. Photo EXIF
  • 165. adr <div class=&quot;adr&quot;> <div class=&quot;street-address&quot;>548 4th St.</div> <span class=&quot;locality&quot;>San Francisco</span>, <span class=&quot;region&quot;>CA</span> <span class=&quot;postal-code&quot;>94107</span> <div class=&quot;country-name&quot;>U.S.A.</div> </div>
  • 166.
    • We don't care
    • about format diffs.
  • 167. GeoRSS <georss:point>37.7782 -122.3973</georss:point> <georss:where> <gml:Point> <gml:pos>37.7782 -122.3973</gml:pos> </gml:Point> </georss:where> Namespace::GeoRSS
  • 168. RDF geo vocabulary <foaf:based_near> <geo:Point> <geo:lat>35.678</geo:lat> <geo:long>139.770</geo:long> </geo:Point> </foaf:based_near> Namespace::Geo
  • 169. Flickr geo tag <media:category scheme=&quot;urn:flickr:tags&quot;> geo:lat=37.7782 geo:lon=-122.3973 </media:category> Filter::geotagged
  • 170. GeoURL <meta name=&quot;ICBM&quot; content=&quot;37.7782, -122.3973&quot; /> Filter::GeoURL
  • 171. geo microformats <div class=&quot;geo&quot;> <span class=&quot;latitude&quot;>37.7782</span> <span class=&quot;longitude&quot;>-122.3973</span> </div> Filter::Microformats::geo
  • 172. Links to Google Maps <a href=&quot;http://maps.google.com/maps?q=37.7782,-122.3973&z=16&quot;> Link to Google Maps</a> Filter::ExtractMapsLinks
  • 173. Eznavi mail Filter::ExtractMapsLinks
  • 174. Photo EXIF Filter::FetchEnclosure + Filter::ExtractEXIF
  • 175. adr <div class=&quot;adr&quot;> <div class=&quot;street-address&quot;>548 4th St.</div> <span class=&quot;locality&quot;>San Francisco</span>, <span class=&quot;region&quot;>CA</span> <span class=&quot;postal-code&quot;>94107</span> <div class=&quot;country-name&quot;>U.S.A.</div> </div> Filter::Microformats::adr + Filter::Geocoding::US
  • 176.
    • Publish::KML
    • Publish::GoogleMaps
    • Publish::Feed
    • (with geotags)
  • 177.
    • Everything's done
    • in plugins
    • = Clean & extensible.
  • 178.
    • Plagger
    • dev. Status
  • 179.
    • Version
    • 0.7.13
  • 180.
    • Coming Soon …
  • 181.
    • iTunes RSS support
  • 182.
    • Geo extensions
  • 183.
    • Enclosure processors
    • ffmpeg, Sync::PSP, Sync::iPodVideo
  • 184.
    • Pluggable summarizer
    • & text formatter
    • Lingua::EN::Summarize, Text::Original, HTML::WikiConverter, HTML::FormatText
  • 185.
    • Rich Media metadata
    • ID3 tag in enclosures
    • Links to imdb.com / amazon.com
    • hReview microformats
  • 186.
    • Calendar Support
    • iCal parser & emitter
    • hCalendar microformats
    • .ics attached in emails
    • Sync::SyncML
  • 187.
    • Email refactoring
    • text/plain, iso-2022-jp support
    • Pluggable storage engines
  • 188.
    • http://plagger.org/
    • Planet, Mailing List, IRC
    • Bug Tracking, SVN repository
  • 189.
    • #plagger-ja on freenode
  • 190.
    • Join Us!
  • 191.
    • Thank you
    • Questions?