Plagger the duct tape of internet
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Plagger the duct tape of internet

on

  • 6,561 views

at XML developers' day 2006

at XML developers' day 2006

Statistics

Views

Total Views
6,561
Views on SlideShare
6,554
Embed Views
7

Actions

Likes
3
Downloads
52
Comments
0

2 Embeds 7

http://www.slideshare.net 6
http://192.168.10.100 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Plagger the duct tape of internet Presentation Transcript

  • 1. Plagger the duct tape of the Web Tatsuhiko Miyagawa [email_address] Six Apart, Ltd. / Shibuya Perl Mongers XML Developers' Day #9
  • 2.
    • IRC
    • #plagger-ja
    • chat.freenode.net
    • (iso-2022-jp)
  • 3.
    • アウェイっぽいので
    • 自己紹介
  • 4.
    • Tatsuhiko Miyagawa
  • 5.  
  • 6.  
  • 7. http://www.vox.com/
  • 8.
    • What is Plagger?
  • 9.  
  • 10.  
  • 11.  
  • 12.  
  • 13.
    • Pl uggable
    • RSS/Atom
    • Agg regato r
  • 14.
    • Why Pluggable?
    • Just for a feed aggregation?
  • 15.
    • History
  • 16.
    • 2002 Apr.
    • baseball2rss
    • http://search.cpan.org/dist/WWW-Baseball-NPB/
  • 17.
    • 2003 Oct.
    • rss2javascript
    • http://blog.bulknews.net/cookbook/blosxom/rss/rss2js.html
  • 18.
    • 2004 Sep.
    • bloglines2ipod
    • http://bulknews.net/lib/utils/bloglines2ipod/
  • 19.  
  • 20.
    • 2004 Oct.
    • rss2audiobook
    • http://bulknews.net/lib/utils/rss2audiobook/
  • 21.
    • 2005 Aug.
    • bloglines2gmail
    • http://svn.bulknews.net/repos/public/bloglines2email/trunk/
  • 22. #!/usr/bin/perl use strict; use warnings; use LWP::UserAgent; use XML::RSS; my $url = "http://example.com/rss.xml"; my $agent = LWP::UserAgent->new; my $xml = $agent->get($url)->content; my $rss = XML::RSS->new; $rss->parse($xml); for my $item (@{$rss->items}) { # do something with $item }
  • 23. #!/usr/bin/perl use strict; use warnings; use LWP::UserAgent; use XML::RSS; my $url = "http://example.com/rss.xml"; my $agent = LWP::UserAgent->new; my $xml = $agent->get($url)->content; my $rss = XML::RSS->new; $rss->parse($xml); for my $item (@{$rss->items}) { # do something with $item }
  • 24.
    • "HTTP リクエストが
    • エラーになったら ?"
  • 25. #!/usr/bin/perl use strict; use warnings; use LWP::UserAgent; use XML::RSS; my $url = "http://example.com/rss.xml"; my $agent = LWP::UserAgent->new; my $res = $agent->get($url); if ($res->is_error) { die "Bah." } my $xml = $res->content; my $rss = XML::RSS->new; $rss->parse($xml); for my $item (@{$rss->items}) { # do something with $item }
  • 26.
    • "Atom も読みたい !"
  • 27. #!/usr/bin/perl use strict; use warnings; use LWP::UserAgent; use XML::RSS; use XML::Atom::Feed; my $url = "http://example.com/rss.xml"; my $agent = LWP::UserAgent->new; my $res = $agent->get($url); if ($res->is_error) { die "Bah." } my $xml = $res->content; if ($res->content_type =~ /atom/) { my $feed = XML::Atom::Feed->new($xml); } else { my $rss = XML::RSS->new; $rss->parse($xml); }
  • 28.
    • If-Modified-Since で
    • 帯域節約したい!
  • 29. #!/usr/bin/perl use strict; use warnings; use LWP::UserAgent; use XML::RSS; my $url = "http://example.com/rss.xml"; my $agent = LWP::UserAgent->new; my $local = cache_path_for($url); my $res = $agent->mirror($url, $local); if ($res->is_error) { die "Bah." } my $xml = $res->content; …
  • 30.
    • " 壊れたフィードも
    • パースしたい! "
  • 31.
    • Etc., etc.
  • 32.
    • 他にも
  • 33.
    • rss2opml
    • http://aruntx.com/software/rss2opml/
  • 34.
    • rss2pdf
    • http://rss2pdf.com/
  • 35.
    • rss2atom
    • brian.wanamaker.com/mybicycle/2004/02/rss2atom.html
  • 36.
    • atom2rss
    • http://www.2rss.com/software.php?page=atom2rss
  • 37.
    • rss2ical
    • http://bura-bura.com/blog/archives/2004/06/22/rss2ical/
  • 38.
    • Bloglines2opml
    • http://mycvs.org/wp/wp-content/wp-transform.php
  • 39.
    • rss2gmail
    • http://www.cs.utexas.edu/~karu/gmailrss/
  • 40.
    • rss2imap
    • http://rss2imap.sourceforge.jp/
  • 41.
    • ebay2rss
    • http://www.2rss.com/software.php?page=ebay2rss
  • 42.
    • svn2rss
    • http://twiki.org/cgi-bin/view/Codev/Svn2rss
  • 43.
    • <any>2<any>
    Where either of <any> is RSS|Atom|OPML
  • 44.
    • This is ridiculous.
  • 45.
    • Different Languages,
    • Different Bugs.
    • No hackability
  • 46.  
  • 47. via http://www.atmarkit.co.jp/fnetwork/rensai/5minplagger/02.html
  • 48. IRC, Eject, Growl MSAgent, SSTP … Filter Publish StripRSSAd TruePermalink EntryFullText Pipe Thumbnail FindEnclosures FetchEnclosure SpamAssassin RSSLiberalDateTime URLBL ResolveRelativeLink … Gmail Delicious PDF MT Feed Planet Speech … Notify Bloglines Config OPML, XOXO File, DBI, FOAF … Mixi, Frepa POP3, iCal iTunes, Amazon YouTube … Subscription CustomFeed
  • 49. IRC, Eject, Growl MSAgent, SSTP … Filter Publish StripRSSAd TruePermalink EntryFullText Pipe Thumbnail FindEnclosures FetchEnclosure SpamAssassin RSSLiberalDateTime URLBL ResolveRelativeLink … Gmail Delicious PDF MT Feed Planet Speech … Notify Bloglines Config OPML, XOXO File, DBI, FOAF … Mixi, Frepa POP3, iCal iTunes, Amazon YouTube … Subscription CustomFeed
  • 50. IRC, Eject, Growl MSAgent, SSTP … Filter Publish StripRSSAd TruePermalink EntryFullText Pipe Thumbnail FindEnclosures FetchEnclosure SpamAssassin RSSLiberalDateTime URLBL ResolveRelativeLink … Gmail Delicious PDF MT Feed Planet Speech … Notify Bloglines Config OPML, XOXO File, DBI, FOAF … Mixi, Frepa POP3, iCal iTunes, Amazon YouTube … Subscription CustomFeed
  • 51. IRC, Eject, Growl MSAgent, SSTP … Filter Publish StripRSSAd TruePermalink EntryFullText Pipe Thumbnail FindEnclosures FetchEnclosure SpamAssassin RSSLiberalDateTime URLBL ResolveRelativeLink … Gmail Delicious PDF MT Feed Planet Speech … Notify Bloglines Config OPML , XOXO File, DBI, FOAF … Mixi, Frepa POP3, iCal iTunes, Amazon YouTube … Subscription CustomFeed
  • 52.
    • Just like UNIX pipe
    Subscribe OPML | StripRSSAd | ResolveRelativeLink | Publish Feed --type=Atom
  • 53. I believe RSS has the potential to be the “UNIX pipe of the internet” … Ray Ozzie CTO of Microsoft http://rayozzie.spaces.live.com/blog/cns!FB3017FBB9B2E142!285.entry
  • 54. &quot;the Unix shell for Web 2.0&quot;
  • 55.
    • 組み合わせの数
    • {CustomFeed,Subscription}/*.pm: 35
    • {Publish,Notify}/*.pm: 37
    • 35 * 37 = 1295
  • 56.
    • Plagger
    • Core features
  • 57.
    • RSS/Atom
    • Auto-Discovery
  • 58.
    • Support Feed formats
    • RSS 0.91 to Atom 1.0
    • (XML::Feed + hacks)
  • 59.
    • Support parsing
    • Broken feeds
    • (XML::Liberal)
  • 60.
    • HTTP optimizations
    • If-Modified-Since / gzip
    • (URI::Fetch)
  • 61.
    • Podcast / Videocast
    • Support
    • (RSS 2.0 & Atom 1.0)
  • 62.
    • Photocast
    • Media RSS
    • iTunes RSS*
  • 63.
    • 非同期ダウンロード
    • cURL, wget, HTTP::Parallel & HTTP::Async*
  • 64.
    • 完全な国際化
    • Unicode & Timezone
  • 65.
    • Access to
    • browser Cookies
    • IE, Safari, Firefox and w3m
  • 66.
    • Screen-scraping
    • Via CutomFeed::*
  • 67.
    • Stackable Plugins
  • 68.
    • Rule-based
    • Dispatch of Plugins
  • 69.
    • Plagger の
    • インストール
  • 70.  
  • 71.  
  • 72.
    • 省略
    http://plagger.org/trac/wiki/PlaggerQuickStart
  • 73.
    • Plagger
    • クイックチュートリアル
  • 74.
    • Email クライアントで
    • RSS フィードを読みたい!
    • (Gmail, Thunderbird)
  • 75. rss2email.yaml plugins: - module: Subscription::Config config: feed: - http://bulknews.vox.com/library/posts/atom.xml - http://bulknews.typepad.com/blog/ - module: Publish::Gmail config: mailto: miyagawa@gmail.com
  • 76. RSS in Gmail
  • 77. HTML + Images
  • 78. Feed Image (Logo / Buddy Icon)
  • 79. Search
  • 80. Auto grouping (“Conversations”)
  • 81. Diff
  • 82.
    • オフライン(飛行機)でも
    • フィードを読みたい!
  • 83. rss2email.yaml plugins: - module: Subscription::Config config: feed: - http://bulknews.vox.com/library/posts/atom.xml - http://bulknews.typepad.com/blog/ - module: Filter::FindEnclosures - module: Filter::FetchEnclosure config: dir: /tmp - module: Publish::Gmail config: mailto: miyagawa@gmail.com attach_enclosures: 1
  • 84. Offline Mode POP3 + Thunderbird
  • 85.
    • &quot; フィードのリストを
    • YAML で管理するのメンドウ &quot;
  • 86. opml2email.yaml plugins: - module: Subscription::OPML config: url: http://example.com/subscription.opml - module: Publish::Gmail config: mailto: miyagawa@gmail.com # subscription.opml <?xml version=&quot;1.0&quot;?> <opml> <outline title=&quot;Subscriptions&quot;> <outline title=&quot;miyagawa&quot; type=&quot;rss&quot; xmlUrl=&quot;http://bulknews.typepad.com/blog/atom.xml&quot; /> <outline title=&quot;miyagawa on Vox&quot; type=&quot;rss&quot; htmlUrl=&quot;http://bulknews.vox.com/&quot; /> </outline> </opml>
  • 87.
    • &quot;OPML を手動で編集するのはテラダルス &quot;
  • 88. filesub2email.yaml plugins: - module: Subscription::File config: url: file:///path/to/subscription.txt - module: Publish::Gmail config: mailto: miyagawa@gmail.com > cat subscription.txt http://bulknews.typepad.com/blog/atom.xml http://bulknews.vox.com/ >
  • 89.
    • &quot;1000 フィード以上読んでると自分のマシンで読むのは帯域のムダ。 Bloglines にクロールさせたい。 &quot;
  • 90. bloglines2email.yaml plugins: - module: Subscription::Bloglines config: username: YOU@example.com password: blahblahblah - module: Publish::Gmail config: mailto: miyagawa@gmail.com
  • 91.
    • &quot; それ Livedoor Reader で &quot;
  • 92. bloglines2email.yaml plugins: - module: Subscription::LivedoorReader config: username: YOU@example.com password: blahblahblah - module: Publish::Gmail config: mailto: miyagawa@gmail.com
  • 93.
    • &quot;Gmail / Thunderbird から
    • del.icio.us にブクマしたい !&quot;
  • 94. bloglines2email.yaml plugins: - module: Subscription::Bloglines config: username: YOU@example.com password: blahblahblah - module: Widget::Simple config: widget: delicious - module: Publish::Gmail config: mailto: miyagawa@gmail.com
  • 95.  
  • 96.
    • &quot; それはてなブックマークで &quot;
  • 97. bloglines2email.yaml plugins: - module: Subscription::Bloglines config: username: YOU@example.com password: blahblahblah - module: Widget::Simple config: widget: hatena_bookmark_users - module: Publish::Gmail config: mailto: miyagawa@gmail.com
  • 98.  
  • 99.
    • &quot;RSS 広告ウザス &quot;
  • 100. bloglines2email.yaml plugins: - module: Subscription::Bloglines config: username: YOU@example.com password: blahblahblah - module: Widget::Simple config: widget: delicious - module: Filter::StripRSSAd - module: Publish::Gmail config: mailto: miyagawa@gmail.com
  • 101.
    • Quick tour
    • For more plugins
  • 102. Plugin phases (types)
    • Subscription
    • Aggregator
    • CustomFeed
    • Filter
    • Publish
    • Search
    • Notify
  • 103. Plugin phases (types)
    • Subscription
    • Aggregator
    • CustomFeed
    • Filter
    • Publish
    • Search
    • Notify
  • 104.
    • Subscription
    • load subscriptions
    • (list the feeds/URLs to aggregate)
  • 105.
    • Subscription::Config
    - module: Subscription::Config config: feed: - http://www.yapcchicago.org/feed/ - http://tokyo.yapcasia.org/blog/
  • 106.
    • Subscription::OPML
    - module: Subscription::OPML config: url: http://www.example.com/subs.opml # subs.opml <opml> <outline xmlUrl=&quot;http://www.yapcchicago.org/feed/&quot; /> <outline htmlUrl=&quot;http://tokyo.yapcasia.org/blog/&quot; /> </opml>
  • 107.
    • Subscription::File
    - module: Subscription::File config: url: file:///path/to/subscription.txt % cat subscription.txt http://www.yapcchicago.org/feed/ http://tokyo.yapcasia.org/blog/ %
  • 108.
    • Subscription::XOXO
    - module: Subscription::XOXO config: url: http://www.example.com/subscription.html # subscription.html <ul class=&quot;xoxo&quot;> <li><a href=&quot;http://www.yapcchicago.org/feed/&quot;>YAPC::NA</a></li> <li><a href=&quot;http://tokyo.yapcasia.org/blog/&quot;>YAPC::NA</a></li> </ul>
  • 109.
    • Subscription::Bookmarks
    • Read bookmarks file of IE, Firefox and Safari
  • 110. Plugin phases (types)
    • Subscription
    • Aggregator
    • CustomFeed
    • Filter
    • Publish
    • Search
    • Notify
  • 111.
    • Filter
    • Normalize / Repair feed metadata
    • Upgrade feed content
    • Filter feed content using text filters
    • Invoke some action on entries
  • 112.
    • Filter::EntryFullText
    • 本文なしのフィードをアップグレード
    • 個別 HTML を取得して正規表現 / XPath
  • 113.
    • Filter::TruePermalink
    • リダイレクト URL などを Canonicalize
    • (e.g. http://…/go.php?url=….)
  • 114.
    • Filter::FindEnclosures
    • コンテンツからエンクロージャを抽出
    • <a href=&quot;http://…./foo.mp3&quot;>episode #1</a>
  • 115. Plugin phases (types)
    • Subscription
    • Aggregator
    • CustomFeed
    • Filter
    • Publish
    • Search
    • Notify
  • 116.
    • Publish
    • Publish aggregated entry to online services
    • reBlogging
    • Convert feeds to other formats
  • 117.
    • Publish::Feed
    • Republish feed in RSS/Atom
    • Good to use with scrapers
  • 118.
    • Publish::MT
    • Reblog entries using MT XML-RPC
  • 119.
    • Publish::MTWidget
  • 120.
    • Publish::Email
    • text/plain, multipart/alternative
    • Pluggable email protocols
    • (SMTP, SMTP Auth, IMAP, Maildir …)
  • 121.
    • Publish::iCal
    • Publish iCal feeds out of RSS/Atom
  • 122.
    • Publish::Excel
    仕事の合間に !
  • 123. Plugin phases (types)
    • Subscription
    • Aggregator
    • CustomFeed
    • Filter
    • Publish
    • Search
    • Notify
  • 124.
    • Search
    • Index aggregated entries on search engines
  • 125.
    • Search::Spotlight
  • 126.
    • Search::Estraier
    • Uses HyperEstraier XMLRPC node API
  • 127.
    • Search::Lucene*
    • Use Lucene WebService API
    • (OpenSearch 1.1 and Atom)
  • 128. Plugin phases (types)
    • Subscription
    • Aggregator
    • CustomFeed
    • Filter
    • Publish
    • Search
    • Notify
  • 129.
    • Notify
    • Notify feed updates in various ways
  • 130.
    • Notify::Growl
  • 131.
    • Notify::Balloon
  • 132.
    • Notify::MSAgent
  • 133. Notify::Eject Supports: Windows, Linux, FreeBSD and Mac OSX!
  • 134.
    • Notify::Pizza
  • 135.
    • Notify::Pizza
    Now it does Sushi too!
  • 136.
    • Plagger に
    • 対する誤解
  • 137.  
  • 138. http://d.hatena.ne.jp/sugarcut/20061117/p1 それ Pla 脳 それ Plagger で できるよ それプラズマで 説明できるよ
  • 139.  
  • 140.
    • 何でもできる
    • … わけじゃない
  • 141.  
  • 142.
    • Plagger is a
    • &quot;Feed&quot; aggregator
  • 143. Plagger::Subscription Plagger::Feed author id link title tags url entries Plagger::Feed author id link title tags url entries Plagger::Feed author id link title tags url entries Plagger::Entry author id link permalink title tags enclosures
  • 144. ピザ Pla の意義
  • 145.  
  • 146.  
  • 147.  
  • 148. &quot;the Unix shell for Web 2.0&quot;
  • 149. &quot;RSS is the Standard IO for the Web&quot;
  • 150. &quot;RSS is the Standard IO for the Web&quot;
  • 151. Feed formats RSS 0.91 RSS 2.0 RSS1.0 / RDF Atom 1.0 JSON iCal OPML XOXO XBEL Sitemaps attention.xml Amazon API Google API OpenSearch AtomPP GData
  • 152. Feed Vocabulary / Extensions rvw: Enclosures Photocast iTunes RSS Media RSS Dublin Core FOAF microformats
  • 153.  
  • 154. Plagger = The duct tape of the web.
  • 155.
    • Example:
    • Location metadata
  • 156. N 37.7782 W 122.3973
  • 157. GeoRSS <georss:point>37.7782 -122.3973</georss:point> <georss:where> <gml:Point> <gml:pos>37.7782 -122.3973</gml:pos> </gml:Point> </georss:where> xmlns:georss=&quot;http://www.georss.org/georss&quot; xmlns:gml=&quot;http://www.opengis.net/gml&quot;
  • 158. RDF geo vocabulary <foaf:based_near> <geo:Point> <geo:lat>35.678</geo:lat> <geo:long>139.770</geo:long> </geo:Point> </foaf:based_near> xmlns:geo=&quot;http://www.w3.org/2003/01/geo/wgs84_pos#&quot;
  • 159. Flickr geo tag <media:category scheme=&quot;urn:flickr:tags&quot;> geo:lat=37.7782 geo:lon=-122.3973 </media:category> xmlns:media=&quot;http://search.yahoo.com/mrss/&quot;
  • 160. GeoURL <meta name=&quot;ICBM&quot; content=&quot;37.7782, -122.3973&quot; />
  • 161. geo microformats <div class=&quot;geo&quot;> <span class=&quot;latitude&quot;>37.7782</span> <span class=&quot;longitude&quot;>-122.3973</span> </div>
  • 162. Links to Google Maps <a href=&quot;http://maps.google.com/maps?q=37.7782,-122.3973&z=16&quot;> Link to Google Maps</a>
  • 163. Eznavi mail
  • 164. Photo EXIF
  • 165. adr <div class=&quot;adr&quot;> <div class=&quot;street-address&quot;>548 4th St.</div> <span class=&quot;locality&quot;>San Francisco</span>, <span class=&quot;region&quot;>CA</span> <span class=&quot;postal-code&quot;>94107</span> <div class=&quot;country-name&quot;>U.S.A.</div> </div>
  • 166.
    • We don't care
    • about format diffs.
  • 167. GeoRSS <georss:point>37.7782 -122.3973</georss:point> <georss:where> <gml:Point> <gml:pos>37.7782 -122.3973</gml:pos> </gml:Point> </georss:where> Namespace::GeoRSS
  • 168. RDF geo vocabulary <foaf:based_near> <geo:Point> <geo:lat>35.678</geo:lat> <geo:long>139.770</geo:long> </geo:Point> </foaf:based_near> Namespace::Geo
  • 169. Flickr geo tag <media:category scheme=&quot;urn:flickr:tags&quot;> geo:lat=37.7782 geo:lon=-122.3973 </media:category> Filter::geotagged
  • 170. GeoURL <meta name=&quot;ICBM&quot; content=&quot;37.7782, -122.3973&quot; /> Filter::GeoURL
  • 171. geo microformats <div class=&quot;geo&quot;> <span class=&quot;latitude&quot;>37.7782</span> <span class=&quot;longitude&quot;>-122.3973</span> </div> Filter::Microformats::geo
  • 172. Links to Google Maps <a href=&quot;http://maps.google.com/maps?q=37.7782,-122.3973&z=16&quot;> Link to Google Maps</a> Filter::ExtractMapsLinks
  • 173. Eznavi mail Filter::ExtractMapsLinks
  • 174. Photo EXIF Filter::FetchEnclosure + Filter::ExtractEXIF
  • 175. adr <div class=&quot;adr&quot;> <div class=&quot;street-address&quot;>548 4th St.</div> <span class=&quot;locality&quot;>San Francisco</span>, <span class=&quot;region&quot;>CA</span> <span class=&quot;postal-code&quot;>94107</span> <div class=&quot;country-name&quot;>U.S.A.</div> </div> Filter::Microformats::adr + Filter::Geocoding::US
  • 176.
    • Publish::KML
    • Publish::GoogleMaps
    • Publish::Feed
    • (with geotags)
  • 177.
    • Everything's done
    • in plugins
    • = Clean & extensible.
  • 178.
    • Plagger
    • dev. Status
  • 179.
    • Version
    • 0.7.13
  • 180.
    • Coming Soon …
  • 181.
    • iTunes RSS support
  • 182.
    • Geo extensions
  • 183.
    • Enclosure processors
    • ffmpeg, Sync::PSP, Sync::iPodVideo
  • 184.
    • Pluggable summarizer
    • & text formatter
    • Lingua::EN::Summarize, Text::Original, HTML::WikiConverter, HTML::FormatText
  • 185.
    • Rich Media metadata
    • ID3 tag in enclosures
    • Links to imdb.com / amazon.com
    • hReview microformats
  • 186.
    • Calendar Support
    • iCal parser & emitter
    • hCalendar microformats
    • .ics attached in emails
    • Sync::SyncML
  • 187.
    • Email refactoring
    • text/plain, iso-2022-jp support
    • Pluggable storage engines
  • 188.
    • http://plagger.org/
    • Planet, Mailing List, IRC
    • Bug Tracking, SVN repository
  • 189.
    • #plagger-ja on freenode
  • 190.
    • Join Us!
  • 191.
    • Thank you
    • Questions?