Introduce of the parallel distributed Crawler with scraping Dynamic HTML

2,774 views

Published on

動的HTMLスクレイピング対応並列分散クローラのご紹介
札幌Ruby会議02

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,774
On SlideShare
0
From Embeds
0
Number of Embeds
134
Actions
Shares
0
Downloads
9
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Introduce of the parallel distributed Crawler with scraping Dynamic HTML

  1. 1. require 'rubygems' require 'sinatra' post '/' do url = params[:url] data = params[:data] store(url, data) next_url = process(url) next_url end
  2. 2. // ==UserScript== // @name greasi_scraper // @namespace http://libelabo.jp/ // @include http://images.google.co.jp/* // @require http://ajax.googleapis.com/ajax/libs/jquery/1.3.1/jquery.min.js // ==/UserScript== function postData(data) { var postData = $.param({url: location.href, data: JSON.stringify(data)}); GM_xmlhttpRequest({ method: "POST", url: "http://libelabo.jp/greasi/", headers: {'Content-type':'application/x-www-form-urlencoded'}, data: postData, onload: function(xhr){ location.href = xhr.responseText } }); }

×