Introduce of the parallel distributed Crawler with scraping Dynamic HTML

Kei Shiratsuchi
Kei ShiratsuchiSenior Web Developer at Oh My Glasses, Inc
Introduce of the parallel distributed Crawler with scraping Dynamic HTML
Introduce of the parallel distributed Crawler with scraping Dynamic HTML
Introduce of the parallel distributed Crawler with scraping Dynamic HTML
Introduce of the parallel distributed Crawler with scraping Dynamic HTML
Introduce of the parallel distributed Crawler with scraping Dynamic HTML
Introduce of the parallel distributed Crawler with scraping Dynamic HTML
Introduce of the parallel distributed Crawler with scraping Dynamic HTML
Introduce of the parallel distributed Crawler with scraping Dynamic HTML
Introduce of the parallel distributed Crawler with scraping Dynamic HTML
Introduce of the parallel distributed Crawler with scraping Dynamic HTML
Introduce of the parallel distributed Crawler with scraping Dynamic HTML
Introduce of the parallel distributed Crawler with scraping Dynamic HTML
Introduce of the parallel distributed Crawler with scraping Dynamic HTML
Introduce of the parallel distributed Crawler with scraping Dynamic HTML
Introduce of the parallel distributed Crawler with scraping Dynamic HTML
Introduce of the parallel distributed Crawler with scraping Dynamic HTML
Introduce of the parallel distributed Crawler with scraping Dynamic HTML
Introduce of the parallel distributed Crawler with scraping Dynamic HTML
require 'rubygems'
require 'sinatra'

post '/' do
  url = params[:url]
  data = params[:data]
  store(url, data)
  next_url = process(url)
  next_url
end
//   ==UserScript==
//   @name      greasi_scraper
//   @namespace http://libelabo.jp/
//   @include   http://images.google.co.jp/*
//   @require   http://ajax.googleapis.com/ajax/libs/jquery/1.3.1/jquery.min.js
//   ==/UserScript==


function postData(data) {
  var postData = $.param({url: location.href, data: JSON.stringify(data)});
  GM_xmlhttpRequest({
    method: "POST",
    url:     "http://libelabo.jp/greasi/",
    headers: {'Content-type':'application/x-www-form-urlencoded'},
    data:    postData,
    onload: function(xhr){ location.href = xhr.responseText }
  });
}
Introduce of the parallel distributed Crawler with scraping Dynamic HTML
Introduce of the parallel distributed Crawler with scraping Dynamic HTML
Introduce of the parallel distributed Crawler with scraping Dynamic HTML
Introduce of the parallel distributed Crawler with scraping Dynamic HTML
Introduce of the parallel distributed Crawler with scraping Dynamic HTML
Introduce of the parallel distributed Crawler with scraping Dynamic HTML
Introduce of the parallel distributed Crawler with scraping Dynamic HTML
1 of 27

Recommended

Web+GISという視点から見たGISの方向性 by
Web+GISという視点から見たGISの方向性Web+GISという視点から見たGISの方向性
Web+GISという視点から見たGISの方向性Hidenori Fujimura
1.8K views49 slides
AngularJS - $http & $resource Services by
AngularJS - $http & $resource ServicesAngularJS - $http & $resource Services
AngularJS - $http & $resource ServicesEyal Vardi
25.3K views25 slides
Expressを使ってみた by
Expressを使ってみたExpressを使ってみた
Expressを使ってみたAtsuhiro Takiguchi
886 views33 slides
AngularJS Routing by
AngularJS RoutingAngularJS Routing
AngularJS RoutingEyal Vardi
10.1K views13 slides
Dart and AngularDart by
Dart and AngularDartDart and AngularDart
Dart and AngularDartLoc Nguyen
1K views31 slides
hachioji.pm #40 : asynchronous in JS by
hachioji.pm #40 : asynchronous in JShachioji.pm #40 : asynchronous in JS
hachioji.pm #40 : asynchronous in JSKotaro Kawashima
536 views7 slides

More Related Content

What's hot

AngularJS Services by
AngularJS ServicesAngularJS Services
AngularJS ServicesEyal Vardi
9.8K views28 slides
Code igniter parameter passing techniques by
Code igniter parameter passing techniquesCode igniter parameter passing techniques
Code igniter parameter passing techniquesRakhitha Ratnayake
17.6K views1 slide
Let's Build A Gutenberg Block | WordCamp Europe 2018 by
Let's Build A Gutenberg Block | WordCamp Europe 2018Let's Build A Gutenberg Block | WordCamp Europe 2018
Let's Build A Gutenberg Block | WordCamp Europe 2018Lara Schenck
2.5K views51 slides
Div id by
Div idDiv id
Div idNelson Dionizio
95 views1 slide
Backbone by
BackboneBackbone
BackboneGlenn De Backer
809 views55 slides
Understanding backbonejs by
Understanding backbonejsUnderstanding backbonejs
Understanding backbonejsNick Lee
2.6K views39 slides

What's hot(20)

AngularJS Services by Eyal Vardi
AngularJS ServicesAngularJS Services
AngularJS Services
Eyal Vardi9.8K views
Code igniter parameter passing techniques by Rakhitha Ratnayake
Code igniter parameter passing techniquesCode igniter parameter passing techniques
Code igniter parameter passing techniques
Rakhitha Ratnayake17.6K views
Let's Build A Gutenberg Block | WordCamp Europe 2018 by Lara Schenck
Let's Build A Gutenberg Block | WordCamp Europe 2018Let's Build A Gutenberg Block | WordCamp Europe 2018
Let's Build A Gutenberg Block | WordCamp Europe 2018
Lara Schenck2.5K views
Understanding backbonejs by Nick Lee
Understanding backbonejsUnderstanding backbonejs
Understanding backbonejs
Nick Lee2.6K views
Flask and Angular: An approach to build robust platforms by Ayush Sharma
Flask and Angular:  An approach to build robust platformsFlask and Angular:  An approach to build robust platforms
Flask and Angular: An approach to build robust platforms
Ayush Sharma197 views
[Srijan Wednesday Webinars] Routing in Drupal 8: Decoupling hook_menu by Srijan Technologies
[Srijan Wednesday Webinars] Routing in Drupal 8: Decoupling hook_menu[Srijan Wednesday Webinars] Routing in Drupal 8: Decoupling hook_menu
[Srijan Wednesday Webinars] Routing in Drupal 8: Decoupling hook_menu
Srijan Technologies1.1K views
Clojure Workshop: Web development by Sytac
Clojure Workshop: Web developmentClojure Workshop: Web development
Clojure Workshop: Web development
Sytac328 views
The hitchhiker's guide to the Webpack - Sara Vieira - Codemotion Amsterdam 2017 by Codemotion
The hitchhiker's guide to the Webpack - Sara Vieira - Codemotion Amsterdam 2017The hitchhiker's guide to the Webpack - Sara Vieira - Codemotion Amsterdam 2017
The hitchhiker's guide to the Webpack - Sara Vieira - Codemotion Amsterdam 2017
Codemotion765 views
Introduction to AngularJS For WordPress Developers by Caldera Labs
Introduction to AngularJS For WordPress DevelopersIntroduction to AngularJS For WordPress Developers
Introduction to AngularJS For WordPress Developers
Caldera Labs5.6K views
Angular Promises and Advanced Routing by Alexe Bogdan
Angular Promises and Advanced RoutingAngular Promises and Advanced Routing
Angular Promises and Advanced Routing
Alexe Bogdan3.2K views
Creating effective ruby gems by Ben Zhang
Creating effective ruby gemsCreating effective ruby gems
Creating effective ruby gems
Ben Zhang980 views
Ruby on Rails Intro by zhang tao
Ruby on Rails IntroRuby on Rails Intro
Ruby on Rails Intro
zhang tao294 views
Introducing AngularJS by Loc Nguyen
Introducing AngularJSIntroducing AngularJS
Introducing AngularJS
Loc Nguyen661 views

Recently uploaded

NET Conf 2023 Recap by
NET Conf 2023 RecapNET Conf 2023 Recap
NET Conf 2023 RecapLee Richardson
10 views71 slides
Ransomware is Knocking your Door_Final.pdf by
Ransomware is Knocking your Door_Final.pdfRansomware is Knocking your Door_Final.pdf
Ransomware is Knocking your Door_Final.pdfSecurity Bootcamp
59 views46 slides
Unit 1_Lecture 2_Physical Design of IoT.pdf by
Unit 1_Lecture 2_Physical Design of IoT.pdfUnit 1_Lecture 2_Physical Design of IoT.pdf
Unit 1_Lecture 2_Physical Design of IoT.pdfStephenTec
12 views36 slides
"Running students' code in isolation. The hard way", Yurii Holiuk by
"Running students' code in isolation. The hard way", Yurii Holiuk "Running students' code in isolation. The hard way", Yurii Holiuk
"Running students' code in isolation. The hard way", Yurii Holiuk Fwdays
17 views34 slides
Special_edition_innovator_2023.pdf by
Special_edition_innovator_2023.pdfSpecial_edition_innovator_2023.pdf
Special_edition_innovator_2023.pdfWillDavies22
18 views6 slides
Zero to Automated in Under a Year by
Zero to Automated in Under a YearZero to Automated in Under a Year
Zero to Automated in Under a YearNetwork Automation Forum
15 views23 slides

Recently uploaded(20)

Unit 1_Lecture 2_Physical Design of IoT.pdf by StephenTec
Unit 1_Lecture 2_Physical Design of IoT.pdfUnit 1_Lecture 2_Physical Design of IoT.pdf
Unit 1_Lecture 2_Physical Design of IoT.pdf
StephenTec12 views
"Running students' code in isolation. The hard way", Yurii Holiuk by Fwdays
"Running students' code in isolation. The hard way", Yurii Holiuk "Running students' code in isolation. The hard way", Yurii Holiuk
"Running students' code in isolation. The hard way", Yurii Holiuk
Fwdays17 views
Special_edition_innovator_2023.pdf by WillDavies22
Special_edition_innovator_2023.pdfSpecial_edition_innovator_2023.pdf
Special_edition_innovator_2023.pdf
WillDavies2218 views
Future of AR - Facebook Presentation by ssuserb54b561
Future of AR - Facebook PresentationFuture of AR - Facebook Presentation
Future of AR - Facebook Presentation
ssuserb54b56115 views
PharoJS - Zürich Smalltalk Group Meetup November 2023 by Noury Bouraqadi
PharoJS - Zürich Smalltalk Group Meetup November 2023PharoJS - Zürich Smalltalk Group Meetup November 2023
PharoJS - Zürich Smalltalk Group Meetup November 2023
Noury Bouraqadi132 views
Voice Logger - Telephony Integration Solution at Aegis by Nirmal Sharma
Voice Logger - Telephony Integration Solution at AegisVoice Logger - Telephony Integration Solution at Aegis
Voice Logger - Telephony Integration Solution at Aegis
Nirmal Sharma39 views
Piloting & Scaling Successfully With Microsoft Viva by Richard Harbridge
Piloting & Scaling Successfully With Microsoft VivaPiloting & Scaling Successfully With Microsoft Viva
Piloting & Scaling Successfully With Microsoft Viva
Business Analyst Series 2023 - Week 3 Session 5 by DianaGray10
Business Analyst Series 2023 -  Week 3 Session 5Business Analyst Series 2023 -  Week 3 Session 5
Business Analyst Series 2023 - Week 3 Session 5
DianaGray10300 views
Data Integrity for Banking and Financial Services by Precisely
Data Integrity for Banking and Financial ServicesData Integrity for Banking and Financial Services
Data Integrity for Banking and Financial Services
Precisely25 views
6g - REPORT.pdf by Liveplex
6g - REPORT.pdf6g - REPORT.pdf
6g - REPORT.pdf
Liveplex10 views
Case Study Copenhagen Energy and Business Central.pdf by Aitana
Case Study Copenhagen Energy and Business Central.pdfCase Study Copenhagen Energy and Business Central.pdf
Case Study Copenhagen Energy and Business Central.pdf
Aitana16 views

Introduce of the parallel distributed Crawler with scraping Dynamic HTML

  • 19. require 'rubygems' require 'sinatra' post '/' do url = params[:url] data = params[:data] store(url, data) next_url = process(url) next_url end
  • 20. // ==UserScript== // @name greasi_scraper // @namespace http://libelabo.jp/ // @include http://images.google.co.jp/* // @require http://ajax.googleapis.com/ajax/libs/jquery/1.3.1/jquery.min.js // ==/UserScript== function postData(data) { var postData = $.param({url: location.href, data: JSON.stringify(data)}); GM_xmlhttpRequest({ method: "POST", url: "http://libelabo.jp/greasi/", headers: {'Content-type':'application/x-www-form-urlencoded'}, data: postData, onload: function(xhr){ location.href = xhr.responseText } }); }