Mechanize at the Ruby Drink-up of Sophia, November 2011

R
Simple web-scraping with Mechanize and Nokogiri Nov 8 th  2011 Muriel Salvan Open Source Lead developer and architect X-Aeon Solutions http://x-aeon.com
The need ,[object Object],[object Object]
Parse  HTML pages (DOM) =>  Nokogiri
Installation ,[object Object],[object Object]
! Version 1.0.0 can be more stable than 2.x.x for some complex queries (" gem install mechanize -v 1.0.0 " to enforce it).
Basic example require   'mechanize' agent = Mechanize. new page = agent. get ( 'http://rivierarb.fr' ) element = page. root . css ( 'h1.logo' ) . first element . content =>   "Riviera.rb"
Main usage ,[object Object]
Use the agent to  perform HTTP(S) requests  (get, post). Each request gives a Nokogiri page.
Parse the page  using CSS selectors, XPath, DOM iterators.
Fill and post forms  using intuitive helpers.
Common requests page = agent. get ( 'http://rivierarb.fr' ) page2 = page. links_with ( :text   =>   'Green King' ) . first . click page3 = agent. back agent. user_agent  =  'My user agent'
Common parsing Selectors page. root . css ( 'body div.myclass' ) . each   {   | element |  …  } page. root . xpath ( '//h3/a[@class="l"]' ) . eac h   {   | element |  …  }
Common parsing Elements < div > < a   href = &quot; http://www.google.com &quot; > Click here < img   src = &quot; http://www.google.com/favicon.ico &quot; / > < / a > < / div > element [ 'href' ] =>   &quot;http: // www.google.com&quot; element. content =>   &quot;    Click here       &quot; element. children . second . name =>   &quot;img&quot; element. parent . name =>   &quot;div&quot; element
Filling and submitting forms Basic example Google search form =  agent. get ( 'http://www.google.com' ) . forms . first form. q  =  'Rivierarb' results_page = form. submit
1 of 14

Recommended

Widgets: Making Your Site Great and Letting Others Help - WordCamp Victoria by
Widgets: Making Your Site Great and Letting Others Help - WordCamp VictoriaWidgets: Making Your Site Great and Letting Others Help - WordCamp Victoria
Widgets: Making Your Site Great and Letting Others Help - WordCamp VictoriaJeff Richards
808 views27 slides
Functional testing with capybara by
Functional testing with capybaraFunctional testing with capybara
Functional testing with capybarakoffeinfrei
3.4K views33 slides
Embracing Capybara by
Embracing CapybaraEmbracing Capybara
Embracing CapybaraTim Moore
7.6K views42 slides
Introduction to JQuery by
Introduction to JQueryIntroduction to JQuery
Introduction to JQueryMobME Technical
1.1K views16 slides
Story Driven Development With Cucumber by
Story Driven Development With CucumberStory Driven Development With Cucumber
Story Driven Development With CucumberSean Cribbs
6K views64 slides
Haml. New HTML? (RU) by
Haml. New HTML? (RU)Haml. New HTML? (RU)
Haml. New HTML? (RU)Kirill Zonov
6K views13 slides

More Related Content

What's hot

Changing Template Engine by
Changing Template EngineChanging Template Engine
Changing Template EngineTakatsugu Shigeta
911 views53 slides
A Universal Automation Framework based on BDD Cucumber and Ruby on Rails - Ph... by
A Universal Automation Framework based on BDD Cucumber and Ruby on Rails - Ph...A Universal Automation Framework based on BDD Cucumber and Ruby on Rails - Ph...
A Universal Automation Framework based on BDD Cucumber and Ruby on Rails - Ph...Ho Chi Minh City Software Testing Club
2.7K views32 slides
Haml in 5 minutes by
Haml in 5 minutesHaml in 5 minutes
Haml in 5 minutescameronbot
8.8K views18 slides
Evolution of API With Blogging by
Evolution of API With BloggingEvolution of API With Blogging
Evolution of API With BloggingTakatsugu Shigeta
2.2K views75 slides
Enabling agile devliery through enabling BDD in PHP projects by
Enabling agile devliery through enabling BDD in PHP projectsEnabling agile devliery through enabling BDD in PHP projects
Enabling agile devliery through enabling BDD in PHP projectsKonstantin Kudryashov
3.3K views91 slides
WordPress as a Content Management System by
WordPress as a Content Management SystemWordPress as a Content Management System
WordPress as a Content Management SystemValent Mustamin
2.4K views28 slides

What's hot(20)

Haml in 5 minutes by cameronbot
Haml in 5 minutesHaml in 5 minutes
Haml in 5 minutes
cameronbot8.8K views
Enabling agile devliery through enabling BDD in PHP projects by Konstantin Kudryashov
Enabling agile devliery through enabling BDD in PHP projectsEnabling agile devliery through enabling BDD in PHP projects
Enabling agile devliery through enabling BDD in PHP projects
WordPress as a Content Management System by Valent Mustamin
WordPress as a Content Management SystemWordPress as a Content Management System
WordPress as a Content Management System
Valent Mustamin2.4K views
Add row in asp.net Gridview on button click using C# and vb.net by Vijay Saklani
Add row in asp.net Gridview on button click using C# and vb.netAdd row in asp.net Gridview on button click using C# and vb.net
Add row in asp.net Gridview on button click using C# and vb.net
Vijay Saklani501 views
Ajax by wangjiaz
AjaxAjax
Ajax
wangjiaz467 views
Dicas de palestra by Fabio Akita
Dicas de palestraDicas de palestra
Dicas de palestra
Fabio Akita1.1K views
How to Create simple One Page site by Moneer kamal
How to Create simple One Page siteHow to Create simple One Page site
How to Create simple One Page site
Moneer kamal318 views
Gadgets Intro (Plus Mapplets) by Pamela Fox
Gadgets Intro (Plus Mapplets)Gadgets Intro (Plus Mapplets)
Gadgets Intro (Plus Mapplets)
Pamela Fox557 views
Rugalytics | Ruby Manor Nov 2008 by Rob
Rugalytics | Ruby Manor Nov 2008Rugalytics | Ruby Manor Nov 2008
Rugalytics | Ruby Manor Nov 2008
Rob 21.2K views
シックス・アパート・フレームワーク by Takatsugu Shigeta
シックス・アパート・フレームワークシックス・アパート・フレームワーク
シックス・アパート・フレームワーク
Takatsugu Shigeta3.3K views
Components are the Future of the Web: It’s Going To Be Okay by FITC
Components are the Future of the Web: It’s Going To Be OkayComponents are the Future of the Web: It’s Going To Be Okay
Components are the Future of the Web: It’s Going To Be Okay
FITC1.1K views
Building high-fidelity interactive prototypes with jQuery by David Park
Building high-fidelity interactive prototypes with jQueryBuilding high-fidelity interactive prototypes with jQuery
Building high-fidelity interactive prototypes with jQuery
David Park3.5K views
Building a Blogging System -- Rapidly using Alpha Five v10 with Codeless AJAX... by Richard Rabins
Building a Blogging System -- Rapidly using Alpha Five v10 with Codeless AJAX...Building a Blogging System -- Rapidly using Alpha Five v10 with Codeless AJAX...
Building a Blogging System -- Rapidly using Alpha Five v10 with Codeless AJAX...
Richard Rabins1.6K views
Symfony2 by mdpatrick
Symfony2Symfony2
Symfony2
mdpatrick570 views
HTML5 Forms - KISS time - Fronteers by Robert Nyman
HTML5 Forms - KISS time - FronteersHTML5 Forms - KISS time - Fronteers
HTML5 Forms - KISS time - Fronteers
Robert Nyman8.5K views

Similar to Mechanize at the Ruby Drink-up of Sophia, November 2011

jQuery for Sharepoint Dev by
jQuery for Sharepoint DevjQuery for Sharepoint Dev
jQuery for Sharepoint DevZeddy Iskandar
1.2K views41 slides
10 Things You're Not Doing [IBM Lotus Notes Domino Application Development] by
10 Things You're Not Doing [IBM Lotus Notes Domino Application Development]10 Things You're Not Doing [IBM Lotus Notes Domino Application Development]
10 Things You're Not Doing [IBM Lotus Notes Domino Application Development]Chris Toohey
994 views13 slides
IBM Lotus Notes Domino XPages and XPages for Mobile by
IBM Lotus Notes Domino XPages and XPages for MobileIBM Lotus Notes Domino XPages and XPages for Mobile
IBM Lotus Notes Domino XPages and XPages for MobileChris Toohey
2.4K views49 slides
ASP_NET Features by
ASP_NET FeaturesASP_NET Features
ASP_NET FeaturesBiswadip Goswami
364 views8 slides
SlideShare Instant by
SlideShare InstantSlideShare Instant
SlideShare InstantSaket Choudhary
814 views13 slides
SlideShare Instant by
SlideShare InstantSlideShare Instant
SlideShare InstantSaket Choudhary
834 views13 slides

Similar to Mechanize at the Ruby Drink-up of Sophia, November 2011(20)

jQuery for Sharepoint Dev by Zeddy Iskandar
jQuery for Sharepoint DevjQuery for Sharepoint Dev
jQuery for Sharepoint Dev
Zeddy Iskandar1.2K views
10 Things You're Not Doing [IBM Lotus Notes Domino Application Development] by Chris Toohey
10 Things You're Not Doing [IBM Lotus Notes Domino Application Development]10 Things You're Not Doing [IBM Lotus Notes Domino Application Development]
10 Things You're Not Doing [IBM Lotus Notes Domino Application Development]
Chris Toohey994 views
IBM Lotus Notes Domino XPages and XPages for Mobile by Chris Toohey
IBM Lotus Notes Domino XPages and XPages for MobileIBM Lotus Notes Domino XPages and XPages for Mobile
IBM Lotus Notes Domino XPages and XPages for Mobile
Chris Toohey2.4K views
Getting the Most Out of OpenSocial Gadgets by Atlassian
Getting the Most Out of OpenSocial GadgetsGetting the Most Out of OpenSocial Gadgets
Getting the Most Out of OpenSocial Gadgets
Atlassian1K views
Component and Event-Driven Architectures in PHP by Stephan Schmidt
Component and Event-Driven Architectures in PHPComponent and Event-Driven Architectures in PHP
Component and Event-Driven Architectures in PHP
Stephan Schmidt1.6K views
Building Complex GUI Apps The Right Way. With Ample SDK - SWDC2010 by Sergey Ilinsky
Building Complex GUI Apps The Right Way. With Ample SDK - SWDC2010Building Complex GUI Apps The Right Way. With Ample SDK - SWDC2010
Building Complex GUI Apps The Right Way. With Ample SDK - SWDC2010
Sergey Ilinsky1.5K views
Javascript: Ajax & DOM Manipulation v1.2 by borkweb
Javascript: Ajax & DOM Manipulation v1.2Javascript: Ajax & DOM Manipulation v1.2
Javascript: Ajax & DOM Manipulation v1.2
borkweb2.3K views
Yahoo Mobile Widgets by Jose Palazon
Yahoo Mobile WidgetsYahoo Mobile Widgets
Yahoo Mobile Widgets
Jose Palazon562 views
Flex For Flash Developers Ff 2006 Final by ematrix
Flex For Flash Developers Ff 2006 FinalFlex For Flash Developers Ff 2006 Final
Flex For Flash Developers Ff 2006 Final
ematrix510 views
Flash templates for Joomla! by Herman Peeren
Flash templates for Joomla!Flash templates for Joomla!
Flash templates for Joomla!
Herman Peeren3.3K views
Mashups as Collection of Widgets by giurca
Mashups as Collection of WidgetsMashups as Collection of Widgets
Mashups as Collection of Widgets
giurca754 views
Lecture 6 - Comm Lab: Web @ ITP by yucefmerhi
Lecture 6 - Comm Lab: Web @ ITPLecture 6 - Comm Lab: Web @ ITP
Lecture 6 - Comm Lab: Web @ ITP
yucefmerhi890 views
Master pages ppt by Iblesoft
Master pages pptMaster pages ppt
Master pages ppt
Iblesoft9.9K views
HTML5 Overview by reybango
HTML5 OverviewHTML5 Overview
HTML5 Overview
reybango1.7K views

More from rivierarb

Ruby 2.0 at the Ruby drink-up of Sophia, February 2013 by
Ruby 2.0 at the Ruby drink-up of Sophia, February 2013Ruby 2.0 at the Ruby drink-up of Sophia, February 2013
Ruby 2.0 at the Ruby drink-up of Sophia, February 2013rivierarb
1.3K views15 slides
Ruby object model at the Ruby drink-up of Sophia, January 2013 by
Ruby object model at the Ruby drink-up of Sophia, January 2013Ruby object model at the Ruby drink-up of Sophia, January 2013
Ruby object model at the Ruby drink-up of Sophia, January 2013rivierarb
1.4K views23 slides
Ruby and Twitter at the Ruby drink-up of Sophia, January 2013 by
Ruby and Twitter at the Ruby drink-up of Sophia, January 2013Ruby and Twitter at the Ruby drink-up of Sophia, January 2013
Ruby and Twitter at the Ruby drink-up of Sophia, January 2013rivierarb
1.5K views11 slides
PoBot at the Ruby drink-up of Sophia, July 2012 by
PoBot at the Ruby drink-up of Sophia, July 2012PoBot at the Ruby drink-up of Sophia, July 2012
PoBot at the Ruby drink-up of Sophia, July 2012rivierarb
919 views7 slides
Ruby C extensions at the Ruby drink-up of Sophia, April 2012 by
Ruby C extensions at the Ruby drink-up of Sophia, April 2012Ruby C extensions at the Ruby drink-up of Sophia, April 2012
Ruby C extensions at the Ruby drink-up of Sophia, April 2012rivierarb
2.4K views35 slides
Pry at the Ruby Drink-up of Sophia, February 2012 by
Pry at the Ruby Drink-up of Sophia, February 2012Pry at the Ruby Drink-up of Sophia, February 2012
Pry at the Ruby Drink-up of Sophia, February 2012rivierarb
1.5K views13 slides

More from rivierarb(9)

Ruby 2.0 at the Ruby drink-up of Sophia, February 2013 by rivierarb
Ruby 2.0 at the Ruby drink-up of Sophia, February 2013Ruby 2.0 at the Ruby drink-up of Sophia, February 2013
Ruby 2.0 at the Ruby drink-up of Sophia, February 2013
rivierarb1.3K views
Ruby object model at the Ruby drink-up of Sophia, January 2013 by rivierarb
Ruby object model at the Ruby drink-up of Sophia, January 2013Ruby object model at the Ruby drink-up of Sophia, January 2013
Ruby object model at the Ruby drink-up of Sophia, January 2013
rivierarb1.4K views
Ruby and Twitter at the Ruby drink-up of Sophia, January 2013 by rivierarb
Ruby and Twitter at the Ruby drink-up of Sophia, January 2013Ruby and Twitter at the Ruby drink-up of Sophia, January 2013
Ruby and Twitter at the Ruby drink-up of Sophia, January 2013
rivierarb1.5K views
PoBot at the Ruby drink-up of Sophia, July 2012 by rivierarb
PoBot at the Ruby drink-up of Sophia, July 2012PoBot at the Ruby drink-up of Sophia, July 2012
PoBot at the Ruby drink-up of Sophia, July 2012
rivierarb919 views
Ruby C extensions at the Ruby drink-up of Sophia, April 2012 by rivierarb
Ruby C extensions at the Ruby drink-up of Sophia, April 2012Ruby C extensions at the Ruby drink-up of Sophia, April 2012
Ruby C extensions at the Ruby drink-up of Sophia, April 2012
rivierarb2.4K views
Pry at the Ruby Drink-up of Sophia, February 2012 by rivierarb
Pry at the Ruby Drink-up of Sophia, February 2012Pry at the Ruby Drink-up of Sophia, February 2012
Pry at the Ruby Drink-up of Sophia, February 2012
rivierarb1.5K views
Piloting processes through std IO at the Ruby Drink-up of Sophia, January 2012 by rivierarb
Piloting processes through std IO at the Ruby Drink-up of Sophia, January 2012Piloting processes through std IO at the Ruby Drink-up of Sophia, January 2012
Piloting processes through std IO at the Ruby Drink-up of Sophia, January 2012
rivierarb1.8K views
DRb at the Ruby Drink-up of Sophia, December 2011 by rivierarb
DRb at the Ruby Drink-up of Sophia, December 2011DRb at the Ruby Drink-up of Sophia, December 2011
DRb at the Ruby Drink-up of Sophia, December 2011
rivierarb1.2K views
FPM at the Ruby Drink-up of Sophia, September 2011 by rivierarb
FPM at the Ruby Drink-up of Sophia, September 2011FPM at the Ruby Drink-up of Sophia, September 2011
FPM at the Ruby Drink-up of Sophia, September 2011
rivierarb677 views

Recently uploaded

Business Analyst Series 2023 - Week 4 Session 8 by
Business Analyst Series 2023 -  Week 4 Session 8Business Analyst Series 2023 -  Week 4 Session 8
Business Analyst Series 2023 - Week 4 Session 8DianaGray10
123 views13 slides
CloudStack Object Storage - An Introduction - Vladimir Petrov - ShapeBlue by
CloudStack Object Storage - An Introduction - Vladimir Petrov - ShapeBlueCloudStack Object Storage - An Introduction - Vladimir Petrov - ShapeBlue
CloudStack Object Storage - An Introduction - Vladimir Petrov - ShapeBlueShapeBlue
138 views15 slides
Developments to CloudStack’s SDN ecosystem: Integration with VMWare NSX 4 - P... by
Developments to CloudStack’s SDN ecosystem: Integration with VMWare NSX 4 - P...Developments to CloudStack’s SDN ecosystem: Integration with VMWare NSX 4 - P...
Developments to CloudStack’s SDN ecosystem: Integration with VMWare NSX 4 - P...ShapeBlue
194 views62 slides
Confidence in CloudStack - Aron Wagner, Nathan Gleason - Americ by
Confidence in CloudStack - Aron Wagner, Nathan Gleason - AmericConfidence in CloudStack - Aron Wagner, Nathan Gleason - Americ
Confidence in CloudStack - Aron Wagner, Nathan Gleason - AmericShapeBlue
130 views9 slides
DRBD Deep Dive - Philipp Reisner - LINBIT by
DRBD Deep Dive - Philipp Reisner - LINBITDRBD Deep Dive - Philipp Reisner - LINBIT
DRBD Deep Dive - Philipp Reisner - LINBITShapeBlue
180 views21 slides
TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f... by
TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f...TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f...
TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f...TrustArc
170 views29 slides

Recently uploaded(20)

Business Analyst Series 2023 - Week 4 Session 8 by DianaGray10
Business Analyst Series 2023 -  Week 4 Session 8Business Analyst Series 2023 -  Week 4 Session 8
Business Analyst Series 2023 - Week 4 Session 8
DianaGray10123 views
CloudStack Object Storage - An Introduction - Vladimir Petrov - ShapeBlue by ShapeBlue
CloudStack Object Storage - An Introduction - Vladimir Petrov - ShapeBlueCloudStack Object Storage - An Introduction - Vladimir Petrov - ShapeBlue
CloudStack Object Storage - An Introduction - Vladimir Petrov - ShapeBlue
ShapeBlue138 views
Developments to CloudStack’s SDN ecosystem: Integration with VMWare NSX 4 - P... by ShapeBlue
Developments to CloudStack’s SDN ecosystem: Integration with VMWare NSX 4 - P...Developments to CloudStack’s SDN ecosystem: Integration with VMWare NSX 4 - P...
Developments to CloudStack’s SDN ecosystem: Integration with VMWare NSX 4 - P...
ShapeBlue194 views
Confidence in CloudStack - Aron Wagner, Nathan Gleason - Americ by ShapeBlue
Confidence in CloudStack - Aron Wagner, Nathan Gleason - AmericConfidence in CloudStack - Aron Wagner, Nathan Gleason - Americ
Confidence in CloudStack - Aron Wagner, Nathan Gleason - Americ
ShapeBlue130 views
DRBD Deep Dive - Philipp Reisner - LINBIT by ShapeBlue
DRBD Deep Dive - Philipp Reisner - LINBITDRBD Deep Dive - Philipp Reisner - LINBIT
DRBD Deep Dive - Philipp Reisner - LINBIT
ShapeBlue180 views
TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f... by TrustArc
TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f...TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f...
TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f...
TrustArc170 views
CloudStack and GitOps at Enterprise Scale - Alex Dometrius, Rene Glover - AT&T by ShapeBlue
CloudStack and GitOps at Enterprise Scale - Alex Dometrius, Rene Glover - AT&TCloudStack and GitOps at Enterprise Scale - Alex Dometrius, Rene Glover - AT&T
CloudStack and GitOps at Enterprise Scale - Alex Dometrius, Rene Glover - AT&T
ShapeBlue152 views
KVM Security Groups Under the Hood - Wido den Hollander - Your.Online by ShapeBlue
KVM Security Groups Under the Hood - Wido den Hollander - Your.OnlineKVM Security Groups Under the Hood - Wido den Hollander - Your.Online
KVM Security Groups Under the Hood - Wido den Hollander - Your.Online
ShapeBlue221 views
"Surviving highload with Node.js", Andrii Shumada by Fwdays
"Surviving highload with Node.js", Andrii Shumada "Surviving highload with Node.js", Andrii Shumada
"Surviving highload with Node.js", Andrii Shumada
Fwdays56 views
Elevating Privacy and Security in CloudStack - Boris Stoyanov - ShapeBlue by ShapeBlue
Elevating Privacy and Security in CloudStack - Boris Stoyanov - ShapeBlueElevating Privacy and Security in CloudStack - Boris Stoyanov - ShapeBlue
Elevating Privacy and Security in CloudStack - Boris Stoyanov - ShapeBlue
ShapeBlue222 views
Zero to Cloud Hero: Crafting a Private Cloud from Scratch with XCP-ng, Xen Or... by ShapeBlue
Zero to Cloud Hero: Crafting a Private Cloud from Scratch with XCP-ng, Xen Or...Zero to Cloud Hero: Crafting a Private Cloud from Scratch with XCP-ng, Xen Or...
Zero to Cloud Hero: Crafting a Private Cloud from Scratch with XCP-ng, Xen Or...
ShapeBlue198 views
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas... by Bernd Ruecker
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...
Bernd Ruecker54 views
Live Demo Showcase: Unveiling Dell PowerFlex’s IaaS Capabilities with Apache ... by ShapeBlue
Live Demo Showcase: Unveiling Dell PowerFlex’s IaaS Capabilities with Apache ...Live Demo Showcase: Unveiling Dell PowerFlex’s IaaS Capabilities with Apache ...
Live Demo Showcase: Unveiling Dell PowerFlex’s IaaS Capabilities with Apache ...
ShapeBlue126 views
VNF Integration and Support in CloudStack - Wei Zhou - ShapeBlue by ShapeBlue
VNF Integration and Support in CloudStack - Wei Zhou - ShapeBlueVNF Integration and Support in CloudStack - Wei Zhou - ShapeBlue
VNF Integration and Support in CloudStack - Wei Zhou - ShapeBlue
ShapeBlue203 views
Enabling DPU Hardware Accelerators in XCP-ng Cloud Platform Environment - And... by ShapeBlue
Enabling DPU Hardware Accelerators in XCP-ng Cloud Platform Environment - And...Enabling DPU Hardware Accelerators in XCP-ng Cloud Platform Environment - And...
Enabling DPU Hardware Accelerators in XCP-ng Cloud Platform Environment - And...
ShapeBlue106 views
Extending KVM Host HA for Non-NFS Storage - Alex Ivanov - StorPool by ShapeBlue
Extending KVM Host HA for Non-NFS Storage -  Alex Ivanov - StorPoolExtending KVM Host HA for Non-NFS Storage -  Alex Ivanov - StorPool
Extending KVM Host HA for Non-NFS Storage - Alex Ivanov - StorPool
ShapeBlue123 views
Declarative Kubernetes Cluster Deployment with Cloudstack and Cluster API - O... by ShapeBlue
Declarative Kubernetes Cluster Deployment with Cloudstack and Cluster API - O...Declarative Kubernetes Cluster Deployment with Cloudstack and Cluster API - O...
Declarative Kubernetes Cluster Deployment with Cloudstack and Cluster API - O...
ShapeBlue132 views
Business Analyst Series 2023 - Week 4 Session 7 by DianaGray10
Business Analyst Series 2023 -  Week 4 Session 7Business Analyst Series 2023 -  Week 4 Session 7
Business Analyst Series 2023 - Week 4 Session 7
DianaGray10139 views
Migrating VMware Infra to KVM Using CloudStack - Nicolas Vazquez - ShapeBlue by ShapeBlue
Migrating VMware Infra to KVM Using CloudStack - Nicolas Vazquez - ShapeBlueMigrating VMware Infra to KVM Using CloudStack - Nicolas Vazquez - ShapeBlue
Migrating VMware Infra to KVM Using CloudStack - Nicolas Vazquez - ShapeBlue
ShapeBlue218 views

Mechanize at the Ruby Drink-up of Sophia, November 2011

  • 1. Simple web-scraping with Mechanize and Nokogiri Nov 8 th 2011 Muriel Salvan Open Source Lead developer and architect X-Aeon Solutions http://x-aeon.com
  • 2.
  • 3. Parse HTML pages (DOM) => Nokogiri
  • 4.
  • 5. ! Version 1.0.0 can be more stable than 2.x.x for some complex queries (&quot; gem install mechanize -v 1.0.0 &quot; to enforce it).
  • 6. Basic example require 'mechanize' agent = Mechanize. new page = agent. get ( 'http://rivierarb.fr' ) element = page. root . css ( 'h1.logo' ) . first element . content => &quot;Riviera.rb&quot;
  • 7.
  • 8. Use the agent to perform HTTP(S) requests (get, post). Each request gives a Nokogiri page.
  • 9. Parse the page using CSS selectors, XPath, DOM iterators.
  • 10. Fill and post forms using intuitive helpers.
  • 11. Common requests page = agent. get ( 'http://rivierarb.fr' ) page2 = page. links_with ( :text => 'Green King' ) . first . click page3 = agent. back agent. user_agent = 'My user agent'
  • 12. Common parsing Selectors page. root . css ( 'body div.myclass' ) . each { | element | … } page. root . xpath ( '//h3/a[@class=&quot;l&quot;]' ) . eac h { | element | … }
  • 13. Common parsing Elements < div > < a href = &quot; http://www.google.com &quot; > Click here < img src = &quot; http://www.google.com/favicon.ico &quot; / > < / a > < / div > element [ 'href' ] => &quot;http: // www.google.com&quot; element. content => &quot; Click here &quot; element. children . second . name => &quot;img&quot; element. parent . name => &quot;div&quot; element
  • 14. Filling and submitting forms Basic example Google search form = agent. get ( 'http://www.google.com' ) . forms . first form. q = 'Rivierarb' results_page = form. submit
  • 15. Filling and submitting forms Fields When your HTML form has < input … name = &quot;myfield&quot; >...< / input > you can write form. myfield = 'The field value' form. field_with ( :name => 'myfield' ) . value = 'The field value' form. checkboxfield = '1' form. selectfield = '5'
  • 16. Filling and submitting forms Buttons ! Mechanize does not add the value of the button being clicked ! If the web server cares for buttons values in POST data, add them manually. < input type = &quot;submit&quot; name = &quot;btn1&quot; value = &quot;Clicked&quot;>...< / input > form. add_field ! ( 'btn1' , 'Clicked' ) b utton = form. button_with ( :name => 'btn1' ) page = form. click_button ( button )
  • 17.
  • 21. Use HTML parsers other than Nokogiri
  • 22. Does not have JavaScript engine (therefore no Ajax)
  • 23.
  • 26. Nokogiri element API This presentation is available under CC-BY license by Muriel Salvan
  • 27. Q/A