SlideShare a Scribd company logo
Simple web-scraping with Mechanize and Nokogiri Nov 8 th  2011 Muriel Salvan Open Source Lead developer and architect X-Aeon Solutions http://x-aeon.com
The need ,[object Object],[object Object]
Parse  HTML pages (DOM) =>  Nokogiri
Installation ,[object Object],[object Object]
! Version 1.0.0 can be more stable than 2.x.x for some complex queries (" gem install mechanize -v 1.0.0 " to enforce it).
Basic example require   'mechanize' agent = Mechanize. new page = agent. get ( 'http://rivierarb.fr' ) element = page. root . css ( 'h1.logo' ) . first element . content =>   "Riviera.rb"
Main usage ,[object Object]
Use the agent to  perform HTTP(S) requests  (get, post). Each request gives a Nokogiri page.
Parse the page  using CSS selectors, XPath, DOM iterators.
Fill and post forms  using intuitive helpers.
Common requests page = agent. get ( 'http://rivierarb.fr' ) page2 = page. links_with ( :text   =>   'Green King' ) . first . click page3 = agent. back agent. user_agent  =  'My user agent'
Common parsing Selectors page. root . css ( 'body div.myclass' ) . each   {   | element |  …  } page. root . xpath ( '//h3/a[@class="l"]' ) . eac h   {   | element |  …  }
Common parsing Elements < div > < a   href = &quot; http://www.google.com &quot; > Click here < img   src = &quot; http://www.google.com/favicon.ico &quot; / > < / a > < / div > element [ 'href' ] =>   &quot;http: // www.google.com&quot; element. content =>   &quot;    Click here       &quot; element. children . second . name =>   &quot;img&quot; element. parent . name =>   &quot;div&quot; element
Filling and submitting forms Basic example Google search form =  agent. get ( 'http://www.google.com' ) . forms . first form. q  =  'Rivierarb' results_page = form. submit

More Related Content

What's hot

Changing Template Engine
Changing Template EngineChanging Template Engine
Changing Template Engine
Takatsugu Shigeta
 
A Universal Automation Framework based on BDD Cucumber and Ruby on Rails - Ph...
A Universal Automation Framework based on BDD Cucumber and Ruby on Rails - Ph...A Universal Automation Framework based on BDD Cucumber and Ruby on Rails - Ph...
A Universal Automation Framework based on BDD Cucumber and Ruby on Rails - Ph...
Ho Chi Minh City Software Testing Club
 
Haml in 5 minutes
Haml in 5 minutesHaml in 5 minutes
Haml in 5 minutes
cameronbot
 
Evolution of API With Blogging
Evolution of API With BloggingEvolution of API With Blogging
Evolution of API With Blogging
Takatsugu Shigeta
 
Enabling agile devliery through enabling BDD in PHP projects
Enabling agile devliery through enabling BDD in PHP projectsEnabling agile devliery through enabling BDD in PHP projects
Enabling agile devliery through enabling BDD in PHP projects
Konstantin Kudryashov
 
WordPress as a Content Management System
WordPress as a Content Management SystemWordPress as a Content Management System
WordPress as a Content Management System
Valent Mustamin
 
Capybara
CapybaraCapybara
Capybara
Flavian Missi
 
Add row in asp.net Gridview on button click using C# and vb.net
Add row in asp.net Gridview on button click using C# and vb.netAdd row in asp.net Gridview on button click using C# and vb.net
Add row in asp.net Gridview on button click using C# and vb.net
Vijay Saklani
 
Ajax
AjaxAjax
Ajax
wangjiaz
 
Dicas de palestra
Dicas de palestraDicas de palestra
Dicas de palestra
Fabio Akita
 
How to Create simple One Page site
How to Create simple One Page siteHow to Create simple One Page site
How to Create simple One Page site
Moneer kamal
 
Gadgets Intro (Plus Mapplets)
Gadgets Intro (Plus Mapplets)Gadgets Intro (Plus Mapplets)
Gadgets Intro (Plus Mapplets)
Pamela Fox
 
Rugalytics | Ruby Manor Nov 2008
Rugalytics | Ruby Manor Nov 2008Rugalytics | Ruby Manor Nov 2008
Rugalytics | Ruby Manor Nov 2008
Rob
 
BDD with cucumber
BDD with cucumberBDD with cucumber
BDD with cucumber
Kerry Buckley
 
シックス・アパート・フレームワーク
シックス・アパート・フレームワークシックス・アパート・フレームワーク
シックス・アパート・フレームワーク
Takatsugu Shigeta
 
Components are the Future of the Web: It’s Going To Be Okay
Components are the Future of the Web: It’s Going To Be OkayComponents are the Future of the Web: It’s Going To Be Okay
Components are the Future of the Web: It’s Going To Be Okay
FITC
 
Building high-fidelity interactive prototypes with jQuery
Building high-fidelity interactive prototypes with jQueryBuilding high-fidelity interactive prototypes with jQuery
Building high-fidelity interactive prototypes with jQuery
David Park
 
Building a Blogging System -- Rapidly using Alpha Five v10 with Codeless AJAX...
Building a Blogging System -- Rapidly using Alpha Five v10 with Codeless AJAX...Building a Blogging System -- Rapidly using Alpha Five v10 with Codeless AJAX...
Building a Blogging System -- Rapidly using Alpha Five v10 with Codeless AJAX...
Richard Rabins
 
Symfony2
Symfony2Symfony2
Symfony2
mdpatrick
 
HTML5 Forms - KISS time - Fronteers
HTML5 Forms - KISS time - FronteersHTML5 Forms - KISS time - Fronteers
HTML5 Forms - KISS time - Fronteers
Robert Nyman
 

What's hot (20)

Changing Template Engine
Changing Template EngineChanging Template Engine
Changing Template Engine
 
A Universal Automation Framework based on BDD Cucumber and Ruby on Rails - Ph...
A Universal Automation Framework based on BDD Cucumber and Ruby on Rails - Ph...A Universal Automation Framework based on BDD Cucumber and Ruby on Rails - Ph...
A Universal Automation Framework based on BDD Cucumber and Ruby on Rails - Ph...
 
Haml in 5 minutes
Haml in 5 minutesHaml in 5 minutes
Haml in 5 minutes
 
Evolution of API With Blogging
Evolution of API With BloggingEvolution of API With Blogging
Evolution of API With Blogging
 
Enabling agile devliery through enabling BDD in PHP projects
Enabling agile devliery through enabling BDD in PHP projectsEnabling agile devliery through enabling BDD in PHP projects
Enabling agile devliery through enabling BDD in PHP projects
 
WordPress as a Content Management System
WordPress as a Content Management SystemWordPress as a Content Management System
WordPress as a Content Management System
 
Capybara
CapybaraCapybara
Capybara
 
Add row in asp.net Gridview on button click using C# and vb.net
Add row in asp.net Gridview on button click using C# and vb.netAdd row in asp.net Gridview on button click using C# and vb.net
Add row in asp.net Gridview on button click using C# and vb.net
 
Ajax
AjaxAjax
Ajax
 
Dicas de palestra
Dicas de palestraDicas de palestra
Dicas de palestra
 
How to Create simple One Page site
How to Create simple One Page siteHow to Create simple One Page site
How to Create simple One Page site
 
Gadgets Intro (Plus Mapplets)
Gadgets Intro (Plus Mapplets)Gadgets Intro (Plus Mapplets)
Gadgets Intro (Plus Mapplets)
 
Rugalytics | Ruby Manor Nov 2008
Rugalytics | Ruby Manor Nov 2008Rugalytics | Ruby Manor Nov 2008
Rugalytics | Ruby Manor Nov 2008
 
BDD with cucumber
BDD with cucumberBDD with cucumber
BDD with cucumber
 
シックス・アパート・フレームワーク
シックス・アパート・フレームワークシックス・アパート・フレームワーク
シックス・アパート・フレームワーク
 
Components are the Future of the Web: It’s Going To Be Okay
Components are the Future of the Web: It’s Going To Be OkayComponents are the Future of the Web: It’s Going To Be Okay
Components are the Future of the Web: It’s Going To Be Okay
 
Building high-fidelity interactive prototypes with jQuery
Building high-fidelity interactive prototypes with jQueryBuilding high-fidelity interactive prototypes with jQuery
Building high-fidelity interactive prototypes with jQuery
 
Building a Blogging System -- Rapidly using Alpha Five v10 with Codeless AJAX...
Building a Blogging System -- Rapidly using Alpha Five v10 with Codeless AJAX...Building a Blogging System -- Rapidly using Alpha Five v10 with Codeless AJAX...
Building a Blogging System -- Rapidly using Alpha Five v10 with Codeless AJAX...
 
Symfony2
Symfony2Symfony2
Symfony2
 
HTML5 Forms - KISS time - Fronteers
HTML5 Forms - KISS time - FronteersHTML5 Forms - KISS time - Fronteers
HTML5 Forms - KISS time - Fronteers
 

Similar to Mechanize at the Ruby Drink-up of Sophia, November 2011

jQuery for Sharepoint Dev
jQuery for Sharepoint DevjQuery for Sharepoint Dev
jQuery for Sharepoint Dev
Zeddy Iskandar
 
10 Things You're Not Doing [IBM Lotus Notes Domino Application Development]
10 Things You're Not Doing [IBM Lotus Notes Domino Application Development]10 Things You're Not Doing [IBM Lotus Notes Domino Application Development]
10 Things You're Not Doing [IBM Lotus Notes Domino Application Development]
Chris Toohey
 
IBM Lotus Notes Domino XPages and XPages for Mobile
IBM Lotus Notes Domino XPages and XPages for MobileIBM Lotus Notes Domino XPages and XPages for Mobile
IBM Lotus Notes Domino XPages and XPages for Mobile
Chris Toohey
 
ASP_NET Features
ASP_NET FeaturesASP_NET Features
ASP_NET Features
Biswadip Goswami
 
SlideShare Instant
SlideShare InstantSlideShare Instant
SlideShare Instant
Saket Choudhary
 
SlideShare Instant
SlideShare InstantSlideShare Instant
SlideShare Instant
Saket Choudhary
 
Getting the Most Out of OpenSocial Gadgets
Getting the Most Out of OpenSocial GadgetsGetting the Most Out of OpenSocial Gadgets
Getting the Most Out of OpenSocial Gadgets
Atlassian
 
Vb.Net Web Forms
Vb.Net  Web FormsVb.Net  Web Forms
Vb.Net Web Forms
Dutch Dasanaike {LION}
 
Component and Event-Driven Architectures in PHP
Component and Event-Driven Architectures in PHPComponent and Event-Driven Architectures in PHP
Component and Event-Driven Architectures in PHP
Stephan Schmidt
 
Building Complex GUI Apps The Right Way. With Ample SDK - SWDC2010
Building Complex GUI Apps The Right Way. With Ample SDK - SWDC2010Building Complex GUI Apps The Right Way. With Ample SDK - SWDC2010
Building Complex GUI Apps The Right Way. With Ample SDK - SWDC2010
Sergey Ilinsky
 
Javascript: Ajax & DOM Manipulation v1.2
Javascript: Ajax & DOM Manipulation v1.2Javascript: Ajax & DOM Manipulation v1.2
Javascript: Ajax & DOM Manipulation v1.2
borkweb
 
BluePrint Mobile Framework
BluePrint Mobile FrameworkBluePrint Mobile Framework
BluePrint Mobile Framework
Christian Heilmann
 
Yahoo Mobile Widgets
Yahoo Mobile WidgetsYahoo Mobile Widgets
Yahoo Mobile Widgets
Jose Palazon
 
Flex For Flash Developers Ff 2006 Final
Flex For Flash Developers Ff 2006 FinalFlex For Flash Developers Ff 2006 Final
Flex For Flash Developers Ff 2006 Final
ematrix
 
Flash templates for Joomla!
Flash templates for Joomla!Flash templates for Joomla!
Flash templates for Joomla!
Herman Peeren
 
Flash Templates- Joomla!Days NL 2009 #jd09nl
Flash Templates- Joomla!Days NL 2009 #jd09nlFlash Templates- Joomla!Days NL 2009 #jd09nl
Flash Templates- Joomla!Days NL 2009 #jd09nl
Joomla!Days Netherlands
 
Mashups as Collection of Widgets
Mashups as Collection of WidgetsMashups as Collection of Widgets
Mashups as Collection of Widgets
giurca
 
Lecture 6 - Comm Lab: Web @ ITP
Lecture 6 - Comm Lab: Web @ ITPLecture 6 - Comm Lab: Web @ ITP
Lecture 6 - Comm Lab: Web @ ITP
yucefmerhi
 
Master pages ppt
Master pages pptMaster pages ppt
Master pages ppt
Iblesoft
 
HTML5 Overview
HTML5 OverviewHTML5 Overview
HTML5 Overview
reybango
 

Similar to Mechanize at the Ruby Drink-up of Sophia, November 2011 (20)

jQuery for Sharepoint Dev
jQuery for Sharepoint DevjQuery for Sharepoint Dev
jQuery for Sharepoint Dev
 
10 Things You're Not Doing [IBM Lotus Notes Domino Application Development]
10 Things You're Not Doing [IBM Lotus Notes Domino Application Development]10 Things You're Not Doing [IBM Lotus Notes Domino Application Development]
10 Things You're Not Doing [IBM Lotus Notes Domino Application Development]
 
IBM Lotus Notes Domino XPages and XPages for Mobile
IBM Lotus Notes Domino XPages and XPages for MobileIBM Lotus Notes Domino XPages and XPages for Mobile
IBM Lotus Notes Domino XPages and XPages for Mobile
 
ASP_NET Features
ASP_NET FeaturesASP_NET Features
ASP_NET Features
 
SlideShare Instant
SlideShare InstantSlideShare Instant
SlideShare Instant
 
SlideShare Instant
SlideShare InstantSlideShare Instant
SlideShare Instant
 
Getting the Most Out of OpenSocial Gadgets
Getting the Most Out of OpenSocial GadgetsGetting the Most Out of OpenSocial Gadgets
Getting the Most Out of OpenSocial Gadgets
 
Vb.Net Web Forms
Vb.Net  Web FormsVb.Net  Web Forms
Vb.Net Web Forms
 
Component and Event-Driven Architectures in PHP
Component and Event-Driven Architectures in PHPComponent and Event-Driven Architectures in PHP
Component and Event-Driven Architectures in PHP
 
Building Complex GUI Apps The Right Way. With Ample SDK - SWDC2010
Building Complex GUI Apps The Right Way. With Ample SDK - SWDC2010Building Complex GUI Apps The Right Way. With Ample SDK - SWDC2010
Building Complex GUI Apps The Right Way. With Ample SDK - SWDC2010
 
Javascript: Ajax & DOM Manipulation v1.2
Javascript: Ajax & DOM Manipulation v1.2Javascript: Ajax & DOM Manipulation v1.2
Javascript: Ajax & DOM Manipulation v1.2
 
BluePrint Mobile Framework
BluePrint Mobile FrameworkBluePrint Mobile Framework
BluePrint Mobile Framework
 
Yahoo Mobile Widgets
Yahoo Mobile WidgetsYahoo Mobile Widgets
Yahoo Mobile Widgets
 
Flex For Flash Developers Ff 2006 Final
Flex For Flash Developers Ff 2006 FinalFlex For Flash Developers Ff 2006 Final
Flex For Flash Developers Ff 2006 Final
 
Flash templates for Joomla!
Flash templates for Joomla!Flash templates for Joomla!
Flash templates for Joomla!
 
Flash Templates- Joomla!Days NL 2009 #jd09nl
Flash Templates- Joomla!Days NL 2009 #jd09nlFlash Templates- Joomla!Days NL 2009 #jd09nl
Flash Templates- Joomla!Days NL 2009 #jd09nl
 
Mashups as Collection of Widgets
Mashups as Collection of WidgetsMashups as Collection of Widgets
Mashups as Collection of Widgets
 
Lecture 6 - Comm Lab: Web @ ITP
Lecture 6 - Comm Lab: Web @ ITPLecture 6 - Comm Lab: Web @ ITP
Lecture 6 - Comm Lab: Web @ ITP
 
Master pages ppt
Master pages pptMaster pages ppt
Master pages ppt
 
HTML5 Overview
HTML5 OverviewHTML5 Overview
HTML5 Overview
 

More from rivierarb

Ruby 2.0 at the Ruby drink-up of Sophia, February 2013
Ruby 2.0 at the Ruby drink-up of Sophia, February 2013Ruby 2.0 at the Ruby drink-up of Sophia, February 2013
Ruby 2.0 at the Ruby drink-up of Sophia, February 2013
rivierarb
 
Ruby object model at the Ruby drink-up of Sophia, January 2013
Ruby object model at the Ruby drink-up of Sophia, January 2013Ruby object model at the Ruby drink-up of Sophia, January 2013
Ruby object model at the Ruby drink-up of Sophia, January 2013
rivierarb
 
Ruby and Twitter at the Ruby drink-up of Sophia, January 2013
Ruby and Twitter at the Ruby drink-up of Sophia, January 2013Ruby and Twitter at the Ruby drink-up of Sophia, January 2013
Ruby and Twitter at the Ruby drink-up of Sophia, January 2013
rivierarb
 
PoBot at the Ruby drink-up of Sophia, July 2012
PoBot at the Ruby drink-up of Sophia, July 2012PoBot at the Ruby drink-up of Sophia, July 2012
PoBot at the Ruby drink-up of Sophia, July 2012
rivierarb
 
Ruby C extensions at the Ruby drink-up of Sophia, April 2012
Ruby C extensions at the Ruby drink-up of Sophia, April 2012Ruby C extensions at the Ruby drink-up of Sophia, April 2012
Ruby C extensions at the Ruby drink-up of Sophia, April 2012
rivierarb
 
Pry at the Ruby Drink-up of Sophia, February 2012
Pry at the Ruby Drink-up of Sophia, February 2012Pry at the Ruby Drink-up of Sophia, February 2012
Pry at the Ruby Drink-up of Sophia, February 2012
rivierarb
 
Piloting processes through std IO at the Ruby Drink-up of Sophia, January 2012
Piloting processes through std IO at the Ruby Drink-up of Sophia, January 2012Piloting processes through std IO at the Ruby Drink-up of Sophia, January 2012
Piloting processes through std IO at the Ruby Drink-up of Sophia, January 2012
rivierarb
 
DRb at the Ruby Drink-up of Sophia, December 2011
DRb at the Ruby Drink-up of Sophia, December 2011DRb at the Ruby Drink-up of Sophia, December 2011
DRb at the Ruby Drink-up of Sophia, December 2011
rivierarb
 
FPM at the Ruby Drink-up of Sophia, September 2011
FPM at the Ruby Drink-up of Sophia, September 2011FPM at the Ruby Drink-up of Sophia, September 2011
FPM at the Ruby Drink-up of Sophia, September 2011
rivierarb
 

More from rivierarb (9)

Ruby 2.0 at the Ruby drink-up of Sophia, February 2013
Ruby 2.0 at the Ruby drink-up of Sophia, February 2013Ruby 2.0 at the Ruby drink-up of Sophia, February 2013
Ruby 2.0 at the Ruby drink-up of Sophia, February 2013
 
Ruby object model at the Ruby drink-up of Sophia, January 2013
Ruby object model at the Ruby drink-up of Sophia, January 2013Ruby object model at the Ruby drink-up of Sophia, January 2013
Ruby object model at the Ruby drink-up of Sophia, January 2013
 
Ruby and Twitter at the Ruby drink-up of Sophia, January 2013
Ruby and Twitter at the Ruby drink-up of Sophia, January 2013Ruby and Twitter at the Ruby drink-up of Sophia, January 2013
Ruby and Twitter at the Ruby drink-up of Sophia, January 2013
 
PoBot at the Ruby drink-up of Sophia, July 2012
PoBot at the Ruby drink-up of Sophia, July 2012PoBot at the Ruby drink-up of Sophia, July 2012
PoBot at the Ruby drink-up of Sophia, July 2012
 
Ruby C extensions at the Ruby drink-up of Sophia, April 2012
Ruby C extensions at the Ruby drink-up of Sophia, April 2012Ruby C extensions at the Ruby drink-up of Sophia, April 2012
Ruby C extensions at the Ruby drink-up of Sophia, April 2012
 
Pry at the Ruby Drink-up of Sophia, February 2012
Pry at the Ruby Drink-up of Sophia, February 2012Pry at the Ruby Drink-up of Sophia, February 2012
Pry at the Ruby Drink-up of Sophia, February 2012
 
Piloting processes through std IO at the Ruby Drink-up of Sophia, January 2012
Piloting processes through std IO at the Ruby Drink-up of Sophia, January 2012Piloting processes through std IO at the Ruby Drink-up of Sophia, January 2012
Piloting processes through std IO at the Ruby Drink-up of Sophia, January 2012
 
DRb at the Ruby Drink-up of Sophia, December 2011
DRb at the Ruby Drink-up of Sophia, December 2011DRb at the Ruby Drink-up of Sophia, December 2011
DRb at the Ruby Drink-up of Sophia, December 2011
 
FPM at the Ruby Drink-up of Sophia, September 2011
FPM at the Ruby Drink-up of Sophia, September 2011FPM at the Ruby Drink-up of Sophia, September 2011
FPM at the Ruby Drink-up of Sophia, September 2011
 

Recently uploaded

“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
Claudio Di Ciccio
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Zilliz
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
Kumud Singh
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
Zilliz
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 

Recently uploaded (20)

“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 

Mechanize at the Ruby Drink-up of Sophia, November 2011

  • 1. Simple web-scraping with Mechanize and Nokogiri Nov 8 th 2011 Muriel Salvan Open Source Lead developer and architect X-Aeon Solutions http://x-aeon.com
  • 2.
  • 3. Parse HTML pages (DOM) => Nokogiri
  • 4.
  • 5. ! Version 1.0.0 can be more stable than 2.x.x for some complex queries (&quot; gem install mechanize -v 1.0.0 &quot; to enforce it).
  • 6. Basic example require 'mechanize' agent = Mechanize. new page = agent. get ( 'http://rivierarb.fr' ) element = page. root . css ( 'h1.logo' ) . first element . content => &quot;Riviera.rb&quot;
  • 7.
  • 8. Use the agent to perform HTTP(S) requests (get, post). Each request gives a Nokogiri page.
  • 9. Parse the page using CSS selectors, XPath, DOM iterators.
  • 10. Fill and post forms using intuitive helpers.
  • 11. Common requests page = agent. get ( 'http://rivierarb.fr' ) page2 = page. links_with ( :text => 'Green King' ) . first . click page3 = agent. back agent. user_agent = 'My user agent'
  • 12. Common parsing Selectors page. root . css ( 'body div.myclass' ) . each { | element | … } page. root . xpath ( '//h3/a[@class=&quot;l&quot;]' ) . eac h { | element | … }
  • 13. Common parsing Elements < div > < a href = &quot; http://www.google.com &quot; > Click here < img src = &quot; http://www.google.com/favicon.ico &quot; / > < / a > < / div > element [ 'href' ] => &quot;http: // www.google.com&quot; element. content => &quot; Click here &quot; element. children . second . name => &quot;img&quot; element. parent . name => &quot;div&quot; element
  • 14. Filling and submitting forms Basic example Google search form = agent. get ( 'http://www.google.com' ) . forms . first form. q = 'Rivierarb' results_page = form. submit
  • 15. Filling and submitting forms Fields When your HTML form has < input … name = &quot;myfield&quot; >...< / input > you can write form. myfield = 'The field value' form. field_with ( :name => 'myfield' ) . value = 'The field value' form. checkboxfield = '1' form. selectfield = '5'
  • 16. Filling and submitting forms Buttons ! Mechanize does not add the value of the button being clicked ! If the web server cares for buttons values in POST data, add them manually. < input type = &quot;submit&quot; name = &quot;btn1&quot; value = &quot;Clicked&quot;>...< / input > form. add_field ! ( 'btn1' , 'Clicked' ) b utton = form. button_with ( :name => 'btn1' ) page = form. click_button ( button )
  • 17.
  • 21. Use HTML parsers other than Nokogiri
  • 22. Does not have JavaScript engine (therefore no Ajax)
  • 23.
  • 26. Nokogiri element API This presentation is available under CC-BY license by Muriel Salvan
  • 27. Q/A