SlideShare a Scribd company logo
Web scraping
with Ruby
@LeWagonTokyo
Web scraping with Ruby
Understanding HTML DOM
Uunderstanding Ruby
Happy scraping!
Today’s repository
https://github.com/hidehiro98/ruby-
scraping
Workshop outline
1. Set up
Setting tools up
2. Theoretical intros
Basic concepts to understand
3. Build with me
Step by step


4. Try your own scraping
Enjoy it ; )
1. Let’s set up
Setup
https://www.gitpod.io/#https://
github.com/hidehiro98/ruby-scraping
2. Theoritical intro about
Web scraping
When do we need web
scraping?
You want information, but no API for
that.
What's a web scraper?
Web scraping is data scraping used
for extracting data from websites.
- Wikipedia
What's an HTML DOM?
We will use the OpenURI
and the Nokogiri
3. Let’s build
Install the Nokogiri
$ gem install nokogiri
4. Scrape any website with
live code
5. Scrape any website by
yourself!
If you want to scrape
dynamic website
https://readysteadycode.com/howto-
scrape-websites-with-ruby-and-
headless-chrome
Go further
lewagon.com/tokyo
@LeWagonTokyo
Thank you!
@LeWagonTokyo

More Related Content

Similar to Web scraping with Ruby

WebGL Awesomeness
WebGL AwesomenessWebGL Awesomeness
WebGL Awesomeness
Stephan Seidt
 
Practical webcrawling with scrapy
Practical webcrawling with scrapyPractical webcrawling with scrapy
Practical webcrawling with scrapy
Iván Compañy Avi
 
Practical webcrawling with scrapy
Practical webcrawling with scrapyPractical webcrawling with scrapy
Practical webcrawling with scrapy
Iván Compañy Avi
 
Hack Rio/OS
Hack Rio/OSHack Rio/OS
Hack Rio/OS
Kishore Neelamegam
 
Ruby And The Cloud
Ruby And The CloudRuby And The Cloud
Ruby And The Cloud
Sau Sheong Chang
 
Node.js 기반 정적 페이지 블로그 엔진, 하루프레스
Node.js 기반 정적 페이지 블로그 엔진, 하루프레스Node.js 기반 정적 페이지 블로그 엔진, 하루프레스
Node.js 기반 정적 페이지 블로그 엔진, 하루프레스
Rhio Kim
 
OSINT tools for security auditing with python
OSINT tools for security auditing with pythonOSINT tools for security auditing with python
OSINT tools for security auditing with python
Jose Manuel Ortega Candel
 
Achieve Blazing-Fast Ingest Speeds with Apache Arrow
Achieve Blazing-Fast Ingest Speeds with Apache ArrowAchieve Blazing-Fast Ingest Speeds with Apache Arrow
Achieve Blazing-Fast Ingest Speeds with Apache Arrow
Neo4j
 
Search Engine Spiders
Search Engine SpidersSearch Engine Spiders
Search Engine Spiders
CJ Jenkins
 
Devoxx France: Développement JAVA avec un IDE dans le Cloud: Yes we can !
Devoxx France: Développement JAVA avec un IDE dans le Cloud: Yes we can !Devoxx France: Développement JAVA avec un IDE dans le Cloud: Yes we can !
Devoxx France: Développement JAVA avec un IDE dans le Cloud: Yes we can !
Florent BENOIT
 
Practical White Hat Hacker Training - Passive Information Gathering(OSINT)
Practical White Hat Hacker Training -  Passive Information Gathering(OSINT)Practical White Hat Hacker Training -  Passive Information Gathering(OSINT)
Practical White Hat Hacker Training - Passive Information Gathering(OSINT)
PRISMA CSI
 
Google App Engine
Google App EngineGoogle App Engine
Google App Engine
Myles Braithwaite
 
Shifting Gears
Shifting GearsShifting Gears
Shifting Gears
Christian Heilmann
 
Sinatra Heroku You And You - Keynote Format
Sinatra Heroku You And You - Keynote FormatSinatra Heroku You And You - Keynote Format
Sinatra Heroku You And You - Keynote Format
Adam Lowe
 
Sinatra Heroku You And You - PDF Format
Sinatra Heroku You And You - PDF FormatSinatra Heroku You And You - PDF Format
Sinatra Heroku You And You - PDF Format
Adam Lowe
 
Free Mongo on OpenShift
Free Mongo on OpenShiftFree Mongo on OpenShift
Free Mongo on OpenShift
Steven Pousty
 
Web Scraping With Python
Web Scraping With PythonWeb Scraping With Python
Web Scraping With Python
Robert Dempsey
 
Scrabbly GTUG presentation
Scrabbly GTUG presentationScrabbly GTUG presentation
Scrabbly GTUG presentation
Grant Goodale
 
Docker for Developers - PHP Detroit 2018
Docker for Developers - PHP Detroit 2018Docker for Developers - PHP Detroit 2018
Docker for Developers - PHP Detroit 2018
Chris Tankersley
 
Contributing to rails
Contributing to railsContributing to rails
Contributing to rails
Lukas Eppler
 

Similar to Web scraping with Ruby (20)

WebGL Awesomeness
WebGL AwesomenessWebGL Awesomeness
WebGL Awesomeness
 
Practical webcrawling with scrapy
Practical webcrawling with scrapyPractical webcrawling with scrapy
Practical webcrawling with scrapy
 
Practical webcrawling with scrapy
Practical webcrawling with scrapyPractical webcrawling with scrapy
Practical webcrawling with scrapy
 
Hack Rio/OS
Hack Rio/OSHack Rio/OS
Hack Rio/OS
 
Ruby And The Cloud
Ruby And The CloudRuby And The Cloud
Ruby And The Cloud
 
Node.js 기반 정적 페이지 블로그 엔진, 하루프레스
Node.js 기반 정적 페이지 블로그 엔진, 하루프레스Node.js 기반 정적 페이지 블로그 엔진, 하루프레스
Node.js 기반 정적 페이지 블로그 엔진, 하루프레스
 
OSINT tools for security auditing with python
OSINT tools for security auditing with pythonOSINT tools for security auditing with python
OSINT tools for security auditing with python
 
Achieve Blazing-Fast Ingest Speeds with Apache Arrow
Achieve Blazing-Fast Ingest Speeds with Apache ArrowAchieve Blazing-Fast Ingest Speeds with Apache Arrow
Achieve Blazing-Fast Ingest Speeds with Apache Arrow
 
Search Engine Spiders
Search Engine SpidersSearch Engine Spiders
Search Engine Spiders
 
Devoxx France: Développement JAVA avec un IDE dans le Cloud: Yes we can !
Devoxx France: Développement JAVA avec un IDE dans le Cloud: Yes we can !Devoxx France: Développement JAVA avec un IDE dans le Cloud: Yes we can !
Devoxx France: Développement JAVA avec un IDE dans le Cloud: Yes we can !
 
Practical White Hat Hacker Training - Passive Information Gathering(OSINT)
Practical White Hat Hacker Training -  Passive Information Gathering(OSINT)Practical White Hat Hacker Training -  Passive Information Gathering(OSINT)
Practical White Hat Hacker Training - Passive Information Gathering(OSINT)
 
Google App Engine
Google App EngineGoogle App Engine
Google App Engine
 
Shifting Gears
Shifting GearsShifting Gears
Shifting Gears
 
Sinatra Heroku You And You - Keynote Format
Sinatra Heroku You And You - Keynote FormatSinatra Heroku You And You - Keynote Format
Sinatra Heroku You And You - Keynote Format
 
Sinatra Heroku You And You - PDF Format
Sinatra Heroku You And You - PDF FormatSinatra Heroku You And You - PDF Format
Sinatra Heroku You And You - PDF Format
 
Free Mongo on OpenShift
Free Mongo on OpenShiftFree Mongo on OpenShift
Free Mongo on OpenShift
 
Web Scraping With Python
Web Scraping With PythonWeb Scraping With Python
Web Scraping With Python
 
Scrabbly GTUG presentation
Scrabbly GTUG presentationScrabbly GTUG presentation
Scrabbly GTUG presentation
 
Docker for Developers - PHP Detroit 2018
Docker for Developers - PHP Detroit 2018Docker for Developers - PHP Detroit 2018
Docker for Developers - PHP Detroit 2018
 
Contributing to rails
Contributing to railsContributing to rails
Contributing to rails
 

Recently uploaded

Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStrDeep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
saastr
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
DanBrown980551
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
Tatiana Kojar
 
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - HiikeSystem Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
Hiike
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
Hiroshi SHIBATA
 
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdfNunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
flufftailshop
 
Azure API Management to expose backend services securely
Azure API Management to expose backend services securelyAzure API Management to expose backend services securely
Azure API Management to expose backend services securely
Dinusha Kumarasiri
 
Operating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptxOperating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptx
Pravash Chandra Das
 
Recommendation System using RAG Architecture
Recommendation System using RAG ArchitectureRecommendation System using RAG Architecture
Recommendation System using RAG Architecture
fredae14
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
Jason Packer
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
Ivanti
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
ssuserfac0301
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
Wouter Lemaire
 
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
alexjohnson7307
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
MichaelKnudsen27
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Tosin Akinosho
 

Recently uploaded (20)

Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStrDeep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
 
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - HiikeSystem Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
 
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdfNunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
 
Azure API Management to expose backend services securely
Azure API Management to expose backend services securelyAzure API Management to expose backend services securely
Azure API Management to expose backend services securely
 
Operating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptxOperating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptx
 
Recommendation System using RAG Architecture
Recommendation System using RAG ArchitectureRecommendation System using RAG Architecture
Recommendation System using RAG Architecture
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
 
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
 

Web scraping with Ruby