SlideShare a Scribd company logo
1 of 14
Web scraping
Generally a bad idea
Web scraping
If it sounds painful
Thatโ€™s because it is
Web scraping
Should I do it?
No
Thanks for coming
Any questions?
What is web scraping
โ— Programmatically extracting data from web pages
Example
Web scraping is a horrible idea
โ— The scripts are tightly linked to the HTML
โ— The scripts fragile and prone to breaking
โ— Identifying HTML elements to extract is messy work
โ— Legal gray area
โ— You could be blocked from the web site
Sometimes web scraping is all we have
โ— The data isnโ€™t accessible any other way
โ— We still need the data
Benefits of web scraping
โ— Automation
โ— Scalability
Techniques to demonstrate
1. Simple technique
โ—‹ For simple/static web pages
2. Advanced technique
โ—‹ JavaScript must execute
โ—‹ Interaction
โ—‹ Authentication
Tools
1. Simple technique
โ—‹ request-promise
โ—‹ cheerio
2. Advanced technique
โ—‹ nightmare (headless browser)
โ—‹ cheerio
Live coding
The code:
https://github.com/ashleydavis/brisjs-web-scraping-talk
The pages to scrape:
Simple: https://quotes.wsj.com/AU/XASX/CBA
Advanced: https://www.asx.com.au/asx/share-price-research/company/CBA
Production issues...
Performance
โ— Cache the Nightmare object / batch requests
โ— Disable image download
Debugging
โ— Show the Electron window
โ— Enable devtools
โ— Handle errors from Nightmare
โ— Display logging from the headless browser
Resources
โ— Code
โ—‹ github.com/ashleydavis/brisjs-web-scraping-talk
โ— Contact
โ—‹ Email: ashley@codecapers.com.au
โ—‹ Twitter: @ashleydavis75
โ—‹ GitHub:
โ–  ashleydavis
โ–  data-forge
โ— Data Wrangling with JavaScript
โ—‹ datawranglingwithjavascript.com
โ— The Data Wrangler
โ—‹ the-data-wrangler.com
My book

More Related Content

What's hot

Webcrawler
Webcrawler Webcrawler
Webcrawler
Govind Raj
ย 
Web mining
Web miningWeb mining
Web mining
Iniya Kannan
ย 

What's hot (20)

Web Scraping
Web ScrapingWeb Scraping
Web Scraping
ย 
ppt of web development for diploma student
ppt of web development for diploma student ppt of web development for diploma student
ppt of web development for diploma student
ย 
Web Scraping Basics
Web Scraping BasicsWeb Scraping Basics
Web Scraping Basics
ย 
What is Web-scraping?
What is Web-scraping?What is Web-scraping?
What is Web-scraping?
ย 
Intro to beautiful soup
Intro to beautiful soupIntro to beautiful soup
Intro to beautiful soup
ย 
Web Development Presentation
Web Development PresentationWeb Development Presentation
Web Development Presentation
ย 
Web scraping & browser automation
Web scraping & browser automationWeb scraping & browser automation
Web scraping & browser automation
ย 
Scraping data from the web and documents
Scraping data from the web and documentsScraping data from the web and documents
Scraping data from the web and documents
ย 
Introduction to ajax
Introduction  to  ajaxIntroduction  to  ajax
Introduction to ajax
ย 
Internship presentation
Internship presentationInternship presentation
Internship presentation
ย 
Webcrawler
Webcrawler Webcrawler
Webcrawler
ย 
Web development | Derin Dolen
Web development | Derin Dolen Web development | Derin Dolen
Web development | Derin Dolen
ย 
Technical SEO.pdf
Technical SEO.pdfTechnical SEO.pdf
Technical SEO.pdf
ย 
Skillshare - Introduction to Data Scraping
Skillshare - Introduction to Data ScrapingSkillshare - Introduction to Data Scraping
Skillshare - Introduction to Data Scraping
ย 
Web mining
Web miningWeb mining
Web mining
ย 
Web Development
Web DevelopmentWeb Development
Web Development
ย 
Django Introduction & Tutorial
Django Introduction & TutorialDjango Introduction & Tutorial
Django Introduction & Tutorial
ย 
web development.pptx
web development.pptxweb development.pptx
web development.pptx
ย 
Web development ppt
Web development pptWeb development ppt
Web development ppt
ย 
Introduction to Web Development
Introduction to Web DevelopmentIntroduction to Web Development
Introduction to Web Development
ย 

Similar to Web scraping

Getting started with Scrapy in Python
Getting started with Scrapy in PythonGetting started with Scrapy in Python
Getting started with Scrapy in Python
Viren Rajput
ย 
Web Fundamentals Crash Course
Web Fundamentals Crash CourseWeb Fundamentals Crash Course
Web Fundamentals Crash Course
MrAbbas
ย 
Web Fundamentals Crash Course
Web Fundamentals Crash CourseWeb Fundamentals Crash Course
Web Fundamentals Crash Course
MrAbas
ย 
Web Design
Web DesignWeb Design
Web Design
nelsoniscool
ย 

Similar to Web scraping (20)

Chrome extensions
Chrome extensions Chrome extensions
Chrome extensions
ย 
Web scraping in python
Web scraping in python Web scraping in python
Web scraping in python
ย 
Getting started with Scrapy in Python
Getting started with Scrapy in PythonGetting started with Scrapy in Python
Getting started with Scrapy in Python
ย 
Make It Rain With Web Scraping
Make It Rain With Web ScrapingMake It Rain With Web Scraping
Make It Rain With Web Scraping
ย 
Common SEO Mistakes During Site Relaunches, Redesigns, Migrations (2018)
Common SEO Mistakes During Site Relaunches, Redesigns, Migrations (2018) Common SEO Mistakes During Site Relaunches, Redesigns, Migrations (2018)
Common SEO Mistakes During Site Relaunches, Redesigns, Migrations (2018)
ย 
You Can Work on the Web Patform! (GOSIM 2023)
You Can Work on the Web Patform! (GOSIM 2023)You Can Work on the Web Patform! (GOSIM 2023)
You Can Work on the Web Patform! (GOSIM 2023)
ย 
Frontend performance metrics
Frontend performance metricsFrontend performance metrics
Frontend performance metrics
ย 
Web Fundamentals Crash Course
Web Fundamentals Crash CourseWeb Fundamentals Crash Course
Web Fundamentals Crash Course
ย 
Web Fundamentals Crash Course
Web Fundamentals Crash CourseWeb Fundamentals Crash Course
Web Fundamentals Crash Course
ย 
Bodin - Hullin & Potencier - Magento Performance Profiling and Best Practices
Bodin - Hullin & Potencier - Magento Performance Profiling and Best PracticesBodin - Hullin & Potencier - Magento Performance Profiling and Best Practices
Bodin - Hullin & Potencier - Magento Performance Profiling and Best Practices
ย 
TSC Summit #4 - Howto get browser persitence and remote execution (JS)
TSC Summit #4 - Howto get browser persitence and remote execution (JS)TSC Summit #4 - Howto get browser persitence and remote execution (JS)
TSC Summit #4 - Howto get browser persitence and remote execution (JS)
ย 
Only Test the Features You Want to Keep
Only Test the Features You Want to KeepOnly Test the Features You Want to Keep
Only Test the Features You Want to Keep
ย 
Hinting at a better web
Hinting at a better webHinting at a better web
Hinting at a better web
ย 
Scraping the web with Laravel, Dusk, Docker, and PHP
Scraping the web with Laravel, Dusk, Docker, and PHPScraping the web with Laravel, Dusk, Docker, and PHP
Scraping the web with Laravel, Dusk, Docker, and PHP
ย 
Magento 2 performance profiling and best practices
Magento 2 performance profiling and best practicesMagento 2 performance profiling and best practices
Magento 2 performance profiling and best practices
ย 
Web Design
Web DesignWeb Design
Web Design
ย 
PrairieDevCon 2014 - Web Doesn't Mean Slow
PrairieDevCon 2014 -  Web Doesn't Mean SlowPrairieDevCon 2014 -  Web Doesn't Mean Slow
PrairieDevCon 2014 - Web Doesn't Mean Slow
ย 
The power of accessibility (November, 2018)
The power of accessibility (November, 2018)The power of accessibility (November, 2018)
The power of accessibility (November, 2018)
ย 
Development Guide for Beginners
Development Guide for BeginnersDevelopment Guide for Beginners
Development Guide for Beginners
ย 
Developing word press professionally
Developing word press professionallyDeveloping word press professionally
Developing word press professionally
ย 

More from Ashley Davis

More from Ashley Davis (17)

Live reload across the stack
Live reload across the stackLive reload across the stack
Live reload across the stack
ย 
Microservices with Node.js - Livestreamed for Manning
Microservices with Node.js - Livestreamed for ManningMicroservices with Node.js - Livestreamed for Manning
Microservices with Node.js - Livestreamed for Manning
ย 
Rapid Fullstack Development
Rapid Fullstack DevelopmentRapid Fullstack Development
Rapid Fullstack Development
ย 
Rapid Fullstack Development
Rapid Fullstack DevelopmentRapid Fullstack Development
Rapid Fullstack Development
ย 
Building microservices with Node.js - part 3
Building microservices with Node.js - part 3Building microservices with Node.js - part 3
Building microservices with Node.js - part 3
ย 
Microservices with Node.js for BrisJS
Microservices with Node.js for BrisJSMicroservices with Node.js for BrisJS
Microservices with Node.js for BrisJS
ย 
Building microservices with Node.js - part 2
Building microservices with Node.js - part 2Building microservices with Node.js - part 2
Building microservices with Node.js - part 2
ย 
Building microservices with Node.js - part 1
Building microservices with Node.js - part 1Building microservices with Node.js - part 1
Building microservices with Node.js - part 1
ย 
When to reinvent the wheel / Building a query language in TypeScript
When to reinvent the wheel / Building a query language in TypeScriptWhen to reinvent the wheel / Building a query language in TypeScript
When to reinvent the wheel / Building a query language in TypeScript
ย 
How to be a good developer
How to be a good developerHow to be a good developer
How to be a good developer
ย 
Crafting build pipelines with Docker
Crafting build pipelines with DockerCrafting build pipelines with Docker
Crafting build pipelines with Docker
ย 
How to be a good developer
How to be a good developerHow to be a good developer
How to be a good developer
ย 
Building desktop apps in java script with Electron
Building desktop apps in java script with ElectronBuilding desktop apps in java script with Electron
Building desktop apps in java script with Electron
ย 
Testing trading strategies in JavaScript
Testing trading strategies in JavaScriptTesting trading strategies in JavaScript
Testing trading strategies in JavaScript
ย 
Node.js memory limitations
Node.js memory limitationsNode.js memory limitations
Node.js memory limitations
ย 
Ai and ml study group lecture 1 and 2
Ai and ml study group   lecture 1 and 2Ai and ml study group   lecture 1 and 2
Ai and ml study group lecture 1 and 2
ย 
Data analysis in JavaScript
Data analysis in JavaScriptData analysis in JavaScript
Data analysis in JavaScript
ย 

Recently uploaded

CHEAP Call Girls in Saket (-DELHI )๐Ÿ” 9953056974๐Ÿ”(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )๐Ÿ” 9953056974๐Ÿ”(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )๐Ÿ” 9953056974๐Ÿ”(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )๐Ÿ” 9953056974๐Ÿ”(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
ย 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
AroojKhan71
ย 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171โœ”๏ธBody to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171โœ”๏ธBody to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171โœ”๏ธBody to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171โœ”๏ธBody to body massage wit...
shivangimorya083
ย 
Call Girls Hsr Layout Just Call ๐Ÿ‘— 7737669865 ๐Ÿ‘— Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call ๐Ÿ‘— 7737669865 ๐Ÿ‘— Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call ๐Ÿ‘— 7737669865 ๐Ÿ‘— Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call ๐Ÿ‘— 7737669865 ๐Ÿ‘— Top Class Call Girl Service Ba...
amitlee9823
ย 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
MarinCaroMartnezBerg
ย 
Delhi Call Girls CP 9711199171 โ˜Žโœ”๐Ÿ‘Œโœ” Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 โ˜Žโœ”๐Ÿ‘Œโœ” Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 โ˜Žโœ”๐Ÿ‘Œโœ” Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 โ˜Žโœ”๐Ÿ‘Œโœ” Whatsapp Hard And Sexy Vip Call
shivangimorya083
ย 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
Lars Albertsson
ย 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
SUHANI PANDEY
ย 

Recently uploaded (20)

Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
ย 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
ย 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
ย 
CHEAP Call Girls in Saket (-DELHI )๐Ÿ” 9953056974๐Ÿ”(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )๐Ÿ” 9953056974๐Ÿ”(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )๐Ÿ” 9953056974๐Ÿ”(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )๐Ÿ” 9953056974๐Ÿ”(=)/CALL GIRLS SERVICE
ย 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
ย 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
ย 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
ย 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
ย 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
ย 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
ย 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171โœ”๏ธBody to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171โœ”๏ธBody to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171โœ”๏ธBody to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171โœ”๏ธBody to body massage wit...
ย 
BDSMโšกCall Girls in Mandawali Delhi >เผ’8448380779 Escort Service
BDSMโšกCall Girls in Mandawali Delhi >เผ’8448380779 Escort ServiceBDSMโšกCall Girls in Mandawali Delhi >เผ’8448380779 Escort Service
BDSMโšกCall Girls in Mandawali Delhi >เผ’8448380779 Escort Service
ย 
Call Girls Hsr Layout Just Call ๐Ÿ‘— 7737669865 ๐Ÿ‘— Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call ๐Ÿ‘— 7737669865 ๐Ÿ‘— Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call ๐Ÿ‘— 7737669865 ๐Ÿ‘— Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call ๐Ÿ‘— 7737669865 ๐Ÿ‘— Top Class Call Girl Service Ba...
ย 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
ย 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
ย 
Delhi Call Girls CP 9711199171 โ˜Žโœ”๐Ÿ‘Œโœ” Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 โ˜Žโœ”๐Ÿ‘Œโœ” Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 โ˜Žโœ”๐Ÿ‘Œโœ” Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 โ˜Žโœ”๐Ÿ‘Œโœ” Whatsapp Hard And Sexy Vip Call
ย 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
ย 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
ย 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
ย 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
ย 

Web scraping