Web scraping

Web scraping
Generally a bad idea

Web scraping
If it sounds painful
That’s because it is

Web scraping
Should I do it?
No

Thanks for coming
Any questions?

What is web scraping
● Programmatically extracting data from web pages

Web scraping is a horrible idea
● The scripts are tightly linked to the HTML
● The scripts fragile and prone to breaking
● Identifying HTML elements to extract is messy work
● Legal gray area
● You could be blocked from the web site

Sometimes web scraping is all we have
● The data isn’t accessible any other way
● We still need the data

Benefits of web scraping
● Automation
● Scalability

Techniques to demonstrate
1. Simple technique
○ For simple/static web pages
2. Advanced technique
○ JavaScript must execute
○ Interaction
○ Authentication

Tools
1. Simple technique
○ request-promise
○ cheerio
2. Advanced technique
○ nightmare (headless browser)
○ cheerio

Live coding
The code:
https://github.com/ashleydavis/brisjs-web-scraping-talk
The pages to scrape:
Simple: https://quotes.wsj.com/AU/XASX/CBA
Advanced: https://www.asx.com.au/asx/share-price-research/company/CBA

Production issues...
Performance
● Cache the Nightmare object / batch requests
● Disable image download
Debugging
● Show the Electron window
● Enable devtools
● Handle errors from Nightmare
● Display logging from the headless browser

Resources
● Code
○ github.com/ashleydavis/brisjs-web-scraping-talk
● Contact
○ Email: ashley@codecapers.com.au
○ Twitter: @ashleydavis75
○ GitHub:
■ ashleydavis
■ data-forge
● Data Wrangling with JavaScript
○ datawranglingwithjavascript.com
● The Data Wrangler
○ the-data-wrangler.com
My book

Web scraping

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Web scraping

Similar to Web scraping (20)

More from Ashley Davis

More from Ashley Davis (17)

Recently uploaded

Recently uploaded (20)

Web scraping