Getting started with Scrapy in Python

Web Scraping with Scrapy
Virendra Rajput

Hacker @Markitty

Agenda
● What is web scraping and why it's fun
● My experiments with web scraping
● Getting started with Scrapy
● How Scrapy works and a quick Demo
● Why Scrapy
● Questions

What is Web Scraping?
● Extracting information from websites
● Problem:
○ Static websites
○ No access to APIs to extract the data you
need
○ Need to extract data periodically
● Manual solution - go to the website and copy
the required data
● Smarter solution: Web Scraping

Web Scraping in Python
● Download webpage with urllib2, requests

● Parse the page with BeautifulSoup/lxml

● Select with XPath or css selectors

Scrapy - fast high Level Screen
Scraping and web crawling
Framework
● Pick a website
● Define the data you want to scrape
● Write the spider to extract the data
● Run the spider
● Store the Data

Why Scrapy
● Simplicity
● Fast
● Productive/ Extensible
● Portable
● Well docs & Healthy community
● Commercial Support

Advanced Features (built in)
● Interactive shell for trying XPaths (useful for
debugging)
● selecting and extracting data from html
sources
● cleaning and sanitizing the scraped data
● generating feed exports (JSON, CSV)
● media pipeline for downloading stuff
● Middlewares for (cookies, HTTP
compression, cache, user-agent spoofing,
etc)

Getting started with Scrapy in Python

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Getting started with Scrapy in Python

Similar to Getting started with Scrapy in Python (20)

Recently uploaded

Recently uploaded (20)

Getting started with Scrapy in Python