Getting started with Scrapy in Python

1,860 views
1,662 views

Published on

Published in: Technology
0 Comments
5 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,860
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
56
Comments
0
Likes
5
Embeds 0
No embeds

No notes for slide

Getting started with Scrapy in Python

  1. 1. Web Scraping with Scrapy Virendra Rajput Hacker @Markitty
  2. 2. Agenda● What is web scraping and why its fun● My experiments with web scraping● Getting started with Scrapy● How Scrapy works and a quick Demo● Why Scrapy● Questions
  3. 3. What is Web Scraping?● Extracting information from websites● Problem: ○ Static websites ○ No access to APIs to extract the data you need ○ Need to extract data periodically● Manual solution - go to the website and copy the required data● Smarter solution: Web Scraping
  4. 4. My Experiments with Scraping
  5. 5. Web Scraping in Python● Download webpage with urllib2, requests● Parse the page with BeautifulSoup/lxml● Select with XPath or css selectors
  6. 6. Scrapy - fast high Level ScreenScraping and web crawlingFramework● Pick a website● Define the data you want to scrape● Write the spider to extract the data● Run the spider● Store the Data
  7. 7. Demo
  8. 8. Why Scrapy● Simplicity● Fast● Productive/ Extensible● Portable● Well docs & Healthy community● Commercial Support
  9. 9. Advanced Features (built in)● Interactive shell for trying XPaths (useful for debugging)● selecting and extracting data from html sources● cleaning and sanitizing the scraped data● generating feed exports (JSON, CSV)● media pipeline for downloading stuff● Middlewares for (cookies, HTTP compression, cache, user-agent spoofing, etc)
  10. 10. questions ?

×