Web Scraping with Python
by @sauravtom
(work is progress …)
Data Scraping
Automated Process
Specify css or xml path
grab the content
store it in a database
Who uses Scrapers ?
Scrapers as backbone of Big Data
Importance in Industry level as well as indie
projects.
Why choose python ?
Robust, flexible and powerful
Relatively lesser development time
Easy to learn and use
Huge standard l...
Scraping libraries in python
lxml
BS4
Scrapy
Mechanize
twill
...
Scraper Demonstration in bs4
Inspect the element
Find the node
Plug it in
(some code and pictures)
Making Scrapers faster
Thread and Queues
(some code ...)
Detecting bottlenecks
Introduction to profiling in python
(some code)
Making Scrapers even faster
Using memcache to reduce redundant
scraping
(some code)
Thats it !!
(links to the code present in these slides)
Upcoming SlideShare
Loading in...5
×

Web scraping in python

398

Published on

Introduction to Web Scraping in Python

Published in: Internet
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
398
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
6
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Web scraping in python

  1. 1. Web Scraping with Python by @sauravtom (work is progress …)
  2. 2. Data Scraping Automated Process Specify css or xml path grab the content store it in a database
  3. 3. Who uses Scrapers ? Scrapers as backbone of Big Data Importance in Industry level as well as indie projects.
  4. 4. Why choose python ? Robust, flexible and powerful Relatively lesser development time Easy to learn and use Huge standard library, thorough documentation and helpful community.
  5. 5. Scraping libraries in python lxml BS4 Scrapy Mechanize twill ...
  6. 6. Scraper Demonstration in bs4 Inspect the element Find the node Plug it in (some code and pictures)
  7. 7. Making Scrapers faster Thread and Queues (some code ...)
  8. 8. Detecting bottlenecks Introduction to profiling in python (some code)
  9. 9. Making Scrapers even faster Using memcache to reduce redundant scraping (some code)
  10. 10. Thats it !! (links to the code present in these slides)
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×