Web Scraping with Python
by @sauravtom
(work is progress …)
Data Scraping
Automated Process
Specify css or xml path
grab the content
store it in a database
Who uses Scrapers ?
Scrapers as backbone of Big Data
Importance in Industry level as well as indie
projects.
Why choose python ?
Robust, flexible and powerful
Relatively lesser development time
Easy to learn and use
Huge standard library, thorough documentation and
helpful community.
Scraping libraries in python
lxml
BS4
Scrapy
Mechanize
twill
...
Scraper Demonstration in bs4
Inspect the element
Find the node
Plug it in
(some code and pictures)
Making Scrapers faster
Thread and Queues
(some code ...)
Detecting bottlenecks
Introduction to profiling in python
(some code)
Making Scrapers even faster
Using memcache to reduce redundant
scraping
(some code)
Thats it !!
(links to the code present in these slides)

Web scraping in python