Day 4 - Advance Python - Ground Gurus

Advance Python
Day 4
CHA PLADIN
cpladin@ibex.co | cpladin@ama.edu
Introduction to Web Scraping and Data Analysis

AGENDA
Web scraping
HTML Tag familiarization
and inspecting elements
Request
Data scraping process
File reading and writing
(.csv)

QUICK
RECAP
● Introduction to classes
● Instance of a class,
instance variables (self)
and class variables.
● Inheritance (parent and
child class)
● Method overriding

Activity - Super Idol
Create a parent class called Worker which has
instance variables such as name, email_address,
employee_id and basic_pay
Create a subclass CEO which will inherit the Worker
attributes and with additional
employee_department which is a list and only
assign one value.

Web scraping
- data scraping used for extracting data
from websites;
- refers to automated processes
implemented using a bot or web
crawler.

Libraries to use
● beautifulsoup4
● lxml
● requests
● html5lib

Libraries
Beautifulsoup4 - library designed for quick turnaround projects like
screen-scraping.
lxml - easy-to-use library for processing XML and HTML in the
Python language.
Requests - allows you to send organic, grass-fed HTTP/1.1 requests,
without the need for manual labor.
Html5lib - Standards-compliant library for parsing and serializing
HTML documents and fragments in Python

Basic Rules of Web Scraping
● Use an API if one is provided, instead of scraping
data.
● Respect the Terms of Service (ToS).
● Respect the rules of robots.txt.
● Use a reasonable crawl rate. Respect the crawl-delay
setting provided in robots.txt; if there's none, use a
conservative crawl rate (e.g. 1 request per 10-15
seconds).

Practice
We will scrape data from
https://old.yellow-pages.ph/search/jolli
bee/metro-manila/page-1, getting few
information such establishment
name, company, address and
phone number.

Import Libraries and setup source

PRACTICE 2
Create a program to scrape
http://books.toscrape.com/catalogue/page-2.html
in which will allows us to generate the following:
● Product/Book Name
● Book Link
● Star rating

Day 4 - Advance Python - Ground Gurus

More Related Content

What's hot

Similar to Day 4 - Advance Python - Ground Gurus

More from Chariza Pladin

Recently uploaded

Day 4 - Advance Python - Ground Gurus