SlideShare a Scribd company logo
INTRODUCTION TO
WEB SCRAPING USING
PYTHON
Tushar Mittal
@techytushar
AGENDA
What we’ll do
What is Web Scraping?
Need of Web Scraping.
Real Life Used Cases.
Workflow and Libraries used.
Demo (Scrape a Website)
Rules of Web Scraping.
WEB SCRAPING
What is it?
Web Scraping is a technique to fetch data and
information from websites.
Everything you see on a webpage can be
scraped.
Can be done in most programming languages,
we’ll use Python (coz its a python meetup :p).
NEED OF WEB SCRAPING
But I Can Just Copy/Paste the Data
What about a thousand webpages or even more.
When no API is provided or there is only limited
number of requests.
Online tools with less customizations.
Learn something new and be your own boss!
USAGE
Real Life Used Cases
Web Crawlers
E-Commerce price comparer.
Preparing dataset for your ML model.
Scraping Social Media Profiles.
Weather Data.
(Sky’s the limit)
WORKFLOW & LIBRARIES
Steps and Tools Involved
Send Request and Load the webpage.
(Requests, urllib, httplib)
Parse the content for desired data.
(Beautiful Soup, re, Scrapy)
Store the data the way you want.
LET’S SCRAPE SOME DATA
RULES OF WEB SCRAPING
Beware!
Don’t crawl at disruptive rate.
Read T&C of Use.
Data is valuable use it wisely.

More Related Content

What's hot

What is web scraping?
What is web scraping?What is web scraping?
What is web scraping?
Brijesh Prajapati
 
Web Scraping Basics
Web Scraping BasicsWeb Scraping Basics
Web Scraping Basics
Kyle Banerjee
 
What is Web-scraping?
What is Web-scraping?What is Web-scraping?
What is Web-scraping?
Yu-Chang Ho
 
Web Scraping and Data Extraction Service
Web Scraping and Data Extraction ServiceWeb Scraping and Data Extraction Service
Web Scraping and Data Extraction Service
PromptCloud
 
Web scraping
Web scrapingWeb scraping
Web scraping
Ashley Davis
 
Web Scraping
Web ScrapingWeb Scraping
Web Scraping
primeteacher32
 
Scraping data from the web and documents
Scraping data from the web and documentsScraping data from the web and documents
Scraping data from the web and documents
Tommy Tavenner
 
Skillshare - Introduction to Data Scraping
Skillshare - Introduction to Data ScrapingSkillshare - Introduction to Data Scraping
Skillshare - Introduction to Data Scraping
School of Data
 
Web development | Derin Dolen
Web development | Derin Dolen Web development | Derin Dolen
Web development | Derin Dolen
Derin Dolen
 
Web Development
Web DevelopmentWeb Development
Web Development
Aditya Raman
 
Intro to beautiful soup
Intro to beautiful soupIntro to beautiful soup
Intro to beautiful soup
Andreas Chandra
 
Front end web development
Front end web developmentFront end web development
Front end web development
viveksewa
 
presentation in html,css,javascript
presentation in html,css,javascriptpresentation in html,css,javascript
presentation in html,css,javascript
FaysalAhammed5
 
Beautiful soup
Beautiful soupBeautiful soup
Beautiful soup
DeepakRaghavan4
 
Web Scrapping Using Python
Web Scrapping Using PythonWeb Scrapping Using Python
Web Scrapping Using Python
ComputerScienceJunct
 
Web Development
Web DevelopmentWeb Development
Web Development
Lena Petsenchuk
 
Introduction to web development
Introduction to web developmentIntroduction to web development
Introduction to web developmentMohammed Safwat
 
ppt of web development for diploma student
ppt of web development for diploma student ppt of web development for diploma student
ppt of web development for diploma student
Abhishekchauhan863165
 
Introduction to Web Development
Introduction to Web DevelopmentIntroduction to Web Development
Introduction to Web Development
Parvez Mahbub
 

What's hot (20)

What is web scraping?
What is web scraping?What is web scraping?
What is web scraping?
 
Web Scraping Basics
Web Scraping BasicsWeb Scraping Basics
Web Scraping Basics
 
What is Web-scraping?
What is Web-scraping?What is Web-scraping?
What is Web-scraping?
 
Web Scraping and Data Extraction Service
Web Scraping and Data Extraction ServiceWeb Scraping and Data Extraction Service
Web Scraping and Data Extraction Service
 
Web scraping
Web scrapingWeb scraping
Web scraping
 
Web Scraping
Web ScrapingWeb Scraping
Web Scraping
 
Scraping data from the web and documents
Scraping data from the web and documentsScraping data from the web and documents
Scraping data from the web and documents
 
Skillshare - Introduction to Data Scraping
Skillshare - Introduction to Data ScrapingSkillshare - Introduction to Data Scraping
Skillshare - Introduction to Data Scraping
 
Web development | Derin Dolen
Web development | Derin Dolen Web development | Derin Dolen
Web development | Derin Dolen
 
Web Development
Web DevelopmentWeb Development
Web Development
 
Intro to beautiful soup
Intro to beautiful soupIntro to beautiful soup
Intro to beautiful soup
 
Front end web development
Front end web developmentFront end web development
Front end web development
 
presentation in html,css,javascript
presentation in html,css,javascriptpresentation in html,css,javascript
presentation in html,css,javascript
 
Beautiful soup
Beautiful soupBeautiful soup
Beautiful soup
 
Web Scrapping Using Python
Web Scrapping Using PythonWeb Scrapping Using Python
Web Scrapping Using Python
 
Web Development
Web DevelopmentWeb Development
Web Development
 
Web Crawler
Web CrawlerWeb Crawler
Web Crawler
 
Introduction to web development
Introduction to web developmentIntroduction to web development
Introduction to web development
 
ppt of web development for diploma student
ppt of web development for diploma student ppt of web development for diploma student
ppt of web development for diploma student
 
Introduction to Web Development
Introduction to Web DevelopmentIntroduction to Web Development
Introduction to Web Development
 

Similar to Introduction to Web Scraping using Python and Beautiful Soup

Mashup Application at Barcampbkk2
Mashup Application at Barcampbkk2Mashup Application at Barcampbkk2
Mashup Application at Barcampbkk2bunthidj
 
Scrappy
ScrappyScrappy
Scrappy
Vishwas N
 
Web scraping with BeautifulSoup, LXML, RegEx and Scrapy
Web scraping with BeautifulSoup, LXML, RegEx and ScrapyWeb scraping with BeautifulSoup, LXML, RegEx and Scrapy
Web scraping with BeautifulSoup, LXML, RegEx and Scrapy
LITTINRAJAN
 
Living On A Cloud, Dr Keith Marlow
Living On A Cloud, Dr Keith MarlowLiving On A Cloud, Dr Keith Marlow
Living On A Cloud, Dr Keith Marlow
guesta04b0
 
Jeremy cabral search marketing summit - scraping data-driven content (1)
Jeremy cabral   search marketing summit - scraping data-driven content (1)Jeremy cabral   search marketing summit - scraping data-driven content (1)
Jeremy cabral search marketing summit - scraping data-driven content (1)
Jeremy Cabral
 
Api strategy and practice
Api strategy and practiceApi strategy and practice
Api strategy and practiceritc
 
API Athens Meetup - API standards 25-6-2014
API Athens Meetup - API standards 25-6-2014API Athens Meetup - API standards 25-6-2014
API Athens Meetup - API standards 25-6-2014
openi_ict
 
API Athens Meetup - API standards 25-6-2014
API Athens Meetup - API standards   25-6-2014API Athens Meetup - API standards   25-6-2014
API Athens Meetup - API standards 25-6-2014Michael Petychakis
 
Web 2.0 for IA's
Web 2.0 for IA'sWeb 2.0 for IA's
Web 2.0 for IA's
Dave Malouf
 
Mining the Social Web for Fun and Profit: A Getting Started Guide
Mining the Social Web for Fun and Profit: A Getting Started GuideMining the Social Web for Fun and Profit: A Getting Started Guide
Mining the Social Web for Fun and Profit: A Getting Started Guide
Matthew Russell
 
Big data and APIs for PHP developers - SXSW 2011
Big data and APIs for PHP developers - SXSW 2011Big data and APIs for PHP developers - SXSW 2011
Big data and APIs for PHP developers - SXSW 2011
Eli White
 
Semantic Web Science
Semantic Web ScienceSemantic Web Science
Semantic Web Science
James Hendler
 
Mining the Social Web for Fun and Profit: A Getting Started Guide
Mining the Social Web for Fun and Profit: A Getting Started GuideMining the Social Web for Fun and Profit: A Getting Started Guide
Mining the Social Web for Fun and Profit: A Getting Started Guide
Matthew Russell
 
Data Collection from Social Media Platforms
Data Collection from Social Media PlatformsData Collection from Social Media Platforms
Data Collection from Social Media Platforms
Mahmoud Yasser
 
CSOM (Client Side Object Model). Explained @ SharePoint Saturday Houston
CSOM (Client Side Object Model). Explained @ SharePoint Saturday HoustonCSOM (Client Side Object Model). Explained @ SharePoint Saturday Houston
CSOM (Client Side Object Model). Explained @ SharePoint Saturday Houston
Kunaal Kapoor
 
So You Want to Be a SharePoint Developer - SPS Utah 2015
So You Want to Be a SharePoint Developer - SPS Utah 2015So You Want to Be a SharePoint Developer - SPS Utah 2015
So You Want to Be a SharePoint Developer - SPS Utah 2015
Ryan Schouten
 
Leading Your Business To Success & The Cloud
Leading Your Business To Success & The CloudLeading Your Business To Success & The Cloud
Leading Your Business To Success & The Cloud
Richard Harbridge
 
Api strategy and practice
Api strategy and practiceApi strategy and practice
Api strategy and practice
Dave Goldberg
 
Services, Apps and the API Powered Web
Services, Apps and the API Powered WebServices, Apps and the API Powered Web
Services, Apps and the API Powered Web
Steven Willmott
 

Similar to Introduction to Web Scraping using Python and Beautiful Soup (20)

Mashup Application at Barcampbkk2
Mashup Application at Barcampbkk2Mashup Application at Barcampbkk2
Mashup Application at Barcampbkk2
 
Scrappy
ScrappyScrappy
Scrappy
 
Web scraping with BeautifulSoup, LXML, RegEx and Scrapy
Web scraping with BeautifulSoup, LXML, RegEx and ScrapyWeb scraping with BeautifulSoup, LXML, RegEx and Scrapy
Web scraping with BeautifulSoup, LXML, RegEx and Scrapy
 
Living On A Cloud, Dr Keith Marlow
Living On A Cloud, Dr Keith MarlowLiving On A Cloud, Dr Keith Marlow
Living On A Cloud, Dr Keith Marlow
 
Jeremy cabral search marketing summit - scraping data-driven content (1)
Jeremy cabral   search marketing summit - scraping data-driven content (1)Jeremy cabral   search marketing summit - scraping data-driven content (1)
Jeremy cabral search marketing summit - scraping data-driven content (1)
 
Api strategy and practice
Api strategy and practiceApi strategy and practice
Api strategy and practice
 
API Athens Meetup - API standards 25-6-2014
API Athens Meetup - API standards 25-6-2014API Athens Meetup - API standards 25-6-2014
API Athens Meetup - API standards 25-6-2014
 
API Athens Meetup - API standards 25-6-2014
API Athens Meetup - API standards   25-6-2014API Athens Meetup - API standards   25-6-2014
API Athens Meetup - API standards 25-6-2014
 
Web 2.0 for IA's
Web 2.0 for IA'sWeb 2.0 for IA's
Web 2.0 for IA's
 
Mining the Social Web for Fun and Profit: A Getting Started Guide
Mining the Social Web for Fun and Profit: A Getting Started GuideMining the Social Web for Fun and Profit: A Getting Started Guide
Mining the Social Web for Fun and Profit: A Getting Started Guide
 
Big data and APIs for PHP developers - SXSW 2011
Big data and APIs for PHP developers - SXSW 2011Big data and APIs for PHP developers - SXSW 2011
Big data and APIs for PHP developers - SXSW 2011
 
Semantic Web Science
Semantic Web ScienceSemantic Web Science
Semantic Web Science
 
Mining the Social Web for Fun and Profit: A Getting Started Guide
Mining the Social Web for Fun and Profit: A Getting Started GuideMining the Social Web for Fun and Profit: A Getting Started Guide
Mining the Social Web for Fun and Profit: A Getting Started Guide
 
Null 1
Null 1Null 1
Null 1
 
Data Collection from Social Media Platforms
Data Collection from Social Media PlatformsData Collection from Social Media Platforms
Data Collection from Social Media Platforms
 
CSOM (Client Side Object Model). Explained @ SharePoint Saturday Houston
CSOM (Client Side Object Model). Explained @ SharePoint Saturday HoustonCSOM (Client Side Object Model). Explained @ SharePoint Saturday Houston
CSOM (Client Side Object Model). Explained @ SharePoint Saturday Houston
 
So You Want to Be a SharePoint Developer - SPS Utah 2015
So You Want to Be a SharePoint Developer - SPS Utah 2015So You Want to Be a SharePoint Developer - SPS Utah 2015
So You Want to Be a SharePoint Developer - SPS Utah 2015
 
Leading Your Business To Success & The Cloud
Leading Your Business To Success & The CloudLeading Your Business To Success & The Cloud
Leading Your Business To Success & The Cloud
 
Api strategy and practice
Api strategy and practiceApi strategy and practice
Api strategy and practice
 
Services, Apps and the API Powered Web
Services, Apps and the API Powered WebServices, Apps and the API Powered Web
Services, Apps and the API Powered Web
 

Recently uploaded

Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
Abida Shariff
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
Bhaskar Mitra
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
CatarinaPereira64715
 

Recently uploaded (20)

Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
 

Introduction to Web Scraping using Python and Beautiful Soup

  • 1. INTRODUCTION TO WEB SCRAPING USING PYTHON Tushar Mittal @techytushar
  • 2. AGENDA What we’ll do What is Web Scraping? Need of Web Scraping. Real Life Used Cases. Workflow and Libraries used. Demo (Scrape a Website) Rules of Web Scraping.
  • 3. WEB SCRAPING What is it? Web Scraping is a technique to fetch data and information from websites. Everything you see on a webpage can be scraped. Can be done in most programming languages, we’ll use Python (coz its a python meetup :p).
  • 4. NEED OF WEB SCRAPING But I Can Just Copy/Paste the Data What about a thousand webpages or even more. When no API is provided or there is only limited number of requests. Online tools with less customizations. Learn something new and be your own boss!
  • 5. USAGE Real Life Used Cases Web Crawlers E-Commerce price comparer. Preparing dataset for your ML model. Scraping Social Media Profiles. Weather Data. (Sky’s the limit)
  • 6. WORKFLOW & LIBRARIES Steps and Tools Involved Send Request and Load the webpage. (Requests, urllib, httplib) Parse the content for desired data. (Beautiful Soup, re, Scrapy) Store the data the way you want.
  • 8. RULES OF WEB SCRAPING Beware! Don’t crawl at disruptive rate. Read T&C of Use. Data is valuable use it wisely.