SlideShare a Scribd company logo
1 of 7
Collect
Data???
API
Based
Web
Scrapping
In many cases where a website wants you to have access to their data, they will have a web-based
API that makes downloading the data easier than reading it from an HTML page.
One big advantage of using an API is that it is an officially sanctioned method of getting access to
the data. Aside from the legal protections, using an API means that the website owners and
programmers know that people are using it, and won’t change the API without a lot of advance
warning. If you are scraping a webpage and depending on it’s format to remain unchanged so that
you can find the data you want, any time the website gets an update your web-scraper will break!
Web scraping is the practice of using a computer program to sift through a web page and gather the data
that you need in a format most useful to you while at the same time preserving the structure of the data.
However, before learning how to do this, you should understand that some websites put legal (and ethical)
restrictions on the data they provide. In some cases a website intends the data to be seen by humans, but
do not want a program to download the data automatically. Website owners have several legitimate reasons
for not wanting you to download their data automatically. In some cases the website owners worry about
programs overloading their server.
Scrapy BeautifulSoup RegEx Jsoup
• response.xpath('//title/text()').extract_first()XPath
• response.css('title::text').extract()CSS Path
• response.css('title::text').re(r'LG.*')Regex
CSV
JSON
XML
Text

More Related Content

Similar to Web Scraping

Datasets, APIs, and Web Scraping
Datasets, APIs, and Web ScrapingDatasets, APIs, and Web Scraping
Datasets, APIs, and Web ScrapingDamian T. Gordon
 
What are the different types of web scraping approaches
What are the different types of web scraping approachesWhat are the different types of web scraping approaches
What are the different types of web scraping approachesAparna Sharma
 
Instagram Scraping Using Selenium.docx
Instagram Scraping Using Selenium.docxInstagram Scraping Using Selenium.docx
Instagram Scraping Using Selenium.docxRohitBatta4
 
Is web scraping legal or not?
Is web scraping legal or not?Is web scraping legal or not?
Is web scraping legal or not?Aparna Sharma
 
A_Complete_Guide_to_API_Development.pdf
A_Complete_Guide_to_API_Development.pdfA_Complete_Guide_to_API_Development.pdf
A_Complete_Guide_to_API_Development.pdfPamRobert
 
Data Collection from Social Media Platforms
Data Collection from Social Media PlatformsData Collection from Social Media Platforms
Data Collection from Social Media PlatformsMahmoud Yasser
 
Guide To API Development – Cost, Importance, Types, Tools, Terminology, and B...
Guide To API Development – Cost, Importance, Types, Tools, Terminology, and B...Guide To API Development – Cost, Importance, Types, Tools, Terminology, and B...
Guide To API Development – Cost, Importance, Types, Tools, Terminology, and B...Techugo
 
Top 13 web scraping tools in 2022
Top 13 web scraping tools in 2022Top 13 web scraping tools in 2022
Top 13 web scraping tools in 2022Aparna Sharma
 
Services, Apps and the API Powered Web
Services, Apps and the API Powered WebServices, Apps and the API Powered Web
Services, Apps and the API Powered WebSteven Willmott
 
APIs: the Glue of Cloud Computing
APIs: the Glue of Cloud ComputingAPIs: the Glue of Cloud Computing
APIs: the Glue of Cloud Computing3scale
 
How to Develop APIs - Importance, Types, Tools, Terminology, and Best Practic...
How to Develop APIs - Importance, Types, Tools, Terminology, and Best Practic...How to Develop APIs - Importance, Types, Tools, Terminology, and Best Practic...
How to Develop APIs - Importance, Types, Tools, Terminology, and Best Practic...Techugo Inc
 
An SEO optimized website is best charged up.pdf
An SEO optimized website is best charged up.pdfAn SEO optimized website is best charged up.pdf
An SEO optimized website is best charged up.pdfMindfire LLC
 
Biggest Challenges behind SERP Scraping in 2023
Biggest Challenges behind SERP Scraping in 2023Biggest Challenges behind SERP Scraping in 2023
Biggest Challenges behind SERP Scraping in 2023sonu jain
 
Shane Media DMA - Essential SEO Tools For Agencies
Shane Media  DMA - Essential SEO Tools For AgenciesShane Media  DMA - Essential SEO Tools For Agencies
Shane Media DMA - Essential SEO Tools For AgenciesShane Media DMA
 
Guide To API Development.pdf
Guide To API Development.pdfGuide To API Development.pdf
Guide To API Development.pdfTechugo
 
iProspect - Tech SEO - Task - 17/12/2019
iProspect - Tech SEO - Task - 17/12/2019iProspect - Tech SEO - Task - 17/12/2019
iProspect - Tech SEO - Task - 17/12/2019Nick Samuel
 

Similar to Web Scraping (20)

Datasets, APIs, and Web Scraping
Datasets, APIs, and Web ScrapingDatasets, APIs, and Web Scraping
Datasets, APIs, and Web Scraping
 
What are the different types of web scraping approaches
What are the different types of web scraping approachesWhat are the different types of web scraping approaches
What are the different types of web scraping approaches
 
Instagram Scraping Using Selenium.docx
Instagram Scraping Using Selenium.docxInstagram Scraping Using Selenium.docx
Instagram Scraping Using Selenium.docx
 
Is web scraping legal or not?
Is web scraping legal or not?Is web scraping legal or not?
Is web scraping legal or not?
 
What is web scraping?
What is web scraping?What is web scraping?
What is web scraping?
 
A_Complete_Guide_to_API_Development.pdf
A_Complete_Guide_to_API_Development.pdfA_Complete_Guide_to_API_Development.pdf
A_Complete_Guide_to_API_Development.pdf
 
Data Collection from Social Media Platforms
Data Collection from Social Media PlatformsData Collection from Social Media Platforms
Data Collection from Social Media Platforms
 
Guide To API Development – Cost, Importance, Types, Tools, Terminology, and B...
Guide To API Development – Cost, Importance, Types, Tools, Terminology, and B...Guide To API Development – Cost, Importance, Types, Tools, Terminology, and B...
Guide To API Development – Cost, Importance, Types, Tools, Terminology, and B...
 
Top 13 web scraping tools in 2022
Top 13 web scraping tools in 2022Top 13 web scraping tools in 2022
Top 13 web scraping tools in 2022
 
Services, Apps and the API Powered Web
Services, Apps and the API Powered WebServices, Apps and the API Powered Web
Services, Apps and the API Powered Web
 
APIs: the Glue of Cloud Computing
APIs: the Glue of Cloud ComputingAPIs: the Glue of Cloud Computing
APIs: the Glue of Cloud Computing
 
How to Develop APIs - Importance, Types, Tools, Terminology, and Best Practic...
How to Develop APIs - Importance, Types, Tools, Terminology, and Best Practic...How to Develop APIs - Importance, Types, Tools, Terminology, and Best Practic...
How to Develop APIs - Importance, Types, Tools, Terminology, and Best Practic...
 
An SEO optimized website is best charged up.pdf
An SEO optimized website is best charged up.pdfAn SEO optimized website is best charged up.pdf
An SEO optimized website is best charged up.pdf
 
DFY Suite
DFY SuiteDFY Suite
DFY Suite
 
Biggest Challenges behind SERP Scraping in 2023
Biggest Challenges behind SERP Scraping in 2023Biggest Challenges behind SERP Scraping in 2023
Biggest Challenges behind SERP Scraping in 2023
 
API full form
API full formAPI full form
API full form
 
Shane Media DMA - Essential SEO Tools For Agencies
Shane Media  DMA - Essential SEO Tools For AgenciesShane Media  DMA - Essential SEO Tools For Agencies
Shane Media DMA - Essential SEO Tools For Agencies
 
API.docx
API.docxAPI.docx
API.docx
 
Guide To API Development.pdf
Guide To API Development.pdfGuide To API Development.pdf
Guide To API Development.pdf
 
iProspect - Tech SEO - Task - 17/12/2019
iProspect - Tech SEO - Task - 17/12/2019iProspect - Tech SEO - Task - 17/12/2019
iProspect - Tech SEO - Task - 17/12/2019
 

Recently uploaded

1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home ServiceSapana Sha
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一F La
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
Data Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxData Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxFurkanTasci3
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAmazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAbdelrhman abooda
 

Recently uploaded (20)

Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service
 
Decoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in ActionDecoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in Action
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
Data Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxData Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptx
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAmazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
 

Web Scraping

  • 1.
  • 3. In many cases where a website wants you to have access to their data, they will have a web-based API that makes downloading the data easier than reading it from an HTML page. One big advantage of using an API is that it is an officially sanctioned method of getting access to the data. Aside from the legal protections, using an API means that the website owners and programmers know that people are using it, and won’t change the API without a lot of advance warning. If you are scraping a webpage and depending on it’s format to remain unchanged so that you can find the data you want, any time the website gets an update your web-scraper will break!
  • 4. Web scraping is the practice of using a computer program to sift through a web page and gather the data that you need in a format most useful to you while at the same time preserving the structure of the data. However, before learning how to do this, you should understand that some websites put legal (and ethical) restrictions on the data they provide. In some cases a website intends the data to be seen by humans, but do not want a program to download the data automatically. Website owners have several legitimate reasons for not wanting you to download their data automatically. In some cases the website owners worry about programs overloading their server. Scrapy BeautifulSoup RegEx Jsoup
  • 5.