SlideShare a Scribd company logo
1 of 9
Download to read offline
ScrapingZomato'sTop100Restaurantsusing
Selenium
Launched in 2010, This technology platform connects customers, restaurant partners and delivery partners,
serving their multiple needs.
Customers use their platform to search and discover restaurants, read and write customer generated reviews and
view and upload photos, order food delivery, book a table and make payments while dining-out at restaurants.
On the other hand, They provide restaurant partners with industry-speci c marketing tools which enable them to
engage and acquire customers to grow their business while also providing a reliable and e cient last mile delivery
service.If you have not come across "ZOMATO" yet, welcome to planet Earth and do checkout zomato
Let'swalkyouthrough'WEBSCRAPING'!!
Web scraping is an automatic method to obtain large amounts of data from websites. Most of this data is
unstructured data in an HTML format which is then converted into structured data in a spreadsheet or a database
so that it can be used in various applications.
There are many different ways to perform web scraping to obtain data from websites. These include using online
services, particular API’s or even creating your code for web scraping from scratch. Many large websites, like
Google, Twitter, Facebook, StackOver ow, etc. have API’s that allow you to access their data in a structured
format.
This is the best option, but there are other sites that don’t allow users to access large amounts of data in a
structured form or they are simply not that technologically advanced. In that situation, it’s best to use Web
Scraping to scrape the website for data, To learn more checkout webscraping
Objective:
Scraping the best 100 listings on zomato by parsing the information from this website in the form of Tabular data.
Listofdetailswearelookingonwebsite:
1-Top 100 Listings Of Restaurants For Each Location.
2-The 'Name' Of The Restaurants For Each Location.
3-The 'Ratings' Of Dining At The Restaurants For Each Location.
4-The 'Link' Of Restaurants For Each Location.
Outlineoftheproject:
1- Understanding The Structure Of Zomato's Website
2- Installing And Importing Required Libraries
3- Simulating The Page And Extracting The Name, Ratings, URLs Of Different restaurants From Website using
selenium.
4- Accessing each Restaurants And Building A Method To Locate Exact Location Of Restaurant Name,Ratings
And Urls For Top 100 Places.
5- Parsing The Top 100 Restaurants For Each Location consisting Details Of Name Of The Place, Dining Ratings Of
The Place, Link Of The Place, Using Helper Functions.
6- Storing The Extracted Data Into A Dictionary.
7- Compiling All The Data Into A DataFrame Using Pandas And Saving The Data Into CSV File.
Use the "Run" button to execute the code.
By The End Of The Project we will Create DataFrame In The Following Format:
ProjectCodeOnReplit
The code which has been used for this project is publicly available at the replit platform.Feel free to explore the
code and make changes for the betterment of the code to make it more e cient.Let's get on the road to identify
how the details are fetched and scraped for this project.
Replit Platform
TheListOfPackagesUsed
FIRST-- SELENIUM -- what is selenium
SECOND -- PANDAS -- what is pandas
THIRD -- TIME -- why do we use TIME
FOURTH -- OS -- why do we use OS
Let'sDiscussTheStepsInTheProject
1STSTEP
Atthebeginningoftheproject,weimporttherequiredpackagesneeded,asshown
below:-
import os
import pandas as pd
import time
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.common.exceptions import NoSuchElementException
3RDSTEP
Creatingahelperfunction,togetthelistofdetailsfromthewebsitecontaining
'Restaurants'Name.Wecallit'res_name(driver)':-
def res_name(driver):
2NDSTEP
Let'screateafunctiontocreatethewebdriverthatwewillusetoextractwebpage
information.Thedriverfunctionisasfollows:-
def get_driver():
chrome_options=Options()
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--headless')
chrome_options.add_argument('--disable-dev-shm-usgae')
#to access the zomato's website we need to setup a 'user-agent' access, we cant access the website without
creating a standard 'user-agent'.learn more about user-agent setup
user_agent = 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko)
chrome_options.add_argument('user-agent= {0}'.format(user_agent))
driver = webdriver.Chrome(options=chrome_options)
return driver
#calling the driver to carry out further steps
driver=get_driver()
place_divs_tag = 'sc-bke1zw-0'
places = driver.find_element(By.CLASS_NAME,place_divs_tag)
tags=places.find_elements(By.CLASS_NAME,'sc-bke1zw-1')
res_names = []
for i in tags:
res_names.append(i.find_element(By.XPATH,".//div/section/div[1]/a").text)
return res_names[:100]
#here we fetched for the common class_name having the details of all the required restaurants,then we fetch the
common way to call all the 'NAME' of the places by using XPATH. #to learn about XPATH click here #we need to
understand the html code structure before we scrape any website.to learn about html click here
ALittleBriefOnHTMLAndXPATH
Before we go deeper into the explanation of the code, it is imperative that readers have a basic understanding of
HTML, the language of the web, and Xpaths, which are used to navigate through elements and attributes in an
HTML/XML document. HTML (HyperText Markup Language) is the code that is used to structure a web page and
its content. For example, content could be structured within a set of paragraphs, a list of bulleted points, or using
images and data tables. We will be using Xpaths to point to tags, attributes, and elements of an HTML webpage to
extract required information such as, in our case, Restaurants name, Restaurants Rating's, Restaurants URL's etc.
To avoid putting too much information into one notebook, and to save time for readers who are already familiar
with HTML and Xpaths.
4THSTEP
Creatingahelperfunction,togetthelistofdetailsforURL'Sfromthewebsite.Wecall
it'res_url(driver)':-
def res_url(driver):
place_divs_tag = 'sc-bke1zw-0'
places = driver.find_element(By.CLASS_NAME,place_divs_tag)
tags=places.find_elements(By.CLASS_NAME,'sc-bke1zw-1')
urls = []
for i in tags:
urls.append(i.find_element(By.TAG_NAME,"a").get_attribute('href'))
return urls[:100]
5THSTEP
Creatingahelperfunction,togetthelistofdetailsforRating'sfromthewebsite.We
callit'res_ratings(driver)':-
def res_ratings(driver):
place_divs_tag = 'sc-bke1zw-0'
places = driver.find_element(By.CLASS_NAME,place_divs_tag)
tags=places.find_elements(By.CLASS_NAME,'sc-bke1zw-1')
ratings = []
for i in tags:
try: ratings.append(i.find_element(By.CLASS_NAME,'sc-1q7bklc-5').text)
except:
ratings.append('.')
return ratings[:100]
To avoid running into exception while running this code, we make use of the method of 'TRY AND EXCEPT". here
you can learn more about try and except
6THSTEP
We create a parser function named "get_all_cities()" to extract the required details from the website containing the
required NAME,RATINGS,LINK in the form of dictionary. We create such a function which can be e cient
irrespective of the Location for eg: --mumbai,pune,bangalore,delhi,chandigarh etc..by creating this function we
get the required details which was the objective of this project and it can be done for any location, in this case we
are scraping for ' Mumbai , Bangalore, Pune '.
def get_all_cities():
cities = ['mumbai','bangalore','pune']
dic={'NAME':[],'RATINGS':[],'LINK':[]}
for i in cities:
base_url = 'https://www.zomato.com/'+ i + '/great-food-no-bull'
driver.get(base_url)
dic['NAME'].extend(res_name(driver))
dic['RATINGS'].extend(res_ratings(driver))
dic['LINK'].extend(res_url(driver))
return dic
7THSTEP
WecreateapandasDataFrameoftheparseddataandexportittoaCSVfilenamed
best100.csvandachievetheexpectedresultasshownagainbelow,
SUMMARY
It is quite fascinating that the amount of ease Webscraping brings to the life of all the CODERS. Summing up, We
essentially built a code in the Following steps: -we setup the required packages selenium,pandas,time and os.
-we create a helper function to get the Names,Ratings,Url's for the top 100 listings.
-we create a parser function to get the details of Name,Rating's,Url's for the top 100 listings for Three location's in
dictionary form.
-we create the proper DataFrame and save the work into a .CSV format.
FUTUREWORK
-The code can be accessed to get the different location and fetch the similar details from those location's by
changing the cities.
-more details like the restaurants contact number can be fetched using proper path for each place.
-we can have a understanding of most consistent restaurant which is likely to be in the top 50 listings on an
average and help provide proper understanding for some investors who would possibly like to invest in successful
restaurants.
-this can be done for any location which is the beauty of this project, which can be always worked on for the
betterment with time.
-The Project can be setup on a service like AWS Lambda for automatic timed scraping.
REFERENCES
Jovian web scraping with python
webscraping
Complete guide on Selenium
HTML Tutorial
XPATH
pandas
exception
!pip install jovian --upgrade --quiet
import jovian
# Execute this to save new versions of the notebook
jovian.commit(project="web-scraping-project")
[jovian] Updating notebook "hai-advisoryservices/web-scraping-project" on
https://jovian.ai
[jovian] Committed successfully! https://jovian.ai/hai-advisoryservices/web-scraping-
project
'https://jovian.ai/hai-advisoryservices/web-scraping-project'

More Related Content

Similar to web-scraping-project (1).pdf

Productionizing Real-time Serving With MLflow
Productionizing Real-time Serving With MLflowProductionizing Real-time Serving With MLflow
Productionizing Real-time Serving With MLflowDatabricks
 
Secure Development on the Salesforce Platform - Part 2
Secure Development on the Salesforce Platform - Part 2Secure Development on the Salesforce Platform - Part 2
Secure Development on the Salesforce Platform - Part 2Salesforce Developers
 
Research on Key Technology of Web Reptile
Research on Key Technology of Web ReptileResearch on Key Technology of Web Reptile
Research on Key Technology of Web ReptileIRJESJOURNAL
 
Developing Offline-Capable Apps with the Salesforce Mobile SDK and SmartStore
Developing Offline-Capable Apps with the Salesforce Mobile SDK and SmartStoreDeveloping Offline-Capable Apps with the Salesforce Mobile SDK and SmartStore
Developing Offline-Capable Apps with the Salesforce Mobile SDK and SmartStoreSalesforce Developers
 
Getting More Traffic From Search Advanced Seo For Developers Presentation
Getting More Traffic From Search  Advanced Seo For Developers PresentationGetting More Traffic From Search  Advanced Seo For Developers Presentation
Getting More Traffic From Search Advanced Seo For Developers PresentationSeo Indonesia
 
A sample system_design_costestimation_of_webstack_at_aws
A sample system_design_costestimation_of_webstack_at_awsA sample system_design_costestimation_of_webstack_at_aws
A sample system_design_costestimation_of_webstack_at_awsSumit Arora
 
IRJET - Review on Search Engine Optimization
IRJET - Review on Search Engine OptimizationIRJET - Review on Search Engine Optimization
IRJET - Review on Search Engine OptimizationIRJET Journal
 
Oracle APEX URLs Untangled & SEOptimized
Oracle APEX URLs Untangled & SEOptimizedOracle APEX URLs Untangled & SEOptimized
Oracle APEX URLs Untangled & SEOptimizedChristian Rokitta
 
Getting Started With Apex REST Services
Getting Started With Apex REST ServicesGetting Started With Apex REST Services
Getting Started With Apex REST ServicesSalesforce Developers
 
Advanced SEO for Web Developers
Advanced SEO for Web DevelopersAdvanced SEO for Web Developers
Advanced SEO for Web DevelopersNathan Buggia
 
RESTful SOA - 中科院暑期讲座
RESTful SOA - 中科院暑期讲座RESTful SOA - 中科院暑期讲座
RESTful SOA - 中科院暑期讲座Li Yi
 
Big Data Web applications for Interactive Hadoop by ENRICO BERTI at Big Data...
 Big Data Web applications for Interactive Hadoop by ENRICO BERTI at Big Data... Big Data Web applications for Interactive Hadoop by ENRICO BERTI at Big Data...
Big Data Web applications for Interactive Hadoop by ENRICO BERTI at Big Data...Big Data Spain
 
Azure appservice
Azure appserviceAzure appservice
Azure appserviceRaju Kumar
 
Mobile strategy workshop 2013 wordcamp
Mobile strategy workshop   2013 wordcampMobile strategy workshop   2013 wordcamp
Mobile strategy workshop 2013 wordcampRamesh Kumar
 

Similar to web-scraping-project (1).pdf (20)

Productionizing Real-time Serving With MLflow
Productionizing Real-time Serving With MLflowProductionizing Real-time Serving With MLflow
Productionizing Real-time Serving With MLflow
 
Secure Development on the Salesforce Platform - Part 2
Secure Development on the Salesforce Platform - Part 2Secure Development on the Salesforce Platform - Part 2
Secure Development on the Salesforce Platform - Part 2
 
Research on Key Technology of Web Reptile
Research on Key Technology of Web ReptileResearch on Key Technology of Web Reptile
Research on Key Technology of Web Reptile
 
06 web api
06 web api06 web api
06 web api
 
Developing Offline-Capable Apps with the Salesforce Mobile SDK and SmartStore
Developing Offline-Capable Apps with the Salesforce Mobile SDK and SmartStoreDeveloping Offline-Capable Apps with the Salesforce Mobile SDK and SmartStore
Developing Offline-Capable Apps with the Salesforce Mobile SDK and SmartStore
 
Getting More Traffic From Search Advanced Seo For Developers Presentation
Getting More Traffic From Search  Advanced Seo For Developers PresentationGetting More Traffic From Search  Advanced Seo For Developers Presentation
Getting More Traffic From Search Advanced Seo For Developers Presentation
 
Boost and SEO
Boost and SEOBoost and SEO
Boost and SEO
 
A sample system_design_costestimation_of_webstack_at_aws
A sample system_design_costestimation_of_webstack_at_awsA sample system_design_costestimation_of_webstack_at_aws
A sample system_design_costestimation_of_webstack_at_aws
 
IRJET - Review on Search Engine Optimization
IRJET - Review on Search Engine OptimizationIRJET - Review on Search Engine Optimization
IRJET - Review on Search Engine Optimization
 
Oracle APEX URLs Untangled & SEOptimized
Oracle APEX URLs Untangled & SEOptimizedOracle APEX URLs Untangled & SEOptimized
Oracle APEX URLs Untangled & SEOptimized
 
Getting Started With Apex REST Services
Getting Started With Apex REST ServicesGetting Started With Apex REST Services
Getting Started With Apex REST Services
 
Advanced SEO for Web Developers
Advanced SEO for Web DevelopersAdvanced SEO for Web Developers
Advanced SEO for Web Developers
 
RESTful SOA - 中科院暑期讲座
RESTful SOA - 中科院暑期讲座RESTful SOA - 中科院暑期讲座
RESTful SOA - 中科院暑期讲座
 
Cqrs api
Cqrs apiCqrs api
Cqrs api
 
Symfony2 and AngularJS
Symfony2 and AngularJSSymfony2 and AngularJS
Symfony2 and AngularJS
 
Big Data Web applications for Interactive Hadoop by ENRICO BERTI at Big Data...
 Big Data Web applications for Interactive Hadoop by ENRICO BERTI at Big Data... Big Data Web applications for Interactive Hadoop by ENRICO BERTI at Big Data...
Big Data Web applications for Interactive Hadoop by ENRICO BERTI at Big Data...
 
Search engine optimization (seo)
Search engine optimization (seo)Search engine optimization (seo)
Search engine optimization (seo)
 
Azure appservice
Azure appserviceAzure appservice
Azure appservice
 
Find,Mix And Show
Find,Mix And ShowFind,Mix And Show
Find,Mix And Show
 
Mobile strategy workshop 2013 wordcamp
Mobile strategy workshop   2013 wordcampMobile strategy workshop   2013 wordcamp
Mobile strategy workshop 2013 wordcamp
 

Recently uploaded

原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证pwgnohujw
 
Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital AgeCredit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital AgeBoston Institute of Analytics
 
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证acoha1
 
Predictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting TechniquesPredictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting TechniquesBoston Institute of Analytics
 
Audience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptxAudience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptxStephen266013
 
Formulas dax para power bI de microsoft.pdf
Formulas dax para power bI de microsoft.pdfFormulas dax para power bI de microsoft.pdf
Formulas dax para power bI de microsoft.pdfRobertoOcampo24
 
Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)Jon Hansen
 
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证dq9vz1isj
 
Aggregations - The Elasticsearch "GROUP BY"
Aggregations - The Elasticsearch "GROUP BY"Aggregations - The Elasticsearch "GROUP BY"
Aggregations - The Elasticsearch "GROUP BY"John Sobanski
 
Northern New England Tableau User Group (TUG) May 2024
Northern New England Tableau User Group (TUG) May 2024Northern New England Tableau User Group (TUG) May 2024
Northern New England Tableau User Group (TUG) May 2024patrickdtherriault
 
一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理cyebo
 
How to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data AnalyticsHow to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data AnalyticsBrainSell Technologies
 
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证ppy8zfkfm
 
Abortion Clinic in Randfontein +27791653574 Randfontein WhatsApp Abortion Cli...
Abortion Clinic in Randfontein +27791653574 Randfontein WhatsApp Abortion Cli...Abortion Clinic in Randfontein +27791653574 Randfontein WhatsApp Abortion Cli...
Abortion Clinic in Randfontein +27791653574 Randfontein WhatsApp Abortion Cli...mikehavy0
 
Seven tools of quality control.slideshare
Seven tools of quality control.slideshareSeven tools of quality control.slideshare
Seven tools of quality control.slideshareraiaryan448
 
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证ju0dztxtn
 
一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理pyhepag
 
MATERI MANAJEMEN OF PENYAKIT TETANUS.ppt
MATERI  MANAJEMEN OF PENYAKIT TETANUS.pptMATERI  MANAJEMEN OF PENYAKIT TETANUS.ppt
MATERI MANAJEMEN OF PENYAKIT TETANUS.pptRachmaGhifari
 
一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理pyhepag
 
一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理cyebo
 

Recently uploaded (20)

原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
 
Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital AgeCredit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
 
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
 
Predictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting TechniquesPredictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting Techniques
 
Audience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptxAudience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptx
 
Formulas dax para power bI de microsoft.pdf
Formulas dax para power bI de microsoft.pdfFormulas dax para power bI de microsoft.pdf
Formulas dax para power bI de microsoft.pdf
 
Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)
 
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
 
Aggregations - The Elasticsearch "GROUP BY"
Aggregations - The Elasticsearch "GROUP BY"Aggregations - The Elasticsearch "GROUP BY"
Aggregations - The Elasticsearch "GROUP BY"
 
Northern New England Tableau User Group (TUG) May 2024
Northern New England Tableau User Group (TUG) May 2024Northern New England Tableau User Group (TUG) May 2024
Northern New England Tableau User Group (TUG) May 2024
 
一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理
 
How to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data AnalyticsHow to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data Analytics
 
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
 
Abortion Clinic in Randfontein +27791653574 Randfontein WhatsApp Abortion Cli...
Abortion Clinic in Randfontein +27791653574 Randfontein WhatsApp Abortion Cli...Abortion Clinic in Randfontein +27791653574 Randfontein WhatsApp Abortion Cli...
Abortion Clinic in Randfontein +27791653574 Randfontein WhatsApp Abortion Cli...
 
Seven tools of quality control.slideshare
Seven tools of quality control.slideshareSeven tools of quality control.slideshare
Seven tools of quality control.slideshare
 
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
 
一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理
 
MATERI MANAJEMEN OF PENYAKIT TETANUS.ppt
MATERI  MANAJEMEN OF PENYAKIT TETANUS.pptMATERI  MANAJEMEN OF PENYAKIT TETANUS.ppt
MATERI MANAJEMEN OF PENYAKIT TETANUS.ppt
 
一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理
 
一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理
 

web-scraping-project (1).pdf

  • 1. ScrapingZomato'sTop100Restaurantsusing Selenium Launched in 2010, This technology platform connects customers, restaurant partners and delivery partners, serving their multiple needs. Customers use their platform to search and discover restaurants, read and write customer generated reviews and view and upload photos, order food delivery, book a table and make payments while dining-out at restaurants. On the other hand, They provide restaurant partners with industry-speci c marketing tools which enable them to engage and acquire customers to grow their business while also providing a reliable and e cient last mile delivery service.If you have not come across "ZOMATO" yet, welcome to planet Earth and do checkout zomato Let'swalkyouthrough'WEBSCRAPING'!!
  • 2. Web scraping is an automatic method to obtain large amounts of data from websites. Most of this data is unstructured data in an HTML format which is then converted into structured data in a spreadsheet or a database so that it can be used in various applications. There are many different ways to perform web scraping to obtain data from websites. These include using online services, particular API’s or even creating your code for web scraping from scratch. Many large websites, like Google, Twitter, Facebook, StackOver ow, etc. have API’s that allow you to access their data in a structured format. This is the best option, but there are other sites that don’t allow users to access large amounts of data in a structured form or they are simply not that technologically advanced. In that situation, it’s best to use Web Scraping to scrape the website for data, To learn more checkout webscraping Objective: Scraping the best 100 listings on zomato by parsing the information from this website in the form of Tabular data. Listofdetailswearelookingonwebsite: 1-Top 100 Listings Of Restaurants For Each Location. 2-The 'Name' Of The Restaurants For Each Location. 3-The 'Ratings' Of Dining At The Restaurants For Each Location. 4-The 'Link' Of Restaurants For Each Location.
  • 3. Outlineoftheproject: 1- Understanding The Structure Of Zomato's Website 2- Installing And Importing Required Libraries 3- Simulating The Page And Extracting The Name, Ratings, URLs Of Different restaurants From Website using selenium. 4- Accessing each Restaurants And Building A Method To Locate Exact Location Of Restaurant Name,Ratings And Urls For Top 100 Places. 5- Parsing The Top 100 Restaurants For Each Location consisting Details Of Name Of The Place, Dining Ratings Of The Place, Link Of The Place, Using Helper Functions. 6- Storing The Extracted Data Into A Dictionary. 7- Compiling All The Data Into A DataFrame Using Pandas And Saving The Data Into CSV File. Use the "Run" button to execute the code. By The End Of The Project we will Create DataFrame In The Following Format:
  • 4. ProjectCodeOnReplit The code which has been used for this project is publicly available at the replit platform.Feel free to explore the code and make changes for the betterment of the code to make it more e cient.Let's get on the road to identify how the details are fetched and scraped for this project. Replit Platform TheListOfPackagesUsed FIRST-- SELENIUM -- what is selenium SECOND -- PANDAS -- what is pandas THIRD -- TIME -- why do we use TIME FOURTH -- OS -- why do we use OS Let'sDiscussTheStepsInTheProject 1STSTEP Atthebeginningoftheproject,weimporttherequiredpackagesneeded,asshown below:-
  • 5. import os import pandas as pd import time from selenium import webdriver from selenium.webdriver.chrome.options import Options from selenium.webdriver.common.by import By from selenium.common.exceptions import NoSuchElementException 3RDSTEP Creatingahelperfunction,togetthelistofdetailsfromthewebsitecontaining 'Restaurants'Name.Wecallit'res_name(driver)':- def res_name(driver): 2NDSTEP Let'screateafunctiontocreatethewebdriverthatwewillusetoextractwebpage information.Thedriverfunctionisasfollows:- def get_driver(): chrome_options=Options() chrome_options.add_argument('--no-sandbox') chrome_options.add_argument('--headless') chrome_options.add_argument('--disable-dev-shm-usgae') #to access the zomato's website we need to setup a 'user-agent' access, we cant access the website without creating a standard 'user-agent'.learn more about user-agent setup user_agent = 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) chrome_options.add_argument('user-agent= {0}'.format(user_agent)) driver = webdriver.Chrome(options=chrome_options) return driver #calling the driver to carry out further steps driver=get_driver()
  • 6. place_divs_tag = 'sc-bke1zw-0' places = driver.find_element(By.CLASS_NAME,place_divs_tag) tags=places.find_elements(By.CLASS_NAME,'sc-bke1zw-1') res_names = [] for i in tags: res_names.append(i.find_element(By.XPATH,".//div/section/div[1]/a").text) return res_names[:100] #here we fetched for the common class_name having the details of all the required restaurants,then we fetch the common way to call all the 'NAME' of the places by using XPATH. #to learn about XPATH click here #we need to understand the html code structure before we scrape any website.to learn about html click here ALittleBriefOnHTMLAndXPATH Before we go deeper into the explanation of the code, it is imperative that readers have a basic understanding of HTML, the language of the web, and Xpaths, which are used to navigate through elements and attributes in an HTML/XML document. HTML (HyperText Markup Language) is the code that is used to structure a web page and its content. For example, content could be structured within a set of paragraphs, a list of bulleted points, or using images and data tables. We will be using Xpaths to point to tags, attributes, and elements of an HTML webpage to extract required information such as, in our case, Restaurants name, Restaurants Rating's, Restaurants URL's etc. To avoid putting too much information into one notebook, and to save time for readers who are already familiar with HTML and Xpaths. 4THSTEP Creatingahelperfunction,togetthelistofdetailsforURL'Sfromthewebsite.Wecall it'res_url(driver)':- def res_url(driver): place_divs_tag = 'sc-bke1zw-0'
  • 7. places = driver.find_element(By.CLASS_NAME,place_divs_tag) tags=places.find_elements(By.CLASS_NAME,'sc-bke1zw-1') urls = [] for i in tags: urls.append(i.find_element(By.TAG_NAME,"a").get_attribute('href')) return urls[:100] 5THSTEP Creatingahelperfunction,togetthelistofdetailsforRating'sfromthewebsite.We callit'res_ratings(driver)':- def res_ratings(driver): place_divs_tag = 'sc-bke1zw-0' places = driver.find_element(By.CLASS_NAME,place_divs_tag) tags=places.find_elements(By.CLASS_NAME,'sc-bke1zw-1') ratings = [] for i in tags: try: ratings.append(i.find_element(By.CLASS_NAME,'sc-1q7bklc-5').text) except: ratings.append('.') return ratings[:100] To avoid running into exception while running this code, we make use of the method of 'TRY AND EXCEPT". here you can learn more about try and except 6THSTEP We create a parser function named "get_all_cities()" to extract the required details from the website containing the required NAME,RATINGS,LINK in the form of dictionary. We create such a function which can be e cient irrespective of the Location for eg: --mumbai,pune,bangalore,delhi,chandigarh etc..by creating this function we get the required details which was the objective of this project and it can be done for any location, in this case we are scraping for ' Mumbai , Bangalore, Pune '. def get_all_cities(): cities = ['mumbai','bangalore','pune']
  • 8. dic={'NAME':[],'RATINGS':[],'LINK':[]} for i in cities: base_url = 'https://www.zomato.com/'+ i + '/great-food-no-bull' driver.get(base_url) dic['NAME'].extend(res_name(driver)) dic['RATINGS'].extend(res_ratings(driver)) dic['LINK'].extend(res_url(driver)) return dic 7THSTEP WecreateapandasDataFrameoftheparseddataandexportittoaCSVfilenamed best100.csvandachievetheexpectedresultasshownagainbelow, SUMMARY It is quite fascinating that the amount of ease Webscraping brings to the life of all the CODERS. Summing up, We essentially built a code in the Following steps: -we setup the required packages selenium,pandas,time and os. -we create a helper function to get the Names,Ratings,Url's for the top 100 listings.
  • 9. -we create a parser function to get the details of Name,Rating's,Url's for the top 100 listings for Three location's in dictionary form. -we create the proper DataFrame and save the work into a .CSV format. FUTUREWORK -The code can be accessed to get the different location and fetch the similar details from those location's by changing the cities. -more details like the restaurants contact number can be fetched using proper path for each place. -we can have a understanding of most consistent restaurant which is likely to be in the top 50 listings on an average and help provide proper understanding for some investors who would possibly like to invest in successful restaurants. -this can be done for any location which is the beauty of this project, which can be always worked on for the betterment with time. -The Project can be setup on a service like AWS Lambda for automatic timed scraping. REFERENCES Jovian web scraping with python webscraping Complete guide on Selenium HTML Tutorial XPATH pandas exception !pip install jovian --upgrade --quiet import jovian # Execute this to save new versions of the notebook jovian.commit(project="web-scraping-project") [jovian] Updating notebook "hai-advisoryservices/web-scraping-project" on https://jovian.ai [jovian] Committed successfully! https://jovian.ai/hai-advisoryservices/web-scraping- project 'https://jovian.ai/hai-advisoryservices/web-scraping-project'