SlideShare a Scribd company logo
By –Anil Khare
Submitted to – DATA FOLKZ
Project-Web Scrapping with EDA
1
Web Scrapping with EDA
Capstone Project -1
Data Analytics
Introduction
Project-Web Scrapping with EDA
2
• There are many ways to access a website , like through browser,
which are capable of viewing one page at a time .
web scrapers are excellent at gathering and processing large
amount of data
 Rather than viewing one page at a time ,we can view databases
spanning thousands of pages at once.
 Web scraping is the practice of gathering data through any
means other than a program interacting with an API.
 Web scraping encompasses a wide variety of programming
techniques and technologies, such as Exploratory data analysis.
Uses Of Web Scrapping
Project-Web Scrapping with EDA
3
• Web scraping is used for-
 Social media sentiment analysis
 E-commerce pricing
 Investment opportunities
 Machine Learning
 Email address gathering
 Research and Development and many more application
The project – Web scrapping with
EDA
Project-Web Scrapping with EDA
4
As a project for Data analysis the topic given was ‘Web
Scraping with EDA’
 I chose the to Scrape the Flipkart website for various TV’s
available
 The problem statement was ‘Getting the best proposal of TV
with
best available features in lowest price.’
 The Steps followed for the project
1. Scrape the Flipkart website for various TV’s available on the site
2. Request was used to open the url.
3. Beautiful soup library was used to parse the HTML data and get the raw
data.
4. The data was then cleaned using various techniques in python and
The Data
Project-Web Scrapping with EDA
5
 A structured data of 24 rows & 12 columns was saved in csv file
‘Capstone 111.csv’ . The various features as column of the data
are -:
Column _Name Data type
TV Name object
OS object
HD object
Speaker(in_W) int64
refresh_rate(in_Hz) int64
USB(in_Nos) int64
Price(in_Rs) int64
Rating float64
HDMI(in_Nos) int64
pixel float64
Display size (in inch) int64
pixel_rows int64
The Data
Project-Web Scrapping with EDA
6
TV Name OS HD Speake
r
(in_W)
refresh_ra
te(in_Hz)
USB
(in_Nos
)
Price
(in_Rs)
Rating HDMI
(in_No
s)
pixel Display
size (in
inch)
pixel_ro
ws
Mi 4X 13
Android
Based
Ultra HD
(4K) 20 60 2 40999 4.4 3 3840 55 2160
LG WebOS
Ultra HD
(4K) 20 50 2 34999 4.4 3 3840 43 2160
Mi 4X 12 Android
Ultra HD
(4K) 20 60 2 34999 4.4 3 3840 50 2160
SAMSUNG Tizen Full HD 20 60 1 31999 4.4 2 1920 43 1080
LG WebOS Full HD 20 50 1 29999 4.4 2 1920 43 1080
Mi 4X Android
Ultra HD
(4K) 20 60 2 27999 4.4 3 3840 43 2160
Mi 4X Android
Ultra HD
(4K) 20 60 2 27999 4.4 3 3840 43 2160
iFFALCON by TCL Android
Ultra HD
(4K) 24 60 1 27999 4.3 3 3840 50 2160
iFFALCON by TCL
107 Android Full HD 20 60 1 26999 4.2 2 1920 43 1080
OnePlus Y Series Android Full HD 20 60 2 25499 4.3 2 1920 43 1080
Mi 4A Pro Android Full HD 20 60 3 24999 4.4 3 1920 43 1080
Vu Premium Android Full HD 24 60 2 24999 4.3 2 1920 43 1080
iFFALCON by TCL Android
Ultra HD
(4K) 24 60 1 24999 4.3 2 3840 43 2160
realme Android Full HD 24 60 2 23999 4.3 3 1920 43 1080
Mi 4A Android Full HD 20 60 2 21999 4.4 3 1920 40 1080
iFFALCON by TCL
10 Android Full HD 20 60 1 19999 4.3 2 1920 40 1080
SAMSUNG Tizen HD Ready 20 60 1 17999 4.3 2 1366 32 768
SAMSUNG Tizen HD Ready 20 60 1 17490 4.4 2 1366 32 768
Box Plot for outliers
7 Project-Web Scrapping with EDA
As per the Box Plot there
are no outliers in price for
the data
Minimum & Maximum Price Analysis
Project-Web Scrapping with EDA
8
Max_Price Min_Price
Mi 4X 13 Mi 4A PRO
Android Based Android
Ultra HD (4K) HD Ready
20 20
60 60
2 2
40999 14499
4.4 4.4
3 3
3840 1366
55 32
2160 768
 The various observation from
the
data are –
 The min and max value of
price is Rs. 14999/- and max
value of Rs. 40999/-
 Comparing the min max
values we find that only the
features such as OS,HD,
pixel rows ,pixel & display
size are different for the max
and the min values .
 From the heatmap we can infer that only
 Pixel
 Display size
 And pixel rows have strong positive correlation with Price
Project-Web Scrapping with EDA
9
Heatmap for the Data
Project-Web Scrapping with EDA
10
Comparison HD vs Price vs OS
The Rating of Ultra HD (4K) is
4.4 as compared to Full HD and HD
Ready
Rating of android is 4.3 as compared
to Android based , WebOS and
Tizen which have 4.4 rating
Comparison HD and OS
Project-Web Scrapping with EDA
11
The ultra HD (4K) TVs
are available only in
android , Android based
and WebOS operating
system only
Tizen is available in Full
HD & HD ready only.
Comparison of HD vs Price
Project-Web Scrapping with EDA
12
Price range for
Ultra HD (4K) -
25000 -30000 -2 nos.
35000 – 1 no.
Above 40000 -1 No.
Full HD -
20000 – 25000 -4 nos.
25000 -30000- 3 nos.
30001 – 35000 – 1 No.
HD Ready -
Below 15000 – 1 no.
15000-20000 – 5 Nos.
Project-Web Scrapping with EDA
13
Price Range : Rs.25000-30000
Android based –Nil
Tizen – Nil
WebOS- 1 No.
Android – 5 Nos.
Price Range –Rs. 35000
Android based –Nil
Tizen – Nil
WebOS- 1 No.
Android – 1 Nos
Price Range – Above
Rs.40000
Android based –Nil
Tizen – Nil
WebOS- 1 No.
Android – 1 Nos
Comparison Price vs OS
The Conclusion
Project-Web Scrapping with EDA
14
The Ultra HD (4K) can
be purchased in
Android operating
system for Rs.
25000/-
For Web OS the price
shall be Rs. 35000/-
Project-Web Scrapping with EDA
15

More Related Content

Similar to Web Scrapping Project_AnilKhare.pptx

PDC 2010 update
PDC 2010 updatePDC 2010 update
PDC 2010 update
Spiffy
 
Traffic Insight Using Netflow and Deepfield Systems
Traffic Insight Using Netflow and Deepfield SystemsTraffic Insight Using Netflow and Deepfield Systems
Traffic Insight Using Netflow and Deepfield Systems
MyNOG
 
Pixel Perfect
Pixel PerfectPixel Perfect
Pixel Perfect
Steven Meyer
 
Lecture 4 display_principles
Lecture 4 display_principlesLecture 4 display_principles
Lecture 4 display_principles
moduledesign
 
Vmware 2015 with vsphereHigh performance application platforms
Vmware 2015 with vsphereHigh performance application platformsVmware 2015 with vsphereHigh performance application platforms
Vmware 2015 with vsphereHigh performance application platforms
solarisyougood
 
SDI to IP 2110 Transition Part 1
SDI to IP 2110 Transition Part 1SDI to IP 2110 Transition Part 1
SDI to IP 2110 Transition Part 1
Dr. Mohieddin Moradi
 
DYI - Starting your own webrtc project
DYI - Starting your own webrtc projectDYI - Starting your own webrtc project
DYI - Starting your own webrtc project
Alexandre Gouaillard
 
Getting Cloudy with Remote Graphics and GPU Compute Using G2 instances (CPN21...
Getting Cloudy with Remote Graphics and GPU Compute Using G2 instances (CPN21...Getting Cloudy with Remote Graphics and GPU Compute Using G2 instances (CPN21...
Getting Cloudy with Remote Graphics and GPU Compute Using G2 instances (CPN21...
Amazon Web Services
 
Windows Phone Development
Windows Phone DevelopmentWindows Phone Development
Windows Phone DevelopmentPuja Pramudya
 
Solution (connected cities)
Solution (connected cities)Solution (connected cities)
Solution (connected cities)
DannySim
 
Lecture 4 display_principles
Lecture 4 display_principlesLecture 4 display_principles
Lecture 4 display_principles
moduledesign
 
[Td 2015]general session 세상을 품은 플랫폼과 그 가능성에 대하여(기술에반젤리스트)
[Td 2015]general session 세상을 품은 플랫폼과 그 가능성에 대하여(기술에반젤리스트)[Td 2015]general session 세상을 품은 플랫폼과 그 가능성에 대하여(기술에반젤리스트)
[Td 2015]general session 세상을 품은 플랫폼과 그 가능성에 대하여(기술에반젤리스트)
Sang Don Kim
 
Customer Story: Elastic Stack을 이용한 게임 서비스 통합 로깅 플랫폼
Customer Story: Elastic Stack을 이용한 게임 서비스 통합 로깅 플랫폼Customer Story: Elastic Stack을 이용한 게임 서비스 통합 로깅 플랫폼
Customer Story: Elastic Stack을 이용한 게임 서비스 통합 로깅 플랫폼
Elasticsearch
 
Testing IoT Apps with the Cloud
Testing IoT Apps with the CloudTesting IoT Apps with the Cloud
Testing IoT Apps with the Cloud
Josiah Renaudin
 
Technology Roadmap by ericnel
Technology Roadmap by ericnelTechnology Roadmap by ericnel
Technology Roadmap by ericnel
Eric Nelson
 
End-to-End Big Data AI with Analytics Zoo
End-to-End Big Data AI with Analytics ZooEnd-to-End Big Data AI with Analytics Zoo
End-to-End Big Data AI with Analytics Zoo
Jason Dai
 
Build once deploy everywhere using the telerik platform
Build once deploy everywhere using the telerik platformBuild once deploy everywhere using the telerik platform
Build once deploy everywhere using the telerik platform
Aspenware
 
Using the Presentation API and external screens on Android
Using the Presentation API and external screens on AndroidUsing the Presentation API and external screens on Android
Using the Presentation API and external screens on AndroidXavier Hallade
 
Preventative Maintenance of Robots in Automotive Industry
Preventative Maintenance of Robots in Automotive IndustryPreventative Maintenance of Robots in Automotive Industry
Preventative Maintenance of Robots in Automotive Industry
DataWorks Summit/Hadoop Summit
 

Similar to Web Scrapping Project_AnilKhare.pptx (20)

PDC 2010 update
PDC 2010 updatePDC 2010 update
PDC 2010 update
 
Traffic Insight Using Netflow and Deepfield Systems
Traffic Insight Using Netflow and Deepfield SystemsTraffic Insight Using Netflow and Deepfield Systems
Traffic Insight Using Netflow and Deepfield Systems
 
201001162_report
201001162_report201001162_report
201001162_report
 
Pixel Perfect
Pixel PerfectPixel Perfect
Pixel Perfect
 
Lecture 4 display_principles
Lecture 4 display_principlesLecture 4 display_principles
Lecture 4 display_principles
 
Vmware 2015 with vsphereHigh performance application platforms
Vmware 2015 with vsphereHigh performance application platformsVmware 2015 with vsphereHigh performance application platforms
Vmware 2015 with vsphereHigh performance application platforms
 
SDI to IP 2110 Transition Part 1
SDI to IP 2110 Transition Part 1SDI to IP 2110 Transition Part 1
SDI to IP 2110 Transition Part 1
 
DYI - Starting your own webrtc project
DYI - Starting your own webrtc projectDYI - Starting your own webrtc project
DYI - Starting your own webrtc project
 
Getting Cloudy with Remote Graphics and GPU Compute Using G2 instances (CPN21...
Getting Cloudy with Remote Graphics and GPU Compute Using G2 instances (CPN21...Getting Cloudy with Remote Graphics and GPU Compute Using G2 instances (CPN21...
Getting Cloudy with Remote Graphics and GPU Compute Using G2 instances (CPN21...
 
Windows Phone Development
Windows Phone DevelopmentWindows Phone Development
Windows Phone Development
 
Solution (connected cities)
Solution (connected cities)Solution (connected cities)
Solution (connected cities)
 
Lecture 4 display_principles
Lecture 4 display_principlesLecture 4 display_principles
Lecture 4 display_principles
 
[Td 2015]general session 세상을 품은 플랫폼과 그 가능성에 대하여(기술에반젤리스트)
[Td 2015]general session 세상을 품은 플랫폼과 그 가능성에 대하여(기술에반젤리스트)[Td 2015]general session 세상을 품은 플랫폼과 그 가능성에 대하여(기술에반젤리스트)
[Td 2015]general session 세상을 품은 플랫폼과 그 가능성에 대하여(기술에반젤리스트)
 
Customer Story: Elastic Stack을 이용한 게임 서비스 통합 로깅 플랫폼
Customer Story: Elastic Stack을 이용한 게임 서비스 통합 로깅 플랫폼Customer Story: Elastic Stack을 이용한 게임 서비스 통합 로깅 플랫폼
Customer Story: Elastic Stack을 이용한 게임 서비스 통합 로깅 플랫폼
 
Testing IoT Apps with the Cloud
Testing IoT Apps with the CloudTesting IoT Apps with the Cloud
Testing IoT Apps with the Cloud
 
Technology Roadmap by ericnel
Technology Roadmap by ericnelTechnology Roadmap by ericnel
Technology Roadmap by ericnel
 
End-to-End Big Data AI with Analytics Zoo
End-to-End Big Data AI with Analytics ZooEnd-to-End Big Data AI with Analytics Zoo
End-to-End Big Data AI with Analytics Zoo
 
Build once deploy everywhere using the telerik platform
Build once deploy everywhere using the telerik platformBuild once deploy everywhere using the telerik platform
Build once deploy everywhere using the telerik platform
 
Using the Presentation API and external screens on Android
Using the Presentation API and external screens on AndroidUsing the Presentation API and external screens on Android
Using the Presentation API and external screens on Android
 
Preventative Maintenance of Robots in Automotive Industry
Preventative Maintenance of Robots in Automotive IndustryPreventative Maintenance of Robots in Automotive Industry
Preventative Maintenance of Robots in Automotive Industry
 

Recently uploaded

哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
Opendatabay
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
enxupq
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
pchutichetpong
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
Oppotus
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
benishzehra469
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
ocavb
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
vcaxypu
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 

Recently uploaded (20)

哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 

Web Scrapping Project_AnilKhare.pptx

  • 1. By –Anil Khare Submitted to – DATA FOLKZ Project-Web Scrapping with EDA 1 Web Scrapping with EDA Capstone Project -1 Data Analytics
  • 2. Introduction Project-Web Scrapping with EDA 2 • There are many ways to access a website , like through browser, which are capable of viewing one page at a time . web scrapers are excellent at gathering and processing large amount of data  Rather than viewing one page at a time ,we can view databases spanning thousands of pages at once.  Web scraping is the practice of gathering data through any means other than a program interacting with an API.  Web scraping encompasses a wide variety of programming techniques and technologies, such as Exploratory data analysis.
  • 3. Uses Of Web Scrapping Project-Web Scrapping with EDA 3 • Web scraping is used for-  Social media sentiment analysis  E-commerce pricing  Investment opportunities  Machine Learning  Email address gathering  Research and Development and many more application
  • 4. The project – Web scrapping with EDA Project-Web Scrapping with EDA 4 As a project for Data analysis the topic given was ‘Web Scraping with EDA’  I chose the to Scrape the Flipkart website for various TV’s available  The problem statement was ‘Getting the best proposal of TV with best available features in lowest price.’  The Steps followed for the project 1. Scrape the Flipkart website for various TV’s available on the site 2. Request was used to open the url. 3. Beautiful soup library was used to parse the HTML data and get the raw data. 4. The data was then cleaned using various techniques in python and
  • 5. The Data Project-Web Scrapping with EDA 5  A structured data of 24 rows & 12 columns was saved in csv file ‘Capstone 111.csv’ . The various features as column of the data are -: Column _Name Data type TV Name object OS object HD object Speaker(in_W) int64 refresh_rate(in_Hz) int64 USB(in_Nos) int64 Price(in_Rs) int64 Rating float64 HDMI(in_Nos) int64 pixel float64 Display size (in inch) int64 pixel_rows int64
  • 6. The Data Project-Web Scrapping with EDA 6 TV Name OS HD Speake r (in_W) refresh_ra te(in_Hz) USB (in_Nos ) Price (in_Rs) Rating HDMI (in_No s) pixel Display size (in inch) pixel_ro ws Mi 4X 13 Android Based Ultra HD (4K) 20 60 2 40999 4.4 3 3840 55 2160 LG WebOS Ultra HD (4K) 20 50 2 34999 4.4 3 3840 43 2160 Mi 4X 12 Android Ultra HD (4K) 20 60 2 34999 4.4 3 3840 50 2160 SAMSUNG Tizen Full HD 20 60 1 31999 4.4 2 1920 43 1080 LG WebOS Full HD 20 50 1 29999 4.4 2 1920 43 1080 Mi 4X Android Ultra HD (4K) 20 60 2 27999 4.4 3 3840 43 2160 Mi 4X Android Ultra HD (4K) 20 60 2 27999 4.4 3 3840 43 2160 iFFALCON by TCL Android Ultra HD (4K) 24 60 1 27999 4.3 3 3840 50 2160 iFFALCON by TCL 107 Android Full HD 20 60 1 26999 4.2 2 1920 43 1080 OnePlus Y Series Android Full HD 20 60 2 25499 4.3 2 1920 43 1080 Mi 4A Pro Android Full HD 20 60 3 24999 4.4 3 1920 43 1080 Vu Premium Android Full HD 24 60 2 24999 4.3 2 1920 43 1080 iFFALCON by TCL Android Ultra HD (4K) 24 60 1 24999 4.3 2 3840 43 2160 realme Android Full HD 24 60 2 23999 4.3 3 1920 43 1080 Mi 4A Android Full HD 20 60 2 21999 4.4 3 1920 40 1080 iFFALCON by TCL 10 Android Full HD 20 60 1 19999 4.3 2 1920 40 1080 SAMSUNG Tizen HD Ready 20 60 1 17999 4.3 2 1366 32 768 SAMSUNG Tizen HD Ready 20 60 1 17490 4.4 2 1366 32 768
  • 7. Box Plot for outliers 7 Project-Web Scrapping with EDA As per the Box Plot there are no outliers in price for the data
  • 8. Minimum & Maximum Price Analysis Project-Web Scrapping with EDA 8 Max_Price Min_Price Mi 4X 13 Mi 4A PRO Android Based Android Ultra HD (4K) HD Ready 20 20 60 60 2 2 40999 14499 4.4 4.4 3 3 3840 1366 55 32 2160 768  The various observation from the data are –  The min and max value of price is Rs. 14999/- and max value of Rs. 40999/-  Comparing the min max values we find that only the features such as OS,HD, pixel rows ,pixel & display size are different for the max and the min values .
  • 9.  From the heatmap we can infer that only  Pixel  Display size  And pixel rows have strong positive correlation with Price Project-Web Scrapping with EDA 9 Heatmap for the Data
  • 10. Project-Web Scrapping with EDA 10 Comparison HD vs Price vs OS The Rating of Ultra HD (4K) is 4.4 as compared to Full HD and HD Ready Rating of android is 4.3 as compared to Android based , WebOS and Tizen which have 4.4 rating
  • 11. Comparison HD and OS Project-Web Scrapping with EDA 11 The ultra HD (4K) TVs are available only in android , Android based and WebOS operating system only Tizen is available in Full HD & HD ready only.
  • 12. Comparison of HD vs Price Project-Web Scrapping with EDA 12 Price range for Ultra HD (4K) - 25000 -30000 -2 nos. 35000 – 1 no. Above 40000 -1 No. Full HD - 20000 – 25000 -4 nos. 25000 -30000- 3 nos. 30001 – 35000 – 1 No. HD Ready - Below 15000 – 1 no. 15000-20000 – 5 Nos.
  • 13. Project-Web Scrapping with EDA 13 Price Range : Rs.25000-30000 Android based –Nil Tizen – Nil WebOS- 1 No. Android – 5 Nos. Price Range –Rs. 35000 Android based –Nil Tizen – Nil WebOS- 1 No. Android – 1 Nos Price Range – Above Rs.40000 Android based –Nil Tizen – Nil WebOS- 1 No. Android – 1 Nos Comparison Price vs OS
  • 14. The Conclusion Project-Web Scrapping with EDA 14 The Ultra HD (4K) can be purchased in Android operating system for Rs. 25000/- For Web OS the price shall be Rs. 35000/-