SlideShare a Scribd company logo
1 of 20
PANDAS APPLICATION
AUTHOR
NAME :- Soham Chakraborty
COLLEGE :- Teschno India University
COURSE :-B.Sc Data Science
SEMESTER :- 4th sem,
2nd year
YEAR OF PASING :- 2021
e-mail ID :- sohamchakraborty777@gmail.com
CONTENT
• Introduction
• Python
• Libraries
• Integrated development
environment
• Problem statement
• Solution
• Code
• Output
• Source
• Conclusion
INTRODUCTION
Machine learning is a subset of artificial intelligence in the field of computer science
that often uses statistical techniques to give computers the ability to "learn" with
data, without being explicitly programmed. Machine learning helps us to analyse a lot
of data is less time with great accuracy. Madrid has a diverse amount of pollution rate
with a tendency to vary drastically within days. Machine learning helps us to
calculate the gases rate by analysing other gases.
PYTHON
• Python is a high-level, general-purpose, open source, strictly typed programming
language. The language provides constructs intended to enable clear programs on
both a small and large scale.
• Python was Created by Guido van Rossum.
• The Python Software Foundation (PSF) is the organization behind Python.
• Current Versions:
• 3.6.3
• 2.7.14
• Python features
• Some of the features of python include
• Dynamic
• Object oriented
• Multipurpose
• Strongly typed
• Open Sourced
• Python is widely used in many domains
• Web Development
• Data Analysis
• Machine Learning
• Internet Of Things
• GUI Development
• Image processing
• Data visualization
• Game Development
LIBRARIES
• Pandas
• In computer programming, pandas is a software library written for the Python
programming language for data manipulation and analysis. In particular, it offers data
structures and operations for manipulating numerical tables and time series.
INTEGRATED DEVELOPMENT
ENVIRONMENT
• An integrated development environment is a software application that provides
comprehensive facilities to computer programmers for software development. An
IDE normally consists of a source code editor, build automation tools, and a
debugger.
• SPYDER is the Scientific Python Development Environment:
• A powerful interactive development environment for the Python language with advanced
editing, interactive testing, debugging and introspection features.
• And a numerical computing environment thanks to the support of IPython (enhanced
interactive Python interpreter) and popular Python libraries such as NumPy (linear
algebra), SciPy (signal and image processing) or matplotlib (interactive 2D/3D plotting).
PROBLEM STATEMENT
• Data in real world are rarely clean and homogeneous. Data can either be missing during
data extraction or collection. Missing values need to be handled because they reduce the
quality for any of our performance metric. It can also lead to wrong prediction or
classification and can also cause a high bias for any given model being used.
• Depending on data sources, missing data are identified differently. Pandas always
identify missing values as NaN. However, unless the data has been pre-processed to a
degree that an analyst will encounter missing values as NaN. Missing values can appear
as a question mark (?) or a zero (0) or minus one (-1) or a blank. As a result, it is always
important that a data scientist always perform exploratory data analysis(EDA) first
before writing any machine learning algorithm. EDA is simply a litmus for
understanding and knowing the behaviour of our data.
SOLUTION
• There are several options for handling missing values each with its own PROS and CONS. However,
the choice of what should be done is largely dependent on the nature of our data and the missing
values. Below is a summary highlight of several options we have for handling missing values.
1) DROP MISSING VALUES
2) FILL MISSING VALUES WITH TEST STATISTIC
3) PREDICT MISSING VALUE WITH A MACHINE LEARNING ALGORITHM
• Below is a few list of commands to detect missing values with EDA
1. data_name.describe()
2. data_name.info()
3. data_name.head(x)
4. data_name.isnull().sum()
CODE
#we import pandas for data_frame operations
import pandas as pd
df=pd.read_csv("D:Data_SetsGoogle-Playstore-32K(1).csv")
print("The First 3 Rows of the table are shown as below")
print(df.head(3))
print("Dimension of the acquired Data Frame:",df.shape)
print("Description of the only numeric Column is given belown",df.describe())
print("Mean for Reviews Column(with missing values):",df.Reviews.mean())
print("Median for Reviews Column(with missing values):",df.Reviews.median())
print("Total No. of Reviews:",df.Reviews.count())
#check the number of missing values
print("The Number of missing values in each columnn",df.isnull().sum())
#user defined function to fill the missing numeric vaues
def abc(series):
return series.fillna(series.median())
#command to fill the empty cells
df.Reviews = df["Reviews"].transform(abc)
#again checking number of missing values
#and there are no empty cells in the Reviews Column
print("The Number of missing values in each column after transformingn",df.isnull().sum())
print("Mean for Reviews Column(without missing values):",df.Reviews.mean())
print("Median for Reviews Column(without missing values):",df.Reviews.median())
OUTPUT
The First 3 Rows of the table are shown as below
App Name Category Rating Reviews Installs Size Price Content Rating Last Updated Minimum Version Latest
Version
DoorDash – FOOD_AND_DRINK 4.548561573 305034.0 5,000,000+ Varies 0 Everyone 29-Mar-19 Varies with device Varies with
Device
Food Delivery with device
TripAdvisor Hotels... TRAVEL 4.400671482 1207922.0 100,000,000+ Varies 0 Everyone 29-Mar-19 Varies with device Varies with
device
_AND_LOCAL with device
Peapod SHOPPING 3.656329393 1967.0 100,000+ 1.4M 0 Everyone 20-Sep-18 5.0 and up 2.2.0
Dimension of the acquired Data Frame: (32000, 11)
OUTPUT
• Description of the only numeric Column is given below
• Reviews
• count 3.199300e+04
• mean 9.850928e+04
• std 1.173820e+06
• min 1.000000e+00
• 25% 1.390000e+02
• 50% 1.464000e+03
• 75% 1.445100e+04
• max 8.621429e+07
• Mean for Reviews Column(with missing values): 98509.28305747504
• Median for Reviews Column(with missing values): 1464.0
• Total No. of Reviews: 31993
OUTPUT
• The Number of missing values in each column
• App Name 0
• Category 0
• Rating 0
• Reviews 7
• Installs 0
• Size 0
• Price 0
• Content Rating 0
• Last Updated 0
• Minimum Version 0
• Latest Version 0
• dtype: int64
OUTPUT
• The Number of missing values in each column after transforming
• App Name 0
• Category 0
• Rating 0
• Reviews 0
• Installs 0
• Size 0
• Price 0
• Content Rating 0
• Last Updated 0
• Minimum Version 0
• Latest Version 1
• dtype: int64
OUTPUT
• Mean for Reviews Column(without missing values): 98488.05440180622
• Median for Reviews Column(without missing values): 1464.0
SOURCE OF DATASET
• All the data present in this dataset comes from Kaggle.com, which are the ones to be
acknowledged for the data collection. It aims to provide a more convenient format for
data scientist, as well as some enhanced context in a single place.
CONCLUSION
• As you can see in the few lines of code above using pandas to fill the empty cells is
quite simple. This is truly where the library shines in its ability to easily manipulate
a data to get required insights.
THANK YOU

More Related Content

Similar to Pandas application

Budapest Spring MUG 2016 - MongoDB User Group
Budapest Spring MUG 2016 - MongoDB User GroupBudapest Spring MUG 2016 - MongoDB User Group
Budapest Spring MUG 2016 - MongoDB User GroupMarc Schwering
 
Applications of Machine Learning and Metaheuristic Search to Security Testing
Applications of Machine Learning and Metaheuristic Search to Security TestingApplications of Machine Learning and Metaheuristic Search to Security Testing
Applications of Machine Learning and Metaheuristic Search to Security TestingLionel Briand
 
Prepare your data for machine learning
Prepare your data for machine learningPrepare your data for machine learning
Prepare your data for machine learningIvo Andreev
 
Machine Learning with ML.NET and Azure - Andy Cross
Machine Learning with ML.NET and Azure - Andy CrossMachine Learning with ML.NET and Azure - Andy Cross
Machine Learning with ML.NET and Azure - Andy CrossAndrew Flatters
 
Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...
Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...
Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...Spark Summit
 
Webinar : Nouveautés de MongoDB 3.2
Webinar : Nouveautés de MongoDB 3.2Webinar : Nouveautés de MongoDB 3.2
Webinar : Nouveautés de MongoDB 3.2MongoDB
 
Code Analysis-run time error prediction
Code Analysis-run time error predictionCode Analysis-run time error prediction
Code Analysis-run time error predictionNIKHIL NAWATHE
 
Software Analytics: Data Analytics for Software Engineering and Security
Software Analytics: Data Analytics for Software Engineering and SecuritySoftware Analytics: Data Analytics for Software Engineering and Security
Software Analytics: Data Analytics for Software Engineering and SecurityTao Xie
 
Webminar - Novedades de MongoDB 3.2
Webminar - Novedades de MongoDB 3.2Webminar - Novedades de MongoDB 3.2
Webminar - Novedades de MongoDB 3.2Sam_Francis
 
MODEL-DRIVEN ENGINEERING (MDE) in Practice
MODEL-DRIVEN ENGINEERING (MDE) in PracticeMODEL-DRIVEN ENGINEERING (MDE) in Practice
MODEL-DRIVEN ENGINEERING (MDE) in PracticeHussein Alshkhir
 
From Pipelines to Refineries: scaling big data applications with Tim Hunter
From Pipelines to Refineries: scaling big data applications with Tim HunterFrom Pipelines to Refineries: scaling big data applications with Tim Hunter
From Pipelines to Refineries: scaling big data applications with Tim HunterDatabricks
 
Basic of python for data analysis
Basic of python for data analysisBasic of python for data analysis
Basic of python for data analysisPramod Toraskar
 
Cyber Threat Ranking using READ
Cyber Threat Ranking using READCyber Threat Ranking using READ
Cyber Threat Ranking using READZachary S. Brown
 
Lessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at NetflixLessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at NetflixJustin Basilico
 
What’s eating python performance
What’s eating python performanceWhat’s eating python performance
What’s eating python performancePiotr Przymus
 
Productionalizing ML : Real Experience
Productionalizing ML : Real ExperienceProductionalizing ML : Real Experience
Productionalizing ML : Real ExperienceIhor Bobak
 

Similar to Pandas application (20)

Budapest Spring MUG 2016 - MongoDB User Group
Budapest Spring MUG 2016 - MongoDB User GroupBudapest Spring MUG 2016 - MongoDB User Group
Budapest Spring MUG 2016 - MongoDB User Group
 
Applications of Machine Learning and Metaheuristic Search to Security Testing
Applications of Machine Learning and Metaheuristic Search to Security TestingApplications of Machine Learning and Metaheuristic Search to Security Testing
Applications of Machine Learning and Metaheuristic Search to Security Testing
 
Prepare your data for machine learning
Prepare your data for machine learningPrepare your data for machine learning
Prepare your data for machine learning
 
Machine Learning with ML.NET and Azure - Andy Cross
Machine Learning with ML.NET and Azure - Andy CrossMachine Learning with ML.NET and Azure - Andy Cross
Machine Learning with ML.NET and Azure - Andy Cross
 
Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...
Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...
Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...
 
Webinar : Nouveautés de MongoDB 3.2
Webinar : Nouveautés de MongoDB 3.2Webinar : Nouveautés de MongoDB 3.2
Webinar : Nouveautés de MongoDB 3.2
 
Introduction to ML.NET
Introduction to ML.NETIntroduction to ML.NET
Introduction to ML.NET
 
Python ml
Python mlPython ml
Python ml
 
Code Analysis-run time error prediction
Code Analysis-run time error predictionCode Analysis-run time error prediction
Code Analysis-run time error prediction
 
I explore
I exploreI explore
I explore
 
Software Analytics: Data Analytics for Software Engineering and Security
Software Analytics: Data Analytics for Software Engineering and SecuritySoftware Analytics: Data Analytics for Software Engineering and Security
Software Analytics: Data Analytics for Software Engineering and Security
 
Webminar - Novedades de MongoDB 3.2
Webminar - Novedades de MongoDB 3.2Webminar - Novedades de MongoDB 3.2
Webminar - Novedades de MongoDB 3.2
 
MODEL-DRIVEN ENGINEERING (MDE) in Practice
MODEL-DRIVEN ENGINEERING (MDE) in PracticeMODEL-DRIVEN ENGINEERING (MDE) in Practice
MODEL-DRIVEN ENGINEERING (MDE) in Practice
 
From Pipelines to Refineries: scaling big data applications with Tim Hunter
From Pipelines to Refineries: scaling big data applications with Tim HunterFrom Pipelines to Refineries: scaling big data applications with Tim Hunter
From Pipelines to Refineries: scaling big data applications with Tim Hunter
 
Basic of python for data analysis
Basic of python for data analysisBasic of python for data analysis
Basic of python for data analysis
 
ELAVARASAN.pdf
ELAVARASAN.pdfELAVARASAN.pdf
ELAVARASAN.pdf
 
Cyber Threat Ranking using READ
Cyber Threat Ranking using READCyber Threat Ranking using READ
Cyber Threat Ranking using READ
 
Lessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at NetflixLessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at Netflix
 
What’s eating python performance
What’s eating python performanceWhat’s eating python performance
What’s eating python performance
 
Productionalizing ML : Real Experience
Productionalizing ML : Real ExperienceProductionalizing ML : Real Experience
Productionalizing ML : Real Experience
 

Recently uploaded

POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
Meghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentMeghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentInMediaRes1
 
MARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupMARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupJonathanParaisoCruz
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Celine George
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersSabitha Banu
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxRaymartEstabillo3
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxAvyJaneVismanos
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfMahmoud M. Sallam
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Jisc
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17Celine George
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfSumit Tiwari
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaVirag Sontakke
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 

Recently uploaded (20)

POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
Meghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentMeghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media Component
 
MARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupMARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized Group
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginners
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
9953330565 Low Rate Call Girls In Rohini Delhi NCR
9953330565 Low Rate Call Girls In Rohini  Delhi NCR9953330565 Low Rate Call Girls In Rohini  Delhi NCR
9953330565 Low Rate Call Girls In Rohini Delhi NCR
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptx
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdf
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...
 
OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of India
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 

Pandas application

  • 2. AUTHOR NAME :- Soham Chakraborty COLLEGE :- Teschno India University COURSE :-B.Sc Data Science SEMESTER :- 4th sem, 2nd year YEAR OF PASING :- 2021 e-mail ID :- sohamchakraborty777@gmail.com
  • 3. CONTENT • Introduction • Python • Libraries • Integrated development environment • Problem statement • Solution • Code • Output • Source • Conclusion
  • 4. INTRODUCTION Machine learning is a subset of artificial intelligence in the field of computer science that often uses statistical techniques to give computers the ability to "learn" with data, without being explicitly programmed. Machine learning helps us to analyse a lot of data is less time with great accuracy. Madrid has a diverse amount of pollution rate with a tendency to vary drastically within days. Machine learning helps us to calculate the gases rate by analysing other gases.
  • 5. PYTHON • Python is a high-level, general-purpose, open source, strictly typed programming language. The language provides constructs intended to enable clear programs on both a small and large scale. • Python was Created by Guido van Rossum. • The Python Software Foundation (PSF) is the organization behind Python. • Current Versions: • 3.6.3 • 2.7.14
  • 6. • Python features • Some of the features of python include • Dynamic • Object oriented • Multipurpose • Strongly typed • Open Sourced • Python is widely used in many domains • Web Development • Data Analysis • Machine Learning • Internet Of Things • GUI Development • Image processing • Data visualization • Game Development
  • 7. LIBRARIES • Pandas • In computer programming, pandas is a software library written for the Python programming language for data manipulation and analysis. In particular, it offers data structures and operations for manipulating numerical tables and time series.
  • 8. INTEGRATED DEVELOPMENT ENVIRONMENT • An integrated development environment is a software application that provides comprehensive facilities to computer programmers for software development. An IDE normally consists of a source code editor, build automation tools, and a debugger. • SPYDER is the Scientific Python Development Environment: • A powerful interactive development environment for the Python language with advanced editing, interactive testing, debugging and introspection features. • And a numerical computing environment thanks to the support of IPython (enhanced interactive Python interpreter) and popular Python libraries such as NumPy (linear algebra), SciPy (signal and image processing) or matplotlib (interactive 2D/3D plotting).
  • 9. PROBLEM STATEMENT • Data in real world are rarely clean and homogeneous. Data can either be missing during data extraction or collection. Missing values need to be handled because they reduce the quality for any of our performance metric. It can also lead to wrong prediction or classification and can also cause a high bias for any given model being used. • Depending on data sources, missing data are identified differently. Pandas always identify missing values as NaN. However, unless the data has been pre-processed to a degree that an analyst will encounter missing values as NaN. Missing values can appear as a question mark (?) or a zero (0) or minus one (-1) or a blank. As a result, it is always important that a data scientist always perform exploratory data analysis(EDA) first before writing any machine learning algorithm. EDA is simply a litmus for understanding and knowing the behaviour of our data.
  • 10. SOLUTION • There are several options for handling missing values each with its own PROS and CONS. However, the choice of what should be done is largely dependent on the nature of our data and the missing values. Below is a summary highlight of several options we have for handling missing values. 1) DROP MISSING VALUES 2) FILL MISSING VALUES WITH TEST STATISTIC 3) PREDICT MISSING VALUE WITH A MACHINE LEARNING ALGORITHM • Below is a few list of commands to detect missing values with EDA 1. data_name.describe() 2. data_name.info() 3. data_name.head(x) 4. data_name.isnull().sum()
  • 11. CODE #we import pandas for data_frame operations import pandas as pd df=pd.read_csv("D:Data_SetsGoogle-Playstore-32K(1).csv") print("The First 3 Rows of the table are shown as below") print(df.head(3)) print("Dimension of the acquired Data Frame:",df.shape) print("Description of the only numeric Column is given belown",df.describe()) print("Mean for Reviews Column(with missing values):",df.Reviews.mean()) print("Median for Reviews Column(with missing values):",df.Reviews.median()) print("Total No. of Reviews:",df.Reviews.count()) #check the number of missing values print("The Number of missing values in each columnn",df.isnull().sum())
  • 12. #user defined function to fill the missing numeric vaues def abc(series): return series.fillna(series.median()) #command to fill the empty cells df.Reviews = df["Reviews"].transform(abc) #again checking number of missing values #and there are no empty cells in the Reviews Column print("The Number of missing values in each column after transformingn",df.isnull().sum()) print("Mean for Reviews Column(without missing values):",df.Reviews.mean()) print("Median for Reviews Column(without missing values):",df.Reviews.median())
  • 13. OUTPUT The First 3 Rows of the table are shown as below App Name Category Rating Reviews Installs Size Price Content Rating Last Updated Minimum Version Latest Version DoorDash – FOOD_AND_DRINK 4.548561573 305034.0 5,000,000+ Varies 0 Everyone 29-Mar-19 Varies with device Varies with Device Food Delivery with device TripAdvisor Hotels... TRAVEL 4.400671482 1207922.0 100,000,000+ Varies 0 Everyone 29-Mar-19 Varies with device Varies with device _AND_LOCAL with device Peapod SHOPPING 3.656329393 1967.0 100,000+ 1.4M 0 Everyone 20-Sep-18 5.0 and up 2.2.0 Dimension of the acquired Data Frame: (32000, 11)
  • 14. OUTPUT • Description of the only numeric Column is given below • Reviews • count 3.199300e+04 • mean 9.850928e+04 • std 1.173820e+06 • min 1.000000e+00 • 25% 1.390000e+02 • 50% 1.464000e+03 • 75% 1.445100e+04 • max 8.621429e+07 • Mean for Reviews Column(with missing values): 98509.28305747504 • Median for Reviews Column(with missing values): 1464.0 • Total No. of Reviews: 31993
  • 15. OUTPUT • The Number of missing values in each column • App Name 0 • Category 0 • Rating 0 • Reviews 7 • Installs 0 • Size 0 • Price 0 • Content Rating 0 • Last Updated 0 • Minimum Version 0 • Latest Version 0 • dtype: int64
  • 16. OUTPUT • The Number of missing values in each column after transforming • App Name 0 • Category 0 • Rating 0 • Reviews 0 • Installs 0 • Size 0 • Price 0 • Content Rating 0 • Last Updated 0 • Minimum Version 0 • Latest Version 1 • dtype: int64
  • 17. OUTPUT • Mean for Reviews Column(without missing values): 98488.05440180622 • Median for Reviews Column(without missing values): 1464.0
  • 18. SOURCE OF DATASET • All the data present in this dataset comes from Kaggle.com, which are the ones to be acknowledged for the data collection. It aims to provide a more convenient format for data scientist, as well as some enhanced context in a single place.
  • 19. CONCLUSION • As you can see in the few lines of code above using pandas to fill the empty cells is quite simple. This is truly where the library shines in its ability to easily manipulate a data to get required insights.