SlideShare a Scribd company logo
1 of 20
Presentation On
“Handling missing values in Machine Learning”
Supervised
Fazle Rabbi
Lecturer
Faculty of Science And Information Technology
Daffodil International University
Undertaken By
Md. Shamim Bhuiyan
ID: 0242310005343002
©Daffodil International University
Handling missing values
in machine learning
WHY VALUES ARE MISSING IN MACHINE LEARNING:
We take data from sometimes sources like kaggle.com, sometimes
we collect from different sources by doing web scrapping
containing missing values in it. But do you think ?
• We can collect data and start doing our work?
• Can we implement machine learning algorithms using that data?
The answer is no. Because the data collected is
unprocessed. Unprocessed data must be containing
some unauthorized data.
• Missing values
• Outliers
• Unstructured manner
• Categorical data(which needs to be converted to numerical
variables)
Let’s have a look at the missing data in the dataset below:
Let’s begin with the methods to solve the problem
1. Delete Rows with Missing Values
One way of handling missing values is the deletion of the rows
or columns having null values. If any columns have more than
half of the values as null then you can drop the entire column. In
the same way, rows can also be dropped if having one or more
columns values as null. Before using this method one thing we
have to keep in mind is that we should not be losing information.
When to delete the rows/column in a dataset?
• If a certain column has many missing values then you can
choose to drop the entire column.
• When you have a huge dataset. Deleting for e.g. 2-3
rows/columns will not make much difference.
• Output results do not depend on the Deleted data.
2. Replacing With Arbitrary Value
We can replace the missing value with some arbitrary value using
fillna().
Ex. In the below code, we are replacing the missing values with
‘0’.As well you can replace any particular column missing values
with some arbitrary value also.
There are two ways to replacing with arbitrary value :
1.Replacing with previous value-Forward fill
2.Replacing with next value-Backward fill
i. Replacing with previous value – Forward fill
We can impute the values with the previous value by using forward
fill. It is mostly used in time series data.
Syntax: df.fillna(method=’fill’)
ii.Replacing with next value – Backward fill
In backward fill, the missing value is imputed using the next value. It is
mostly used in time series data.
3. Interpolation
Missing values can also be imputed using ‘interpolation’.
Pandas interpolate method can be used to replace the
missing values with different interpolation methods like
‘polynomial’, ‘linear’, ‘quadratic’. The default method is
‘linear’.
Syntax: df.interpolate(method=’linear’)
Handling missing values: python code:
We have taken dataset titanic.csv which is
freely available at kaggle.com. This
dataset was taken as it has missing values.
i.Reading the data
import pandas as pd df =
pd.read_csv("train.csv",
usecols=['Age','Fare','Survived'])
df
The dataset is read and used three
columns ‘Age’, ’Fare’, ’Survived’.
ii.Checking if there are missing values
df.isnull().sum()
Output:
Survived 4
Age 179
Fare 2
dtype: int64
iii.Filling missing values with 0
new_df = df.fillna(0)
new_df
4. Filling NaN values with forward fill value
new_df = df.fillna(method=“ffill")
new_df
If we use forward fill that simply means we are forwarding the
previous value where ever we have NaN values.
5. Setting forward fill limit to 1
new_df = df.fillna(method="ffill",limit=1)
new_df
6. Interpolate of missing values
new_df = df.interpolate() df
In this, we were having two values 22 and 26. And in
between value was a NaN value. So that NaN value is
computed by getting the mean of 22 and 26 i.e. 24. In the
same way, other NaN values were also computed.
7. Dropna()
new_df = df.dropna() new_df
Previously we were having 891 rows and after running this
code we are left with 710 rows because some of the rows
were continuing NaN values were dropped.
8. Deleting the rows having all NaN values
new_df = df.dropna(how='all')
new_df
Those rows in which all the values are
NaN values will be deleted. If the row even
has one value even then it will not be
dropped.
We have another technique to handle missing values.
The End

More Related Content

What's hot

Bias and variance trade off
Bias and variance trade offBias and variance trade off
Bias and variance trade offVARUN KUMAR
 
Data Science - Part III - EDA & Model Selection
Data Science - Part III - EDA & Model SelectionData Science - Part III - EDA & Model Selection
Data Science - Part III - EDA & Model SelectionDerek Kane
 
Python Seaborn Data Visualization
Python Seaborn Data Visualization Python Seaborn Data Visualization
Python Seaborn Data Visualization Sourabh Sahu
 
Logistic Regression in Python | Logistic Regression Example | Machine Learnin...
Logistic Regression in Python | Logistic Regression Example | Machine Learnin...Logistic Regression in Python | Logistic Regression Example | Machine Learnin...
Logistic Regression in Python | Logistic Regression Example | Machine Learnin...Edureka!
 
Data cleaning and visualization
Data cleaning and visualizationData cleaning and visualization
Data cleaning and visualizationTapan Gautam
 
Data preprocessing using Machine Learning
Data  preprocessing using Machine Learning Data  preprocessing using Machine Learning
Data preprocessing using Machine Learning Gopal Sakarkar
 
Machine Learning with Decision trees
Machine Learning with Decision treesMachine Learning with Decision trees
Machine Learning with Decision treesKnoldus Inc.
 
Classification and Regression
Classification and RegressionClassification and Regression
Classification and RegressionMegha Sharma
 
Classification Algorithm.
Classification Algorithm.Classification Algorithm.
Classification Algorithm.Megha Sharma
 
Understanding Bagging and Boosting
Understanding Bagging and BoostingUnderstanding Bagging and Boosting
Understanding Bagging and BoostingMohit Rajput
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality ReductionSaad Elbeleidy
 
K-Nearest Neighbor Classifier
K-Nearest Neighbor ClassifierK-Nearest Neighbor Classifier
K-Nearest Neighbor ClassifierNeha Kulkarni
 
Hierarchical Clustering | Hierarchical Clustering in R |Hierarchical Clusteri...
Hierarchical Clustering | Hierarchical Clustering in R |Hierarchical Clusteri...Hierarchical Clustering | Hierarchical Clustering in R |Hierarchical Clusteri...
Hierarchical Clustering | Hierarchical Clustering in R |Hierarchical Clusteri...Simplilearn
 
Classification
ClassificationClassification
ClassificationCloudxLab
 
Decision trees in Machine Learning
Decision trees in Machine Learning Decision trees in Machine Learning
Decision trees in Machine Learning Mohammad Junaid Khan
 
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...Simplilearn
 
Data Visualization in Exploratory Data Analysis
Data Visualization in Exploratory Data AnalysisData Visualization in Exploratory Data Analysis
Data Visualization in Exploratory Data AnalysisEva Durall
 
Data mining :Concepts and Techniques Chapter 2, data
Data mining :Concepts and Techniques Chapter 2, dataData mining :Concepts and Techniques Chapter 2, data
Data mining :Concepts and Techniques Chapter 2, dataSalah Amean
 
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & KamberChapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & Kambererror007
 

What's hot (20)

Bias and variance trade off
Bias and variance trade offBias and variance trade off
Bias and variance trade off
 
Data Science - Part III - EDA & Model Selection
Data Science - Part III - EDA & Model SelectionData Science - Part III - EDA & Model Selection
Data Science - Part III - EDA & Model Selection
 
Python Seaborn Data Visualization
Python Seaborn Data Visualization Python Seaborn Data Visualization
Python Seaborn Data Visualization
 
Logistic Regression in Python | Logistic Regression Example | Machine Learnin...
Logistic Regression in Python | Logistic Regression Example | Machine Learnin...Logistic Regression in Python | Logistic Regression Example | Machine Learnin...
Logistic Regression in Python | Logistic Regression Example | Machine Learnin...
 
KNN
KNN KNN
KNN
 
Data cleaning and visualization
Data cleaning and visualizationData cleaning and visualization
Data cleaning and visualization
 
Data preprocessing using Machine Learning
Data  preprocessing using Machine Learning Data  preprocessing using Machine Learning
Data preprocessing using Machine Learning
 
Machine Learning with Decision trees
Machine Learning with Decision treesMachine Learning with Decision trees
Machine Learning with Decision trees
 
Classification and Regression
Classification and RegressionClassification and Regression
Classification and Regression
 
Classification Algorithm.
Classification Algorithm.Classification Algorithm.
Classification Algorithm.
 
Understanding Bagging and Boosting
Understanding Bagging and BoostingUnderstanding Bagging and Boosting
Understanding Bagging and Boosting
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality Reduction
 
K-Nearest Neighbor Classifier
K-Nearest Neighbor ClassifierK-Nearest Neighbor Classifier
K-Nearest Neighbor Classifier
 
Hierarchical Clustering | Hierarchical Clustering in R |Hierarchical Clusteri...
Hierarchical Clustering | Hierarchical Clustering in R |Hierarchical Clusteri...Hierarchical Clustering | Hierarchical Clustering in R |Hierarchical Clusteri...
Hierarchical Clustering | Hierarchical Clustering in R |Hierarchical Clusteri...
 
Classification
ClassificationClassification
Classification
 
Decision trees in Machine Learning
Decision trees in Machine Learning Decision trees in Machine Learning
Decision trees in Machine Learning
 
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
 
Data Visualization in Exploratory Data Analysis
Data Visualization in Exploratory Data AnalysisData Visualization in Exploratory Data Analysis
Data Visualization in Exploratory Data Analysis
 
Data mining :Concepts and Techniques Chapter 2, data
Data mining :Concepts and Techniques Chapter 2, dataData mining :Concepts and Techniques Chapter 2, data
Data mining :Concepts and Techniques Chapter 2, data
 
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & KamberChapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
 

Similar to Handling Missing Values for Machine Learning.pptx

Interpolation Missing values.pptx
Interpolation Missing values.pptxInterpolation Missing values.pptx
Interpolation Missing values.pptxRushikeshGore18
 
01-Introduction of DSA-1.pptx
01-Introduction of DSA-1.pptx01-Introduction of DSA-1.pptx
01-Introduction of DSA-1.pptxDwijBaxi
 
Consider this code using the ArrayBag of Section 5.2 and the Locat.docx
Consider this code using the ArrayBag of Section 5.2 and the Locat.docxConsider this code using the ArrayBag of Section 5.2 and the Locat.docx
Consider this code using the ArrayBag of Section 5.2 and the Locat.docxmaxinesmith73660
 
QUEUE in data-structure (classification, working procedure, Applications)
QUEUE in data-structure (classification, working procedure, Applications)QUEUE in data-structure (classification, working procedure, Applications)
QUEUE in data-structure (classification, working procedure, Applications)Mehedi Hasan
 
Supervised and unsupervised learning
Supervised and unsupervised learningSupervised and unsupervised learning
Supervised and unsupervised learningAmAn Singh
 
Algorithm and Data Structure - Queue
Algorithm and Data Structure - QueueAlgorithm and Data Structure - Queue
Algorithm and Data Structure - QueueAndiNurkholis1
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationBoston Institute of Analytics
 
Congrats ! You got your Data Science Job
Congrats ! You got your Data Science JobCongrats ! You got your Data Science Job
Congrats ! You got your Data Science JobRohit Dubey
 
INTRODUCTION TO BOOSTING.ppt
INTRODUCTION TO BOOSTING.pptINTRODUCTION TO BOOSTING.ppt
INTRODUCTION TO BOOSTING.pptBharatDaiyaBharat
 
Stacks
StacksStacks
StacksAcad
 

Similar to Handling Missing Values for Machine Learning.pptx (20)

Interpolation Missing values.pptx
Interpolation Missing values.pptxInterpolation Missing values.pptx
Interpolation Missing values.pptx
 
01-Introduction of DSA-1.pptx
01-Introduction of DSA-1.pptx01-Introduction of DSA-1.pptx
01-Introduction of DSA-1.pptx
 
MyStataLab Assignment Help
MyStataLab Assignment HelpMyStataLab Assignment Help
MyStataLab Assignment Help
 
ml ppt.pptx
ml ppt.pptxml ppt.pptx
ml ppt.pptx
 
Data structures
Data structuresData structures
Data structures
 
Numerical data.
Numerical data.Numerical data.
Numerical data.
 
Queues
QueuesQueues
Queues
 
Consider this code using the ArrayBag of Section 5.2 and the Locat.docx
Consider this code using the ArrayBag of Section 5.2 and the Locat.docxConsider this code using the ArrayBag of Section 5.2 and the Locat.docx
Consider this code using the ArrayBag of Section 5.2 and the Locat.docx
 
QUEUE in data-structure (classification, working procedure, Applications)
QUEUE in data-structure (classification, working procedure, Applications)QUEUE in data-structure (classification, working procedure, Applications)
QUEUE in data-structure (classification, working procedure, Applications)
 
Supervised and unsupervised learning
Supervised and unsupervised learningSupervised and unsupervised learning
Supervised and unsupervised learning
 
4. method overloading
4. method overloading4. method overloading
4. method overloading
 
Algorithm and Data Structure - Queue
Algorithm and Data Structure - QueueAlgorithm and Data Structure - Queue
Algorithm and Data Structure - Queue
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project Presentation
 
Congrats ! You got your Data Science Job
Congrats ! You got your Data Science JobCongrats ! You got your Data Science Job
Congrats ! You got your Data Science Job
 
interenship.pptx
interenship.pptxinterenship.pptx
interenship.pptx
 
Explore ml day 2
Explore ml day 2Explore ml day 2
Explore ml day 2
 
Queue and its operations
Queue and its operationsQueue and its operations
Queue and its operations
 
INTRODUCTION TO BOOSTING.ppt
INTRODUCTION TO BOOSTING.pptINTRODUCTION TO BOOSTING.ppt
INTRODUCTION TO BOOSTING.ppt
 
Stacks
StacksStacks
Stacks
 
05-stack_queue.ppt
05-stack_queue.ppt05-stack_queue.ppt
05-stack_queue.ppt
 

Recently uploaded

Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
Spark3's new memory model/management
Spark3's new memory model/managementSpark3's new memory model/management
Spark3's new memory model/managementakshesh doshi
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
Data Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxData Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxFurkanTasci3
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 

Recently uploaded (20)

Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
Spark3's new memory model/management
Spark3's new memory model/managementSpark3's new memory model/management
Spark3's new memory model/management
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
Data Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxData Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptx
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 

Handling Missing Values for Machine Learning.pptx

  • 1. Presentation On “Handling missing values in Machine Learning” Supervised Fazle Rabbi Lecturer Faculty of Science And Information Technology Daffodil International University Undertaken By Md. Shamim Bhuiyan ID: 0242310005343002 ©Daffodil International University
  • 2. Handling missing values in machine learning
  • 3. WHY VALUES ARE MISSING IN MACHINE LEARNING: We take data from sometimes sources like kaggle.com, sometimes we collect from different sources by doing web scrapping containing missing values in it. But do you think ? • We can collect data and start doing our work? • Can we implement machine learning algorithms using that data?
  • 4. The answer is no. Because the data collected is unprocessed. Unprocessed data must be containing some unauthorized data. • Missing values • Outliers • Unstructured manner • Categorical data(which needs to be converted to numerical variables)
  • 5. Let’s have a look at the missing data in the dataset below:
  • 6. Let’s begin with the methods to solve the problem 1. Delete Rows with Missing Values One way of handling missing values is the deletion of the rows or columns having null values. If any columns have more than half of the values as null then you can drop the entire column. In the same way, rows can also be dropped if having one or more columns values as null. Before using this method one thing we have to keep in mind is that we should not be losing information.
  • 7. When to delete the rows/column in a dataset? • If a certain column has many missing values then you can choose to drop the entire column. • When you have a huge dataset. Deleting for e.g. 2-3 rows/columns will not make much difference. • Output results do not depend on the Deleted data.
  • 8. 2. Replacing With Arbitrary Value We can replace the missing value with some arbitrary value using fillna(). Ex. In the below code, we are replacing the missing values with ‘0’.As well you can replace any particular column missing values with some arbitrary value also. There are two ways to replacing with arbitrary value : 1.Replacing with previous value-Forward fill 2.Replacing with next value-Backward fill
  • 9. i. Replacing with previous value – Forward fill We can impute the values with the previous value by using forward fill. It is mostly used in time series data. Syntax: df.fillna(method=’fill’)
  • 10. ii.Replacing with next value – Backward fill In backward fill, the missing value is imputed using the next value. It is mostly used in time series data.
  • 11. 3. Interpolation Missing values can also be imputed using ‘interpolation’. Pandas interpolate method can be used to replace the missing values with different interpolation methods like ‘polynomial’, ‘linear’, ‘quadratic’. The default method is ‘linear’. Syntax: df.interpolate(method=’linear’)
  • 12. Handling missing values: python code: We have taken dataset titanic.csv which is freely available at kaggle.com. This dataset was taken as it has missing values. i.Reading the data import pandas as pd df = pd.read_csv("train.csv", usecols=['Age','Fare','Survived']) df The dataset is read and used three columns ‘Age’, ’Fare’, ’Survived’.
  • 13. ii.Checking if there are missing values df.isnull().sum() Output: Survived 4 Age 179 Fare 2 dtype: int64 iii.Filling missing values with 0 new_df = df.fillna(0) new_df
  • 14. 4. Filling NaN values with forward fill value new_df = df.fillna(method=“ffill") new_df If we use forward fill that simply means we are forwarding the previous value where ever we have NaN values.
  • 15. 5. Setting forward fill limit to 1 new_df = df.fillna(method="ffill",limit=1) new_df
  • 16. 6. Interpolate of missing values new_df = df.interpolate() df In this, we were having two values 22 and 26. And in between value was a NaN value. So that NaN value is computed by getting the mean of 22 and 26 i.e. 24. In the same way, other NaN values were also computed.
  • 17. 7. Dropna() new_df = df.dropna() new_df Previously we were having 891 rows and after running this code we are left with 710 rows because some of the rows were continuing NaN values were dropped.
  • 18. 8. Deleting the rows having all NaN values new_df = df.dropna(how='all') new_df Those rows in which all the values are NaN values will be deleted. If the row even has one value even then it will not be dropped.
  • 19. We have another technique to handle missing values.