SlideShare a Scribd company logo
1 of 25
Interpolation Missing values
• Interpolation in Python is a technique used to
estimate unknown data points between two known
data points. In Python, Interpolation is a technique
mostly used to impute missing values in the data
frame or series while preprocessing data. You can use
this method to estimate missing data points in your
data using Python in Power BI or machine learning
algorithms. Interpolation is also used in Image
Processing when expanding an image, where you can
estimate the pixel value with the help of neighboring
pixels.
When to Use Interpolation?
• We can use Interpolation to find missing value/null with
the help of its neighbors. When imputing missing values
with average does not fit best, we have to move to a
different technique, and the technique most people find is
Interpolation.
• Interpolation is mostly used while working with time-
series data because, in time-series data, we like to fill
missing values with the previous one or two values. for
example, suppose temperature, now we would always
prefer to fill today’s temperature with the mean of the last
2 days, not with the mean of the month. We can also use
Interpolation for calculating the moving averages.
EXAMPLE
• You can use interpolation when there is an order or a
sequence and you want to estimate a missing value in
the sequence. For example: Let’s say there are various
classes of tickets in train travel, like, first class, second
class, and so on. You would naturally expect the ticket
price of the higher class to be more expensive than the
lower class.
• In that case, if the ticket price of an intermediate class
is missing, you can use interpolation to estimate the
missing value.
Using Interpolation to Fill Missing
Values in Series Data
• import numpy as np
• import pandas as pd
• fare = {'first_class':100,
• 'second_class':np.nan,
• 'third_class':60,
• 'open_class':20}
• ser = pd.Series(fare)
• ser
OUTPUT
• first_class 100.0
• second_class NaN
• third_class 60.0
• open_class 20.0
• dtype: float64
Linear Interpolation
• Linear Interpolation simply means to estimate
a missing value by connecting dots in a
straight line in increasing order.
Polynomial Interpolation
• In Polynomial Interpolation, you need to
specify an order. It means that polynomial
interpolation fills missing values with the
lowest possible degree that passes through
available data points. The polynomial
Interpolation curve is like the trigonometric
sin curve or assumes it like a parabola shape.
• a.interpolate(method="polynomial", order=2)
Interpolation padding
• Interpolation with the help of padding simply means
filling missing values with the same value present
above them in the dataset. If the missing value is in the
first row, then this method will not work. While using
this technique, you also need to specify the limit, which
means how many NaN values to fill.
• So, if you are working on a real-world project and want
to fill missing values with previous values, you have to
specify the limit as to the number of rows in the dataset.
• a.interpolate(method="pad", limit=2)
Application
• These uses of interpolation include: Help
users to determine what data might exist
outside of their collected data. Similarly, for
scientists, engineers, photographers and
mathematicians to fit the data for analysing
the trend and so on.
discretization
• Data discretization is the process of converting
continuous data into discrete buckets by grouping it.
Discretization is also known for easy maintainability of
the data. Training a model with discrete data becomes
faster and more effective than when attempting the
same with continuous data. Although continuous-
valued data contains more information, huge amounts
of data can slow the model down. Here, discretization
can help us strike a balance between both. Some
famous methods of data discretization are binning and
using a histogram. Although data discretization is
useful, we need to effectively pick the range of each
bucket, which is a challenge.
• Here we make use of a function
called pandas.cut(). This function is useful to
achieve the bucketing and sorting of
segmented data.
• Perform bucketing using the pd.cut() function on
the marks column and display the top 10
columns. The cut() function takes parameters
such as x, bins, and labels. Here, we have used
only three parameters. Add the following code to
implement
this:df['bucket']=pd.cut(df['marks'],5,labels=['Po
or','Below_average','Average','Above_Average','
Excellent'])
• df.head(10)
Binning
• Data binning, which is also known as
bucketing or discretization, is a technique
used in data processing and statistics.
• Binning can be used for example, if there are
more possible data points than observed data
points.
• Bins do not necessarily have to be numerical,
they can be categorical values of any kind, like
"dogs", "cats",and so on.
• Binning is also used in image processing, binning.
It can be used to reduce the amount of data, by
combining neighboring pixel into single pixels. kxk
binning reduces areas of k x k pixels into single
pixel.
• Pandas provides easy ways to create bins and to
bin data
Example of Binning
• import pandas as pd
• df = pd.DataFrame.from_dict({
• 'Name': ['Ray', 'Jane', 'Kate', 'Nik', 'Autumn',
'Kasi', 'Mandeep', 'Evan', 'Kyra', 'Jim'],
• 'Age': [12, 7, 33, 34, 45, 65, 77, 11, 32, 55]
• })
• print(df.head())
Output
• df['Age Groups'] = pd.qcut(df['Age'], 4)
• print(df.head())
Outlier Detection
• Outliers are an important part of a dataset. They
can hold useful information about your data.
• In simple terms, an outlier is an extremely high or
extremely low data point relative to the nearest
data point and the rest of the neighboring co-
existing values in a data graph or dataset you're
working with.
• Outliers are extreme values that stand out greatly
from the overall pattern of values in a dataset or
graph.
How to Identify an Outlier in a
Dataset
• outlier < Q1 - 1.5(IQR)
• outlier > Q3 + 1.5(IQR)
Random Sampling With and without
Replacement.
• Sample() is an inbuilt function of random module in
Python that returns a particular length list of items
chosen from the sequence i.e. list, tuple, string or set.
Used for random sampling without replacement.
• Syntax : random.sample(sequence, k)
• Parameters:
sequence: Can be a list, tuple, string, or set.
k: An Integer value, it specify the length of a sample.
• Returns: k length new list of elements chosen from the
sequence.
•
• Randomly selecting records from a large data set may be
helpful if your data set is so large as to prevent or slow
processing, or if one is conducting a survey and needs to
select a random sample from some master database. When
you select records randomly from a larger data set (or some
master database), you can achieve the sampling in a few
different ways, including:
• sampling without replacement, in which a subset of the
observations are selected randomly, and once an
observation is selected it cannot be selected again.
• sampling with replacement, in which a subset of
observations are selected randomly, and an observation
may be selected more than once.
• Sampling with replacement:
• Consider a population of potato sacks, each of which has
either 12, 13, 14, 15, 16, 17, or 18 potatoes, and all the
values are equally likely. Suppose that, in this population,
there is exactly one sack with each number. So the whole
population has seven sacks. If I sample two with
replacement, then I first pick one (say 14). I had a 1/7
probability of choosing that one. Then I replace it. Then I
pick another. Every one of them still has 1/7 probability of
being chosen. And there are exactly 49 different
possibilities here (assuming we distinguish between the
first and second.) They are: (12,12), (12,13), (12, 14),
(12,15), (12,16), (12,17), (12,18), (13,12), (13,13), (13,14),
etc.
• Sampling without replacement:
• Consider the same population of potato sacks, each of
which has either 12, 13, 14, 15, 16, 17, or 18 potatoes, and
all the values are equally likely. Suppose that, in this
population, there is exactly one sack with each number. So
the whole population has seven sacks. If I sample two
without replacement, then I first pick one (say 14). I had a
1/7 probability of choosing that one. Then I pick another. At
this point, there are only six possibilities: 12, 13, 15, 16, 17,
and 18. So there are only 42 different possibilities here
(again assuming that we distinguish between the first and
the second.) They are: (12,13), (12,14), (12,15), (12,16),
(12,17), (12,18), (13,12), (13,14), (13,15), etc.

More Related Content

Similar to Interpolation Missing values.pptx

Summerization notes for descriptive statistics using r
Summerization notes for descriptive statistics using r Summerization notes for descriptive statistics using r
Summerization notes for descriptive statistics using r Ashwini Mathur
 
Handling Missing Values for Machine Learning.pptx
Handling Missing Values for Machine Learning.pptxHandling Missing Values for Machine Learning.pptx
Handling Missing Values for Machine Learning.pptxShamimBhuiyan8
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationBoston Institute of Analytics
 
Random forest sgv_ai_talk_oct_2_2018
Random forest sgv_ai_talk_oct_2_2018Random forest sgv_ai_talk_oct_2_2018
Random forest sgv_ai_talk_oct_2_2018digitalzombie
 
dimension reduction.ppt
dimension reduction.pptdimension reduction.ppt
dimension reduction.pptDeadpool120050
 
Intro to Machine Learning for non-Data Scientists
Intro to Machine Learning for non-Data ScientistsIntro to Machine Learning for non-Data Scientists
Intro to Machine Learning for non-Data ScientistsParinaz Ameri
 
Congrats ! You got your Data Science Job
Congrats ! You got your Data Science JobCongrats ! You got your Data Science Job
Congrats ! You got your Data Science JobRohit Dubey
 
Application of Machine Learning in Agriculture
Application of Machine  Learning in AgricultureApplication of Machine  Learning in Agriculture
Application of Machine Learning in AgricultureAman Vasisht
 
Python-for-Data-Analysis.pptx
Python-for-Data-Analysis.pptxPython-for-Data-Analysis.pptx
Python-for-Data-Analysis.pptxParveenShaik21
 
Random Forest / Bootstrap Aggregation
Random Forest / Bootstrap AggregationRandom Forest / Bootstrap Aggregation
Random Forest / Bootstrap AggregationRupak Roy
 
Decision control units by Python.pptx includes loop, If else, list, tuple and...
Decision control units by Python.pptx includes loop, If else, list, tuple and...Decision control units by Python.pptx includes loop, If else, list, tuple and...
Decision control units by Python.pptx includes loop, If else, list, tuple and...supriyasarkar38
 
Q-Step_WS_06112019_Data_Analysis_and_visualisation_with_Python.pptx
Q-Step_WS_06112019_Data_Analysis_and_visualisation_with_Python.pptxQ-Step_WS_06112019_Data_Analysis_and_visualisation_with_Python.pptx
Q-Step_WS_06112019_Data_Analysis_and_visualisation_with_Python.pptxkalai75
 
Introduction of data science
Introduction of data scienceIntroduction of data science
Introduction of data scienceTanujaSomvanshi1
 
Data Science Interview Questions | Data Science Interview Questions And Answe...
Data Science Interview Questions | Data Science Interview Questions And Answe...Data Science Interview Questions | Data Science Interview Questions And Answe...
Data Science Interview Questions | Data Science Interview Questions And Answe...Simplilearn
 
Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)Zihui Li
 

Similar to Interpolation Missing values.pptx (20)

Summerization notes for descriptive statistics using r
Summerization notes for descriptive statistics using r Summerization notes for descriptive statistics using r
Summerization notes for descriptive statistics using r
 
Handling Missing Values for Machine Learning.pptx
Handling Missing Values for Machine Learning.pptxHandling Missing Values for Machine Learning.pptx
Handling Missing Values for Machine Learning.pptx
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project Presentation
 
Random forest sgv_ai_talk_oct_2_2018
Random forest sgv_ai_talk_oct_2_2018Random forest sgv_ai_talk_oct_2_2018
Random forest sgv_ai_talk_oct_2_2018
 
dimension reduction.ppt
dimension reduction.pptdimension reduction.ppt
dimension reduction.ppt
 
Intro to Machine Learning for non-Data Scientists
Intro to Machine Learning for non-Data ScientistsIntro to Machine Learning for non-Data Scientists
Intro to Machine Learning for non-Data Scientists
 
Congrats ! You got your Data Science Job
Congrats ! You got your Data Science JobCongrats ! You got your Data Science Job
Congrats ! You got your Data Science Job
 
Resampling methods
Resampling methodsResampling methods
Resampling methods
 
Python for data analysis
Python for data analysisPython for data analysis
Python for data analysis
 
Pandas csv
Pandas csvPandas csv
Pandas csv
 
Application of Machine Learning in Agriculture
Application of Machine  Learning in AgricultureApplication of Machine  Learning in Agriculture
Application of Machine Learning in Agriculture
 
Python-for-Data-Analysis.pptx
Python-for-Data-Analysis.pptxPython-for-Data-Analysis.pptx
Python-for-Data-Analysis.pptx
 
Decision Tree.pptx
Decision Tree.pptxDecision Tree.pptx
Decision Tree.pptx
 
Random Forest / Bootstrap Aggregation
Random Forest / Bootstrap AggregationRandom Forest / Bootstrap Aggregation
Random Forest / Bootstrap Aggregation
 
NumPy.pptx
NumPy.pptxNumPy.pptx
NumPy.pptx
 
Decision control units by Python.pptx includes loop, If else, list, tuple and...
Decision control units by Python.pptx includes loop, If else, list, tuple and...Decision control units by Python.pptx includes loop, If else, list, tuple and...
Decision control units by Python.pptx includes loop, If else, list, tuple and...
 
Q-Step_WS_06112019_Data_Analysis_and_visualisation_with_Python.pptx
Q-Step_WS_06112019_Data_Analysis_and_visualisation_with_Python.pptxQ-Step_WS_06112019_Data_Analysis_and_visualisation_with_Python.pptx
Q-Step_WS_06112019_Data_Analysis_and_visualisation_with_Python.pptx
 
Introduction of data science
Introduction of data scienceIntroduction of data science
Introduction of data science
 
Data Science Interview Questions | Data Science Interview Questions And Answe...
Data Science Interview Questions | Data Science Interview Questions And Answe...Data Science Interview Questions | Data Science Interview Questions And Answe...
Data Science Interview Questions | Data Science Interview Questions And Answe...
 
Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)
 

Recently uploaded

diagnosting testing bsc 2nd sem.pptx....
diagnosting testing bsc 2nd sem.pptx....diagnosting testing bsc 2nd sem.pptx....
diagnosting testing bsc 2nd sem.pptx....Ritu480198
 
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT TOÁN 2024 - TỪ CÁC TRƯỜNG, TRƯỜNG...
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT TOÁN 2024 - TỪ CÁC TRƯỜNG, TRƯỜNG...TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT TOÁN 2024 - TỪ CÁC TRƯỜNG, TRƯỜNG...
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT TOÁN 2024 - TỪ CÁC TRƯỜNG, TRƯỜNG...Nguyen Thanh Tu Collection
 
會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽
會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽
會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽中 央社
 
When Quality Assurance Meets Innovation in Higher Education - Report launch w...
When Quality Assurance Meets Innovation in Higher Education - Report launch w...When Quality Assurance Meets Innovation in Higher Education - Report launch w...
When Quality Assurance Meets Innovation in Higher Education - Report launch w...Gary Wood
 
Sternal Fractures & Dislocations - EMGuidewire Radiology Reading Room
Sternal Fractures & Dislocations - EMGuidewire Radiology Reading RoomSternal Fractures & Dislocations - EMGuidewire Radiology Reading Room
Sternal Fractures & Dislocations - EMGuidewire Radiology Reading RoomSean M. Fox
 
Major project report on Tata Motors and its marketing strategies
Major project report on Tata Motors and its marketing strategiesMajor project report on Tata Motors and its marketing strategies
Major project report on Tata Motors and its marketing strategiesAmanpreetKaur157993
 
Rich Dad Poor Dad ( PDFDrive.com )--.pdf
Rich Dad Poor Dad ( PDFDrive.com )--.pdfRich Dad Poor Dad ( PDFDrive.com )--.pdf
Rich Dad Poor Dad ( PDFDrive.com )--.pdfJerry Chew
 
Improved Approval Flow in Odoo 17 Studio App
Improved Approval Flow in Odoo 17 Studio AppImproved Approval Flow in Odoo 17 Studio App
Improved Approval Flow in Odoo 17 Studio AppCeline George
 
Andreas Schleicher presents at the launch of What does child empowerment mean...
Andreas Schleicher presents at the launch of What does child empowerment mean...Andreas Schleicher presents at the launch of What does child empowerment mean...
Andreas Schleicher presents at the launch of What does child empowerment mean...EduSkills OECD
 
Spring gala 2024 photo slideshow - Celebrating School-Community Partnerships
Spring gala 2024 photo slideshow - Celebrating School-Community PartnershipsSpring gala 2024 photo slideshow - Celebrating School-Community Partnerships
Spring gala 2024 photo slideshow - Celebrating School-Community Partnershipsexpandedwebsite
 
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...EADTU
 
Personalisation of Education by AI and Big Data - Lourdes Guàrdia
Personalisation of Education by AI and Big Data - Lourdes GuàrdiaPersonalisation of Education by AI and Big Data - Lourdes Guàrdia
Personalisation of Education by AI and Big Data - Lourdes GuàrdiaEADTU
 
AIM of Education-Teachers Training-2024.ppt
AIM of Education-Teachers Training-2024.pptAIM of Education-Teachers Training-2024.ppt
AIM of Education-Teachers Training-2024.pptNishitharanjan Rout
 
UChicago CMSC 23320 - The Best Commit Messages of 2024
UChicago CMSC 23320 - The Best Commit Messages of 2024UChicago CMSC 23320 - The Best Commit Messages of 2024
UChicago CMSC 23320 - The Best Commit Messages of 2024Borja Sotomayor
 
Analyzing and resolving a communication crisis in Dhaka textiles LTD.pptx
Analyzing and resolving a communication crisis in Dhaka textiles LTD.pptxAnalyzing and resolving a communication crisis in Dhaka textiles LTD.pptx
Analyzing and resolving a communication crisis in Dhaka textiles LTD.pptxLimon Prince
 
Trauma-Informed Leadership - Five Practical Principles
Trauma-Informed Leadership - Five Practical PrinciplesTrauma-Informed Leadership - Five Practical Principles
Trauma-Informed Leadership - Five Practical PrinciplesPooky Knightsmith
 
Spellings Wk 4 and Wk 5 for Grade 4 at CAPS
Spellings Wk 4 and Wk 5 for Grade 4 at CAPSSpellings Wk 4 and Wk 5 for Grade 4 at CAPS
Spellings Wk 4 and Wk 5 for Grade 4 at CAPSAnaAcapella
 

Recently uploaded (20)

diagnosting testing bsc 2nd sem.pptx....
diagnosting testing bsc 2nd sem.pptx....diagnosting testing bsc 2nd sem.pptx....
diagnosting testing bsc 2nd sem.pptx....
 
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT TOÁN 2024 - TỪ CÁC TRƯỜNG, TRƯỜNG...
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT TOÁN 2024 - TỪ CÁC TRƯỜNG, TRƯỜNG...TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT TOÁN 2024 - TỪ CÁC TRƯỜNG, TRƯỜNG...
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT TOÁN 2024 - TỪ CÁC TRƯỜNG, TRƯỜNG...
 
會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽
會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽
會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽
 
When Quality Assurance Meets Innovation in Higher Education - Report launch w...
When Quality Assurance Meets Innovation in Higher Education - Report launch w...When Quality Assurance Meets Innovation in Higher Education - Report launch w...
When Quality Assurance Meets Innovation in Higher Education - Report launch w...
 
Sternal Fractures & Dislocations - EMGuidewire Radiology Reading Room
Sternal Fractures & Dislocations - EMGuidewire Radiology Reading RoomSternal Fractures & Dislocations - EMGuidewire Radiology Reading Room
Sternal Fractures & Dislocations - EMGuidewire Radiology Reading Room
 
Supporting Newcomer Multilingual Learners
Supporting Newcomer  Multilingual LearnersSupporting Newcomer  Multilingual Learners
Supporting Newcomer Multilingual Learners
 
Major project report on Tata Motors and its marketing strategies
Major project report on Tata Motors and its marketing strategiesMajor project report on Tata Motors and its marketing strategies
Major project report on Tata Motors and its marketing strategies
 
Rich Dad Poor Dad ( PDFDrive.com )--.pdf
Rich Dad Poor Dad ( PDFDrive.com )--.pdfRich Dad Poor Dad ( PDFDrive.com )--.pdf
Rich Dad Poor Dad ( PDFDrive.com )--.pdf
 
Improved Approval Flow in Odoo 17 Studio App
Improved Approval Flow in Odoo 17 Studio AppImproved Approval Flow in Odoo 17 Studio App
Improved Approval Flow in Odoo 17 Studio App
 
Andreas Schleicher presents at the launch of What does child empowerment mean...
Andreas Schleicher presents at the launch of What does child empowerment mean...Andreas Schleicher presents at the launch of What does child empowerment mean...
Andreas Schleicher presents at the launch of What does child empowerment mean...
 
ESSENTIAL of (CS/IT/IS) class 07 (Networks)
ESSENTIAL of (CS/IT/IS) class 07 (Networks)ESSENTIAL of (CS/IT/IS) class 07 (Networks)
ESSENTIAL of (CS/IT/IS) class 07 (Networks)
 
Spring gala 2024 photo slideshow - Celebrating School-Community Partnerships
Spring gala 2024 photo slideshow - Celebrating School-Community PartnershipsSpring gala 2024 photo slideshow - Celebrating School-Community Partnerships
Spring gala 2024 photo slideshow - Celebrating School-Community Partnerships
 
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...
 
Personalisation of Education by AI and Big Data - Lourdes Guàrdia
Personalisation of Education by AI and Big Data - Lourdes GuàrdiaPersonalisation of Education by AI and Big Data - Lourdes Guàrdia
Personalisation of Education by AI and Big Data - Lourdes Guàrdia
 
OS-operating systems- ch05 (CPU Scheduling) ...
OS-operating systems- ch05 (CPU Scheduling) ...OS-operating systems- ch05 (CPU Scheduling) ...
OS-operating systems- ch05 (CPU Scheduling) ...
 
AIM of Education-Teachers Training-2024.ppt
AIM of Education-Teachers Training-2024.pptAIM of Education-Teachers Training-2024.ppt
AIM of Education-Teachers Training-2024.ppt
 
UChicago CMSC 23320 - The Best Commit Messages of 2024
UChicago CMSC 23320 - The Best Commit Messages of 2024UChicago CMSC 23320 - The Best Commit Messages of 2024
UChicago CMSC 23320 - The Best Commit Messages of 2024
 
Analyzing and resolving a communication crisis in Dhaka textiles LTD.pptx
Analyzing and resolving a communication crisis in Dhaka textiles LTD.pptxAnalyzing and resolving a communication crisis in Dhaka textiles LTD.pptx
Analyzing and resolving a communication crisis in Dhaka textiles LTD.pptx
 
Trauma-Informed Leadership - Five Practical Principles
Trauma-Informed Leadership - Five Practical PrinciplesTrauma-Informed Leadership - Five Practical Principles
Trauma-Informed Leadership - Five Practical Principles
 
Spellings Wk 4 and Wk 5 for Grade 4 at CAPS
Spellings Wk 4 and Wk 5 for Grade 4 at CAPSSpellings Wk 4 and Wk 5 for Grade 4 at CAPS
Spellings Wk 4 and Wk 5 for Grade 4 at CAPS
 

Interpolation Missing values.pptx

  • 2. • Interpolation in Python is a technique used to estimate unknown data points between two known data points. In Python, Interpolation is a technique mostly used to impute missing values in the data frame or series while preprocessing data. You can use this method to estimate missing data points in your data using Python in Power BI or machine learning algorithms. Interpolation is also used in Image Processing when expanding an image, where you can estimate the pixel value with the help of neighboring pixels.
  • 3. When to Use Interpolation? • We can use Interpolation to find missing value/null with the help of its neighbors. When imputing missing values with average does not fit best, we have to move to a different technique, and the technique most people find is Interpolation. • Interpolation is mostly used while working with time- series data because, in time-series data, we like to fill missing values with the previous one or two values. for example, suppose temperature, now we would always prefer to fill today’s temperature with the mean of the last 2 days, not with the mean of the month. We can also use Interpolation for calculating the moving averages.
  • 4. EXAMPLE • You can use interpolation when there is an order or a sequence and you want to estimate a missing value in the sequence. For example: Let’s say there are various classes of tickets in train travel, like, first class, second class, and so on. You would naturally expect the ticket price of the higher class to be more expensive than the lower class. • In that case, if the ticket price of an intermediate class is missing, you can use interpolation to estimate the missing value.
  • 5. Using Interpolation to Fill Missing Values in Series Data • import numpy as np • import pandas as pd • fare = {'first_class':100, • 'second_class':np.nan, • 'third_class':60, • 'open_class':20} • ser = pd.Series(fare) • ser
  • 6. OUTPUT • first_class 100.0 • second_class NaN • third_class 60.0 • open_class 20.0 • dtype: float64
  • 7. Linear Interpolation • Linear Interpolation simply means to estimate a missing value by connecting dots in a straight line in increasing order.
  • 8. Polynomial Interpolation • In Polynomial Interpolation, you need to specify an order. It means that polynomial interpolation fills missing values with the lowest possible degree that passes through available data points. The polynomial Interpolation curve is like the trigonometric sin curve or assumes it like a parabola shape. • a.interpolate(method="polynomial", order=2)
  • 9. Interpolation padding • Interpolation with the help of padding simply means filling missing values with the same value present above them in the dataset. If the missing value is in the first row, then this method will not work. While using this technique, you also need to specify the limit, which means how many NaN values to fill. • So, if you are working on a real-world project and want to fill missing values with previous values, you have to specify the limit as to the number of rows in the dataset. • a.interpolate(method="pad", limit=2)
  • 10. Application • These uses of interpolation include: Help users to determine what data might exist outside of their collected data. Similarly, for scientists, engineers, photographers and mathematicians to fit the data for analysing the trend and so on.
  • 11. discretization • Data discretization is the process of converting continuous data into discrete buckets by grouping it. Discretization is also known for easy maintainability of the data. Training a model with discrete data becomes faster and more effective than when attempting the same with continuous data. Although continuous- valued data contains more information, huge amounts of data can slow the model down. Here, discretization can help us strike a balance between both. Some famous methods of data discretization are binning and using a histogram. Although data discretization is useful, we need to effectively pick the range of each bucket, which is a challenge.
  • 12. • Here we make use of a function called pandas.cut(). This function is useful to achieve the bucketing and sorting of segmented data.
  • 13. • Perform bucketing using the pd.cut() function on the marks column and display the top 10 columns. The cut() function takes parameters such as x, bins, and labels. Here, we have used only three parameters. Add the following code to implement this:df['bucket']=pd.cut(df['marks'],5,labels=['Po or','Below_average','Average','Above_Average',' Excellent']) • df.head(10)
  • 14. Binning • Data binning, which is also known as bucketing or discretization, is a technique used in data processing and statistics. • Binning can be used for example, if there are more possible data points than observed data points.
  • 15. • Bins do not necessarily have to be numerical, they can be categorical values of any kind, like "dogs", "cats",and so on. • Binning is also used in image processing, binning. It can be used to reduce the amount of data, by combining neighboring pixel into single pixels. kxk binning reduces areas of k x k pixels into single pixel. • Pandas provides easy ways to create bins and to bin data
  • 16. Example of Binning • import pandas as pd • df = pd.DataFrame.from_dict({ • 'Name': ['Ray', 'Jane', 'Kate', 'Nik', 'Autumn', 'Kasi', 'Mandeep', 'Evan', 'Kyra', 'Jim'], • 'Age': [12, 7, 33, 34, 45, 65, 77, 11, 32, 55] • }) • print(df.head())
  • 18. • df['Age Groups'] = pd.qcut(df['Age'], 4) • print(df.head())
  • 19.
  • 20. Outlier Detection • Outliers are an important part of a dataset. They can hold useful information about your data. • In simple terms, an outlier is an extremely high or extremely low data point relative to the nearest data point and the rest of the neighboring co- existing values in a data graph or dataset you're working with. • Outliers are extreme values that stand out greatly from the overall pattern of values in a dataset or graph.
  • 21. How to Identify an Outlier in a Dataset • outlier < Q1 - 1.5(IQR) • outlier > Q3 + 1.5(IQR)
  • 22. Random Sampling With and without Replacement. • Sample() is an inbuilt function of random module in Python that returns a particular length list of items chosen from the sequence i.e. list, tuple, string or set. Used for random sampling without replacement. • Syntax : random.sample(sequence, k) • Parameters: sequence: Can be a list, tuple, string, or set. k: An Integer value, it specify the length of a sample. • Returns: k length new list of elements chosen from the sequence. •
  • 23. • Randomly selecting records from a large data set may be helpful if your data set is so large as to prevent or slow processing, or if one is conducting a survey and needs to select a random sample from some master database. When you select records randomly from a larger data set (or some master database), you can achieve the sampling in a few different ways, including: • sampling without replacement, in which a subset of the observations are selected randomly, and once an observation is selected it cannot be selected again. • sampling with replacement, in which a subset of observations are selected randomly, and an observation may be selected more than once.
  • 24. • Sampling with replacement: • Consider a population of potato sacks, each of which has either 12, 13, 14, 15, 16, 17, or 18 potatoes, and all the values are equally likely. Suppose that, in this population, there is exactly one sack with each number. So the whole population has seven sacks. If I sample two with replacement, then I first pick one (say 14). I had a 1/7 probability of choosing that one. Then I replace it. Then I pick another. Every one of them still has 1/7 probability of being chosen. And there are exactly 49 different possibilities here (assuming we distinguish between the first and second.) They are: (12,12), (12,13), (12, 14), (12,15), (12,16), (12,17), (12,18), (13,12), (13,13), (13,14), etc.
  • 25. • Sampling without replacement: • Consider the same population of potato sacks, each of which has either 12, 13, 14, 15, 16, 17, or 18 potatoes, and all the values are equally likely. Suppose that, in this population, there is exactly one sack with each number. So the whole population has seven sacks. If I sample two without replacement, then I first pick one (say 14). I had a 1/7 probability of choosing that one. Then I pick another. At this point, there are only six possibilities: 12, 13, 15, 16, 17, and 18. So there are only 42 different possibilities here (again assuming that we distinguish between the first and the second.) They are: (12,13), (12,14), (12,15), (12,16), (12,17), (12,18), (13,12), (13,14), (13,15), etc.