SlideShare a Scribd company logo
1 of 25
Interpolation Missing values
• Interpolation in Python is a technique used to
estimate unknown data points between two known
data points. In Python, Interpolation is a technique
mostly used to impute missing values in the data
frame or series while preprocessing data. You can use
this method to estimate missing data points in your
data using Python in Power BI or machine learning
algorithms. Interpolation is also used in Image
Processing when expanding an image, where you can
estimate the pixel value with the help of neighboring
pixels.
When to Use Interpolation?
• We can use Interpolation to find missing value/null with
the help of its neighbors. When imputing missing values
with average does not fit best, we have to move to a
different technique, and the technique most people find is
Interpolation.
• Interpolation is mostly used while working with time-
series data because, in time-series data, we like to fill
missing values with the previous one or two values. for
example, suppose temperature, now we would always
prefer to fill today’s temperature with the mean of the last
2 days, not with the mean of the month. We can also use
Interpolation for calculating the moving averages.
EXAMPLE
• You can use interpolation when there is an order or a
sequence and you want to estimate a missing value in
the sequence. For example: Let’s say there are various
classes of tickets in train travel, like, first class, second
class, and so on. You would naturally expect the ticket
price of the higher class to be more expensive than the
lower class.
• In that case, if the ticket price of an intermediate class
is missing, you can use interpolation to estimate the
missing value.
Using Interpolation to Fill Missing
Values in Series Data
• import numpy as np
• import pandas as pd
• fare = {'first_class':100,
• 'second_class':np.nan,
• 'third_class':60,
• 'open_class':20}
• ser = pd.Series(fare)
• ser
OUTPUT
• first_class 100.0
• second_class NaN
• third_class 60.0
• open_class 20.0
• dtype: float64
Linear Interpolation
• Linear Interpolation simply means to estimate
a missing value by connecting dots in a
straight line in increasing order.
Polynomial Interpolation
• In Polynomial Interpolation, you need to
specify an order. It means that polynomial
interpolation fills missing values with the
lowest possible degree that passes through
available data points. The polynomial
Interpolation curve is like the trigonometric
sin curve or assumes it like a parabola shape.
• a.interpolate(method="polynomial", order=2)
Interpolation padding
• Interpolation with the help of padding simply means
filling missing values with the same value present
above them in the dataset. If the missing value is in the
first row, then this method will not work. While using
this technique, you also need to specify the limit, which
means how many NaN values to fill.
• So, if you are working on a real-world project and want
to fill missing values with previous values, you have to
specify the limit as to the number of rows in the dataset.
• a.interpolate(method="pad", limit=2)
Application
• These uses of interpolation include: Help
users to determine what data might exist
outside of their collected data. Similarly, for
scientists, engineers, photographers and
mathematicians to fit the data for analysing
the trend and so on.
discretization
• Data discretization is the process of converting
continuous data into discrete buckets by grouping it.
Discretization is also known for easy maintainability of
the data. Training a model with discrete data becomes
faster and more effective than when attempting the
same with continuous data. Although continuous-
valued data contains more information, huge amounts
of data can slow the model down. Here, discretization
can help us strike a balance between both. Some
famous methods of data discretization are binning and
using a histogram. Although data discretization is
useful, we need to effectively pick the range of each
bucket, which is a challenge.
• Here we make use of a function
called pandas.cut(). This function is useful to
achieve the bucketing and sorting of
segmented data.
• Perform bucketing using the pd.cut() function on
the marks column and display the top 10
columns. The cut() function takes parameters
such as x, bins, and labels. Here, we have used
only three parameters. Add the following code to
implement
this:df['bucket']=pd.cut(df['marks'],5,labels=['Po
or','Below_average','Average','Above_Average','
Excellent'])
• df.head(10)
Binning
• Data binning, which is also known as
bucketing or discretization, is a technique
used in data processing and statistics.
• Binning can be used for example, if there are
more possible data points than observed data
points.
• Bins do not necessarily have to be numerical,
they can be categorical values of any kind, like
"dogs", "cats",and so on.
• Binning is also used in image processing, binning.
It can be used to reduce the amount of data, by
combining neighboring pixel into single pixels. kxk
binning reduces areas of k x k pixels into single
pixel.
• Pandas provides easy ways to create bins and to
bin data
Example of Binning
• import pandas as pd
• df = pd.DataFrame.from_dict({
• 'Name': ['Ray', 'Jane', 'Kate', 'Nik', 'Autumn',
'Kasi', 'Mandeep', 'Evan', 'Kyra', 'Jim'],
• 'Age': [12, 7, 33, 34, 45, 65, 77, 11, 32, 55]
• })
• print(df.head())
Output
• df['Age Groups'] = pd.qcut(df['Age'], 4)
• print(df.head())
Outlier Detection
• Outliers are an important part of a dataset. They
can hold useful information about your data.
• In simple terms, an outlier is an extremely high or
extremely low data point relative to the nearest
data point and the rest of the neighboring co-
existing values in a data graph or dataset you're
working with.
• Outliers are extreme values that stand out greatly
from the overall pattern of values in a dataset or
graph.
How to Identify an Outlier in a
Dataset
• outlier < Q1 - 1.5(IQR)
• outlier > Q3 + 1.5(IQR)
Random Sampling With and without
Replacement.
• Sample() is an inbuilt function of random module in
Python that returns a particular length list of items
chosen from the sequence i.e. list, tuple, string or set.
Used for random sampling without replacement.
• Syntax : random.sample(sequence, k)
• Parameters:
sequence: Can be a list, tuple, string, or set.
k: An Integer value, it specify the length of a sample.
• Returns: k length new list of elements chosen from the
sequence.
•
• Randomly selecting records from a large data set may be
helpful if your data set is so large as to prevent or slow
processing, or if one is conducting a survey and needs to
select a random sample from some master database. When
you select records randomly from a larger data set (or some
master database), you can achieve the sampling in a few
different ways, including:
• sampling without replacement, in which a subset of the
observations are selected randomly, and once an
observation is selected it cannot be selected again.
• sampling with replacement, in which a subset of
observations are selected randomly, and an observation
may be selected more than once.
• Sampling with replacement:
• Consider a population of potato sacks, each of which has
either 12, 13, 14, 15, 16, 17, or 18 potatoes, and all the
values are equally likely. Suppose that, in this population,
there is exactly one sack with each number. So the whole
population has seven sacks. If I sample two with
replacement, then I first pick one (say 14). I had a 1/7
probability of choosing that one. Then I replace it. Then I
pick another. Every one of them still has 1/7 probability of
being chosen. And there are exactly 49 different
possibilities here (assuming we distinguish between the
first and second.) They are: (12,12), (12,13), (12, 14),
(12,15), (12,16), (12,17), (12,18), (13,12), (13,13), (13,14),
etc.
• Sampling without replacement:
• Consider the same population of potato sacks, each of
which has either 12, 13, 14, 15, 16, 17, or 18 potatoes, and
all the values are equally likely. Suppose that, in this
population, there is exactly one sack with each number. So
the whole population has seven sacks. If I sample two
without replacement, then I first pick one (say 14). I had a
1/7 probability of choosing that one. Then I pick another. At
this point, there are only six possibilities: 12, 13, 15, 16, 17,
and 18. So there are only 42 different possibilities here
(again assuming that we distinguish between the first and
the second.) They are: (12,13), (12,14), (12,15), (12,16),
(12,17), (12,18), (13,12), (13,14), (13,15), etc.

More Related Content

Similar to Interpolation Missing values.pptx

Summerization notes for descriptive statistics using r
Summerization notes for descriptive statistics using r Summerization notes for descriptive statistics using r
Summerization notes for descriptive statistics using r Ashwini Mathur
 
Handling Missing Values for Machine Learning.pptx
Handling Missing Values for Machine Learning.pptxHandling Missing Values for Machine Learning.pptx
Handling Missing Values for Machine Learning.pptxShamimBhuiyan8
 
Random forest sgv_ai_talk_oct_2_2018
Random forest sgv_ai_talk_oct_2_2018Random forest sgv_ai_talk_oct_2_2018
Random forest sgv_ai_talk_oct_2_2018digitalzombie
 
dimension reduction.ppt
dimension reduction.pptdimension reduction.ppt
dimension reduction.pptDeadpool120050
 
Intro to Machine Learning for non-Data Scientists
Intro to Machine Learning for non-Data ScientistsIntro to Machine Learning for non-Data Scientists
Intro to Machine Learning for non-Data ScientistsParinaz Ameri
 
Congrats ! You got your Data Science Job
Congrats ! You got your Data Science JobCongrats ! You got your Data Science Job
Congrats ! You got your Data Science JobRohit Dubey
 
Application of Machine Learning in Agriculture
Application of Machine  Learning in AgricultureApplication of Machine  Learning in Agriculture
Application of Machine Learning in AgricultureAman Vasisht
 
Python-for-Data-Analysis.pptx
Python-for-Data-Analysis.pptxPython-for-Data-Analysis.pptx
Python-for-Data-Analysis.pptxParveenShaik21
 
Random Forest / Bootstrap Aggregation
Random Forest / Bootstrap AggregationRandom Forest / Bootstrap Aggregation
Random Forest / Bootstrap AggregationRupak Roy
 
Decision control units by Python.pptx includes loop, If else, list, tuple and...
Decision control units by Python.pptx includes loop, If else, list, tuple and...Decision control units by Python.pptx includes loop, If else, list, tuple and...
Decision control units by Python.pptx includes loop, If else, list, tuple and...supriyasarkar38
 
Q-Step_WS_06112019_Data_Analysis_and_visualisation_with_Python.pptx
Q-Step_WS_06112019_Data_Analysis_and_visualisation_with_Python.pptxQ-Step_WS_06112019_Data_Analysis_and_visualisation_with_Python.pptx
Q-Step_WS_06112019_Data_Analysis_and_visualisation_with_Python.pptxkalai75
 
Introduction of data science
Introduction of data scienceIntroduction of data science
Introduction of data scienceTanujaSomvanshi1
 
Data Science Interview Questions | Data Science Interview Questions And Answe...
Data Science Interview Questions | Data Science Interview Questions And Answe...Data Science Interview Questions | Data Science Interview Questions And Answe...
Data Science Interview Questions | Data Science Interview Questions And Answe...Simplilearn
 
Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)Zihui Li
 

Similar to Interpolation Missing values.pptx (20)

Summerization notes for descriptive statistics using r
Summerization notes for descriptive statistics using r Summerization notes for descriptive statistics using r
Summerization notes for descriptive statistics using r
 
Handling Missing Values for Machine Learning.pptx
Handling Missing Values for Machine Learning.pptxHandling Missing Values for Machine Learning.pptx
Handling Missing Values for Machine Learning.pptx
 
Random forest sgv_ai_talk_oct_2_2018
Random forest sgv_ai_talk_oct_2_2018Random forest sgv_ai_talk_oct_2_2018
Random forest sgv_ai_talk_oct_2_2018
 
dimension reduction.ppt
dimension reduction.pptdimension reduction.ppt
dimension reduction.ppt
 
Intro to Machine Learning for non-Data Scientists
Intro to Machine Learning for non-Data ScientistsIntro to Machine Learning for non-Data Scientists
Intro to Machine Learning for non-Data Scientists
 
Congrats ! You got your Data Science Job
Congrats ! You got your Data Science JobCongrats ! You got your Data Science Job
Congrats ! You got your Data Science Job
 
Resampling methods
Resampling methodsResampling methods
Resampling methods
 
Python for data analysis
Python for data analysisPython for data analysis
Python for data analysis
 
Pandas csv
Pandas csvPandas csv
Pandas csv
 
Application of Machine Learning in Agriculture
Application of Machine  Learning in AgricultureApplication of Machine  Learning in Agriculture
Application of Machine Learning in Agriculture
 
Python-for-Data-Analysis.pptx
Python-for-Data-Analysis.pptxPython-for-Data-Analysis.pptx
Python-for-Data-Analysis.pptx
 
Decision Tree.pptx
Decision Tree.pptxDecision Tree.pptx
Decision Tree.pptx
 
Random Forest / Bootstrap Aggregation
Random Forest / Bootstrap AggregationRandom Forest / Bootstrap Aggregation
Random Forest / Bootstrap Aggregation
 
NumPy.pptx
NumPy.pptxNumPy.pptx
NumPy.pptx
 
Decision control units by Python.pptx includes loop, If else, list, tuple and...
Decision control units by Python.pptx includes loop, If else, list, tuple and...Decision control units by Python.pptx includes loop, If else, list, tuple and...
Decision control units by Python.pptx includes loop, If else, list, tuple and...
 
Q-Step_WS_06112019_Data_Analysis_and_visualisation_with_Python.pptx
Q-Step_WS_06112019_Data_Analysis_and_visualisation_with_Python.pptxQ-Step_WS_06112019_Data_Analysis_and_visualisation_with_Python.pptx
Q-Step_WS_06112019_Data_Analysis_and_visualisation_with_Python.pptx
 
Introduction of data science
Introduction of data scienceIntroduction of data science
Introduction of data science
 
Data Science Interview Questions | Data Science Interview Questions And Answe...
Data Science Interview Questions | Data Science Interview Questions And Answe...Data Science Interview Questions | Data Science Interview Questions And Answe...
Data Science Interview Questions | Data Science Interview Questions And Answe...
 
Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)
 
R for Statistical Computing
R for Statistical ComputingR for Statistical Computing
R for Statistical Computing
 

Recently uploaded

MARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupMARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupJonathanParaisoCruz
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersSabitha Banu
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfMahmoud M. Sallam
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
Capitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitolTechU
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceSamikshaHamane
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17Celine George
 
Meghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentMeghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentInMediaRes1
 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaVirag Sontakke
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxAvyJaneVismanos
 
Biting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfBiting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfadityarao40181
 

Recently uploaded (20)

MARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupMARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized Group
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginners
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdf
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
Capitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptx
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in Pharmacovigilance
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17
 
ESSENTIAL of (CS/IT/IS) class 06 (database)
ESSENTIAL of (CS/IT/IS) class 06 (database)ESSENTIAL of (CS/IT/IS) class 06 (database)
ESSENTIAL of (CS/IT/IS) class 06 (database)
 
Meghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentMeghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media Component
 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of India
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptx
 
Biting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfBiting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdf
 

Interpolation Missing values.pptx

  • 2. • Interpolation in Python is a technique used to estimate unknown data points between two known data points. In Python, Interpolation is a technique mostly used to impute missing values in the data frame or series while preprocessing data. You can use this method to estimate missing data points in your data using Python in Power BI or machine learning algorithms. Interpolation is also used in Image Processing when expanding an image, where you can estimate the pixel value with the help of neighboring pixels.
  • 3. When to Use Interpolation? • We can use Interpolation to find missing value/null with the help of its neighbors. When imputing missing values with average does not fit best, we have to move to a different technique, and the technique most people find is Interpolation. • Interpolation is mostly used while working with time- series data because, in time-series data, we like to fill missing values with the previous one or two values. for example, suppose temperature, now we would always prefer to fill today’s temperature with the mean of the last 2 days, not with the mean of the month. We can also use Interpolation for calculating the moving averages.
  • 4. EXAMPLE • You can use interpolation when there is an order or a sequence and you want to estimate a missing value in the sequence. For example: Let’s say there are various classes of tickets in train travel, like, first class, second class, and so on. You would naturally expect the ticket price of the higher class to be more expensive than the lower class. • In that case, if the ticket price of an intermediate class is missing, you can use interpolation to estimate the missing value.
  • 5. Using Interpolation to Fill Missing Values in Series Data • import numpy as np • import pandas as pd • fare = {'first_class':100, • 'second_class':np.nan, • 'third_class':60, • 'open_class':20} • ser = pd.Series(fare) • ser
  • 6. OUTPUT • first_class 100.0 • second_class NaN • third_class 60.0 • open_class 20.0 • dtype: float64
  • 7. Linear Interpolation • Linear Interpolation simply means to estimate a missing value by connecting dots in a straight line in increasing order.
  • 8. Polynomial Interpolation • In Polynomial Interpolation, you need to specify an order. It means that polynomial interpolation fills missing values with the lowest possible degree that passes through available data points. The polynomial Interpolation curve is like the trigonometric sin curve or assumes it like a parabola shape. • a.interpolate(method="polynomial", order=2)
  • 9. Interpolation padding • Interpolation with the help of padding simply means filling missing values with the same value present above them in the dataset. If the missing value is in the first row, then this method will not work. While using this technique, you also need to specify the limit, which means how many NaN values to fill. • So, if you are working on a real-world project and want to fill missing values with previous values, you have to specify the limit as to the number of rows in the dataset. • a.interpolate(method="pad", limit=2)
  • 10. Application • These uses of interpolation include: Help users to determine what data might exist outside of their collected data. Similarly, for scientists, engineers, photographers and mathematicians to fit the data for analysing the trend and so on.
  • 11. discretization • Data discretization is the process of converting continuous data into discrete buckets by grouping it. Discretization is also known for easy maintainability of the data. Training a model with discrete data becomes faster and more effective than when attempting the same with continuous data. Although continuous- valued data contains more information, huge amounts of data can slow the model down. Here, discretization can help us strike a balance between both. Some famous methods of data discretization are binning and using a histogram. Although data discretization is useful, we need to effectively pick the range of each bucket, which is a challenge.
  • 12. • Here we make use of a function called pandas.cut(). This function is useful to achieve the bucketing and sorting of segmented data.
  • 13. • Perform bucketing using the pd.cut() function on the marks column and display the top 10 columns. The cut() function takes parameters such as x, bins, and labels. Here, we have used only three parameters. Add the following code to implement this:df['bucket']=pd.cut(df['marks'],5,labels=['Po or','Below_average','Average','Above_Average',' Excellent']) • df.head(10)
  • 14. Binning • Data binning, which is also known as bucketing or discretization, is a technique used in data processing and statistics. • Binning can be used for example, if there are more possible data points than observed data points.
  • 15. • Bins do not necessarily have to be numerical, they can be categorical values of any kind, like "dogs", "cats",and so on. • Binning is also used in image processing, binning. It can be used to reduce the amount of data, by combining neighboring pixel into single pixels. kxk binning reduces areas of k x k pixels into single pixel. • Pandas provides easy ways to create bins and to bin data
  • 16. Example of Binning • import pandas as pd • df = pd.DataFrame.from_dict({ • 'Name': ['Ray', 'Jane', 'Kate', 'Nik', 'Autumn', 'Kasi', 'Mandeep', 'Evan', 'Kyra', 'Jim'], • 'Age': [12, 7, 33, 34, 45, 65, 77, 11, 32, 55] • }) • print(df.head())
  • 18. • df['Age Groups'] = pd.qcut(df['Age'], 4) • print(df.head())
  • 19.
  • 20. Outlier Detection • Outliers are an important part of a dataset. They can hold useful information about your data. • In simple terms, an outlier is an extremely high or extremely low data point relative to the nearest data point and the rest of the neighboring co- existing values in a data graph or dataset you're working with. • Outliers are extreme values that stand out greatly from the overall pattern of values in a dataset or graph.
  • 21. How to Identify an Outlier in a Dataset • outlier < Q1 - 1.5(IQR) • outlier > Q3 + 1.5(IQR)
  • 22. Random Sampling With and without Replacement. • Sample() is an inbuilt function of random module in Python that returns a particular length list of items chosen from the sequence i.e. list, tuple, string or set. Used for random sampling without replacement. • Syntax : random.sample(sequence, k) • Parameters: sequence: Can be a list, tuple, string, or set. k: An Integer value, it specify the length of a sample. • Returns: k length new list of elements chosen from the sequence. •
  • 23. • Randomly selecting records from a large data set may be helpful if your data set is so large as to prevent or slow processing, or if one is conducting a survey and needs to select a random sample from some master database. When you select records randomly from a larger data set (or some master database), you can achieve the sampling in a few different ways, including: • sampling without replacement, in which a subset of the observations are selected randomly, and once an observation is selected it cannot be selected again. • sampling with replacement, in which a subset of observations are selected randomly, and an observation may be selected more than once.
  • 24. • Sampling with replacement: • Consider a population of potato sacks, each of which has either 12, 13, 14, 15, 16, 17, or 18 potatoes, and all the values are equally likely. Suppose that, in this population, there is exactly one sack with each number. So the whole population has seven sacks. If I sample two with replacement, then I first pick one (say 14). I had a 1/7 probability of choosing that one. Then I replace it. Then I pick another. Every one of them still has 1/7 probability of being chosen. And there are exactly 49 different possibilities here (assuming we distinguish between the first and second.) They are: (12,12), (12,13), (12, 14), (12,15), (12,16), (12,17), (12,18), (13,12), (13,13), (13,14), etc.
  • 25. • Sampling without replacement: • Consider the same population of potato sacks, each of which has either 12, 13, 14, 15, 16, 17, or 18 potatoes, and all the values are equally likely. Suppose that, in this population, there is exactly one sack with each number. So the whole population has seven sacks. If I sample two without replacement, then I first pick one (say 14). I had a 1/7 probability of choosing that one. Then I pick another. At this point, there are only six possibilities: 12, 13, 15, 16, 17, and 18. So there are only 42 different possibilities here (again assuming that we distinguish between the first and the second.) They are: (12,13), (12,14), (12,15), (12,16), (12,17), (12,18), (13,12), (13,14), (13,15), etc.