1
Sachin Pathania
OUTLIER DETECTION AND TREATMENT
Introduction to Outlier Treatment
Outlier Treatment is one of the important part of data pre-processing is the
handling outlier. If the data contains outliers that can affect our result which
will depend on the data. So to remove these outliers from data Outlier
Treatment is used. At first, need to understand what outliers is.
What is Outliers?
An outlier is a value that behaves differently than other observations or we can
say
“A value that lies outside the data”
2
Example: A new coach has been working with the Long Jump team this month,
and the athletes' performance has changed.
 Augustus: +0.15m
 Tom: +0.11m
 June: +0.06m
 Carol: +0.06m
 Bob: + 0.12m
 Sam: -0.56m
So here, Sam is an outlier
Here are the results on the number line:
Following are two process to remove their outliers:-
 Interquartile Range ( IQR )
 Z-Score
But here I’m only using IQR.
Interquartile Range (IQR)
Interquartile Range (IQR) equally divides the distribution into four equal parts
called quartiles. It takes data into account the most of the value lies in that
region, it used a box plot to detect the outliers in data.
3
The following parameter is used to identify the IQR range:
 1st quartile (Q1) is 25%
 3rd quartile (Q3) is 75%
 2nd quartile (Q2) divides the distribution into two equal parts of 50%.
So, basically it is the same as Median.
The interquartile range is defined as the difference between the third and the
first quartile in other words, IQR equals Q3 minus Q1
Formula: - IQR = Q3 - Q1
4
Identify the Outliers Using IQR Method
As per a rule of thumb, observations can be qualified as outliers when they lie
more than 1.5 IQR below the first quartile or 1.5 IQR above the third quartile.
Outliers are values that “lie outside” the other values.
LB = Q1 – 1.5 * IQR
UB = Q3 + 1.5 * IQR
Outlier Treatment using IQR in Python:
5
Using IQR:
6
Calculate Lower Bound and Upper Bond values to remove outliers:
Removing Outliers:

DATA SCIENCE - Outlier detection and treatment_ sachin pathania

  • 1.
    1 Sachin Pathania OUTLIER DETECTIONAND TREATMENT Introduction to Outlier Treatment Outlier Treatment is one of the important part of data pre-processing is the handling outlier. If the data contains outliers that can affect our result which will depend on the data. So to remove these outliers from data Outlier Treatment is used. At first, need to understand what outliers is. What is Outliers? An outlier is a value that behaves differently than other observations or we can say “A value that lies outside the data”
  • 2.
    2 Example: A newcoach has been working with the Long Jump team this month, and the athletes' performance has changed.  Augustus: +0.15m  Tom: +0.11m  June: +0.06m  Carol: +0.06m  Bob: + 0.12m  Sam: -0.56m So here, Sam is an outlier Here are the results on the number line: Following are two process to remove their outliers:-  Interquartile Range ( IQR )  Z-Score But here I’m only using IQR. Interquartile Range (IQR) Interquartile Range (IQR) equally divides the distribution into four equal parts called quartiles. It takes data into account the most of the value lies in that region, it used a box plot to detect the outliers in data.
  • 3.
    3 The following parameteris used to identify the IQR range:  1st quartile (Q1) is 25%  3rd quartile (Q3) is 75%  2nd quartile (Q2) divides the distribution into two equal parts of 50%. So, basically it is the same as Median. The interquartile range is defined as the difference between the third and the first quartile in other words, IQR equals Q3 minus Q1 Formula: - IQR = Q3 - Q1
  • 4.
    4 Identify the OutliersUsing IQR Method As per a rule of thumb, observations can be qualified as outliers when they lie more than 1.5 IQR below the first quartile or 1.5 IQR above the third quartile. Outliers are values that “lie outside” the other values. LB = Q1 – 1.5 * IQR UB = Q3 + 1.5 * IQR Outlier Treatment using IQR in Python:
  • 5.
  • 6.
    6 Calculate Lower Boundand Upper Bond values to remove outliers: Removing Outliers: