SlideShare a Scribd company logo
1
Sachin Pathania
OUTLIER DETECTION AND TREATMENT
Introduction to Outlier Treatment
Outlier Treatment is one of the important part of data pre-processing is the
handling outlier. If the data contains outliers that can affect our result which
will depend on the data. So to remove these outliers from data Outlier
Treatment is used. At first, need to understand what outliers is.
What is Outliers?
An outlier is a value that behaves differently than other observations or we can
say
“A value that lies outside the data”
2
Example: A new coach has been working with the Long Jump team this month,
and the athletes' performance has changed.
 Augustus: +0.15m
 Tom: +0.11m
 June: +0.06m
 Carol: +0.06m
 Bob: + 0.12m
 Sam: -0.56m
So here, Sam is an outlier
Here are the results on the number line:
Following are two process to remove their outliers:-
 Interquartile Range ( IQR )
 Z-Score
But here I’m only using IQR.
Interquartile Range (IQR)
Interquartile Range (IQR) equally divides the distribution into four equal parts
called quartiles. It takes data into account the most of the value lies in that
region, it used a box plot to detect the outliers in data.
3
The following parameter is used to identify the IQR range:
 1st quartile (Q1) is 25%
 3rd quartile (Q3) is 75%
 2nd quartile (Q2) divides the distribution into two equal parts of 50%.
So, basically it is the same as Median.
The interquartile range is defined as the difference between the third and the
first quartile in other words, IQR equals Q3 minus Q1
Formula: - IQR = Q3 - Q1
4
Identify the Outliers Using IQR Method
As per a rule of thumb, observations can be qualified as outliers when they lie
more than 1.5 IQR below the first quartile or 1.5 IQR above the third quartile.
Outliers are values that “lie outside” the other values.
LB = Q1 – 1.5 * IQR
UB = Q3 + 1.5 * IQR
Outlier Treatment using IQR in Python:
5
Using IQR:
6
Calculate Lower Bound and Upper Bond values to remove outliers:
Removing Outliers:

More Related Content

What's hot

Imputation Techniques For Market Research Datasets With Missing Values
Imputation Techniques For Market Research Datasets With Missing Values Imputation Techniques For Market Research Datasets With Missing Values
Imputation Techniques For Market Research Datasets With Missing Values
Salford Systems
 
Missing Data and Causes
Missing Data and CausesMissing Data and Causes
Missing Data and Causes
akanni azeez olamide
 
Machine learning session1
Machine learning   session1Machine learning   session1
Machine learning session1
Abhimanyu Dwivedi
 
Basics mathematical modeling
Basics mathematical modelingBasics mathematical modeling
Basics mathematical modelingcyndy
 
Biostatistics Workshop: Missing Data
Biostatistics Workshop: Missing DataBiostatistics Workshop: Missing Data
Biostatistics Workshop: Missing Data
HopkinsCFAR
 
Wisconsin hospital - Healthcare Cost Prediction
Wisconsin hospital - Healthcare Cost PredictionWisconsin hospital - Healthcare Cost Prediction
Wisconsin hospital - Healthcare Cost Prediction
Prasann Prem
 
Machine learning session5(logistic regression)
Machine learning   session5(logistic regression)Machine learning   session5(logistic regression)
Machine learning session5(logistic regression)
Abhimanyu Dwivedi
 
Assumptions of Linear Regression - Machine Learning
Assumptions of Linear Regression - Machine LearningAssumptions of Linear Regression - Machine Learning
Assumptions of Linear Regression - Machine Learning
Kush Kulshrestha
 
2.2 measurements, estimations and errors(part 2)
2.2   measurements, estimations and errors(part 2)2.2   measurements, estimations and errors(part 2)
2.2 measurements, estimations and errors(part 2)
Raechel Lim
 
Data Preparation with the help of Analytics Methodology
Data Preparation with the help of Analytics MethodologyData Preparation with the help of Analytics Methodology
Data Preparation with the help of Analytics Methodology
Rupak Roy
 
Machine Learning Algorithm - Linear Regression
Machine Learning Algorithm - Linear RegressionMachine Learning Algorithm - Linear Regression
Machine Learning Algorithm - Linear Regression
Kush Kulshrestha
 
Statswork- Lecture:1: Structural Equation Modeling (SEM) using AMOS (www.stat...
Statswork- Lecture:1: Structural Equation Modeling (SEM) using AMOS (www.stat...Statswork- Lecture:1: Structural Equation Modeling (SEM) using AMOS (www.stat...
Statswork- Lecture:1: Structural Equation Modeling (SEM) using AMOS (www.stat...
Stats Statswork
 
Multivariate adaptive regression splines
Multivariate adaptive regression splinesMultivariate adaptive regression splines
Multivariate adaptive regression splines
Eklavya Gupta
 
Regression
RegressionRegression
Regression
Rohit Sharma
 
Computer Applications in Business
Computer Applications in Business Computer Applications in Business
Computer Applications in Business
FATIMA
 

What's hot (18)

Chapter 2
Chapter 2Chapter 2
Chapter 2
 
Imputation Techniques For Market Research Datasets With Missing Values
Imputation Techniques For Market Research Datasets With Missing Values Imputation Techniques For Market Research Datasets With Missing Values
Imputation Techniques For Market Research Datasets With Missing Values
 
Missing Data and Causes
Missing Data and CausesMissing Data and Causes
Missing Data and Causes
 
Machine learning session1
Machine learning   session1Machine learning   session1
Machine learning session1
 
Graphing Notes
Graphing NotesGraphing Notes
Graphing Notes
 
Basics mathematical modeling
Basics mathematical modelingBasics mathematical modeling
Basics mathematical modeling
 
Biostatistics Workshop: Missing Data
Biostatistics Workshop: Missing DataBiostatistics Workshop: Missing Data
Biostatistics Workshop: Missing Data
 
Wisconsin hospital - Healthcare Cost Prediction
Wisconsin hospital - Healthcare Cost PredictionWisconsin hospital - Healthcare Cost Prediction
Wisconsin hospital - Healthcare Cost Prediction
 
Machine learning session5(logistic regression)
Machine learning   session5(logistic regression)Machine learning   session5(logistic regression)
Machine learning session5(logistic regression)
 
Assumptions of Linear Regression - Machine Learning
Assumptions of Linear Regression - Machine LearningAssumptions of Linear Regression - Machine Learning
Assumptions of Linear Regression - Machine Learning
 
2.2 measurements, estimations and errors(part 2)
2.2   measurements, estimations and errors(part 2)2.2   measurements, estimations and errors(part 2)
2.2 measurements, estimations and errors(part 2)
 
Data Preparation with the help of Analytics Methodology
Data Preparation with the help of Analytics MethodologyData Preparation with the help of Analytics Methodology
Data Preparation with the help of Analytics Methodology
 
Correlation analysis
Correlation analysis Correlation analysis
Correlation analysis
 
Machine Learning Algorithm - Linear Regression
Machine Learning Algorithm - Linear RegressionMachine Learning Algorithm - Linear Regression
Machine Learning Algorithm - Linear Regression
 
Statswork- Lecture:1: Structural Equation Modeling (SEM) using AMOS (www.stat...
Statswork- Lecture:1: Structural Equation Modeling (SEM) using AMOS (www.stat...Statswork- Lecture:1: Structural Equation Modeling (SEM) using AMOS (www.stat...
Statswork- Lecture:1: Structural Equation Modeling (SEM) using AMOS (www.stat...
 
Multivariate adaptive regression splines
Multivariate adaptive regression splinesMultivariate adaptive regression splines
Multivariate adaptive regression splines
 
Regression
RegressionRegression
Regression
 
Computer Applications in Business
Computer Applications in Business Computer Applications in Business
Computer Applications in Business
 

Similar to DATA SCIENCE - Outlier detection and treatment_ sachin pathania

Most prominent methods of how to find outliers in statistics
Most prominent methods of how to find outliers in statisticsMost prominent methods of how to find outliers in statistics
Most prominent methods of how to find outliers in statistics
Stat Analytica
 
What is Outlier Analysis and How Can It Improve Analysis?
What is Outlier Analysis and How Can It Improve Analysis?What is Outlier Analysis and How Can It Improve Analysis?
What is Outlier Analysis and How Can It Improve Analysis?
Smarten Augmented Analytics
 
outliers
outliersoutliers
outliers
ARPAN PAUL.
 
Box and-whisker-plots
Box and-whisker-plotsBox and-whisker-plots
Box and-whisker-plots
Ajay Gupta
 
EDA and Preprocessing in Tabular and Text data .pptx
EDA and Preprocessing in Tabular and Text data .pptxEDA and Preprocessing in Tabular and Text data .pptx
EDA and Preprocessing in Tabular and Text data .pptx
Brajkishore23
 
Data Science Interview Questions | Data Science Interview Questions And Answe...
Data Science Interview Questions | Data Science Interview Questions And Answe...Data Science Interview Questions | Data Science Interview Questions And Answe...
Data Science Interview Questions | Data Science Interview Questions And Answe...
Simplilearn
 
A Study on the Short Run Relationship b/w Major Economic Indicators of US Eco...
A Study on the Short Run Relationship b/w Major Economic Indicators of US Eco...A Study on the Short Run Relationship b/w Major Economic Indicators of US Eco...
A Study on the Short Run Relationship b/w Major Economic Indicators of US Eco...
aurkoiitk
 
Machine Learning Clustering
Machine Learning ClusteringMachine Learning Clustering
Machine Learning Clustering
Rupak Roy
 
Outlier detection by Ueda's method
Outlier detection by Ueda's methodOutlier detection by Ueda's method
Outlier detection by Ueda's method
POOJA PATIL
 
Chapter 02 describing distributions with numbers part II
Chapter 02 describing distributions with numbers part IIChapter 02 describing distributions with numbers part II
Chapter 02 describing distributions with numbers part II
Hamdy F. F. Mahmoud
 
Dealing with Outliers
Dealing with OutliersDealing with Outliers
Dealing with Outliers
Sunil Kumar Sharma
 
Application of Machine Learning in Agriculture
Application of Machine  Learning in AgricultureApplication of Machine  Learning in Agriculture
Application of Machine Learning in Agriculture
Aman Vasisht
 
ap_stat_1.3.ppt
ap_stat_1.3.pptap_stat_1.3.ppt
ap_stat_1.3.ppt
fghgjd
 
Risk management
Risk managementRisk management
Risk managementSunam Pal
 
Chapter 07 - Autocorrelation.pptx
Chapter 07 - Autocorrelation.pptxChapter 07 - Autocorrelation.pptx
Chapter 07 - Autocorrelation.pptx
Farah Amir
 
Physics 1.2b Errors and Uncertainties
Physics 1.2b Errors and UncertaintiesPhysics 1.2b Errors and Uncertainties
Physics 1.2b Errors and Uncertainties
JohnPaul Kennedy
 
Detecting Assignable Signals via Decomposition of MEWMA Statistic
Detecting Assignable Signals via Decomposition of MEWMA StatisticDetecting Assignable Signals via Decomposition of MEWMA Statistic
Detecting Assignable Signals via Decomposition of MEWMA Statistic
inventionjournals
 

Similar to DATA SCIENCE - Outlier detection and treatment_ sachin pathania (20)

Most prominent methods of how to find outliers in statistics
Most prominent methods of how to find outliers in statisticsMost prominent methods of how to find outliers in statistics
Most prominent methods of how to find outliers in statistics
 
What is Outlier Analysis and How Can It Improve Analysis?
What is Outlier Analysis and How Can It Improve Analysis?What is Outlier Analysis and How Can It Improve Analysis?
What is Outlier Analysis and How Can It Improve Analysis?
 
Outliers introductory stat
Outliers introductory statOutliers introductory stat
Outliers introductory stat
 
Outlier
OutlierOutlier
Outlier
 
outliers
outliersoutliers
outliers
 
Box and-whisker-plots
Box and-whisker-plotsBox and-whisker-plots
Box and-whisker-plots
 
EDA and Preprocessing in Tabular and Text data .pptx
EDA and Preprocessing in Tabular and Text data .pptxEDA and Preprocessing in Tabular and Text data .pptx
EDA and Preprocessing in Tabular and Text data .pptx
 
Data Science Interview Questions | Data Science Interview Questions And Answe...
Data Science Interview Questions | Data Science Interview Questions And Answe...Data Science Interview Questions | Data Science Interview Questions And Answe...
Data Science Interview Questions | Data Science Interview Questions And Answe...
 
A Study on the Short Run Relationship b/w Major Economic Indicators of US Eco...
A Study on the Short Run Relationship b/w Major Economic Indicators of US Eco...A Study on the Short Run Relationship b/w Major Economic Indicators of US Eco...
A Study on the Short Run Relationship b/w Major Economic Indicators of US Eco...
 
Machine Learning Clustering
Machine Learning ClusteringMachine Learning Clustering
Machine Learning Clustering
 
Outlier detection by Ueda's method
Outlier detection by Ueda's methodOutlier detection by Ueda's method
Outlier detection by Ueda's method
 
Chapter 02 describing distributions with numbers part II
Chapter 02 describing distributions with numbers part IIChapter 02 describing distributions with numbers part II
Chapter 02 describing distributions with numbers part II
 
Dealing with Outliers
Dealing with OutliersDealing with Outliers
Dealing with Outliers
 
Application of Machine Learning in Agriculture
Application of Machine  Learning in AgricultureApplication of Machine  Learning in Agriculture
Application of Machine Learning in Agriculture
 
ap_stat_1.3.ppt
ap_stat_1.3.pptap_stat_1.3.ppt
ap_stat_1.3.ppt
 
Risk management
Risk managementRisk management
Risk management
 
Errors in measurement
Errors in measurementErrors in measurement
Errors in measurement
 
Chapter 07 - Autocorrelation.pptx
Chapter 07 - Autocorrelation.pptxChapter 07 - Autocorrelation.pptx
Chapter 07 - Autocorrelation.pptx
 
Physics 1.2b Errors and Uncertainties
Physics 1.2b Errors and UncertaintiesPhysics 1.2b Errors and Uncertainties
Physics 1.2b Errors and Uncertainties
 
Detecting Assignable Signals via Decomposition of MEWMA Statistic
Detecting Assignable Signals via Decomposition of MEWMA StatisticDetecting Assignable Signals via Decomposition of MEWMA Statistic
Detecting Assignable Signals via Decomposition of MEWMA Statistic
 

Recently uploaded

一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
benishzehra469
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
vcaxypu
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
MaleehaSheikh2
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
balafet
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
pchutichetpong
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Boston Institute of Analytics
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
ukgaet
 

Recently uploaded (20)

一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 

DATA SCIENCE - Outlier detection and treatment_ sachin pathania

  • 1. 1 Sachin Pathania OUTLIER DETECTION AND TREATMENT Introduction to Outlier Treatment Outlier Treatment is one of the important part of data pre-processing is the handling outlier. If the data contains outliers that can affect our result which will depend on the data. So to remove these outliers from data Outlier Treatment is used. At first, need to understand what outliers is. What is Outliers? An outlier is a value that behaves differently than other observations or we can say “A value that lies outside the data”
  • 2. 2 Example: A new coach has been working with the Long Jump team this month, and the athletes' performance has changed.  Augustus: +0.15m  Tom: +0.11m  June: +0.06m  Carol: +0.06m  Bob: + 0.12m  Sam: -0.56m So here, Sam is an outlier Here are the results on the number line: Following are two process to remove their outliers:-  Interquartile Range ( IQR )  Z-Score But here I’m only using IQR. Interquartile Range (IQR) Interquartile Range (IQR) equally divides the distribution into four equal parts called quartiles. It takes data into account the most of the value lies in that region, it used a box plot to detect the outliers in data.
  • 3. 3 The following parameter is used to identify the IQR range:  1st quartile (Q1) is 25%  3rd quartile (Q3) is 75%  2nd quartile (Q2) divides the distribution into two equal parts of 50%. So, basically it is the same as Median. The interquartile range is defined as the difference between the third and the first quartile in other words, IQR equals Q3 minus Q1 Formula: - IQR = Q3 - Q1
  • 4. 4 Identify the Outliers Using IQR Method As per a rule of thumb, observations can be qualified as outliers when they lie more than 1.5 IQR below the first quartile or 1.5 IQR above the third quartile. Outliers are values that “lie outside” the other values. LB = Q1 – 1.5 * IQR UB = Q3 + 1.5 * IQR Outlier Treatment using IQR in Python:
  • 6. 6 Calculate Lower Bound and Upper Bond values to remove outliers: Removing Outliers: