Outlier Detection

Contents
Outlier Detection
Types of outliers
Common causes of outliers
Methods for outlier detection

Outlier Detection
• Observation which deviates so much from other observations
as to arouse suspicion it was generated by a different
mechanism
•
An outlier may be defined as a piece of data or observation
that deviates drastically from the given norm or average of the
data set.
• An outlier is a data point that differs
significantly from other observations.

Types of outliers
Outliers can be of two kinds:
univariate and multivariate.
Univariate outliers can be found when looking at a distribution
of values in a single feature space.
Multivariate outliers can be found in a n-dimensional space (of
n-features). Looking at distributions in n-dimensional spaces
can be very difficult for the human brain, that is why we need
to train a model

Common causes of outliers
• Data entry errors (human errors)
• Measurement errors (instrument errors)
• Experimental errors (data extraction or experiment planning/executing errors)
• Intentional (dummy outliers made to test detection methods)
• Data processing errors (data manipulation or data set unintended mutations)
• Sampling errors (extracting or mixing data from wrong or various sources)
• Natural (not an error, novelties in data)

Methods for outlier detection
• Z-Score or Extreme Value Analysis (parametric)
• Probabilistic and Statistical Modeling (parametric)
• Linear Regression Models (PCA, LMS)
• Proximity Based Models (non-parametric)
• Information Theory Models
• High Dimensional Outlier Detection Methods (high dimensional
sparse data)

References
https://towardsdatascience.com/
https://www.techopedia.com/

Outlier Detection

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Outlier Detection

Similar to Outlier Detection (20)

More from Dr. Abdul Ahad Abro

More from Dr. Abdul Ahad Abro (10)

Recently uploaded

Recently uploaded (20)

Outlier Detection