SlideShare a Scribd company logo
Yashwantrao Chavan Institute of Science,
Satara
Department of Statistics
M.Sc.1 2017-2018
Seminar on
Automatic detection of discordant
outliers via the Ueda’s method
Marmolejo-Ramos et al. Journal of Statistical Distributions and
Applications (2015) 2:8
Presented by,
Patil Pooja Rajaram
Roll No. 115
Content:
 Introduction
 What is an outlier?
 Methodology
 Procedure
 How to interpret the results
 Examples
 Conclusion
 References
Introduction:
The importance of identifying outliers in a data set is
well known. Although various outlier detection methods have
been proposed in order to enable reliable inferences regarding
a data set, a simple but less known method has been proposed
by Ueda. the method proposed by Ueda which still assumes
the underlying data to be normally distributed and based on
the Akaike’s Information Criterion (AIC), presents interesting
features.
The aim of this paper is to study the performance and
robustness of the Ueda’s method to detect outliers via
computer simulations, as well as determine its applicability to
other types of data.
 What is an Outlier?
An outlier is an observation that lies an
abnormal distance from other values in a random
sample from a population. Outliers are generally
viewed as data values which are numerically
distant from the bulk of the sample.
Importance of identifying outliers:
By definition, outliers are points that are
distant from remaining observations. As a result
they can potentially skew or bias any analysis
performed on the data set. Thus it is very
important to detect and adequately deal with
outliers.
 Methodology:
Let X be a random variable with probability distribution f(x), and let
x1, x2, . . . , xn, xn+1, . . . , xn+s−1, xN be a random sample of size N = n + s from this
distribution, with n and s = {1, 2, 3, . . .} being the number of regular and outlying
observations, respectively.
The Ueda’s method is a simplified version of the use of the Akaike
Information Criterion (AIC).
AIC = 2k − 2 ln(L), (1)
In the expression above, k is the number of independently adjusted parameters and
L is the maximum likelihood value for n number of regular observations. In the
normal distribution, the AIC becomes:
AIC = 2n ln σ − 2 lnn!+2s (2)
Ueda used the correction factor; 2𝑠
𝑙𝑛𝑛!
𝑛
which still depends on n, but includes the
number of outliers s in the full sample. Based on this, the proposed test statistic to
identify outliers using this method is given by,
Ut =
1
2
AIC
≈ nlnσ - 2𝑠
𝑙𝑛𝑛!
𝑛
=(N-s) lnσ - 2𝑠
𝑙𝑛𝑛!
𝑛
(3)
 CalculatingUt :
the following procedure is suggested to determine outliers in a
full sample:
1. Order the full sample to obtain xord =x(1), x(2), . . . , x(N), with x(i) being
the i th order statistic (i = 1, 2, . . . ,N).
2. Calculate the z -score for each x(i) as,
Z(i) =
𝑥 𝑖− 𝑥
𝑠
Where, 𝑠 = 𝑖=1
𝑁
(𝑥𝑖− 𝑥)
𝑁−1
3. Provided a number of steps s ≥ 0, calculate the test statistic Ut as in
Eq. (3).Observe that s = 0 means that no outliers are present in the data.
Furthermore, as the original sample is ordered, s > 0 also implies that
the outliers are at either end of the sample.
4. Suppose that s = 1. It could be the case that the outlier is located
either at the beginning of the ordered sample, i.e. x(1) is the outlier or at
the end of it, i.e. x(N) is the outlier. If the former, then the test statistic Ut
is calculated as in Eq. (3) after removing x(1) from the ordered sample.
If the latter, x(N) is removed and Ut subsequently calculated.
5. Step 4 continues for other values of s > 1. As a result of this process,
a collection of Ut values is obtained.
 Interpretingthe Ut statistic:
The previous section described how the test statistic Ut is
calculated provided a value of s≥0. One question that remains to be
answered is how many values of Ut need to be calculated. Let C be such a
number and suppose that up to s outliers are to be detected in the sample.
Then, from the collection of Ut values obtained in step 5, the following
matrix is constructed:
Ut =
𝑈00
𝑈10
𝑈20
⋮
𝑈 𝑚0
𝑈01
𝑈11
𝑈21
⋮
𝑈 𝑚1
𝑈02
𝑈12
𝑈22
⋮
𝑈 𝑚2
⋯
⋯
⋯
⋯
⋯
𝑈0𝑚
𝑈1𝑚
𝑈2𝑚
⋮
𝑈 𝑚𝑚
Where m=s+1, U00 is Ut statistic when no outliers are detected
and Uij is the Ut statistic when i and j outliers are detected in the lower
and upper tails of xord. From Ut it is clear that the number of calculations
to detect up to s outliers in xord is atmost 𝑚2.
However, this number can be reduced to 𝑚2 − 2 if U00 and
Umm, included for convenience and comparison purposes, are not
calculated. Hence, 𝑚2
− 2 < 𝑐 < 𝑚2
.
 Example 1:
Data (sample size = N = 10): 2.02, 2.22, 3.04, 3.23, 3.59,
3.73, 3.94, 4.05, 4.11, 4.13
 Conclusion:
Here, first two observations are outliers hence
they should be removed from data.
• Example 2:
Situation in which no outlier is present and Ut obtained:
Data (sample size = 10): 5.4, 5.4, 5.5, 5.7, 5.8, 5.9, 6.0, 6.1,
6.3, 6.4
• Conclusion:
From the above table, we conclude that there do
not present any outlier in given data.
Example 3 : (Using simulated data)
Normally distributed data.
Consider a random sample x1, x2, . . . , x25 from a
N(300, 102) distribution, and a second random sample
y1, y2, . . . , y5 of outlier observations from a N(400, 52)
distribution. Then, the full sample of size N = n + s = 30
will be given by x1, x2, . . . , x25, y1, y2, . . . , y5.
Although the true number of outliers is s = 5,
smax is set to 10 so up to 10 observations are automatically
tested for their potentiality to be outliers using the Ueda’s
method. As shown in Fig.a, the number of outliers
detected in the full sample is s = 5.
sObservations
• Conclusion:
the outlying observations were sampled from normal
distributions and placed on the right tail of the normal
distribution.
• Graphical representation of example 3:
• Conclusion:
The Ueda’s method is an outlier detection method
that focuses on the symmetry and skewness of the
distribution in order to detect outlying data points. This
method is highly sensitive to outliers when the distribution
under analysis is negatively-, positively-skewed or
asymmetric about the centre of the distribution and such
sensitivity is enhanced by an increase in sample size.
• The advantages of the Ueda’s method are that:
i) It is easy to calculate.
ii) It does not require determining the number of potential
outliers in advance.
iii) It does not depend on tables or charts.
iv) It effectively signals the absence of outliers, and
v) It can be used with large sample sizes.
References :
Automatic detection of discordant outliers via
the Ueda’s method
(Marmolejo-Ramos et al. Journal of Statistical
Distributions and Applications (2015) 2:8)
THANk
YOU ……

More Related Content

What's hot

2 4.devianceと尤度比検定
2 4.devianceと尤度比検定2 4.devianceと尤度比検定
2 4.devianceと尤度比検定
logics-of-blue
 
ベイズ主義による研究の報告方法
ベイズ主義による研究の報告方法ベイズ主義による研究の報告方法
ベイズ主義による研究の報告方法
Masaru Tokuoka
 
ベイズ統計入門
ベイズ統計入門ベイズ統計入門
ベイズ統計入門Miyoshi Yuya
 
20190609 bayes ml ch3
20190609 bayes ml ch320190609 bayes ml ch3
20190609 bayes ml ch3
Yoichi Tokita
 
MCMCとともだちになろう【※Docswellにも同じものを上げています】
MCMCとともだちになろう【※Docswellにも同じものを上げています】MCMCとともだちになろう【※Docswellにも同じものを上げています】
MCMCとともだちになろう【※Docswellにも同じものを上げています】
Hiroyuki Muto
 
PRML2.1 2.2
PRML2.1 2.2PRML2.1 2.2
PRML2.1 2.2
Takuto Kimura
 
PRML 4.1.6-4.2.2
PRML 4.1.6-4.2.2PRML 4.1.6-4.2.2
PRML 4.1.6-4.2.2
kazunori sakai
 
2014.01.23 prml勉強会4.2確率的生成モデル
2014.01.23 prml勉強会4.2確率的生成モデル2014.01.23 prml勉強会4.2確率的生成モデル
2014.01.23 prml勉強会4.2確率的生成モデルTakeshi Sakaki
 
StanとRでベイズ統計モデリング読書会Ch.9
StanとRでベイズ統計モデリング読書会Ch.9StanとRでベイズ統計モデリング読書会Ch.9
StanとRでベイズ統計モデリング読書会Ch.9
考司 小杉
 
PRML読書会1スライド(公開用)
PRML読書会1スライド(公開用)PRML読書会1スライド(公開用)
PRML読書会1スライド(公開用)
tetsuro ito
 
PRML読み会第一章
PRML読み会第一章PRML読み会第一章
PRML読み会第一章
Takushi Miki
 
ハミルトニアンモンテカルロ法についての説明
ハミルトニアンモンテカルロ法についての説明ハミルトニアンモンテカルロ法についての説明
ハミルトニアンモンテカルロ法についての説明
KCS Keio Computer Society
 
GEE(一般化推定方程式)の理論
GEE(一般化推定方程式)の理論GEE(一般化推定方程式)の理論
GEE(一般化推定方程式)の理論
Koichiro Gibo
 
ガウス過程回帰の導出 ( GPR : Gaussian Process Regression )
ガウス過程回帰の導出 ( GPR : Gaussian Process Regression )ガウス過程回帰の導出 ( GPR : Gaussian Process Regression )
ガウス過程回帰の導出 ( GPR : Gaussian Process Regression )
Kenji Urai
 
PRML第3章@京大PRML輪講
PRML第3章@京大PRML輪講PRML第3章@京大PRML輪講
PRML第3章@京大PRML輪講
Sotetsu KOYAMADA(小山田創哲)
 
DS LT祭り 「AUCが0.01改善したって どういうことですか?」
DS LT祭り 「AUCが0.01改善したって どういうことですか?」DS LT祭り 「AUCが0.01改善したって どういうことですか?」
DS LT祭り 「AUCが0.01改善したって どういうことですか?」
Ken'ichi Matsui
 
星野「調査観察データの統計科学」第1&2章
星野「調査観察データの統計科学」第1&2章星野「調査観察データの統計科学」第1&2章
星野「調査観察データの統計科学」第1&2章
Shuyo Nakatani
 
Autoencoderの実装と愉快な仲間との比較
Autoencoderの実装と愉快な仲間との比較Autoencoderの実装と愉快な仲間との比較
Autoencoderの実装と愉快な仲間との比較
YumaMatsuoka
 
分布から見た線形モデル・GLM・GLMM
分布から見た線形モデル・GLM・GLMM分布から見た線形モデル・GLM・GLMM
分布から見た線形モデル・GLM・GLMM. .
 

What's hot (20)

2 4.devianceと尤度比検定
2 4.devianceと尤度比検定2 4.devianceと尤度比検定
2 4.devianceと尤度比検定
 
ベイズ主義による研究の報告方法
ベイズ主義による研究の報告方法ベイズ主義による研究の報告方法
ベイズ主義による研究の報告方法
 
PRML 5.3-5.4
PRML 5.3-5.4PRML 5.3-5.4
PRML 5.3-5.4
 
ベイズ統計入門
ベイズ統計入門ベイズ統計入門
ベイズ統計入門
 
20190609 bayes ml ch3
20190609 bayes ml ch320190609 bayes ml ch3
20190609 bayes ml ch3
 
MCMCとともだちになろう【※Docswellにも同じものを上げています】
MCMCとともだちになろう【※Docswellにも同じものを上げています】MCMCとともだちになろう【※Docswellにも同じものを上げています】
MCMCとともだちになろう【※Docswellにも同じものを上げています】
 
PRML2.1 2.2
PRML2.1 2.2PRML2.1 2.2
PRML2.1 2.2
 
PRML 4.1.6-4.2.2
PRML 4.1.6-4.2.2PRML 4.1.6-4.2.2
PRML 4.1.6-4.2.2
 
2014.01.23 prml勉強会4.2確率的生成モデル
2014.01.23 prml勉強会4.2確率的生成モデル2014.01.23 prml勉強会4.2確率的生成モデル
2014.01.23 prml勉強会4.2確率的生成モデル
 
StanとRでベイズ統計モデリング読書会Ch.9
StanとRでベイズ統計モデリング読書会Ch.9StanとRでベイズ統計モデリング読書会Ch.9
StanとRでベイズ統計モデリング読書会Ch.9
 
PRML読書会1スライド(公開用)
PRML読書会1スライド(公開用)PRML読書会1スライド(公開用)
PRML読書会1スライド(公開用)
 
PRML読み会第一章
PRML読み会第一章PRML読み会第一章
PRML読み会第一章
 
ハミルトニアンモンテカルロ法についての説明
ハミルトニアンモンテカルロ法についての説明ハミルトニアンモンテカルロ法についての説明
ハミルトニアンモンテカルロ法についての説明
 
GEE(一般化推定方程式)の理論
GEE(一般化推定方程式)の理論GEE(一般化推定方程式)の理論
GEE(一般化推定方程式)の理論
 
ガウス過程回帰の導出 ( GPR : Gaussian Process Regression )
ガウス過程回帰の導出 ( GPR : Gaussian Process Regression )ガウス過程回帰の導出 ( GPR : Gaussian Process Regression )
ガウス過程回帰の導出 ( GPR : Gaussian Process Regression )
 
PRML第3章@京大PRML輪講
PRML第3章@京大PRML輪講PRML第3章@京大PRML輪講
PRML第3章@京大PRML輪講
 
DS LT祭り 「AUCが0.01改善したって どういうことですか?」
DS LT祭り 「AUCが0.01改善したって どういうことですか?」DS LT祭り 「AUCが0.01改善したって どういうことですか?」
DS LT祭り 「AUCが0.01改善したって どういうことですか?」
 
星野「調査観察データの統計科学」第1&2章
星野「調査観察データの統計科学」第1&2章星野「調査観察データの統計科学」第1&2章
星野「調査観察データの統計科学」第1&2章
 
Autoencoderの実装と愉快な仲間との比較
Autoencoderの実装と愉快な仲間との比較Autoencoderの実装と愉快な仲間との比較
Autoencoderの実装と愉快な仲間との比較
 
分布から見た線形モデル・GLM・GLMM
分布から見た線形モデル・GLM・GLMM分布から見た線形モデル・GLM・GLMM
分布から見た線形モデル・GLM・GLMM
 

Similar to Outlier detection by Ueda's method

Factorial Experiments
Factorial ExperimentsFactorial Experiments
Factorial Experiments
HelpWithAssignment.com
 
Statistics assignment
Statistics assignmentStatistics assignment
Statistics assignment
Brian Miles
 
Identification of Outliersin Time Series Data via Simulation Study
Identification of Outliersin Time Series Data via Simulation StudyIdentification of Outliersin Time Series Data via Simulation Study
Identification of Outliersin Time Series Data via Simulation Study
iosrjce
 
RESIDUALS AND INFLUENCE IN NONLINEAR REGRESSION FOR REPEATED MEASUREMENT DATA
RESIDUALS AND INFLUENCE IN NONLINEAR REGRESSION FOR REPEATED MEASUREMENT DATARESIDUALS AND INFLUENCE IN NONLINEAR REGRESSION FOR REPEATED MEASUREMENT DATA
RESIDUALS AND INFLUENCE IN NONLINEAR REGRESSION FOR REPEATED MEASUREMENT DATA
orajjournal
 
Mech ma6452 snm_notes
Mech ma6452 snm_notesMech ma6452 snm_notes
Two Means, Independent Samples
Two Means, Independent SamplesTwo Means, Independent Samples
Two Means, Independent Samples
Long Beach City College
 
Undetermined Mixing Matrix Estimation Base on Classification and Counting
Undetermined Mixing Matrix Estimation Base on Classification and CountingUndetermined Mixing Matrix Estimation Base on Classification and Counting
Undetermined Mixing Matrix Estimation Base on Classification and Counting
IJRESJOURNAL
 
Ch 4 Slides.doc655444444444444445678888776
Ch 4 Slides.doc655444444444444445678888776Ch 4 Slides.doc655444444444444445678888776
Ch 4 Slides.doc655444444444444445678888776
ohenebabismark508
 
Systematic ranom sampling for slide share
Systematic ranom sampling for slide shareSystematic ranom sampling for slide share
Systematic ranom sampling for slide share
IVenkatReddyGaaru
 
3Measurements of health and disease_MCTD.pdf
3Measurements of health and disease_MCTD.pdf3Measurements of health and disease_MCTD.pdf
3Measurements of health and disease_MCTD.pdf
AmanuelDina
 
Medical statistics
Medical statisticsMedical statistics
Medical statistics
Amany El-seoud
 
econometría pruebas especificación
econometría pruebas especificacióneconometría pruebas especificación
econometría pruebas especificación
JamesMAlvaradoTolent
 
Chi square test
Chi square test Chi square test
Chi square test
Dr.Syam Chandran.C
 
Statistical techniques used in measurement
Statistical techniques used in measurementStatistical techniques used in measurement
Statistical techniques used in measurement
ShivamKhajuria3
 
Data Driven Choice of Threshold in Cepstrum Based Spectrum Estimate
Data Driven Choice of Threshold in Cepstrum Based Spectrum EstimateData Driven Choice of Threshold in Cepstrum Based Spectrum Estimate
Data Driven Choice of Threshold in Cepstrum Based Spectrum Estimate
sipij
 
Module 6: Outlier Detection for Two Sample Case
Module 6: Outlier Detection for Two Sample CaseModule 6: Outlier Detection for Two Sample Case
Module 6: Outlier Detection for Two Sample Case
Stats Statswork
 
Measures of Central Tendency.pptx
Measures of Central Tendency.pptxMeasures of Central Tendency.pptx
Measures of Central Tendency.pptx
Melba Shaya Sweety
 
Standard deviation
Standard deviationStandard deviation
Standard deviation
Sayeda Salma S.A.
 
Section6 stochastic
Section6 stochasticSection6 stochastic
Section6 stochastic
cairo university
 
SIGN TEST SLIDE.ppt
SIGN TEST SLIDE.pptSIGN TEST SLIDE.ppt
SIGN TEST SLIDE.ppt
SikoBikoAreru
 

Similar to Outlier detection by Ueda's method (20)

Factorial Experiments
Factorial ExperimentsFactorial Experiments
Factorial Experiments
 
Statistics assignment
Statistics assignmentStatistics assignment
Statistics assignment
 
Identification of Outliersin Time Series Data via Simulation Study
Identification of Outliersin Time Series Data via Simulation StudyIdentification of Outliersin Time Series Data via Simulation Study
Identification of Outliersin Time Series Data via Simulation Study
 
RESIDUALS AND INFLUENCE IN NONLINEAR REGRESSION FOR REPEATED MEASUREMENT DATA
RESIDUALS AND INFLUENCE IN NONLINEAR REGRESSION FOR REPEATED MEASUREMENT DATARESIDUALS AND INFLUENCE IN NONLINEAR REGRESSION FOR REPEATED MEASUREMENT DATA
RESIDUALS AND INFLUENCE IN NONLINEAR REGRESSION FOR REPEATED MEASUREMENT DATA
 
Mech ma6452 snm_notes
Mech ma6452 snm_notesMech ma6452 snm_notes
Mech ma6452 snm_notes
 
Two Means, Independent Samples
Two Means, Independent SamplesTwo Means, Independent Samples
Two Means, Independent Samples
 
Undetermined Mixing Matrix Estimation Base on Classification and Counting
Undetermined Mixing Matrix Estimation Base on Classification and CountingUndetermined Mixing Matrix Estimation Base on Classification and Counting
Undetermined Mixing Matrix Estimation Base on Classification and Counting
 
Ch 4 Slides.doc655444444444444445678888776
Ch 4 Slides.doc655444444444444445678888776Ch 4 Slides.doc655444444444444445678888776
Ch 4 Slides.doc655444444444444445678888776
 
Systematic ranom sampling for slide share
Systematic ranom sampling for slide shareSystematic ranom sampling for slide share
Systematic ranom sampling for slide share
 
3Measurements of health and disease_MCTD.pdf
3Measurements of health and disease_MCTD.pdf3Measurements of health and disease_MCTD.pdf
3Measurements of health and disease_MCTD.pdf
 
Medical statistics
Medical statisticsMedical statistics
Medical statistics
 
econometría pruebas especificación
econometría pruebas especificacióneconometría pruebas especificación
econometría pruebas especificación
 
Chi square test
Chi square test Chi square test
Chi square test
 
Statistical techniques used in measurement
Statistical techniques used in measurementStatistical techniques used in measurement
Statistical techniques used in measurement
 
Data Driven Choice of Threshold in Cepstrum Based Spectrum Estimate
Data Driven Choice of Threshold in Cepstrum Based Spectrum EstimateData Driven Choice of Threshold in Cepstrum Based Spectrum Estimate
Data Driven Choice of Threshold in Cepstrum Based Spectrum Estimate
 
Module 6: Outlier Detection for Two Sample Case
Module 6: Outlier Detection for Two Sample CaseModule 6: Outlier Detection for Two Sample Case
Module 6: Outlier Detection for Two Sample Case
 
Measures of Central Tendency.pptx
Measures of Central Tendency.pptxMeasures of Central Tendency.pptx
Measures of Central Tendency.pptx
 
Standard deviation
Standard deviationStandard deviation
Standard deviation
 
Section6 stochastic
Section6 stochasticSection6 stochastic
Section6 stochastic
 
SIGN TEST SLIDE.ppt
SIGN TEST SLIDE.pptSIGN TEST SLIDE.ppt
SIGN TEST SLIDE.ppt
 

Recently uploaded

20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx
Sharon Liu
 
aziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobelaziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobel
İsa Badur
 
Immersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths ForwardImmersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths Forward
Leonel Morgado
 
Oedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptxOedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptx
muralinath2
 
Eukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptxEukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptx
RitabrataSarkar3
 
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
vluwdy49
 
Shallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptxShallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptx
Gokturk Mehmet Dilci
 
The debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically youngThe debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically young
Sérgio Sacani
 
Cytokines and their role in immune regulation.pptx
Cytokines and their role in immune regulation.pptxCytokines and their role in immune regulation.pptx
Cytokines and their role in immune regulation.pptx
Hitesh Sikarwar
 
Basics of crystallography, crystal systems, classes and different forms
Basics of crystallography, crystal systems, classes and different formsBasics of crystallography, crystal systems, classes and different forms
Basics of crystallography, crystal systems, classes and different forms
MaheshaNanjegowda
 
Equivariant neural networks and representation theory
Equivariant neural networks and representation theoryEquivariant neural networks and representation theory
Equivariant neural networks and representation theory
Daniel Tubbenhauer
 
Thornton ESPP slides UK WW Network 4_6_24.pdf
Thornton ESPP slides UK WW Network 4_6_24.pdfThornton ESPP slides UK WW Network 4_6_24.pdf
Thornton ESPP slides UK WW Network 4_6_24.pdf
European Sustainable Phosphorus Platform
 
THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
THEMATIC  APPERCEPTION  TEST(TAT) cognitive abilities, creativity, and critic...THEMATIC  APPERCEPTION  TEST(TAT) cognitive abilities, creativity, and critic...
THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
Abdul Wali Khan University Mardan,kP,Pakistan
 
Compexometric titration/Chelatorphy titration/chelating titration
Compexometric titration/Chelatorphy titration/chelating titrationCompexometric titration/Chelatorphy titration/chelating titration
Compexometric titration/Chelatorphy titration/chelating titration
Vandana Devesh Sharma
 
SAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdfSAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdf
KrushnaDarade1
 
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
yqqaatn0
 
Deep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless ReproducibilityDeep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless Reproducibility
University of Rennes, INSA Rennes, Inria/IRISA, CNRS
 
8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf
by6843629
 
Micronuclei test.M.sc.zoology.fisheries.
Micronuclei test.M.sc.zoology.fisheries.Micronuclei test.M.sc.zoology.fisheries.
Micronuclei test.M.sc.zoology.fisheries.
Aditi Bajpai
 
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
Leonel Morgado
 

Recently uploaded (20)

20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx
 
aziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobelaziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobel
 
Immersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths ForwardImmersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths Forward
 
Oedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptxOedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptx
 
Eukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptxEukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptx
 
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
 
Shallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptxShallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptx
 
The debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically youngThe debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically young
 
Cytokines and their role in immune regulation.pptx
Cytokines and their role in immune regulation.pptxCytokines and their role in immune regulation.pptx
Cytokines and their role in immune regulation.pptx
 
Basics of crystallography, crystal systems, classes and different forms
Basics of crystallography, crystal systems, classes and different formsBasics of crystallography, crystal systems, classes and different forms
Basics of crystallography, crystal systems, classes and different forms
 
Equivariant neural networks and representation theory
Equivariant neural networks and representation theoryEquivariant neural networks and representation theory
Equivariant neural networks and representation theory
 
Thornton ESPP slides UK WW Network 4_6_24.pdf
Thornton ESPP slides UK WW Network 4_6_24.pdfThornton ESPP slides UK WW Network 4_6_24.pdf
Thornton ESPP slides UK WW Network 4_6_24.pdf
 
THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
THEMATIC  APPERCEPTION  TEST(TAT) cognitive abilities, creativity, and critic...THEMATIC  APPERCEPTION  TEST(TAT) cognitive abilities, creativity, and critic...
THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
 
Compexometric titration/Chelatorphy titration/chelating titration
Compexometric titration/Chelatorphy titration/chelating titrationCompexometric titration/Chelatorphy titration/chelating titration
Compexometric titration/Chelatorphy titration/chelating titration
 
SAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdfSAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdf
 
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
 
Deep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless ReproducibilityDeep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless Reproducibility
 
8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf
 
Micronuclei test.M.sc.zoology.fisheries.
Micronuclei test.M.sc.zoology.fisheries.Micronuclei test.M.sc.zoology.fisheries.
Micronuclei test.M.sc.zoology.fisheries.
 
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
 

Outlier detection by Ueda's method

  • 1. Yashwantrao Chavan Institute of Science, Satara Department of Statistics M.Sc.1 2017-2018 Seminar on Automatic detection of discordant outliers via the Ueda’s method Marmolejo-Ramos et al. Journal of Statistical Distributions and Applications (2015) 2:8 Presented by, Patil Pooja Rajaram Roll No. 115
  • 2. Content:  Introduction  What is an outlier?  Methodology  Procedure  How to interpret the results  Examples  Conclusion  References
  • 3. Introduction: The importance of identifying outliers in a data set is well known. Although various outlier detection methods have been proposed in order to enable reliable inferences regarding a data set, a simple but less known method has been proposed by Ueda. the method proposed by Ueda which still assumes the underlying data to be normally distributed and based on the Akaike’s Information Criterion (AIC), presents interesting features. The aim of this paper is to study the performance and robustness of the Ueda’s method to detect outliers via computer simulations, as well as determine its applicability to other types of data.
  • 4.  What is an Outlier? An outlier is an observation that lies an abnormal distance from other values in a random sample from a population. Outliers are generally viewed as data values which are numerically distant from the bulk of the sample.
  • 5. Importance of identifying outliers: By definition, outliers are points that are distant from remaining observations. As a result they can potentially skew or bias any analysis performed on the data set. Thus it is very important to detect and adequately deal with outliers.
  • 6.  Methodology: Let X be a random variable with probability distribution f(x), and let x1, x2, . . . , xn, xn+1, . . . , xn+s−1, xN be a random sample of size N = n + s from this distribution, with n and s = {1, 2, 3, . . .} being the number of regular and outlying observations, respectively. The Ueda’s method is a simplified version of the use of the Akaike Information Criterion (AIC). AIC = 2k − 2 ln(L), (1) In the expression above, k is the number of independently adjusted parameters and L is the maximum likelihood value for n number of regular observations. In the normal distribution, the AIC becomes: AIC = 2n ln σ − 2 lnn!+2s (2) Ueda used the correction factor; 2𝑠 𝑙𝑛𝑛! 𝑛 which still depends on n, but includes the number of outliers s in the full sample. Based on this, the proposed test statistic to identify outliers using this method is given by, Ut = 1 2 AIC ≈ nlnσ - 2𝑠 𝑙𝑛𝑛! 𝑛 =(N-s) lnσ - 2𝑠 𝑙𝑛𝑛! 𝑛 (3)
  • 7.  CalculatingUt : the following procedure is suggested to determine outliers in a full sample: 1. Order the full sample to obtain xord =x(1), x(2), . . . , x(N), with x(i) being the i th order statistic (i = 1, 2, . . . ,N). 2. Calculate the z -score for each x(i) as, Z(i) = 𝑥 𝑖− 𝑥 𝑠 Where, 𝑠 = 𝑖=1 𝑁 (𝑥𝑖− 𝑥) 𝑁−1 3. Provided a number of steps s ≥ 0, calculate the test statistic Ut as in Eq. (3).Observe that s = 0 means that no outliers are present in the data. Furthermore, as the original sample is ordered, s > 0 also implies that the outliers are at either end of the sample. 4. Suppose that s = 1. It could be the case that the outlier is located either at the beginning of the ordered sample, i.e. x(1) is the outlier or at the end of it, i.e. x(N) is the outlier. If the former, then the test statistic Ut is calculated as in Eq. (3) after removing x(1) from the ordered sample. If the latter, x(N) is removed and Ut subsequently calculated. 5. Step 4 continues for other values of s > 1. As a result of this process, a collection of Ut values is obtained.
  • 8.  Interpretingthe Ut statistic: The previous section described how the test statistic Ut is calculated provided a value of s≥0. One question that remains to be answered is how many values of Ut need to be calculated. Let C be such a number and suppose that up to s outliers are to be detected in the sample. Then, from the collection of Ut values obtained in step 5, the following matrix is constructed: Ut = 𝑈00 𝑈10 𝑈20 ⋮ 𝑈 𝑚0 𝑈01 𝑈11 𝑈21 ⋮ 𝑈 𝑚1 𝑈02 𝑈12 𝑈22 ⋮ 𝑈 𝑚2 ⋯ ⋯ ⋯ ⋯ ⋯ 𝑈0𝑚 𝑈1𝑚 𝑈2𝑚 ⋮ 𝑈 𝑚𝑚 Where m=s+1, U00 is Ut statistic when no outliers are detected and Uij is the Ut statistic when i and j outliers are detected in the lower and upper tails of xord. From Ut it is clear that the number of calculations to detect up to s outliers in xord is atmost 𝑚2. However, this number can be reduced to 𝑚2 − 2 if U00 and Umm, included for convenience and comparison purposes, are not calculated. Hence, 𝑚2 − 2 < 𝑐 < 𝑚2 .
  • 9.  Example 1: Data (sample size = N = 10): 2.02, 2.22, 3.04, 3.23, 3.59, 3.73, 3.94, 4.05, 4.11, 4.13  Conclusion: Here, first two observations are outliers hence they should be removed from data.
  • 10. • Example 2: Situation in which no outlier is present and Ut obtained: Data (sample size = 10): 5.4, 5.4, 5.5, 5.7, 5.8, 5.9, 6.0, 6.1, 6.3, 6.4 • Conclusion: From the above table, we conclude that there do not present any outlier in given data.
  • 11. Example 3 : (Using simulated data) Normally distributed data. Consider a random sample x1, x2, . . . , x25 from a N(300, 102) distribution, and a second random sample y1, y2, . . . , y5 of outlier observations from a N(400, 52) distribution. Then, the full sample of size N = n + s = 30 will be given by x1, x2, . . . , x25, y1, y2, . . . , y5. Although the true number of outliers is s = 5, smax is set to 10 so up to 10 observations are automatically tested for their potentiality to be outliers using the Ueda’s method. As shown in Fig.a, the number of outliers detected in the full sample is s = 5.
  • 12. sObservations • Conclusion: the outlying observations were sampled from normal distributions and placed on the right tail of the normal distribution. • Graphical representation of example 3:
  • 13. • Conclusion: The Ueda’s method is an outlier detection method that focuses on the symmetry and skewness of the distribution in order to detect outlying data points. This method is highly sensitive to outliers when the distribution under analysis is negatively-, positively-skewed or asymmetric about the centre of the distribution and such sensitivity is enhanced by an increase in sample size. • The advantages of the Ueda’s method are that: i) It is easy to calculate. ii) It does not require determining the number of potential outliers in advance. iii) It does not depend on tables or charts. iv) It effectively signals the absence of outliers, and v) It can be used with large sample sizes.
  • 14. References : Automatic detection of discordant outliers via the Ueda’s method (Marmolejo-Ramos et al. Journal of Statistical Distributions and Applications (2015) 2:8)