ROBUST MULTIVARIATE OUTLIER DETECTION

Statistics Netherlands
Division Research and Development
Department of Statistical Methods

ROBUST MULTIVARIATE OUTLIER DETECTION

Peter de Boer and Vincent Feltkamp

Summary: Two robust multivariate outlier detection methods, based on the
Mahalanobis distance, are reported: the projection method and the Kosinski
method. The ability of those methods to detect outliers is exhaustively tested.
A comparison is made between the two methods as well as a comparison with
other robust outlier detection methods that are reported in the literature.

The opinions in this paper are those of the authors and do not necessarily reflect
those of Statistics Netherlands.

Projectnummer: RSM-80820
BPA-nummer: 324-00-RSM/INTERN
Datum: 19-jul-00

Robust multivariate outlier detection

1. Introduction

The statistical process can be separated in three steps. The input phase involves the
collection of data by means of surveys and registers. The throughput phase involves
preparing the raw data for tabulation purposes, weighting and variance estimating.
The output phase involves the publication of population totals, means, correlations,
etc., which have come out of the throughput phase.
Data editing is one of the first steps in the throughput process. It is the procedure for
detecting and adjusting individual errors in data. Editing also comprises the
detection and treatment of correct but influential records, i.e. records that have a
substantial contribution to the aggregates to be published.
The search for suspicious records, i.e. records that are possibly wrong or influential,
can be done in basically two ways. The first way is by examining each record and
looking for strange or wrong fields or combinations of fields. In this view a record
includes all fields referring to a particular unit, be it a person, household or business
unit, even if those fields are stored in separate files, like files containing survey data
and files containing auxiliary data.
The second way is by comparing each record with the other records. Even if the
fields of a particular record obey all the edit rules one has laid down, the record
could be an outlier. An outlier is a record, which does not follow the bulk of the
records.
The data can be seen as a rectangular file, each row denoting a particular record and
each column a particular variable. The first way of searching for suspicious data can
be seen as searching in rows, the second way as searching in columns. It is remarked
that some and possibly many errors can be detected by both ways.
Records could be outliers while their outlyingness is not apparent by examining the
variables, or columns, one by one. For instance, a company that has a relatively
large turnover but that has paid relatively little taxes might be no outlier in either one
of the variables, but could be an outlier considering the combination. Outliers
involving more than one variable are multivariate outliers.
In order to quantify how far a record lies from the bulk of the data, one needs a
measure of distance. In the case of categorical data no useful distance measure
exists, but in the case of continuous data the so-called Mahalanobis distance is often
employed.
A distance measure should be robust against the presence of outliers. It is known
that the classical Mahalanobis distance is not. This means that the outliers, which are
to be detected, seriously hamper the detection of those outliers. Hence, a robust
version of the Mahalanobis distance is needed.
In this report two robust multivariate outlier detection algorithms for continuous
data, based on the Mahalanobis distance, are reported. In the next section the
classical Mahalanobis distance is introduced and ways to robustify this distance
measure are discussed. In sections 3 and 4 the two algorithms, successively the

1


Kosinski method and the projection method, are presented. In section 5 a
comparison between the two algorithms is made as well as a comparison with other
algorithms reported in the outlier literature. A practical example, and problems
involved with it, is the subject of section 6. In section 7 some concluding remarks
are made.

2. The Mahalanobis distance

The Mahalanobis distance is a measure of the distance between a point and the
center of all points, with respect to the scale of the data — and in the multivariate
case with respect to the shape of the data as well. It is remarked that in regression
analysis another distance measure is more convenient: instead of the distance
between a point and the center of the data, the distance between the point and the
regression plane (see also section 5).

Suppose we have a continuous data set y1 , y 2 ,.., y n . The vectors y i are p-

dimensional, i.e. y i = ( y i1 yi 2 .. y ip ) t , where y iq denotes a real number. The
classical squared Mahalanobis distance is defined by

MDi2 = ( y i − y ) t C −1 ( y i − y )

where y and C denote the mean and the covariance matrix respectively:

1 n
y= ∑ yi
n i =1
1 n
C= ∑ ( yi − y )( yi − y ) t
n − 1 i =1

In the case of one-dimensional data the covariance matrix reduces to the variance
and the Mahalanobis distance to MDi = y i − y σ , where σ denotes the standard
deviation.
Another point of view results by noting that the Mahalanobis distance is the solution
of a maximization problem. The maximization problem is defined as follows. The
data points y i can be projected on a projection vector a. The outlyingness of the

point y i is the squared projected distance (a t ( y i − y )) , with respect to the
2

projected variance a t Ca . Assuming that the covariance matrix C is positive
definite, there exists a non-singular matrix A such that A t CA = I . Using the
Cauchy-Schwarz equality we have

2


−1
(a t ( y i − y )) 2 (a t A t At ( y i − y )) 2
=
a t Ca a t Ca
( A −1 a ) t ( A −1 a ) ( yi − y ) t AAt ( y i − y )
≤
a t Ca
a t ( AAt ) −1 a ( y i − y ) t AA t ( y i − y )
=
a t Ca
= ( y i − y ) t C −1 ( y i − y )
= MDi2

with equality if and only if A − a = cAt ( yi − y ) for some constant c. Hence
1

(a t ( y i − y )) 2
MD = sup
i
2

a t a =1 a t Ca

i.e., the Mahalanobis distance is equal to the supremum of the outlyingness of y i
over all possible projection vectors.
2
If the data set y i is multivariate normal the squared Mahalanobis distances MDi

follow the χ 2 distribution with p degrees of freedom.

The classical Mahalanobis distance suffers however from the masking and
swamping effect. Outliers seriously affect the mean and the covariance matrix in
such a way that the Mahalanobis distance of outliers could be small (masking),
while the Mahalanobis distance of points which are not outliers could be large
(swamping).
Therefore, robust estimates of the center and the covariance matrix should be found
in order to calculate a useful Mahalanobis distance. In the univariate case the most
robust choice is the median (med) and the median of absolute deviations (mad)
replacing the mean and the standard deviation respectively. The med and mad have a
robustness of 50%. The robustness of a quantity is defined as the maximum
percentage of data points that can be moved arbitrarily far away while the change in
that quantity remains bounded.
It is not trivial to generalize the robust one-dimensional Mahalanobis distance to the
multivariate case. Several robust estimators for the location and scale of multivariate
data have been developed. We have tested two methods, the projection method and
the Kosinski method. Other methods for robust outlier detection will be discussed in
section 5, where we will compare the different methods on their ability to detect
outliers.
In the next two sections the Kosinski method and the projection method will be
discussed in detail.

3


3. The Kosinski method

3.1 The principle of Kosinski
The method discussed in this section was quite recently published by Kosinski. The
idea of Kosinski is basically the following:
1) start with a few, say g, points, denoted the “good” part of the data set;
2) calculate the mean and the covariance matrix of those points;
3) calculate the Mahalanobis distances of the complete data set;
4) increase the good part of the data set with one point by selecting the g+1 points
with the smallest Mahalanobis distance and define g=g+1;
5) return to step 2 or stop as soon as the good part contains more than half the data
set and the smallest Mahalanobis distance of the remaining points is higher than
a predefined cutoff value. At the end the remaining part, or the “bad” part,
should contain the outliers.
In order to assure that the good part will contain no outliers at the end, it is important
to start the algorithm with points which all are good. In the paper by Kosinski this
problem is solved by repetitively choosing a small set of random points, and
performing the algorithm for each set. The number of sets of points to start with is
taken high enough to be sure that at least one set contains no outliers.
We made two major adjustments to the Kosinski algorithm. The first one is the
choice of the starting data set. The demanded property of the starting data set is that
it contains no outliers. It does not matter how these points are found. We choose the
starting data set by robustly estimating the center of the data set and selecting the
p+1 closest points. In the case of a p-dimensional data set, p+1 points are needed to
get a useful starting data set, since the covariance matrix of a set of at most p points
is always non-invertible. A separation of the data set in p+1 good points and n-p-1
bad points is called an elemental partition.
The center is estimated by calculating the mean of the data set, neglecting all
univariate robustly detected outliers. This is of course just a rude estimation, but is
satisfactory for the purpose of selecting a good starting data set. Another rude
estimation of the center that was tried out was the coordinate-wise median. The
coordinate-wise median appeared to result in less satisfactory starting data sets.
The p+1 points closest to the mean are chosen, where closest is defined by an
ordinary distance measure. In order to take the different scales and units of the
different dimensions into account, the data set is coordinate-wisely scaled before the
mean is calculated, i.e. each component of each point is divided by the median of
absolute deviations of the dimension concerned. It is remarked that, after the first
p+1 points are selected the algorithm continues with the original unscaled data.
It is, of course, possible to construct a data set for which this algorithm fails to select
p+1 points that are all good points. However, in all the data sets exploited in this
report, artificial and real, this choice of a starting data set worked very well.

4


This adjustment results in a spectacular gain in computer time, since the algorithm
has to be run only once instead of more than once. Kosinski estimates the required
number of random starting data sets in his own original algorithm to be
approximately 35 in the case of 2-dimensional data sets, and up to 10000 in 10
dimensions.
The other adjustment is in the expansion of the good part. In the Kosinski paper the
increment is always one point. We implemented an increment proportional to the
good part already found, for instance 10%. This means that the good part is
increased with a factor of 10% each step. This speeds up the algorithm as well,
especially in large data sets. The original algorithm with one-point increment scales
2
with n , where n is the number of data points, while the algorithm with proportional
increment scales with nln n . Also this adjustment was tested and appeared to be
very good.
In the remainder of this report, “the Kosinski method” denotes the adjusted Kosinski
method, unless otherwise noted.
3.2 The Kosinski algorithm
The purpose of the algorithm is, given a set of n multivariate data points
y1 , y 2 ,.., y n , to calculate the outlyingness u i for each point i. The algorithm can be
summarized as follows.
Step 0. In: data set

The algorithm is started with a set of continuous p-dimensional data y1 , y 2 ,.., y n ,

(
where y i = y i1 .. y ip ) .
t

Step 1. Choose an elemental partition
A good part of p+1 points is found as follows.

• Calculate the med and mad for each dimension q:
M q = med y kq
k

S q = med y lq − M q
l

• Divide each component q of each data point i by the mad of the dimension
concerned. The scaled data points are denoted by the superscript s:
y iq
yiq =
s

Sq

• Declare a point to be a univariate outlier if at least one component of the data
point is farther than 2.5 standard deviations away from the scaled median. The
standard deviation is approximated by 1.484 times the mad (see section 4.1 for
the background of the factor 1.484). So calculate for each component q of each
point i:

5


1 Mq
u iq = y iq −
s

1.484 Sq
If u iq > 2.5 for any q, then point i is an univariate outlier.

• Calculate the mean of the data set, neglecting the univariate outliers:
n
1
ys =
n0
∑y i =1
i
s

yi is no outlier

where n0 denotes the number of points that are no univariate outliers.

• Select the p+1 points that are closest to the mean. Define those points to be the
good part of the data set. So calculate:
d i = y is − y s
The g=p+1 points with the smallest di form the good part, denoted by G.
Step 2. Iteratively increase the good part
The good part is increased until a certain stop criterion is fulfilled.

• Continue with the original data set y i , not with the scaled data set y is .

• Calculate the mean and the covariance matrix of the good part:
1
y= ∑ yi
g i∈G
1
C= ∑ ( yi − y )( yi − y ) t
g − 1 i∈G

• Calculate the Mahalanobis distance of all the data points:
−1
MD = ( y i − y ) C ( y i − y )
i
2 t

• Calculate the number of points with a Mahalanobis distance smaller than a
predefined cutoff value. A useful cutoff value is χ p ,1−α , with • =1%.
2

• Increase the good part with a predefined percentage (a useful percentage is 20%)
by selecting the points with the smallest Mahalanobis distances, but not more
than up to
a) half the data set if the good part is smaller than half the data set
(g<h=[½(n+p+1]).
b) the number of points with a Mahalanobis distance smaller than the cutoff if
the good part is larger than half the data set.

• Stop the algorithm if the good part was already larger than half the data set and
no more points were added in the last iteration.
Step 3. Out: outlyingnesses
The outlyingness of each point is now simply the Mahalanobis distance of the point,
calculated with the mean and the covariance matrix of the good part of the data set.

6


3.3 Test results
A prototype/test program was implemented in a Borland Pascal 7.0 environment.
Documentation of the program is published elsewhere. We successively tested the
choice of the elemental partition by means of the mean, the amount of swamped
observations of data sets containing no outliers, the amount of masked and swamped
observations of data sets containing outliers, the algorithm with proportional
increment and the time-performance of the proportional increment of the good part
compared to the one-point increment. Finally, we tested the sensitivity of the
number of detected outliers to the cutoff value and the increment percentage in some
known data sets.
3.3.1 Elemental partition
First of all, the choice of the elemental partition was tested with the generated data
set published by Kosinski. The Kosinski data set is a kind of worst-case data set. It
contains a large amount of outliers (40% of the data) and the outliers are distributed
with a variance much smaller than the variance of the good points.
Before using the mean, we calculated the coordinate-wise median as a robust
estimator of the center of the data, and selected the three closest points. This strategy
failed. Although the median has a 50%-robustness, the 40% outliers strongly shift
the median. Hence, one of the three selected points appeared to be an outlier. As a
consequence, the forward search algorithm indicated all point to be good points, i.e.
all the outliers were masked.
This was the reason we searched for another robust measure of the location of the
data. One of the simplest ideas is to search for univariate outliers first, and to
calculate the mean of the points that are outlier in none of the dimensions.
The selected points, the three points closest to the mean, appeared all to be good
points. Moreover, the forward search algorithm, applied with this elemental
partition, successfully distinguished the outliers from the good points.
All following tests were performed using this “mean” to select the first p+1 points.
For all tested data sets the selected p+1 points appeared to be good points, resulting
in a successful forward search. It is possible, in principle, to construct a data set for
which this selection algorithm still fails, for instance a data set with a large fraction
of outliers which are univariately invisible and with no unambiguous dividing line
between the group of outliers and the group of good points. This is, however, a very
hypothetical situation.
3.3.2 Swamping
A simulation study was performed in order to determine the average fraction of
swamped observations in normal distributed data sets. In large data sets almost
always a few points are indicated to be an outlier, even if the whole data set nicely
follows a normal distribution. This is due to the cutoff value. If a cutoff value of
χ 2 ,1−α is used as discriminator between good points and outliers in a
p

p-dimensional standard normal data set, a fraction of • data points will have a
Mahalanobis distance larger than the cutoff value.

7


For each dimension p between 1 and 8 we generated 100 standard normal data sets
of 100 points. The Kosinski algorithm was run twice on each data set, once with a
cutoff value χ 2 , 0.99 , and once with
p χ 2 , 0.95 . Each point that is indicated to be an
p

outlier is a swamped observation since there are no true outliers by construction. We
calculated the average fraction of swamped observations (i.e. the number of
swamped observations of each data set divided by 100, the number of points in the
data set, averaged over all 100 data sets). Results are shown in Table 3.1.

• p=1 2 3 4 5 6 7 8
0.01 0.015 0.011 0.010 0.008 0.008 0.008 0.007 0.007
0.05 0.239 0.112 0.081 0.070 0.059 0.052 0.045 0.042
Table 3.1. The average fraction of swamped observations of the simulations of 100
generated p-dimensional data sets of 100 points for each p between 1 and 8, with
cutoff value χ 2 ,1−α .
p

For • =0.01 the fraction of swamped observations is very close to the value of •
itself. These results are very similar to the results of the original Kosinski algorithm.
For • =0.05, however, the average fraction of swamped observations is much larger
than 0.05 for the lower dimensions, especially for p=1 and p=2. The reason for this
is the following. Consider a one-dimensional standard normal data set. If the
variance of all points is used, the outlyingness of a fraction of • points will be larger
than χ 12,1−α . However, in the Kosinski algorithm the variance of all points but at
least that fraction of • points with the largest outlyingnesses is calculated. This
variance is smaller than the variance of all points. Hence, the Mahalanobis distances
are overestimated and too many points are indicated to be an outlier. This is a self-
magnifying effect. More outliers lead to a smaller variance which leads to more
points indicated to be an outlier, etc.
The effect is the strongest in one dimension. In higher dimensions the points with a
large Mahalanobis distance are “all around”. Therefore they less influence the
variance in the separate directions.
Apparently, the effect is quite strong for • =0.05, but almost negligible for • =0.01. In
the remaining tests • =0.01 is used, unless otherwise stated.
3.3.3 Masking and swamping
The ability of the algorithm to detect outliers was tested in another simulation. We
generated data sets in the same way as is done in the Kosinski paper in order to get a
fair comparison between the original and our adjusted Kosinski algorithm. Thus we
generated data sets of 100 points containing good points as well as outliers. Both the
good points and the outliers were generated from a multivariate distribution, with
σ 2 = 40 for the good points and σ 2 = 1 for the bad points. The distance between

8


the center of the good points and the bad points is denoted by d. The vector between
the centers is along the vector of 1’s.
We varied the dimension (p=2, 5), the fraction of outliers (0.10• 0.45), and the
distance (d=20• 60). We calculated the fraction of masked outliers (the number of
masked outliers of each data set divided by the number of outliers) and the fraction
of swamped points (the number of swamped points of each data set divided by the
number of good points), both averaged over 100 simulation runs for each set of
parameters p, d, and fraction of outliers. Results are shown in Table 3.2.

p=2 p=5
fraction of fraction of fraction of fraction of fraction of fraction of
outliers masked obs. swamped outliers masked obs. swamped
obs. obs.
d=20 d=25
0.10 0.81 0.009 0.10 0.90 0.008
0.20 0.89 0.014 0.20 0.91 0.021
0.30 0.88 0.022 0.30 0.93 0.146
0.40 0.86 0.146 0.40 0.97 0.551
0.45 0.88 0.350 0.45 1.00 0.855
d=30 d=40
0.10 0.03 0.011 0.10 0.00 0.008
0.20 0.00 0.011 0.20 0.04 0.008
0.30 0.01 0.010 0.30 0.03 0.022
0.40 0.05 0.043 0.40 0.02 0.020
0.45 0.01 0.019 0.45 0.01 0.014
d=40 d=60
0.10 0.00 0.011 0.10 0.00 0.008
0.20 0.00 0.011 0.20 0.00 0.007
0.30 0.00 0.011 0.30 0.00 0.009
0.40 0.00 0.009 0.40 0.00 0.010
0.45 0.00 0.010 0.45 0.00 0.008
Table 3.2. Average fraction of masked and swamped observations of 2- and
5-dimensional data sets over 100 simulation runs. Each data set consisted of 100
points with a certain fraction of outliers. The good (bad) points were generated from
a multivariate normal distribution with σ = 40 ( σ = 1 ) in each direction. The
2 2

distance between the center of the good points and the bad points is denoted by d.

The following conclusions can be drawn from these results. The algorithm is said to
be performing well if the fraction of masked outliers is close to zero and the fraction
of swamped observation is close to • =0.01. The first conclusion is: the larger the
distance between the good points and the bad points the better the algorithm
performs. This conclusion is not surprising and is in agreement with Kosinski’s
results. Secondly, the higher the dimension, the worse the performance of the
algorithm. In five dimensions the algorithm starts to perform well at d=40, and close
to perfect at d=60, while in two dimensions the performance is good at d=30,
respectively perfect at d=40. The original algorithm did not show such a dependence
on the dimension. It is remarked, however, that the paper by Kosinski does not give

9


enough details for a good comparison on this point. Third, for both two and five
dimensions the adjusted algorithm performs worse than the original algorithm. The
original algorithm is almost perfect at d=25 for both p=2 and p=5, while the adjusted
algorithm is not perfect until d=40 or d=60. This is the price that is paid for the large
gain in computer time. The fourth conclusion is: the performance of the algorithm is
almost not dependent on the fraction of outliers, in agreement with Kosinski’s
results. In some cases, the algorithm even seems to perform better for higher
fractions. This is however due to the relatively small number of points (100) per data
set. For very large data sets and very large number of simulation runs this artifact
will disappear.

p d fr inc masked swamped
2 20 0.10 1p 0.79 0.010
2 20 0.10 10% 0.80 0.009
2 20 0.10 100% 0.80 0.009
2 20 0.40 1p 0.86 0.225
2 20 0.40 10% 0.86 0.146
2 20 0.40 100% 0.89 0.093
2 30 0.10 1p 0.00 0.011
2 30 0.10 10% 0.03 0.011
2 30 0.10 100% 0.02 0.011
2 30 0.40 1p 0.05 0.042
2 30 0.40 10% 0.05 0.043
2 30 0.40 100% 0.08 0.038
2 40 0.10 1p 0.00 0.011
2 40 0.10 10% 0.00 0.011
2 40 0.10 100% 0.00 0.011
2 40 0.40 1p 0.00 0.010
2 40 0.40 10% 0.00 0.009
2 40 0.40 100% 0.02 0.009
5 40 0.10 1p 0.00 0.008
5 40 0.10 10% 0.00 0.008
5 40 0.10 100% 0.01 0.008
5 40 0.40 1p 0.01 0.016
5 40 0.40 10% 0.01 0.016
5 40 0.40 100% 0.06 0.035
Table 3.3. Average fraction of masked and swamped observations for p-dimensional
data sets with a fraction of fr outliers on a distance d from the good points (for more
details about the data sets see Table 3.2), calculated with runs with either one-point
increment (1p) or proportional increment (10% or 100% of the good part).

3.3.4 Proportional increment
Until now all tests have been performed using the one-point increment, i.e. at each
step of the algorithm the size of the good part is increased with just one point. In
section 3.1 it was already mentioned that a gain in computer time is possible by
increasing the size of the good part with more than one point per step. The
simulations on the masked and swamped observations were repeated with the
proportional increment algorithm. The increment with a certain percentage was

10


tested for percentages up to 100% (which means that the size of the good part is
doubled at each step).
The results of Table 3.1, showing the average fraction of swamped observations in
outlier-free data sets, did not change. Small changes showed up for large
percentages in the presence of outliers. A summary of the results is shown in Table
3.3. In order to avoid an unnecessary profusion of data we only show the results for
p=2 in some relevant cases and, as an illustration, in a few cases for p=5.
A general conclusion from the table is that for a wide range of percentages the
proportional increment algorithm works satisfactorily. For a percentage of 100%
outliers are masked slightly more frequently than for lower percentages. The
differences between 10% increment and one-point increment are negligible.
3.3.5 Time dependence
To illustrate the possible gain with the proportional increment we measured the time
per run for p-dimensional data sets of n points, with p ranging from 1 to 8 and n
from 50 to 400. The simulations were performed with outlier-free generated data
sets so that the complete data sets had to be included in the good part. This was done
in order to obtain useful information about the dependence of the simulation times
on the number of points. Table 3.4 shows the results for the simulation runs with
one-point increment. The results for the runs with a proportional increment of 10%
are shown in Table 3.5.

n p=1 2 3 4 5 6 7 8
50 0.09 0.18 0.29 0.45 0.64 0.84 1.08 1.35
100 0.36 0.68 1.05 1.75 2.5 3.3 4.3 5.5
200 1.46 2.8 4.6 7.0 10
400 6.2 12
Table 3.4. Time (in seconds) per run on p-dimensional data sets of n points, using
the one-point increment.

n p=1 2 3 4 5 6 7 8
50 0.05 0.10 0.16 0.23 0.31 0.39 0.52 0.62
100 0.14 0.24 0.39 0.56 0.76 1.00 1.25 1.55
200 0.33 0.60 0.92 1.35 1.90
400 0.80 1.40
Table 3.5. Time (in seconds) per run on p-dimensional data sets of n points, using
the proportional increment (perc=10%).

Let us denote the time per run as a function of n for fixed p by tp, and the time per
run as a function of p for fixed n by tn. For the one-point increment simulations tp is
approximately proportional to n2. This is as expected since there are O(n) steps with
a increment of one point and at each step the Mahalanobis distance has to be
calculated for each point (O(n)) and sorted (O(n ln n)). For the simulations with
proportional increment tp is approximately O(n ln n), due to the fact that only
O(ln n) steps are needed instead of O(n). As a consequence there is a substantial

11


gain in the time per run, ranging from a factor of 2 for 50 points up to a factor of 8
for 400 points.
The time per run for fixed n, tn, is approximately proportional to p1.5, for both one-
point and proportional increment runs. The exponent 1.5 is just an empirical average
over the range p=1..8 and is result of several O(p) and O(p2) steps. Since the
exponent is much smaller than 2 it is more efficient to search for outliers in one
p-dimensional run than in ½p(p-1) 2-dimensional runs, one for each pair of
dimensions, even if one is not interested in outliers in more than 2 dimensions.
Consider for instance p=8, n=2. One run takes 0.62 seconds. However, a total of 1.4
seconds would be needed for the 28 runs in each pair of dimensions, each run taking
0.05 seconds.
3.3.6 Sensitivity to parameters
The Kosinski algorithm was tested on the twelve data sets described in section 5. A
full description of the outliers and a comparison of the results with the results of the
projection algorithm as well as with other methods described in the literature is
given in that section. In the present section we restrict the discussion to the
sensitivity of the number of outliers to the cutoff and the increment percentage.

The algorithm was run with a cutoff χ 2 ,1−α for • =1% as well as • =5%.
p

Furthermore, both one-point increment and proportional increment (in the range 0-
40%) were used. The number of detected outliers of the twelve data sets is shown in
Table 3.6.
It is clear that the number of outliers for a specific data set is not the same for each
set of parameters. It is remarked that, in all cases, if different sets of parameters lead
to the same number of outliers, the outliers are exactly the same points. Moreover, if
one set of parameters leads to more outliers than another set, all outliers detected by
the latter are also detected by the former (these are empirical results).
Let us first discuss the differences between the detection with • =1% and with • =5%.
It is obvious that in many cases • =5% results in slightly more outliers than • =1%.
However, in two cases the differences are substantial, i.e. in the Stackloss data and
in the Factory data.
In the Stackloss data five outliers for • =5% are found using moderate increments,
while • =1% shows no outliers at all. The reason for this difference is the relatively
small number of points related to the dimension of the data set. It has been argued
by Rousseeuw that the ratio n/p should be larger than 5 in order to be able to detect
outliers reliably. If n/p is smaller than 5 one comes to a point where it is not useful
to speak about outliers since there is no real bulk of data.
With n=21 and p=4 the Stackloss data lie on the edge of meaningful outlier
detection. Moreover, if the five points which are indicated as outliers with • =5% are
left out, only 16 good points remain, resulting in a ratio n/p=4. In such a case any
outlier detection algorithm will presumably fail to find outliers consistently.

12


Data set p n inc • =5% • =1%
1. Kosinski 2 100 1p 42 40
• 40% 42 40
2. Brain mass 2 28 1p 5 3
• 10% 5 3
15-20% 4 3
30-40% 3 3
3. Hertzsprung-Russel 2 47 1p 7 6
• 30% 7 6
40% 6 6
4. Hadi 3 25 1p 3 3
• 5% 3 3
10% 3 0
15-25% 3 3
30% 3 0
40% 3 3
5. Stackloss 4 21 1p 5 0
• 17% 5 0
18-24% 4 0
25-30% 1 0
40% 0 0
6. Salinity 4 28 1p 4 2
• 30% 4 2
40% 2 2
7. HBK 4 75 1p 15 14
• 30% 15 14
40% 14 14
8. Factory 5 50 1p 20 0
• 40% 20 0
9. Bush fire 5 38 1p 16 13
• 40% 16 13
10. Wood gravity 6 20 1p 6 5
• 20% 6 5
30% 6 6
40% 6 5
11. Coleman 6 20 1p 7 7
• 40% 7 7
12. Milk 8 85
1p 20 17
• 30% 20 17
40% 18 15
Table 3.6. Number of outliers detected by the Kosinski algorithm with a cutoff of
χ 2 ,1−α , for • =1% respectively • =5%, with either one-point (1p) or proportional
p

increment in the range 0-40%.

13


The Factory data is an interesting case. For • =5% twenty outliers are detected,
which is 40% of all points, while detection with • =1% shows no outliers.
Explorative data analysis shows that about half the data set is quite narrowly
concentrated in a certain region, while the other half is distributed over a much
larger space. There is however no clear distinction between these two parts. The
more widely distributed part is rather a very thick tail of the other part. In such a
case the effect that the algorithm with • =5% tends to detect too much outliers, which
is explained discussing Table 3.1, is very strong. It is questionable whether the
indicated points should be considered as outliers.
Let us now discuss the sensitivity of the number of detected outliers to the
increment. At low percentages the number of outliers is always the same as for the
one-point increment • in fact, at very low percentages the proportional increment
procedure leads to an increment of just one point per step, making the two
algorithms equal. For most data sets the number of outliers is constant for a wide
range of percentages and starts to differ slightly only at 30-40% or higher. Three of
the twelve data sets behave differently: the Brain mass data, the Hadi data, and the
Stackloss data.
The Brain mass data shows 5 outliers at low percentages for • =5%. At percentages
around 15% the number of outliers is only 4 and at 30% only 3. So the number of
outliers changes earlier (at 15%) than in most other data sets (• 30%). For • =1% the
number of outliers is constant over the whole range. In fact, the three outliers which
are found at 30-40% for • =5% are exactly the same as the three outliers found for
• =1%. The two outliers which are missed at higher percentages for • =5% both lie
just above the cutoff value. Therefore it is disputable whether they are real outliers at
all.
The Hadi data shows strange behavior. At all percentages for • =5% and at most
percentages for • =1% three outliers are found. However, near 10% and near 30% no
outliers are detected. Again, the three outliers are disputable. All have a
Mahalanobis distance just above the cutoff (see Table 5.2). Hence it is not strange
that sometimes these three points are included in the good part (the three points lie
close together; hence, the inclusion of one of them in the good part leads to low
Mahalanobis distances for the other two as well). On the other side, it is also not a
big problem, since it is rather a matter of taste than a matter of science to call the
three points outliers or good points.
The Stackloss data shows a decreasing number of outliers for • =5% at relatively low
percentages, like in the Brain mass data. Here, the sensitivity to the percentage is
related to the low ratio n/p, as is discussed previously.
In conclusion, for increments up to 30% the same outliers are found as with the one-
point increment. In cases where this is not true, the supposed outliers always have an
outlyingness slightly above or below the cutoff, so that missing such outliers has no
big consequences. Furthermore, relatively low cutoff values could lead to
disproportionate swamping.

14


4. The projection method

4.1 The principle of projection
The projection method is based on the idea that outliers in univariate data are easily
recognized, visually as well as by computational means. In one dimension the
Mahalanobis distance is simply y i − y σ . A robust version of the univariate
outlyingness is found by replacing the mean by the med and replacing the standard
deviation by the mad. Denoting the robust outlyingness by u i , this leads to

yi − M
ui =
S
where M and S denote the med respectively the mad:

M = med y k
k

S = med y l − M
l

In the case of multivariate data the idea is to “look” at the data set from all possible
directions and to “see” whether a particular data point lies far away from the bulk of
the data points. Looking in this context means projecting the data set on a projection
vector a; seeing means calculating the outlyingness as is done in univariate data. The
ultimate outlyingness of a point is just the maximum of the outlyingnesses over all
projection directions.
The outlyingness defined in this way corresponds to the multivariate Mahalanobis
distance as is shown in section 2. Recalling the expression for the Mahalanobis
distance:

(a t ( y i − y )) 2
MD = sup
i
2

a t a =1 a t Ca
Robustifying the Mahalanobis distance leads to

a t yi − M
u i = sup
a t a =1 S
Now M and S are defined as follows:

M = med a t y k
k

S = med a t y l − M
l

2 2
It is remarked that MDi corresponds to u i .

How is the maximum calculated? The outlyingness

a t yi − M
S

15


as a function of a could posses several local maxima, making gradient search
methods unfeasible. Therefore the outlyingness is calculated on a grid of a finite
number of projection vectors. The grid should be fine enough in order to calculate
the maximum outlyingness with enough accuracy.
This robust measure of outlyingness was firstly developed by Stahel en Donoho.
More recent work on this subject has been reported by Maronna and Yohai. These
authors used the outlyingness in order to calculate a weighted mean and covariance
matrix. Outliers were given small weights so that the Stahel-Donoho estimator of the
mean was robust against the presence of outliers. It is of course possible to use the
weighted mean and covariance matrix to calculate a weighted Mahalanobis distance.
This is not done in the projection method discussed here.

The robust outlyingness u i was slightly adjusted for the following reason. The mad
of univariate standard normal data, which has a standard deviation of 1 by definition,
is 0.674=1/1.484. In order to assure that, in the limiting case of an infinitely large
2
multivariate normal data set, the outlyingness u i is equal to the squared
Mahalanobis distance, the mad in the denominator is multiplied with 1.484:

a t yi − M
u i = sup
a t a =1 1.484 S

4.2 The projection algorithm
The purpose of the algorithm is, given a set of n multivariate data points
y1 , y 2 ,.., y n , to calculate the outlyingness u i for each point i. The algorithm can be
summarized as follows.
Step 0. In: data set

The algorithm is started with a set of continuous p-dimensional data y1 , y 2 ,.., y n ,

with y i = y i1 ( .. y ip ) .
t

Step 1. Define a grid

 p
There are   subsets of q dimensions in the total set of p dimensions. The
q
 
“maximum search dimension” q is predefined. Projection vectors a in a certain
subset are parameterized by the angles θ 1 ,θ 2 ,..,θ q −1 :

 cosθ 1 
 
 cos θ 2 sin θ 1 
 cos θ sin θ sin θ 
a=  3 2 1

 
¡

 
 cos θ q −1 sin θ q − 2 sin θ 1 

 sin θ sin θ sin θ 1 
 

q −1 q−2

16


A certain predefined step size step (in degrees) is used to define the grid.

The first angle θ 1 can take the values i step1 , with step1 the largest angle smaller
180 180
than or equal to step for which is an integer value, and with i = 1,2,.., .
step1 step1

The second angle can take the values j step 2 , with step 2 the largest angle smaller
step1 180
than or equal to for which is an integer value, and with
cosθ 1 step 2
180
j = 1,2,.., .
step 2

The r-th angle can take the values k step r , with step r the largest angle smaller than
step r −1 180
or equal to for which is an integer value, and with
cosθ r −1 step r
180
k = 1,2,.., .
step r

Such a grid is defined in each subset of q dimensions.
Step 2. Outlyingness for each grid point

For each grid point a, calculate the outlyingness for each data point y i :

• Calculate the projections a y i .

• Calculate the median M a = med a y k .
k

• Calculate the mad La = med a y l − M .
l

a yi − M a
• Calculate the outlyingness u i (a ) = .
1.484 La

Step 3. Out: outlyingness

The outlyingness u i is the maximum over the grid:

u i = sup u i (a ) .
a

4.3 Test results
A prototype/test program was implemented in an Excel/Visual Basic environment.
Documentation of the program is published elsewhere. We successively tested the
amount of swamped observations of data sets containing no outliers, the amount of
masked observations of data sets containing outliers, the time-dependence of the
algorithm on the parameters step and q, and the sensitivity of the number of detected
outliers to these parameters in some known data sets.

17


4.3.1 Swamping
A simulation study was performed in order to determine the average fraction of
swamped observations in normal distributed data sets. See section 3.3.2 for more
detailed remarks about the swamping effect and about generating the data sets. The
results of the simulations are shown in Table 4.1.

• step p=1 2 3 4 5
1% 10 0.010 0.011 0.016 0.018 0.023
5% 10 0.049 0.052 0.067 0.071 0.088
1% 30 0.010 0.010 0.012 0.011 0.012
5% 30 0.049 0.049 0.051 0.049 0.058
Table 4.1. The average fraction of swamped observations of the simulations on
several generated p-dimensional data sets of 100 points, with cutoff value χ 2 ,1−α
p

and step size step. The parameter q is equal to p.

p=2 q=2 p=5 q=2 p=5 q=5
fraction of fraction of fraction of fraction of fraction of fraction of
outliers masked obs. outliers masked obs. outliers masked obs.
d=20 d=30 d=30
0.12 0.83 0.12 1.00 0.12 0.22
0.23 1.00 0.23 1.00 0.23 0.54
0.34 1.00 0.34 1.00 0.34 1.00
0.45 1.00 0.45 1.00 0.45 1.00
d=40 d=50 d=50
0.12 0.00 0.12 0.00 0.12 0.00
0.23 0.00 0.23 0.67 0.23 0.00
0.34 0.62 0.34 1.00 0.34 0.65
0.45 1.00 0.45 1.00 0.45 1.00
d=50 d=80 d=60
0.12 0.00 0.12 0.00 0.12 0.00
0.23 0.00 0.23 0.00 0.23 0.00
0.34 0.00 0.34 0.00 0.34 0.00
0.45 1.00 0.45 1.00 0.45 1.00
d=90 d=140 d=120
0.12 0.00 0.12 0.00 0.12 0.00
0.23 0.00 0.23 0.00 0.23 0.00
0.34 0.00 0.34 0.00 0.34 0.00
0.45 0.00 0.45 0.00 0.45 0.00
Table 4.2. Average fraction of masked outliers of 2- and 5-dimensional generated
data sets (see also section 3.3.3).

For low dimensions the average fraction of swamped observations tend to be almost
equal to • . The fraction increases, however, with increasing dimension. This due to
the decreasing ratio n/p. It is remarkable that if the step size is 30 the fraction of
swamped observations seems to be much better than for step size 10. This is just a
coincidence. The fact that more observations are declared to be an outlier is

18


compensated by the fact that outlyingnesses are usually smaller if high step sizes are
used. In fact, the differences between step size 10 and 30 are so large for higher
dimensions that this is an indication that a step size of 30 could be too low to result
in reliable outlyingnesses.

4.3.2 Masking and swamping
The ability of the projection algorithm to detect outliers was tested by generating
data sets that contain good points as well as outliers. See section 3.3.3 for details on
how the data sets were generated.
Results are shown in Table 4.3. In all cases, the ability to detect the outliers is
strongly dependent on the contamination of outliers. If there are many outliers, they
can only be detected if they lie very far away from the cloud of good points. This is
due to the fact that, although the med and the mad have a robustness of 50%, a large
concentrated fraction of outliers strongly shifts the med towards the cloud of outliers
and enlarges the mad.
In higher dimensions it is more difficult to detect the outliers, like in the Kosinski
method. The ability to detect the outliers depends also on the maximum search
dimension q. If q is taken equal to p less outliers are masked.
4.3.3 Time dependence
The time dependence of the projection algorithm on the step size step and the
maximum search dimension q is shown in Table 4.3.

n p q step t n p q step t
400 2 2 36 13.0 100 2 2 9 8.0
18 21.0 3 19.3
9 32.7 4 33.5
4.5 56.8 5 50.1
6 71.4
400 3 3 36 28.1 7 98.9
18 68.6 8 128.0
9 209.1
4.5 719.3 100 5 1 9 5.9
2 50.1
50 5 2 9 26.3 3 479.8
100 50.1 4 2489.1
200 107.7 5 4692.1
400 202.9
Table 4.3. Time t (in seconds) per run on p-dimensional data sets of n points using
maximum search dimension q and step size step (in degrees).

 p  180 q −1
Asymptotically the time per run should be proportional to (n ln n) (
  ) ,
 q  step
 p
since for each of the   subsets a grid is defined with a number of grid points of
q
 

19


180 q −1
the order of ( ) , and at each grid point the median of the projected points has
step
to be calculated (n ln n). The results in the table roughly confirm this theoretical
estimation. The most important conclusion from the table is that the time per run
strongly increases with the search dimension q. This makes the algorithm only
useful for relatively low dimensions.
4.3.4 Sensitivity to parameters
The projection method was tested with the twelve data sets that are fully described
in section 5, like is done with the Kosinski method (see section 3.3.6). The results
are shown in Table 4.4.
Let us first discuss the differences between • =5% and 1%. In almost all cases the
number of outliers, detected with • =5% are larger than with • =1%. This is
completely due to stronger swamping. It is remarked that there is no algorithmic
dependence on the cutoff value, like in the Kosinski method. In the projection
method a set of outlyingnesses is calculated and after the calculation a certain cutoff
value is used in order to discriminate between good and bad points. Hence, a smaller
cutoff value leads to more outliers, but all points still have the same outlyingness. In
the Kosinski method the cutoff value is already used during the algorithm: the cutoff
is used in order to decide whether more points should be added to the good part. A
smaller cutoff leads not only to more outliers but also to a different set of
outlyingnesses since the mean and the covariance matrix are calculated with a
different set of points. As a consequence, in cases where the Kosinski possibly
shows a rather strong sensitivity to the cutoff value, this sensitivity is missing in the
projection method.
Now let us discuss the dependence of the number of outliers on the maximum search
dimension q. In the Hertzsprung-Russel data set and in the HBK data set the number
of outliers found with q=1 is already as large as found with higher values of q. In the
Brain mass data set and in the Milk data set, the number of outliers for q=1 are
however much smaller than for large values of q. In those cases, many outliers are
truly multivariate.
In the Hadi data set, the Factory data set and the Bush fire data set there is also a
rather large discrepancy between q=2 and q=3. It is remarked that the Hadi data set
was constructed so that all outliers were invisible looking at two dimensions only
(see section 5.2.4). Also in the other two data sets it is clear that many outliers can
only be found by inspecting three or more dimensions at the same time.
If q is higher than three, only slightly more outliers are found than for q=3.
Differences can be explained by the fact that searching in higher dimensions with
the projection method leads to more outliers (see section 4.3.1).

20


Data set p n q step • =5% • =1%
1. Kosinski 2 100 2 10 78 34
2 20 77 34
2 30 42 31
2. Brain mass 2 28 2 5 9 6
2 10 9 4
2 30 8 4
1 n/a 3 1
3. Hertzsprung-Russel 2 47 2 1 7 6
2 30 6 5
2 90 6 5
1 n/a 6 5
4. Hadi 3 25 3 5 11 5
3 10 8 0
2 10 0 0
5. Stackloss 4 21 4 5 14 9
4 10 10 9
4 15 8 6
4 20 9 7
4 30 6 6
6. Salinity 4 28 4 10 12 8
4 20 9 7
3 30 6 4
7. HBK 4 75 4 10 15 14
4 20 14 14
1 n/a 14 14
8. Factory 5 50 5 10 24 18
5 20 14 9
4 10 24 17
3 10 22 14
2 10 9 9
9. Bush fire 5 38 5 10 24 19
5 20 19 17
4 10 22 19
3 10 21 17
2 10 13 12
10. Wood gravity 6 20 5 20 14 14
5 30 12 11
3 10 15 14
11. Coleman 6 20 5 20 10 8
5 30 4 4
12. Milk 8 5 85 20 18 14
5 30 15 13
4 20 16 14
4 30 15 13
3 20 15 13
3 30 15 12
2 20 13 11
2 30 12 7
1 n/a 6 5
Table 4.4. Number of outliers detected by the projection algorithm with a cutoff of

χ 2 ,1−α , for • =1% respectively • =5%, with maximum search dimension q and angular
p

step size step (in degrees).

21


The sensitivity to the step size is not large in most cases. In cases like the Hadi data,
the Stackloss data, the Salinity data and the Coleman data, the sensitivity can be
explained by the sparsity of the data sets. A step size near 10-20 seems to work well
in most cases.
In conclusion, the number of outliers is not very sensitive to the parameters q and
step. However, the sensitivity is not completely negligible. In most practical cases
q=3 and step=10 work well enough.

5. Comparison of methods

In this section the projection method and the Kosinski method are compared with
each other as well with other robust outlier detection methods. In section 5.1 we will
shortly describe some other methods reported in the literature. The comparison is
made by applying the projection method and the Kosinski method on data sets that
are analyzed by at least one of the other methods. Those data sets and the results of
the said methods are described in section 5.2. In section 5.3 the results are discussed.
Unfortunately, in most papers on outlier detection methods very little is said about
the efficiency of the methods, i.e. how fast the algorithms are and how it depends on
the number of points and the dimension of the data set. Therefore we restrict the
discussion to the ability to detect outliers.

5.1 Other methods
It is important to note that two different type outliers are distinguished in the outlier
literature. The first type outlier, which is used in this report, is a point that lies far
away from the bulk of the data. The second type is a point that lies far away from the
regression plane formed by the bulk of the data. The two types will be denoted by
bulk outliers respectively regression outliers.
Of course, outliers are often so according to both points of view. That is why we
compare the results of the projection method and the Kosinski method, which are
both bulk outlier methods, also with regression outlier methods. An outlier that is
declared to be so by both methods is called a bad leverage point. In the case that a
point lies far away from the bulk of the points but close to the regression plane it is
called a good leverage point.
Rousseeuw (1987, 1990) developed the minimum volume ellipsoid (MVE) estimator
in order to robustly detect bulk outliers. The principle is to search for the ellipsoid,
covering at least half the data points, for which the volume is minimal. The mean
and the covariance matrix of the points inside the ellipsoid are inserted in the
expression for the Mahalanobis distance. This method is costly due to the
complexity of the algorithm that searches the minimum volume ellipsoid.
A related technique is based on the minimum covariance determinant (MCD)
estimator. This technique is employed by Rocke. The aim of this technique is to
search for the set of points, containing at least half the data, for which the
determinant of the covariance matrix is minimal. Again, the mean and the

22


covariance matrix, determined by that set of points, are inserted in the Mahalanobis
distance expression. Also this method is rather complex, although substantially
optimized by Rocke.
Hadi (1992) developed a bulk outlier method that is very similar to the Kosinski
method. He also starts with a set of p+1 “good” points and increases the good set
one point by one. The difference lies in the choice of the first p+1 points. Hadi
orders the n points using another robust measure of outlyingness. The question
arises why that other outlyingness would not be appropriate for outlier detection. A
reason could be that an arbitrary robust measure of outlyingness deviates relatively
strongly from the “real” Mahalanobis distance.
Atkinson combines the MVE method of Rousseeuw and the forward search
technique also employed by Kosinski. A few sets of p+1 randomly chosen points are
used for a forward search. The set that results in the ellipsoid with minimal volume
is used for the calculation of the Mahalanobis distances.
Maronna employed a projection–like method, but slightly more complicated. The
outlyingnesses are calculated like in the projection method. Then, weights are
assigned to each point, with low weights for the outlying points, i.e. the influence of
outliers is restricted. The mean and the covariance matrix are calculated using these
weights. They form the Stahel-Donoho estimator for location and scatter. Finally,
Maronna inserts this mean and this covariance matrix in the expression for the
Mahalanobis distance.
Egan proposes resampling by the half-mean method (RHM) and the smallest half-
volume method (SHV). In the RHM method several randomly selected portions of
the data are generated. In each case the outlyingnesses are calculated. For each point
is counted how many times it has a large outlyingness. It is declared to be a true
outlier if this happens often. In the SHV method the distance between each pair of
points is calculated and put in a matrix. The column with the smallest sum of the
smallest n/2 distances is selected. The corresponding n/2 points form the smallest
half-volume. The mean and the covariance of those points are inserted in the
Mahalanobis distance expression.
The detection of regression outliers is mainly done with the least median of squares
(LMS) method. The LMS method is developed by Rousseeuw (1984, 1987, 1990).
Instead of minimizing the sum of the squares of the residuals in the least squares
method (which should rather be called the least sum of squares method in this
context) the median of the squares is minimized. Outliers are simply the points with
large residuals as calculated with the regression coefficients determined with the
LMS method.
Hadi (1993) uses a forward search to detect the regression outliers. The regression
coefficients of a small good set are determined. The set is increased by subsequently
adding the points with the smallest residuals and recalculating the regression
coefficients until a certain stop criterion is fulfilled. A small good set has to be found
beforehand.

23


Atkinson combines forward search and LMS. A few sets of p+1 randomly chosen
points are used in a forward search. The set that results in the smallest LMS is used
for the final determination of the regression residuals.
A completely different approach is the genetic algorithm for detection of regression
outliers by Walczak. We will not describe this approach here since it lies beyond the
scope of deterministic calculation of outlyingnesses.
Fung developed an adding-back algorithm for confirmation of regression outliers.
Once points are declared to be outliers by any other robust method, the points are
added back to the data set in a stepwise way. The extent to which estimation of
regression coefficients are affected by the adding-back of a point is used as a
diagnostic measure to decide whether that point is a real outlier. This method was
developed since robust outlier methods tend to declare too many points to be
outliers.
5.2 Known data sets
In this section the projection method and the Kosinski method are compared by
running both algorithms on the twelve data sets given in Table 5.1. The main part of
these data sets is well described in the robust outlier detection literature. Hence, we
are able to compare the results of the two algorithms with known results.
The outlyingnesses as calculated by the projection method and the Kosinski method
are shown in Table 5.2, Table 5.4 and Table 5.5. In both methods the cutoff value
for • =1% is used. In the Kosinski method a proportional increment of 20% was
used. The outlyingnesses of the projection method were calculated with q=p (if p<6;
if p>5 then q=5) and the lowest step size that is shown in Table 4.4.
We will now discuss the data sets one by one.

Data set p n Source
1. Kosinski 2 100 Ref. [1]
2. Brain mass 2 28 Ref. [3]
3. Hertzsprung-Russel 2 47 Ref. [3]
4. Hadi 3 25 Ref. [4]
5. Stackloss 4 21 Ref. [3]
6. Salinity 4 28 Ref. [3]
7. HBK 4 75 Ref. [3]
8. Factory 5 50 This work
9. Bush fire 5 38 Ref. [5]
10. Wood gravity 6 20 Ref. [6]
11. Coleman 6 20 Ref. [3]
12. Milk 8 85 Ref. [7]
Table 5.1. The name, the dimension p, the number of points n, and the source of the
tested data sets.

5.2.1 Kosinski data
The Kosinski data form a data set that is difficult to handle from a point of view of
robust outlier detection. The two-dimensional data set contains 100 points. Points 1-

24


40 are generated from a bivariate normal distribution with
µ1 = 18, µ 2 = −18, σ 12 = σ 2 = 1, ρ = 0 , and are considered to be outliers. Points
2

41-100 are good points and are a sample from the bivariate normal distribution with
µ1 = 0, µ 2 = 0, σ 12 = σ 2 = 40, ρ = 0.7 .
2

The Kosinski method correctly identifies all outliers (see Table 5.2). The projection
method identifies none of the outliers and declares many good points to be outliers.
The reason for this failure is the large contamination and the small scatter of the
outliers. Since there are so many outliers they strongly shift the median towards the
outliers. Hence, the outliers are not detected. Furthermore, since they are narrowly
distributed, they almost completely determine the median of absolute deviations in
the projection direction perpendicular to the vector pointing from the center of the
good points to the center of the outliers. Hence, many points, lying at the end points
of the ellipsoid of good points, have a large outlyingness.
It is remarked that this data set is not an arbitrarily chosen data set. It was generated
by Kosinski in order to demonstrate the superiority of his own method over other
methods.
5.2.2 Brain mass data
The Brain mass data contain three outliers according to the Kosinski method: points
6, 16 and 25. Those points are also indicated to be outliers by Rousseeuw (1990) and
Hadi (1992). Those authors also declare point 14 to be an outlier, but with an
outlyingness slightly above the cutoff. The projection method declares points 6, 14,
16, 17, 20 and 25 to be outliers.
5.2.3 Hertzsprung-Russel data
The two methods produce almost the same outlyingnesses for all points. Both
declare points 11, 20, 30 and 34 to be large outliers, in agreement with results by
Rousseeuw (1987) and Hadi (1993). However, the projection method and the
Kosinski method also declare points 7 and 14 as outliers and point 9 is an outlier
according to the Kosinski method . The outlyingness of these three points is
relatively small. Visual inspection of the data (see page 28 in Rousseeuw (1987))
shows that these points are indeed moderately outlying.
5.2.4 Hadi data
The Hadi data is an artificial one. The data set contains three variables x1 , x 2 and y .
The two predictors were originally created as uniform (0,15) and were then
transformed to have a correlation of 0.5. The target variable was then created by
y = x1 + x 2 + ε with ε ~ N (0,1) . Finally, cases 1-3 were perturbed to have
predictor values around (15,15) and to satisfy y = x1 + x 2 + 4 .

The Kosinski method finds the outliers, with a relatively small outlyingness. The
projection method finds these outliers too but declares also two good points to be
outliers.

25


A: Kosinski Brain mass Hertzsprung-Russel Hadi
B: 3,035 3,035 3,035 3,368
C: Proj Kos Proj Kos Proj Kos Proj Kos Proj Kos
1 2,59 7,45 51 4,37 1,01 1 1,79 0,75 1 0,80 1,20 1 4,75 3,47
2 2,80 7,96 52 1,53 0,98 2 1,05 1,13 2 1,39 1,46 2 4,75 3,47
3 2,46 7,14 53 2,22 1,05 3 0,37 0,16 3 1,41 1,83 3 4,76 3,46
4 2,87 8,21 54 4,69 1,32 4 0,65 0,13 4 1,39 1,46 4 2,86 1,84
5 2,78 7,97 55 3,97 1,50 5 1,99 0,92 5 1,42 1,90 5 0,96 0,70
6 2,59 7,48 56 3,47 1,44 6 8,40 6,19 6 0,80 1,04 6 3,43 1,57
7 2,84 8,09 57 4,59 2,55 7 2,08 1,27 7 5,55 6,35 7 2,21 0,91
8 2,75 7,89 58 2,27 0,37 8 0,66 0,55 8 1,44 1,38 8 0,46 0,36
9 2,51 7,22 59 2,96 0,51 9 0,94 0,91 9 2,59 3,26 9 0,99 0,35
10 2,45 7,12 60 2,22 0,54 10 1,93 0,99 10 0,61 0,93 10 1,74 1,34
11 2,69 7,71 61 4,94 1,83 11 1,23 0,51 11 11,01 12,67 11 2,50 1,65
12 2,84 8,12 62 5,07 1,29 12 0,96 0,90 12 0,91 1,21 12 1,54 1,13
13 2,77 7,95 63 4,66 1,13 13 0,64 0,60 13 0,79 0,88 13 2,81 1,25
14 2,68 7,72 64 1,68 1,17 14 3,87 2,21 14 3,04 3,51 14 0,98 0,68
15 2,37 6,95 65 3,32 1,03 15 2,22 1,44 15 1,55 1,22 15 2,65 1,37
16 2,46 7,17 66 2,25 1,03 16 7,54 5,63 16 1,23 0,99 16 0,97 0,84
17 2,64 7,59 67 2,59 1,13 17 3,18 1,83 17 2,17 1,80 17 3,31 1,64
18 2,40 6,96 68 3,89 1,04 18 0,90 0,92 18 2,17 2,04 18 3,17 1,39
19 2,46 7,11 69 1,82 0,88 19 3,00 1,43 19 1,77 1,54 19 2,78 1,49
20 2,45 7,15 70 5,96 1,59 20 3,59 1,71 20 11,26 13,01 20 2,94 1,37
21 2,70 7,71 71 2,29 0,70 21 1,54 0,66 21 1,35 1,07 21 0,90 0,66
22 2,62 7,54 72 3,91 0,86 22 0,50 0,25 22 1,62 1,28 22 1,61 1,27
23 2,82 8,11 73 2,15 1,30 23 0,66 0,74 23 1,60 1,41 23 3,89 1,39
24 2,68 7,67 74 6,76 2,00 24 2,18 1,11 24 1,21 1,10 24 2,80 1,22
25 2,37 6,88 75 6,20 2,01 25 8,97 6,75 25 0,34 0,58 25 2,04 1,12
26 2,75 7,86 76 3,37 0,77 26 2,61 1,24 26 1,04 0,78
27 2,67 7,70 77 2,67 0,49 27 2,59 1,41 27 0,88 1,07
28 2,85 8,14 78 1,83 0,50 28 1,13 1,17 28 0,36 0,33
29 2,78 7,98 79 4,19 2,45 29 1,43 1,60
30 2,78 8,00 80 2,71 0,46 30 11,61 13,48
31 2,45 7,14 81 4,49 1,12 31 1,36 1,09
32 2,91 8,29 82 2,74 0,79 32 1,59 1,48
33 2,51 7,27 83 1,62 0,31 33 0,49 0,52
34 2,33 6,80 84 2,81 0,47 34 11,87 13,88
35 2,68 7,72 85 5,94 1,57 35 1,50 1,50
36 2,82 8,08 86 3,50 1,01 36 1,57 1,70
37 2,52 7,31 87 1,38 1,93 37 1,27 1,13
38 2,65 7,66 88 2,21 1,57 38 0,49 0,52
39 2,49 7,18 89 5,47 1,73 39 1,14 1,03
40 2,61 7,52 90 3,07 1,44 40 1,17 1,52
41 1,89 0,50 91 2,94 1,54 41 0,88 0,60
42 1,84 0,41 92 6,02 1,59 42 0,46 0,30
43 7,94 2,03 93 3,65 0,80 43 0,81 0,77
44 3,04 0,61 94 3,89 0,98 44 0,61 0,80
45 2,35 0,67 95 6,68 1,64 45 1,17 1,19
46 6,42 1,76 96 2,50 0,84 46 0,58 0,37
47 5,36 1,68 97 4,59 1,32 47 1,41 1,20
48 3,74 0,77 98 5,65 1,46
49 3,92 0,92 99 2,12 1,64
50 6,53 1,78 100 2,31 0,30
Table 5.2. The outlyingness of each point of the Kosinski, the Brain mass, the Hertzsprung-
Russel and the Hadi data. A: Name of data set. B: Cutoff value for • =1%; outlyingnesses
higher than the cutoff are shown in bold. C: Method (Proj: projection method; Kos: Kosinski
method).

26


The projection method finds consistently larger outlyingnesses than the Kosinski
method, roughly a factor 2 for most points. This is related to the sparsity of the data
set. Consider for instance the extreme case of three points in two dimensions. Every
point will have an infinitely large outlyingness according to the projection method.
This can be understood by noting that the mad of the projected points is zero if the
projection vector intersects two points. The remaining point has an infinite
outlyingness. For data sets with more points the situation is less extreme. But as long
as there are relatively little points the projection outlyingnesses will be relatively
large. In such a case the cutoff values based on the χ -distribution are in fact too
2

low, leading to the swamping effect.
5.2.5 Stackloss data
The Stackloss data outlyingnesses show large differences between the two methods.
One of the reasons is the sensitivity of the Kosinski results to the cutoff value in this
case, as is discussed in section 3. If a cutoff value χ 4, 0.95 = 3.080 is used instead of
2

χ 4, 0.99 = 3.644 , the Kosinski method shows outlyingnesses as in Table 5.3.
2

outl. outl. outl.
1 4.73 8 0.98 15 1.07
2 3.30 9 0.76 16 0.87
3 4.42 10 0.98 17 1.14
4 4.19 11 0.83 18 0.71
5 0.63 12 0.93 19 0.80
6 0.76 13 1.24 20 1.04
7 0.87 14 1.04 21 3.80
Table 5.3. The outlyingnesses of the Stackloss data, calculated with the Kosinski
method with cutoff value χ 4, 0.95 = 3.080 . Outlyingnesses above this value are
2

shown in bold, outlyingnesses that are even higher than χ 4, 0.99 = 3.644 are shown
2

in bold italic.

Here 5 points have an outlyingness exceeding the cutoff value for • =5%, four of
them (points 1, 3, 4 and 21) even above the value for • =1%. Even in this case the
differences with the projection method are large. The projection outlyingnesses are
up to 5 times larger than the Kosinski ones.
For comparison, Walczak and Atkinson declared points 1, 3, 4 and 21 to be outliers,
Rocke indicated also point 2 as an outlier, while points 1, 2, 3 and 21 are outliers
according to Hadi (1992). These results are comparable with the results of the
Kosinski method with • =5%. Hence, considering the results in Table 5.4, the
Kosinski method results in too little outliers, the projection method too much. In
both cases the origin lies in the low n/p ratio.

27


A: Stackloss Salinity HBK Factory
B: 3,644 3,644 3,644 3,884
1 8,42 1,62 1 2,67 1,29 1 30,38 32,34 51 1,99 1,64 1 5,23 2,12
2 6,92 1,53 2 2,58 1,46 2 31,36 33,36 52 2,20 2,06 2 5,66 1,67
3 8,14 1,45 3 4,65 1,84 3 32,81 34,90 53 3,18 2,80 3 5,55 1,91
4 9,00 1,51 4 3,54 1,63 4 32,60 34,97 54 2,13 1,96 4 4,57 2,05
5 1,74 0,41 5 6,06 4,06 5 32,71 34,92 55 1,57 1,22 5 3,28 2,34
6 2,33 0,82 6 3,12 1,41 6 31,42 33,49 56 1,78 1,46 6 2,19 1,48
7 3,45 1,31 7 2,62 1,25 7 32,34 34,33 57 1,81 1,61 7 2,27 1,49
8 3,45 1,24 8 2,87 1,59 8 31,35 33,24 58 1,67 1,55 8 1,85 1,23
9 2,15 1,11 9 3,31 1,90 9 32,13 34,35 59 0,89 1,13 9 2,15 1,17
10 4,26 1,16 10 2,08 0,91 10 31,84 33,86 60 2,08 2,05 10 3,56 1,70
11 3,01 1,11 11 2,76 1,24 11 28,95 32,68 61 1,78 1,99 11 3,64 1,87
12 3,30 1,34 12 0,77 0,43 12 29,42 33,82 62 2,29 2,00 12 3,67 1,99
13 3,25 1,01 13 2,36 1,28 13 29,42 33,82 63 1,70 1,70 13 2,24 1,43
14 3,75 1,15 14 2,52 1,24 14 33,97 36,63 64 1,62 1,75 14 2,13 1,79
15 3,90 1,20 15 3,71 2,16 15 1,99 1,89 65 1,90 1,85 15 1,84 1,29
16 2,88 0,85 16 14,83 8,08 16 2,33 2,03 66 1,78 1,87 16 3,52 2,34
17 7,09 1,78 17 3,68 1,60 17 1,65 1,74 67 1,34 1,20 17 2,42 1,79
18 3,56 0,98 18 1,84 0,82 18 0,86 0,70 68 2,93 2,20 18 5,55 2,49
19 3,07 1,04 19 2,93 1,79 19 1,54 1,18 69 1,97 1,56 19 5,65 1,76
20 2,48 0,61 20 2,00 1,22 20 1,67 1,95 70 1,59 1,93 20 5,91 2,83
21 8,85 2,11 21 2,50 0,95 21 1,57 1,76 71 0,75 1,01 21 4,35 1,90
22 3,34 1,23 22 1,90 1,70 72 1,00 0,83 22 2,20 1,63
23 5,20 2,07 23 1,72 1,72 73 1,70 1,53 23 2,77 1,62
24 4,62 1,90 24 1,70 1,56 74 1,77 1,80 24 2,14 0,90
25 0,77 0,42 25 2,06 1,83 75 2,44 1,98 25 3,11 2,13
26 1,80 0,87 26 1,73 1,80 26 2,27 1,31
27 2,85 1,11 27 2,17 2,01 27 4,88 2,02
28 3,72 1,48 28 1,41 1,13 28 5,08 2,67
29 1,33 1,13 29 4,49 2,59
30 2,04 1,86 30 1,91 1,27
31 1,61 1,53 31 1,13 0,83
32 1,78 1,70 32 2,00 1,34
33 1,55 1,45 33 3,13 2,05
34 2,10 2,07 34 2,43 1,70
35 1,41 1,80 35 5,96 2,82
36 1,63 1,61 36 5,78 2,25
37 1,75 1,87 37 5,75 1,83
38 2,01 1,86 38 4,14 1,62
39 2,16 1,93 39 3,16 2,19
40 1,25 1,17 40 2,77 1,62
41 1,65 1,81 41 2,75 1,86
42 1,91 1,72 42 2,56 1,67
43 2,50 2,17 43 4,54 2,15
44 2,04 1,91 44 4,25 1,89
45 2,07 1,86 45 3,91 2,14
46 2,04 1,91 46 2,10 1,52
47 2,92 2,56 47 1,06 0,84
48 1,40 1,70 48 1,47 1,10
49 1,73 2,01 49 3,34 2,16
50 1,05 1,36 50 2,51 1,39
Table 5.4. The outlyingness of each point of the Stackloss, the Salinity, the HBK
and the Factory data. A, B, C: see Table 5.2.

28


A: Bush fire Wood gravity Coleman Milk
B: 3,884 4,100 4,100 4,482
1 3,48 1,38 1 4,72 2,65 1 3,56 2,84 1 9.06 9,46 51 2.62 1,98
2 3,27 1,04 2 2,71 1,20 2 4,92 6,37 2 10.57 10,81 52 3.64 2,98
3 2,76 1,11 3 3,68 2,19 3 6,76 2,94 3 4.04 5,09 53 2.38 2,22
4 2,84 1,02 4 14,45 33,75 4 2,99 1,53 4 3.86 2,83 54 1.22 1,16
5 3,85 1,40 5 3,02 2,80 5 2,70 1,43 5 2.23 2,52 55 1.68 1,69
6 4,92 1,90 6 16,19 38,83 6 5,74 10,43 6 2.97 2,84 56 1.10 1,01
7 11,79 4,37 7 7,90 5,00 7 3,11 2,23 7 2.36 2,35 57 1.96 2,19
8 17,96 11,87 8 15,85 37,88 8 1,48 1,83 8 2.32 2,08 58 2.05 1,95
9 18,36 12,18 9 6,12 2,72 9 2,49 5,95 9 2.58 2,49 59 1.47 2,21
10 14,75 7,64 10 8,59 2,37 10 5,71 12,04 10 2.20 1,98 60 2.04 1,76
11 12,31 6,76 11 5,38 3,04 11 5,07 7,70 11 5.28 4,60 61 1.48 1,42
12 6,17 2,38 12 6,79 2,65 12 4,31 2,77 12 6.65 6,05 62 2.64 2,07
13 5,83 1,77 13 7,14 1,98 13 3,49 2,92 13 5.63 5,38 63 2.33 2,60
14 2,30 1,59 14 2,38 2,09 14 1,95 2,16 14 6.17 5,48 64 2.58 1,90
15 4,70 1,55 15 2,40 1,47 15 6,11 6,56 15 5.47 5,73 65 1.85 1,56
16 3,43 1,38 16 4,74 2,86 16 2,18 2,30 16 3.84 4,56 66 2.01 1,64
17 3,06 0,92 17 6,07 2,12 17 3,78 5,95 17 3.59 4,76 67 3.28 2,59
18 2,75 1,41 18 3,28 2,49 18 7,86 3,09 18 3.74 3,30 68 2.41 2,33
19 2,82 1,38 19 18,33 44,49 19 3,48 2,11 19 2.43 2,85 69 46.45 44,61
20 2,89 1,20 20 7,16 2,07 20 2,80 1,56 20 4.14 3,44 70 1.99 1,87
21 2,47 1,13 21 2.26 2,08 71 2.19 2,27
22 2,44 1,73 22 1.69 1,59 72 3.24 3,02
23 2,46 1,04 23 1.81 2,04 73 6.89 6,99
24 3,44 1,04 24 2.28 2,05 74 5.01 4,90
25 1,90 0,91 25 2.81 2,83 75 2.02 2,03
26 1,69 0,97 26 1.83 2,09 76 4.77 4,51
27 2,27 0,99 27 4.24 3,71 77 1.35 1,43
28 3,31 1,35 28 3.29 3,04 78 1.49 1,87
29 4,82 1,83 29 3.19 2,57 79 2.93 2,66
30 5,06 2,18 30 1.47 1,39 80 1.40 1,38
31 6,00 5,66 31 2.87 2,29 81 2.59 2,34
32 13,48 14,08 32 2.37 2,66 82 2.14 2,42
33 15,34 16,35 33 1.78 1,33 83 3.00 2,56
34 15,10 16,11 34 2.09 1,96 84 3.88 3,06
35 15,33 16,43 35 2.73 2,10 85 2.19 2,36
36 15,02 16,04 36 2.66 2,32
37 15,17 16,30 37 2.61 2,23
38 15,25 16,41 38 2.23 2,07
39 2.27 2,07
40 3.31 2,89
41 10.63 10,11
42 3.69 3,04
43 3.20 2,85
44 7.67 6,08
45 1.99 2,28
46 1.78 2,41
47 5.19 5,35
48 2.92 2,58
49 3.43 2,70
50 3.96 2,69
Table 5.5. The outlyingness of each point of the Bush fire, the Wood gravity, the
Coleman, and the Milk data. A, B, C: see Table 5.2.

29


5.2.6 Salinity data
The outlyingnesses of the Salinity data are roughly two times larger for the
projection method as compared to the Kosinski method. As a consequence, the latter
shows just 2 outliers (points 5 and 16), the former 8 points. Rousseeuw (1987) and
Walczak agree that the points 5, 16, 23 and 24 are outliers, with points 23 and 24
lying just above the cutoff. Fung finds the same points in first instance, but after
applying his adding-back algorithm he concludes that point 16 is the only outlier.
The projection method shows too much outliers, while the Kosinski method misses
points 23 and 24.
5.2.7 HBK data
In the case of the HBK data the projection method and the Kosinski method agree
completely. Both indicate points 1-14 to be outliers. This is also in agreement with
the results of the original Kosinski method and of Egan, Hadi (1992,1993), Rocke,
Rousseeuw (1987,1990), Fung and Walczak. It is remarked that some of these
authors only find points 1-10 as outliers, but they use the “regression” definition of
an outlier. The HBK is a artificial data set, where the good points lie along a
regression plane. Points 1-10 are bad leverage points, i.e. they lie far away from the
center of the good points and from the regression plane as well. Points 11-14 are
good leverage points, i.e. although they lie far away from the bulk of the data they
still lie close to the regression plane. If one considers the distance from the
regression plane, the points 11-14 are not outliers.
5.2.8 Factory data
The Factory data set is a new one1. It is given in Table 5.6.
The outlyingnesses show a big discrepancy between the two methods. The
projection outlyingnesses are much larger than the Kosinski ones, resulting in 18
versus 0 outliers. The outlyingnesses are so large due to the shape of the data. About
half the data set is quite narrowly concentrated around the center of the data, the
other half forms a rather thick tail. Hence, in many projection directions the mad is
very small, leading to large outlyingnesses for the points in the tail. It is remarked
that the projection outliers are well comparable to the Kosinski outliers found with a
cutoff for • =5% (see also section 3.3.6).

1
The Factory data is a generated data set, originally used in an exercise on regression
analysis in the CBS course “multivariate technics with SPSS”. It is interesting to note that
the regression coefficients change radically if the points, that are indicated to be outliers by
the projection method and the Kosinski method with low cutoff, are removed from the data
set. In other words, the regression coefficients are mainly determined by the “outlying”
points.

30


x1 x2 x3 x4 x5 x1 x2 x3 x4 x5
1 14.9 7.107 21 129 11.609 26 12.3 12.616 20 192 11.478
2 8.4 6.373 22 141 10.704 27 4.1 14.019 20 177 14.261
3 21.6 6.796 22 153 10.942 28 6.8 16.631 23 185 15.300
4 25.2 9.208 20 166 11.332 29 6.2 14.521 19 216 10.181
5 26.3 14.792 25 193 11.665 30 13.7 13.689 22 188 13.475
6 27.2 14.564 23 189 14.754 31 18 14.525 21 192 14.155
7 22.2 11.964 20 175 13.255 32 22.8 14.523 21 183 15.401
8 17.7 13.526 23 186 11.582 33 26.5 18.473 22 205 14.891
9 12.5 12.656 20 190 12.154 34 26.1 15.718 22 200 15.459
10 4.2 14.119 20 187 12.438 35 14.8 7.008 21 124 10.768
11 6.9 16.691 22 195 13.407 36 18.7 6.274 21 145 12.435
12 6.4 14.571 19 206 11.828 37 21.2 6.711 22 153 9.655
13 13.3 13.619 22 198 11.438 38 25.1 9.257 22 169 10.445
14 18.2 14.575 22 192 11.060 39 26.3 14.832 25 191 13.150
15 22.8 14.556 21 191 14.951 40 27.5 14.521 24 177 14.067
16 26.1 18.573 21 200 16.987 41 17.6 13.533 24 186 12.184
17 26.3 15.618 22 200 12.472 42 12.4 12.618 21 194 12.427
18 14.8 7.003 22 130 9.920 43 4.3 14.178 20 181 14.863
19 18.2 6.368 22 144 10.773 44 6 16.612 21 192 14.274
20 21.3 6.722 21 123 15.088 45 6.6 14.513 20 213 10.706
21 25 9.258 20 157 13.510 46 13.1 13.656 22 192 13.191
22 26.1 14.762 24 183 13.047 47 18.2 14.525 21 191 12.956
23 27.4 14.464 23 177 15.745 48 22.8 14.486 21 189 13.690
24 22.4 11.864 21 175 12.725 49 26.2 18.527 22 200 17.551
25 17.9 13.576 23 167 12.119 50 26.1 15.578 22 204 13.530
Table 5.6. The Factory data (n=50, p=5). The average temperature (x1, in degrees
Celsius), the production (x2, in 1000 pieces), the number of working days (x3), the
number of employees (x4) and the water consumption (x5, in 1000 liters) at a factory
in 50 successive months.

5.2.9 Bushfire data
The outliers found by the adjusted Kosinski method (points 7-11, 31-38) agree
perfectly with those found by the original algorithm of Kosinski and with the results
by Rocke and Maronna. The projection method shows as additional outliers points 6,
12, 13, 15, 29 and 30. Due to the large contamination the projected median is shifted
strongly, leading to relatively large outlyingnesses for the good points and,
consequently, many swamped points.
5.2.10 Wood gravity data
Rousseeuw (1984), Hadi (1993), Atkinson, Rocke and Egan declare points 4, 6, 8
and 19 to be outliers. The Kosinski method finds these outliers too, but outlier 7 is
additional. The projection method shows strange results. Fourteen points have an
outlyingness above the cutoff, which is 70% of the data set. This is of course not
realistic. The reason is again the sparsity of the data set. Hence, it is rather surprising
that the Kosinski method and the methods by other authors perform relatively well
in this case.
5.2.11 Coleman data
The Coleman data contain 8 outliers according to the projection method, 7 according
to the Kosinski method. However, they agree only upon 5 points (2, 6, 10, 11, 15).

31

ROBUST MULTIVARIATE OUTLIER DETECTION

ROBUST MULTIVARIATE OUTLIER DETECTION

Recommended

Recommended

More Related Content

What's hot

What's hot (6)

Viewers also liked

Viewers also liked (15)

Similar to ROBUST MULTIVARIATE OUTLIER DETECTION

Similar to ROBUST MULTIVARIATE OUTLIER DETECTION (20)

Recently uploaded

Recently uploaded (20)

ROBUST MULTIVARIATE OUTLIER DETECTION