Robustness under Independent Contamination
November 21, 2009
1 / 17
Deﬁnition of contamination
Why traditional robust estimates don’t work
2 / 17
The Problem (aka Disclaimer) and Terminology
Estimation of mean vector µ and covariance matrix Σ of
supposedly i.i.d. multivariate sample: x1 , . . . , xn ∈ Rp .
x1 x11 x12 ... x1p
x x21 x22 ... x2p
X= . = .
. . .
. . .
xn xn1 xn2 . . . xnp
Vectors xi ∈ Rp – data cases
Values xij ∈ R – data values or cells
3 / 17
Types of error in Statistics
1. Usual statistical error.
Every observation is moderately aﬀected
Xobs = Xmean + e, with e ∼ N (0, σ 2 )
where variance of e deﬁnes the quality of the data.
Some observations are ruined:
Xgood , usually
Xhorrible , sometimes.
Typically comes on top of the usual error:
Xgood = Xmean + e.
4 / 17
Mixture contamination model
Observed data come from the mixture distribution
F = (1 − ε)F0 (θ) + εH
F0 (θ) is the distribution of interest
H is an arbitrary unknown nuisance distribution.
X = (1 − B)Xgood + BXhorrible ,
where B is a Bernoulli(ε) indicator.
Estimate T (F ): feed data from F , obtain estimates for θ.
εBP (T ) = sup sup T (F (θ, ε, H)) < ∞
that is the maximum ε such that T can still isolate F0 from H.
Maximum achievable (and desirable)
εBP (T ) ≤ 0.5.
5 / 17
Examples: simple robust estimates
Trimmed mean: x(i) , with δ ∈ (0, 1).
n(1 − δ)
MAD: Median |xi − Median xj |
IQR: x(n/4) − x(3n/4)
LMS: arg min Median(yi − β xi )2
6 / 17
Examples: multivariate robust estimates
Minimum Covariance Determinant (MCD) by Rousseeuw (1985):
minimize determinant of sample covariance of 50% of data points:
7 / 17
Many robust estimates can be represented as weighted versions of
i=1 wi xi
ˆ i=1 wi (xi − µ)(xi
ˆ − µ)
Σ= n ,
with weights depending on the estimates themselves
wi = w(MD(xi ; µ, Σ)),
where Mahalanobis Distances are given by
MD(xi ; µ, Σ) = (xi − µ) Σ−1 (xi − µ).
ˆ ˆ ˆ ˆ ˆ
8 / 17
Contaminated cells not cases
Traditional Contamination Independent Contamination
ε = 10%
9 / 17
Data entry errors, hardware malfunction, etc
Can express as
Xj = (1 − Bj )(XGood )j + Bj (XHorrible )j , for j = 1, . . . , p,
or, in matrix form, as
X = (1 − B)X Good + BX Horrible ,
where B is a vector of Bernoulli r.v.’s
B’s dependence structure is important
Will assume Independent Contamination: all Bj are
independent and independent of X’s.
Also: P[Bj = 1] = ε for simplicity.
10 / 17
Number of clean cases
each case will appear as outlier if diagnosed with MD’s
P[case is clean] = (1 − ε)p
e.g. with ε = 0.05 and p = 20 — only 20% are clean
waste of data
exceeds breakdown point of traditional robust estimates.
11 / 17
Deﬁnition: if data set Y = A + XB, then
µ(Y ) = A + B µ(Y )
Σ(Y ) = B ΣB,
Desirable: easy to study etc
Most “respectable” robust estimates are A-E
Alqallaf et al (2009) have a proof that reasonable A-E
estimates cannot be robust against IC
if know how it behaves on X, then know for Y ; and vice versa
12 / 17
Aﬃne Transformation of Contaminated Data
Original Contaminated Transformed
X → Y = XB
13 / 17
P[pair of variables are clean] = (1 − ε)2 (1 − ε)p
Estimate all elements Σab , for a, b = 1, . . . , p separately
Problem: multivariate structure is damaged/destroyed
Particular problem: may not be positive-deﬁnite.
May or may not be a problem. Usually is.
Studied to some extent by Alqallaf (2003, PhD thesis)
14 / 17
Some are obvious: univariate outliers
Some only show up with respect to other cells: structural
Van Aelst et al (2009) use Stahel-Donoho projections
Little and Smith (1987) used partial Mahalanobis distances:
if MD(x; µ, Σ) is large,
consider MD(x−j ; µ, Σ) for all j = 1, . . . , p.
Mike explores MD-approach and iterative estimation of
covariances in his thesis.
15 / 17
Weighted estimate with cell weights
Van Aelst et al (2009) proposed a weighted estimate, but it is
pairwise and not SPD
Mike knows how to deal with zero weights - remove the values
and treat them as MCAR. Then do MLE via EM, for example.
Proper cell-weighted estimate is still to be developed.
16 / 17