Upcoming SlideShare
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Standard text messaging rates apply

# Robustness under Independent Contamination Model

226

Published on

Joint UBC/SFU student seminar presentation by Mike Danilov, PhD student at UBC

Joint UBC/SFU student seminar presentation by Mike Danilov, PhD student at UBC

0 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

• Be the first to like this

Views
Total Views
226
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
2
0
Likes
0
Embeds 0
No embeds

No notes for slide

### Transcript

• 1. Robustness under Independent Contamination Mike Danilov November 21, 2009 1 / 17
• 2. Traditional robustness Deﬁnition of contamination Simple examples Weighted representation Independent Contamination The Idea Why traditional robust estimates don’t work Naive approaches Cell-weighting approach 2 / 17
• 3. The Problem (aka Disclaimer) and Terminology Estimation of mean vector µ and covariance matrix Σ of supposedly i.i.d. multivariate sample: x1 , . . . , xn ∈ Rp . Data matrix    x1 x11 x12 ... x1p  x   x21 x22 ... x2p   2  X= . = .  . . .   .   . . . . . . . .  . xn xn1 xn2 . . . xnp Vectors xi ∈ Rp – data cases Values xij ∈ R – data values or cells 3 / 17
• 4. Types of error in Statistics 1. Usual statistical error. Every observation is moderately aﬀected Xobs = Xmean + e, with e ∼ N (0, σ 2 ) where variance of e deﬁnes the quality of the data. 2. Contamination. Some observations are ruined: Xgood , usually Xobs = Xhorrible , sometimes. Typically comes on top of the usual error: Xgood = Xmean + e. 4 / 17
• 5. Mixture contamination model Observed data come from the mixture distribution F = (1 − ε)F0 (θ) + εH F0 (θ) is the distribution of interest H is an arbitrary unknown nuisance distribution. Equivalently X = (1 − B)Xgood + BXhorrible , where B is a Bernoulli(ε) indicator. Estimate T (F ): feed data from F , obtain estimates for θ. Breakdown point εBP (T ) = sup sup T (F (θ, ε, H)) < ∞ ε H that is the maximum ε such that T can still isolate F0 from H. Maximum achievable (and desirable) εBP (T ) ≤ 0.5. 5 / 17
• 6. Examples: simple robust estimates Location Median: x(n/2) n(1−δ/2) 1 Trimmed mean: x(i) , with δ ∈ (0, 1). n(1 − δ) i=nδ/2 Scale MAD: Median |xi − Median xj | i j IQR: x(n/4) − x(3n/4) Regression LMS: arg min Median(yi − β xi )2 β i 6 / 17
• 7. Examples: multivariate robust estimates Minimum Covariance Determinant (MCD) by Rousseeuw (1985): minimize determinant of sample covariance of 50% of data points: 6 Sample Covariance 4 MCD 2 Clean 0 −2 −4 −6 7 / 17
• 8. Weighted representation Many robust estimates can be represented as weighted versions of familiar estimates n i=1 wi xi ˆ µ= n i=1 wi n ˆ i=1 wi (xi − µ)(xi ˆ − µ) ˆ Σ= n , i=1 wi with weights depending on the estimates themselves ˆ ˆ wi = w(MD(xi ; µ, Σ)), where Mahalanobis Distances are given by MD(xi ; µ, Σ) = (xi − µ) Σ−1 (xi − µ). ˆ ˆ ˆ ˆ ˆ 8 / 17
• 9. Contaminated cells not cases Traditional Contamination Independent Contamination ε = 10% q q 9 / 17
• 10. Generalized Contamination Data entry errors, hardware malfunction, etc Can express as Xj = (1 − Bj )(XGood )j + Bj (XHorrible )j , for j = 1, . . . , p, or, in matrix form, as X = (1 − B)X Good + BX Horrible , where B is a vector of Bernoulli r.v.’s B’s dependence structure is important Will assume Independent Contamination: all Bj are independent and independent of X’s. Also: P[Bj = 1] = ε for simplicity. 10 / 17
• 11. Number of clean cases each case will appear as outlier if diagnosed with MD’s P[case is clean] = (1 − ε)p e.g. with ε = 0.05 and p = 20 — only 20% are clean waste of data exceeds breakdown point of traditional robust estimates. 11 / 17
• 12. Aﬃne-equivariance Deﬁnition: if data set Y = A + XB, then ˆ ˆ µ(Y ) = A + B µ(Y ) ˆ ˆ Σ(Y ) = B ΣB, Desirable: easy to study etc Most “respectable” robust estimates are A-E Alqallaf et al (2009) have a proof that reasonable A-E estimates cannot be robust against IC if know how it behaves on X, then know for Y ; and vice versa 12 / 17
• 13. Aﬃne Transformation of Contaminated Data Original Contaminated Transformed X → Y = XB −→ q q 13 / 17
• 14. Pairwise approach P[pair of variables are clean] = (1 − ε)2 (1 − ε)p ˆ Estimate all elements Σab , for a, b = 1, . . . , p separately Problem: multivariate structure is damaged/destroyed Particular problem: may not be positive-deﬁnite. May or may not be a problem. Usually is. Studied to some extent by Alqallaf (2003, PhD thesis) 14 / 17
• 15. Detecting cells Some are obvious: univariate outliers Some only show up with respect to other cells: structural outliers Van Aelst et al (2009) use Stahel-Donoho projections Little and Smith (1987) used partial Mahalanobis distances: ˆ ˆ if MD(x; µ, Σ) is large, ˆ ˆ consider MD(x−j ; µ, Σ) for all j = 1, . . . , p. Mike explores MD-approach and iterative estimation of covariances in his thesis. 15 / 17
• 16. Weighted estimate with cell weights Van Aelst et al (2009) proposed a weighted estimate, but it is pairwise and not SPD Mike knows how to deal with zero weights - remove the values and treat them as MCAR. Then do MLE via EM, for example. Proper cell-weighted estimate is still to be developed. 16 / 17
• 17. The End 17 / 17