SlideShare a Scribd company logo
1 of 69
Download to read offline
ROBUST REGRESSION METHOD
By,
SUMON JOSE
A Seminar Presentation
Under the Guidence of Dr. Jessy John
February 24, 2015
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 1 / 69
CONTENTS
1 INTRODUCTION
2 REVIEW
3 ROBUSTNESS & RESISTANCE
4 APPROACH
5 STRENGTHS & WEAKNESSES
6 M- ESTIMATORS
7 DELIVERY TIME PROBLEM
8 ANALYSIS
9 PROPERTIES
10 SURVEY OF OTHER ROBUST REGRESSION
ESTIMATORS
11 REFERENCE
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 2 / 69
INTRODUCTION
Perfomance Evaluation- Geethu Anna Jose
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 3 / 69
REVIEW
The classical linear regression model relates the
dependednt or response variables yi to independent
explanatory variables xi1, xi2, ..., xip for i = 1, .., n, such
that
yi = xT
i β + i, (1)
for i=1,...,n
where xT
i = (xi1, xi2, ..., xip), i denote the error terms and
β = (β1, β2, ..., βp)T
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 4 / 69
REVIEW
The expected value of yi called the fitted value is
ˆyi = xT
i β (2)
and one can use this to calculate the residual for the ith
case,
ri = yi − ˆyi (3)
In the case of simple linear regression model we may
calculate the value of β0 and β1 using the following
formulae:
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 5 / 69
REVIEW
β1 =
n
i=1 yi xi −
n
i=1 yi
n
i=1 xi
n
n
i=1 x2
i −
( n
i=1 xi )2
n
(4)
β0 = y − ˆβ1x (5)
The vector of fitted values ˆyi curresponding to the
observed value yi may be expressed as follows:
ˆy = X ˆβ (6)
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 6 / 69
REVIEW
Limitations of Least Square Estimator
Extremly sensitive to deviations from the model
assumptions (as normal distribution is assumed for the
errors).
Drastically changed by the effect of outliers.
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 7 / 69
REVIEW
What About Deleting Outliers Before Analysis
All the Outliers need not be erroneous data, they
could be exceptional occurances
Some of such Outliers could be the result of some
factors not considered in the current study
So in general, unusual observations are not always bad
observations. Moreover in large data it is often very
difficult to spot out the outlying data.
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 8 / 69
ROBUSTNESS AND RESISTANCE
Resistant Regression Estimators
Definition
The Resistant regression estimators are primarily
concerned with robustness of validity: meaning that their
main concern is to prevent unsual observations from
affecting the estimates produced.
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 9 / 69
ROBUSTNESS AND RESISTANCE
Robust Regression Estimators
Definition
They are concerned with both robustness of efficiency and
robustness of validity, meaning that they should also
maintain a small sampling variance, even when the data
does not fit the assumed distribution.
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 10 / 69
ROBUSTNESS AND RESISTANCE
⇒ In general Robust regression estimators aim to fit
a model that describes the majority of a sample.
⇒ Their robustness is achieved by giving the data
different weights
⇒ Whereas in Least Square Approximation all data
are treated equally.
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 11 / 69
APPROACH
Robust Estimation methods are powerful tools in
detection of outliers in complicated data sets.
But unless the data is very well behaved, different
estimators would give different estimates.
On their own, they do not provide a final model.
A healthy approach would be to employ both robust
regression methods as well as least square method to
compare the results.
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 12 / 69
STRENGTHS & WEAKNESSES
Finite Sample Breakdown Point
Definition
Breakdown Point (BDP) is the measure of the resistance
of an estimator. The BDP of a regression estimator is the
smallest fraction of contamination that can cause the
estimator to ’breakdown’ and no longer represent the
trend of the data.
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 13 / 69
STRENGTHS & WEAKNESSES
When an estimator breaks down, the estimate it produces
from the contaminated data can become arbitrarily far
from the estimate than it would give when the data was
uncontaminated.
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 14 / 69
STRENGTHS & WEAKNESSES
In order to describe the BDP mathematically, define T as
a regression estimator, Z as a sample of n data points and
T(Z) = ˆβ. Let Z be the corrupted sample where m of
the original data points are replaced with arbitrary values.
The maximum effect that could be caused by such
contamination is
effect(m; T, Z) = supz |T(Z ) − T(Z)| (7)
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 15 / 69
STRENGTHS & WEAKNESS
When (7) is infinite, an outlier can have an arbitrarily
large effect on T. The BDP of T at the sample Z is
therefore defined as:
BDP(T, Z) = min{
m
n
: effect(M; T, Z)is infinite} (8)
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 16 / 69
STRENGTH & WEAKNESSES
The Least Square Method estimator for example has a
breakdown point of 1
n because just one leverage point can
cause it to breakdown. As the number of data increases,
the breakdown point tends to 0 and so it is said to that
the least squares estimator has BDP 0%.
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 17 / 69
STRENGTH & WEAKNESS
Remark
The highest breakdown point one can hope for is 50% as
if more than half the data is contaminated that one
cannot differentiate between ’good’ and ’bad’ data.
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 18 / 69
STRENGTH & WEAKNESSES
Relative Efficiency of an Estimator
Definition
The efficiency of an estimator for a particular parameter is
defined as the ratio of its minimum possible variance to
its actual variance. Strictly, an estimator is considered
’efficient’ when this ratio is one.
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 19 / 69
STRENGTH & WEAKNESSES
High efficiency is crucial for an estimator if the intention
is to use an estimate from sample data to make inference
about the larger population from which the same was
drawn.
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 20 / 69
STRENGTH & WEAKNESSES
Relative Efficiency
Relative efficiency compares the efficiency of an
estimator to that of a well known method.
In the context of regression, estimators are compared
to the least squares estimator which is the most
efficient estimator known.
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 21 / 69
STRENGTH & WEAKNESSES
Given two estimators T1 and T2 for a population
parameter β, where T1 is the most efficient estimator
possible and T2 is less efficient, the relative efficiency of
T2 is calculated as the ratio of its mean squared error to
the mean squared error of T1
Efficiency(T1, T2) =
E[(T1 − β)2
]
E[(T2 − β)2]
(9)
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 22 / 69
M-ESTIMATORS
Introduction
1 Were first proposed by Huber(1973)
2 But the early ones had the weakness in terms of one
or more of the desired properties
3 From them developed the modern means
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 23 / 69
M-ESTIMATORS
Maximum Likelihood Type Estimators
M-estimation is based on the idea that while we still want
a maximum likelihood estimator, the errrors might be
better represented by a different, heavier tailed
distribution.
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 24 / 69
M-ESTIMATORS
If the probability distribution function of the error of f ( i),
then the maximum likelihood estimator for β is that
which maximizes the likelihood function
n
i=1
f ( i) =
n
i=1
f (yi − xT
i β) (10)
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 25 / 69
M-ESTIMATORS
This means, it also maximizes the log-likelihood function
n
i=1
ln f ( i) =
n
i=1
ln f (yi − xT
i β) (11)
When the errrors are normally distributed it has been
shown that this leads to minimising the sum of squared
residuals, which is the ordinary least square method.
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 26 / 69
M-ESTIMATORS
Assuming the the errors are differently distributed, leads to
the maximum likelihood esimator, minimising a different
function. Using this idea, an M-estimator ˆβ minimizes
n
i=1
ρ( i) =
n
i=1
ρ(yi − xT
i β) (12)
where ρ(u) is a continuous, symmetric function called the
objectve function with a unique minimum at 0.
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 27 / 69
M-ESTIMATORS
1 Knowing the appropriate ρ(u) to use requires
knowledge of how the errors are really distributed.
2 Functions are usually chosen through consideration of
how the resulting estimator down-weights the larger
residuals
3 A Robust M-estimator achieves this by minimizing the
sum of a less rapidly increasing objective function than
the ρ(u) = u2
of the least squares
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 28 / 69
M-ESTIMATORS
Constructing a Scale Equivariant Estimator
The M-estimator is not necessarily scale invariant i.e. if
the errors yi − xT
i β were multiplied by a constant, the
new solution to the above equation might not be the
same as the scaled version of the old one.
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 29 / 69
M-ESTIMATORS
To obtain a scale invariant version of this estimator we
usually solve,
n
i=1
ρ(
i
s
) =
n
i=1
ρ(
yi − xT
i β
s
) (13)
where s is a robust estimate of scale.
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 30 / 69
M-ESTIMATORS
A popular choice for s is the re-scaled median absolute
deivation
s = 1.4826XMAD (14)
where MAD is the Median Absolute Deviation
MAD = Median|yi − xT
i
ˆβ| = Median| i| (15)
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 31 / 69
M-ESTIMATORS
’s’ is highly resistant to outlying observations, with BDP
50%, as it is based on the median rather than the mean.
The estimator rescales MAD by the factor 1.4826 so that
when the sample is large and i really distributed as
N(0, σ2
)), s estimates the standard deviation.
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 32 / 69
M-ESTIMATORS
With a large sample and i ∼ N(0, σ2
):
P(| i| < MAD) ≈ 0.5
⇒ P(| i −0
σ | < MAD
σ ) ≈ 0.5
⇒ P(|Z| < MAD
σ ) ≈ 0.5
⇒ MAD
σ ≈ Φ−1
(0.75)
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 33 / 69
M-ESTIMATORS
⇒ MAD
Φ−1 ≈ σ
1.4826 X MAD ≈ σ
Thus the tuning constant 1.4826 makes s an
approximately unbiased estimator of σ if n is large and the
error distribution is normal.
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 34 / 69
M-ESTIMATORS
Finding an M-Estimator
To obtain an M-estimate we solve,
Minimizeβ
n
i=1
ρ(
i
s
) = Minimizeβ
n
i=1
ρ(
yi − xi β
s
) (16)
For that we equate the first partial derivatives of ρ with
respect to βj (j=0,1,2,3,...,k) to zero, yielding a necessary
condition for a minimum.
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 35 / 69
M-ESTIMATORS
This gives a system of p = k + 1 equations
n
i=1
Xijψ(
yi − xi β
s
) = 0, j = 0, 1, 2, ..., k (17)
where ψ = ρ and Xij is the ith
observation on the jth
regressor and xi0 = 1. In general ψ is a non-linear
function and so equation (17) must be solved iteratively.
The most widely used method to find this is the method
of iteratively reweighted least squares.
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 36 / 69
M-ESTIMATORS
To use iteratively reweighted least squares suppose that an
initial estimate of ˆβ0 is available and that s is an estimate
of the scale. Then we write the p = k + 1 equations as:
n
i=1
Xij ψ(
yi − xi β
s
) =
n
i=1
xij {ψ[(yi − xi β)/s]/(yi − xi β)/s}(yi − xi β)
s
= 0
(18)
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 37 / 69
M-ESTIMATORS
as
n
i=1
XijW 0
i (yi − xiβ) = 0, j = 0, 1, 2, ..., k (19)
where
W 0
i =



ψ[
(yi −xi β)
s ]
(yi −x
i
β)
s
if yi = xi
ˆβ0
1 if yi = xi
ˆβ0
(20)
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 38 / 69
M-ESTIMATORS
We may write the above equation in matrix form as
follows:
X W 0
Xβ = X W 0
y (21)
where W0 is an n X n diagonal matrix of weights with
diagonal elements given by the expression
W 0
i =



ψ[
(yi −xi β)
s ]
(yi −x
i
β)
s
if yi = xi
ˆβ0
1 if yi = xi
ˆβ0
(22)
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 39 / 69
M-ESTIMATORS
From the matrix form we realize that the expression is
same as that of the usual weighted least squares normal
equation. Consequently the one step estimator is
ˆβ1 = (X W 0
X)−1
X W 0
y (23)
At the next step we recompute the weights from the
equation for W but using ˆβ1 and not ˆβ0
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 40 / 69
M-ESTIMATORS
NOTE:
Usually only a few iterations are required to obtain
convergence
It could be easily be implemented by a computer
programme.
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 41 / 69
M-ESTIMATORS
Re-Descending Estimators
Re- descending M estimators are those which have
influence functions that are non decreasing near the origin
but decreasing towards zero far from the origin.
Their ψ can be chosen to redescend smoothly to zero, so
that they usually satisfy ψ(x) = 0 for all |x| > r where r
is referred to as the minimum rejection point.
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 42 / 69
M-ESTIMATORS
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 43 / 69
M-ESTIMATORS
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 44 / 69
M-ESTIMATORS
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 45 / 69
M-ESTIMATORS
Robust Criterion Functions
Citerion ρ ψ(z) w(x) range
Least
Squares z2
2 z 1.0 |z| < ∞
Huber’s
t-function z2
2 z 1.0 |z| < t
t = 2 |z|t − t2
2 tsign(z) t
|z| |x| > t
Andrew’s
Wave function a(1 − cos(z
a)) sin(z
a)
sin(z
a )
z
a
|z| ≤ aπ
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 46 / 69
DELIVERY TIME PROBLEM
Problem
A Softdrink bottler is analyzing the vending machine service routes in his
distriution system. He is interested in predicting the amount of time
required by the route driver to service the vending machines in an outlet.
This service activity includes stocking the machine with beverage products
and minor maintenance or housekeeping. The industrial engineer
responsible for the study has suggested that the two most important
variables affecting the delivery time (y) are the numer of cases of product
stocked (x1) and the distance walked by the route driver (x2). The
engineer has collected 25 observations on delivery time, which are shown
in the following table. Fit a regression model into it.
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 47 / 69
DELIVERY TIME PROBLEM
Table of Data
Observation Delivery time Number of cases Distance in Feets
i (in minutes) y x1 x2
1 16.8 7 560
2 11.50 3 320
3 12.03 3 340
4 14.88 4 80
5 13.75 6 150
6 18.11 7 330
7 8 2 110
8 17.83 7 210
9 79.24 30 1460
10 21.50 5 605
11 40.33 16 688
12 21 10 215
13 13.50 4 255
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 48 / 69
DELIVERY TIME PROBLEM
Observation Delivery time Number of cases Distance in Feets
(in minutes) y x1 x2
14 19.75 6 462
15 24.00 9 448
16 29.00 10 776
17 15.35 6 200
18 19.00 7 132
19 9.50 3 36
20 35.10 17 770
21 17.90 10 140
22 52.32 26 810
23 18.75 9 450
24 19.83 8 635
25 10.75 4 150
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 49 / 69
DELIVERY TIME PROBLEM
Least Square Fit of the Delivery Time Data
Obs. yi ˆyi ei Weight
1 .166800E+02 .217081E+02 -.502808E+01 .100000E+01
2 0115000E+02 .103536E+02 .114639E+01 .100000E+01
3 .120300E+02 .120798E+02 -.497937E-01 .100000E+01
4 .148800E+02 .995565E+01 .492435E+01 .100000E+01
5 .137500E+02 .141944E+02 -.444398E+00 .100000E+01
6 .181100E+02 .183996E+02 -.289574E+00 .100000E+01
7 .800000E+01 .715538E+01 .844624E+00 .100000E+01
8 .178300E+02 .166734E+02 .115660E+02 .100000E+01
9 .792400E+02 .718203E+02 .741971E+01 .100000E+01
10 .215000E+02 .191236E+02 .237641E+01 .100000E+01
11 .403300E+02 .380925E+02 .223749E+01 .100000E+01
12 .2100000E+02 .215930E+02 -.593041E+00 .100000E+01
13 .135000E+02 .124730E+02 .102701E+01 .100000E+01
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 50 / 69
DELIVERY TIME PROBLEM
Obs. yi ˆyi ei Weight
14 .197500E+02 .186825E+02 .106754E+01 .100000E+01
15 .240000E+02 .233288E+02 .671202E+00 .100000E+01
16 .290000E+02 .296629E+02 -.662928E+00 .100000E+01
17 .153500E+02 .149136E+02 .436360E+00 .100000E+01
18 .190000E+02 .155514E+02 .344862E+01 .100000E+01
19 .950000E+01 .770681E+01 .179319E+01 .100000E+01
20 .351000E+02 .408880E+02 -.578797E+01 .100000E+01
21 .179000E+02 .205142E+02 -.261418E+01 .100000E+01
22 .523200E+02 .560065E+02 -.368653E+01 .100000E+01
23 .187500E+02 .233576E+02 -.460757E+01 .100000E+01
24 .198300E+02 .244029E+02 -.457285E+01 .100000E+01
25 .107500E+02 .109626E+02 -.212584E+00 .100000E+01
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 51 / 69
DELIVERY TIME PROBLEM
Accordingly we have the following values for the
parameters:
ˆβ0 = 2.3412
ˆβ1 = 1.6159
ˆβ2 = 0.014385 Thus we have the regression line as
follows:
yi = 2.3412 + 1.6159x1 + 0.014385x2 (24)
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 52 / 69
DELIVERY TIME PROBLEM
Huber’s t-Function, t=2
Obs. yi ˆyi ei Weight
1 .166800E+02 .217651E+02 -.508511E+01 .639744E+00
2 .115000E+02 .109809E+02 .519115E+00 .100000E+01
3 .120300E+02 .126296E+02 -.599594E+00 .100000E+01
4 .148800E+02 .105856E+02 .429439E+01 .757165E+00
5 .137500E+02 .146038E+02 -.853800E+00 .100000E+01
6 .181100E+02 .186051E+02 -.495085E+00 .100000E+01
7 .800000E+01 .794135E+01 .586521E-01 .100000E+01
8 .178300E+02 .169564E+02 .873625E+00 .100000E+01
9 .792400E+02 .692795E+02 .996050E+01 .327017E+00
10 .215000E+02 .193269E+02 .217307E+01 .100000E+01
11 .403300E+02 .372777E+02 .305228E+01 .100000E+01
12 .210000E+02 .216097E+02 -.609734E+00 .100000E+01
13 .135000E+02 .129900E+02 .510021E+00 .100000E+01
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 53 / 69
DELIVERY TIME PROBLEM
Obs. yi ˆyi ei Weight
i
14 .197500E+02 .188904E+02 .859556E+00 .100000E+01
15 .240000E+02 .232828E+02 .717244E+00 .100000E+01
16 .290000E+02 .293174E+02 -.317449E+00 .100000E+01
17 .153500E+02 .152908E+02 .592377E-01 .100000E+01
18 .190000E+02 .158847E+02 .311529E+01 .100000E+01
19 .950000E+01 .845286E+01 .104714E+01 .100000E+01
20 .351000E+02 .399326E+02 -.483256E+01 .672828E+00
21 .179000E+02 .205793E+02 -.267929E+01 .100000E+01
22 .523200E+02 .542361E+02 -.191611E+01 .100000E+01
23 .187500E+02 .233102E+02 -.456023E+01 .713481E+00
24 .198300E+02 .243238E+02 .449377E+01 .723794E+00
25 .107500E+02 .115474E+02 -.797359E+00 .100000E+01
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 54 / 69
DELIVERY TIME PROBLEM
Accordingly we get the values of the parameters as
follows: ˆβ0 = 3.3736
ˆβ1 = 1.5282
ˆβ2 = 0.013739
Thus we get the regression line as follows:
yi = 3.3736 + 1.5282x1 + 0.013739x2 (25)
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 55 / 69
DELIVERY TIME PROBLEM
Andrew’s Wave Function with a = 1.48
Obs. yi ˆyi ei Weight
i
1 .166800E+02 .216430E+02 -.496300E+01 .427594E+00
2 .115000E+02 .116923E+02 -.192338E+00 .998944E+00
3 .120300E+02 .131457E+02 .-.111570E+01 .964551E+00
4 .148800E+02 .114549E+02 .342506E+01 .694894E+00
5 .137500E+02 .152191E+02 -.146914E+01 .939284E+00
6 .181100E+01 .188574E+02 -.747381E+00 .984039E+00
7 .800000E+01 .890189E+01 .901888E+00 .976864E+00
8 .178300E+02 ..174040E+02 ..425984E+00 .994747E+00
9 .792400E+02 .660818E+02 .131582E+02 .0
10 .215000E+02 .192716E+02 .222839E+01 .863633E+00
11 .403300E+02 .363170E+02 .401296E+01 .597491E+00
12 .210000E+02 .218392E+02 -.839167E+00 .980003E+00
13 .135000E02 .135744E+02 -.744338E+01 .999843E+00
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 56 / 69
DELIVERY TIME PROBLEM
Obs. yi ˆyi ei Weight
i
14 .197500E+02 .198979E+02 .752115E+00 .983877E+00
15 .240000E+02 .232029E+02 .797080E+00 .981854E+00
16 ..290000E+02 .286336E+02 .366350E+00 .996228E+00
17 .153500E+02 .158247E+02 -.474704E+00 .993580E+00
18 .190000E+02 .164593E+02 .254067E+01 .824146E+00
19 .950000E+01 .946384E+01 .361558E-01 .999936E+00
20 .351000E+02 .387684E+02 -.366837E+01 .655336E+00
21 .179000E+02 .209308E+02 -.303081E+01 .756603E+00
22 .523200E+02 .523766E+02 -.566063E-01 .999908E+00
23 .187500E+02 .232271E+02 .-.447714E+01 .515506E+00
24 .198300E+02 .240095E+02 -.417955E+01 .567792E+00
25 .107500E+02 .123027E+02 -1.55274E+01 .932266E+00
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 57 / 69
DELIVERY TIME PROBLEM
Thus we have the estimates as follows:
ˆβ0 = 4.6532
ˆβ1 = 1.4582
ˆβ2 = 0.012111
Thus we get the regression line as follows:
yi = 4.6532 + 1.4582x1 + 0.012111x2 (26)
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 58 / 69
ANALYSIS
Computing M-Estimators
Robust regression methods are not an option in most
statistical software today.
SAS, PROC, NLIN etc can be used to implement
iteratively reweighted least squares procedure.
There are also Robust procedures available in S-Pluz.
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 59 / 69
ANALYSIS
Robust Regression Methods...
Robust regression methods have much to offer a data
analyst.
They will be extremly helpful in locating outliers and
hightly influential observations.
Whenever a least squares analysis is perfomed it would
be useful to perform a robust fit also.
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 60 / 69
ANALYSIS
If the results of both the fit are in substantial
agreement, the use of Least Square Procedure offers a
good estimation of the parameters.
If the results of both the procedures are not in
agreement, the reason for the difference should be
identified and corrected.
Special attention need to be given to observations
that are down weighted in the robust fit.
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 61 / 69
PROPERTIES
Breakdown Point The finite sample breakdown point is
the smallest fraction of anomalous data that can cause the
estimator to be useless. The smallest possible breakdown
poit is 1
n, i.e. s single observation can distort the estimator
so badly that it is of no practical use to the regression
model builder. The breakdown point of OLS is 1
n.
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 62 / 69
PROPERTIES
M-estimators can be affected by x-space outliers in an
identical manner to OLS.
Consequently, the breakdown point of the class of m
estimators is 1
n as well.
We would generally want the breakdown point of an
estimator to exceed 10%.
This has led to the development of High Breakdown
point estimators.
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 63 / 69
PROPERTIES
Efficiency
The M estimators have a higher efficiency than the least
squares, i.e. they behave well even as the size of the
sample increases to ∞.
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 64 / 69
SURVEY OF OTHER ROBUST
REGRESSION ESTIMATORS
High Break Down Point Estimators Because both the
OLS and M-estimator suffer from a low breakdown point
1
n, considerable effort has been devoted to finding
estimators that perform better with respect to this
property. Often a break down point of 50% is desirable.
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 65 / 69
SURVEY OF OTHER ROBUST
REGRESSION ESTIMATORS
There are various other estimation procedures like
Least Median of Squares
Least Trimmed Sum of Squres
S Estimators
R and L Estimators
Robust Ridge regression
MM Estimation etc.
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 66 / 69
ABSTRACT & CONCLUSION
Review ⇒ Robustness and Resistance ⇒
Our Approach ⇒ Strengths and Weaknesses
⇒ M-Estimators ⇒ Delivery time
problem ⇒ Analysis ⇒ Properties ⇒
Survey of other Robust Regression Estimators
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 67 / 69
REFERENCE
1 Draper, R Norman. & Smith, Harry. “Applied Regression
Analysis”, 3rd edn., John Wiley and Sons, New York, 1998.
2 Montgomery, C Douglas. Peck, A Elizabeth. & Vining, Geoffrey
G. “Introduction to Linear Regression Analysis”, 3rd edn., Wiley
India, 2003.
3 Brook J, Richard. “Applied Regression Analysis and
Experimental Design”, Chapman & Hall, London, 1985.
4 Rawlings O, John. “Applied Regression Analysis: A Research
Tool”, Springer, New York, 1989.
5 Pedhazar, Elazar J. “Multiple Regression in Behavioural Research:
Explanation and Prediction”, Wadsworth, Australia, 1997
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 68 / 69
THANK YOU
SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 69 / 69

More Related Content

What's hot

4.5. logistic regression
4.5. logistic regression4.5. logistic regression
4.5. logistic regressionA M
 
Regression analysis algorithm
Regression analysis algorithm Regression analysis algorithm
Regression analysis algorithm Sammer Qader
 
Multiple Linear Regression
Multiple Linear RegressionMultiple Linear Regression
Multiple Linear RegressionIndus University
 
Ensemble methods in machine learning
Ensemble methods in machine learningEnsemble methods in machine learning
Ensemble methods in machine learningSANTHOSH RAJA M G
 
Introduction to principal component analysis (pca)
Introduction to principal component analysis (pca)Introduction to principal component analysis (pca)
Introduction to principal component analysis (pca)Mohammed Musah
 
Least Squares Regression Method | Edureka
Least Squares Regression Method | EdurekaLeast Squares Regression Method | Edureka
Least Squares Regression Method | EdurekaEdureka!
 
Maximum likelihood estimation
Maximum likelihood estimationMaximum likelihood estimation
Maximum likelihood estimationzihad164
 
Statistical distributions
Statistical distributionsStatistical distributions
Statistical distributionsTanveerRehman4
 
Normal Distribution Presentation
Normal Distribution PresentationNormal Distribution Presentation
Normal Distribution Presentationsankarshanjoshi
 
Pca(principal components analysis)
Pca(principal components analysis)Pca(principal components analysis)
Pca(principal components analysis)kalung0313
 
ML - Multiple Linear Regression
ML - Multiple Linear RegressionML - Multiple Linear Regression
ML - Multiple Linear RegressionAndrew Ferlitsch
 
Multiple linear regression
Multiple linear regressionMultiple linear regression
Multiple linear regressionJames Neill
 

What's hot (20)

4.5. logistic regression
4.5. logistic regression4.5. logistic regression
4.5. logistic regression
 
Linear regression analysis
Linear regression analysisLinear regression analysis
Linear regression analysis
 
Regression analysis algorithm
Regression analysis algorithm Regression analysis algorithm
Regression analysis algorithm
 
Multiple Linear Regression
Multiple Linear RegressionMultiple Linear Regression
Multiple Linear Regression
 
Ensemble methods in machine learning
Ensemble methods in machine learningEnsemble methods in machine learning
Ensemble methods in machine learning
 
Ridge regression
Ridge regressionRidge regression
Ridge regression
 
Introduction to principal component analysis (pca)
Introduction to principal component analysis (pca)Introduction to principal component analysis (pca)
Introduction to principal component analysis (pca)
 
Least Squares Regression Method | Edureka
Least Squares Regression Method | EdurekaLeast Squares Regression Method | Edureka
Least Squares Regression Method | Edureka
 
Maximum likelihood estimation
Maximum likelihood estimationMaximum likelihood estimation
Maximum likelihood estimation
 
Statistical distributions
Statistical distributionsStatistical distributions
Statistical distributions
 
Logistic regression
Logistic regressionLogistic regression
Logistic regression
 
Normal Distribution Presentation
Normal Distribution PresentationNormal Distribution Presentation
Normal Distribution Presentation
 
Poisson regression models for count data
Poisson regression models for count dataPoisson regression models for count data
Poisson regression models for count data
 
Logistic Regression Analysis
Logistic Regression AnalysisLogistic Regression Analysis
Logistic Regression Analysis
 
Regression Analysis
Regression AnalysisRegression Analysis
Regression Analysis
 
Pca(principal components analysis)
Pca(principal components analysis)Pca(principal components analysis)
Pca(principal components analysis)
 
ML - Multiple Linear Regression
ML - Multiple Linear RegressionML - Multiple Linear Regression
ML - Multiple Linear Regression
 
Statistical Distributions
Statistical DistributionsStatistical Distributions
Statistical Distributions
 
Multiple linear regression
Multiple linear regressionMultiple linear regression
Multiple linear regression
 
Ordinal Logistic Regression
Ordinal Logistic RegressionOrdinal Logistic Regression
Ordinal Logistic Regression
 

Viewers also liked

A_Study_on_the_Medieval_Kerala_School_of_Mathematics
A_Study_on_the_Medieval_Kerala_School_of_MathematicsA_Study_on_the_Medieval_Kerala_School_of_Mathematics
A_Study_on_the_Medieval_Kerala_School_of_MathematicsSumon Sdb
 
Ridge regression, lasso and elastic net
Ridge regression, lasso and elastic netRidge regression, lasso and elastic net
Ridge regression, lasso and elastic netVivian S. Zhang
 
4thchannel conference poster_freedom_gumedze
4thchannel conference poster_freedom_gumedze4thchannel conference poster_freedom_gumedze
4thchannel conference poster_freedom_gumedzeFreedom Gumedze
 
Outlier detection for high dimensional data
Outlier detection for high dimensional dataOutlier detection for high dimensional data
Outlier detection for high dimensional dataParag Tamhane
 
5.7 poisson regression in the analysis of cohort data
5.7 poisson regression in the analysis of  cohort data5.7 poisson regression in the analysis of  cohort data
5.7 poisson regression in the analysis of cohort dataA M
 
Reading the Lasso 1996 paper by Robert Tibshirani
Reading the Lasso 1996 paper by Robert TibshiraniReading the Lasso 1996 paper by Robert Tibshirani
Reading the Lasso 1996 paper by Robert TibshiraniChristian Robert
 
Multicollinearity1
Multicollinearity1Multicollinearity1
Multicollinearity1Muhammad Ali
 
Multicolinearity
MulticolinearityMulticolinearity
MulticolinearityPawan Kawan
 
Apprentissage automatique, Régression Ridge et LASSO
Apprentissage automatique, Régression Ridge et LASSOApprentissage automatique, Régression Ridge et LASSO
Apprentissage automatique, Régression Ridge et LASSOPierre-Hugues Carmichael
 
Slideshare.Com Powerpoint
Slideshare.Com PowerpointSlideshare.Com Powerpoint
Slideshare.Com Powerpointguested929b
 
Data Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
Data Science - Part XII - Ridge Regression, LASSO, and Elastic NetsData Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
Data Science - Part XII - Ridge Regression, LASSO, and Elastic NetsDerek Kane
 
Slideshare Powerpoint presentation
Slideshare Powerpoint presentationSlideshare Powerpoint presentation
Slideshare Powerpoint presentationelliehood
 

Viewers also liked (18)

A_Study_on_the_Medieval_Kerala_School_of_Mathematics
A_Study_on_the_Medieval_Kerala_School_of_MathematicsA_Study_on_the_Medieval_Kerala_School_of_Mathematics
A_Study_on_the_Medieval_Kerala_School_of_Mathematics
 
Ridge regression, lasso and elastic net
Ridge regression, lasso and elastic netRidge regression, lasso and elastic net
Ridge regression, lasso and elastic net
 
4thchannel conference poster_freedom_gumedze
4thchannel conference poster_freedom_gumedze4thchannel conference poster_freedom_gumedze
4thchannel conference poster_freedom_gumedze
 
Lasso
LassoLasso
Lasso
 
Seminarppt
SeminarpptSeminarppt
Seminarppt
 
Outlier detection for high dimensional data
Outlier detection for high dimensional dataOutlier detection for high dimensional data
Outlier detection for high dimensional data
 
5.7 poisson regression in the analysis of cohort data
5.7 poisson regression in the analysis of  cohort data5.7 poisson regression in the analysis of  cohort data
5.7 poisson regression in the analysis of cohort data
 
Reading the Lasso 1996 paper by Robert Tibshirani
Reading the Lasso 1996 paper by Robert TibshiraniReading the Lasso 1996 paper by Robert Tibshirani
Reading the Lasso 1996 paper by Robert Tibshirani
 
C2.5
C2.5C2.5
C2.5
 
Diagnostic in poisson regression models
Diagnostic in poisson regression modelsDiagnostic in poisson regression models
Diagnostic in poisson regression models
 
Lasso regression
Lasso regressionLasso regression
Lasso regression
 
Multicollinearity1
Multicollinearity1Multicollinearity1
Multicollinearity1
 
Multicolinearity
MulticolinearityMulticolinearity
Multicolinearity
 
Module1
Module1Module1
Module1
 
Apprentissage automatique, Régression Ridge et LASSO
Apprentissage automatique, Régression Ridge et LASSOApprentissage automatique, Régression Ridge et LASSO
Apprentissage automatique, Régression Ridge et LASSO
 
Slideshare.Com Powerpoint
Slideshare.Com PowerpointSlideshare.Com Powerpoint
Slideshare.Com Powerpoint
 
Data Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
Data Science - Part XII - Ridge Regression, LASSO, and Elastic NetsData Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
Data Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
 
Slideshare Powerpoint presentation
Slideshare Powerpoint presentationSlideshare Powerpoint presentation
Slideshare Powerpoint presentation
 

Similar to Seminar on Robust Regression Methods

Chapter 07 - Autocorrelation.pptx
Chapter 07 - Autocorrelation.pptxChapter 07 - Autocorrelation.pptx
Chapter 07 - Autocorrelation.pptxFarah Amir
 
Applied Business Statistics ,ken black , ch 3 part 1
Applied Business Statistics ,ken black , ch 3 part 1Applied Business Statistics ,ken black , ch 3 part 1
Applied Business Statistics ,ken black , ch 3 part 1AbdelmonsifFadl
 
LESSON 04 - Descriptive Satatistics.pdf
LESSON 04 - Descriptive Satatistics.pdfLESSON 04 - Descriptive Satatistics.pdf
LESSON 04 - Descriptive Satatistics.pdfICOMICOM4
 
Identification of Outliersin Time Series Data via Simulation Study
Identification of Outliersin Time Series Data via Simulation StudyIdentification of Outliersin Time Series Data via Simulation Study
Identification of Outliersin Time Series Data via Simulation Studyiosrjce
 
Javier Garcia - Verdugo Sanchez - Six Sigma Training - W1 Statistical Methods
Javier Garcia - Verdugo Sanchez - Six Sigma Training - W1 Statistical MethodsJavier Garcia - Verdugo Sanchez - Six Sigma Training - W1 Statistical Methods
Javier Garcia - Verdugo Sanchez - Six Sigma Training - W1 Statistical MethodsJ. García - Verdugo
 
Ch5_slides Qwertr12234543234433444344.ppt
Ch5_slides Qwertr12234543234433444344.pptCh5_slides Qwertr12234543234433444344.ppt
Ch5_slides Qwertr12234543234433444344.pptsadafshahbaz7777
 
Multinomial Logistic Regression Analysis
Multinomial Logistic Regression AnalysisMultinomial Logistic Regression Analysis
Multinomial Logistic Regression AnalysisHARISH Kumar H R
 
Media attention in Belgium: How much influence do citizens and politicians ha...
Media attention in Belgium: How much influence do citizens and politicians ha...Media attention in Belgium: How much influence do citizens and politicians ha...
Media attention in Belgium: How much influence do citizens and politicians ha...Mark Boukes (University of Amsterdam)
 
20150404 rm - autocorrelation
20150404   rm - autocorrelation20150404   rm - autocorrelation
20150404 rm - autocorrelationQatar University
 
Measuring credit risk in a large banking system: econometric modeling and emp...
Measuring credit risk in a large banking system: econometric modeling and emp...Measuring credit risk in a large banking system: econometric modeling and emp...
Measuring credit risk in a large banking system: econometric modeling and emp...SYRTO Project
 
Desinging dsp (0, 1) acceptance sampling plans based on
Desinging dsp (0, 1) acceptance sampling plans based onDesinging dsp (0, 1) acceptance sampling plans based on
Desinging dsp (0, 1) acceptance sampling plans based oneSAT Publishing House
 

Similar to Seminar on Robust Regression Methods (20)

Chapter 07 - Autocorrelation.pptx
Chapter 07 - Autocorrelation.pptxChapter 07 - Autocorrelation.pptx
Chapter 07 - Autocorrelation.pptx
 
Applied Business Statistics ,ken black , ch 3 part 1
Applied Business Statistics ,ken black , ch 3 part 1Applied Business Statistics ,ken black , ch 3 part 1
Applied Business Statistics ,ken black , ch 3 part 1
 
LESSON 04 - Descriptive Satatistics.pdf
LESSON 04 - Descriptive Satatistics.pdfLESSON 04 - Descriptive Satatistics.pdf
LESSON 04 - Descriptive Satatistics.pdf
 
Change Point Analysis
Change Point AnalysisChange Point Analysis
Change Point Analysis
 
report
reportreport
report
 
Identification of Outliersin Time Series Data via Simulation Study
Identification of Outliersin Time Series Data via Simulation StudyIdentification of Outliersin Time Series Data via Simulation Study
Identification of Outliersin Time Series Data via Simulation Study
 
Biostatistics in Bioequivalence
Biostatistics in BioequivalenceBiostatistics in Bioequivalence
Biostatistics in Bioequivalence
 
Javier Garcia - Verdugo Sanchez - Six Sigma Training - W1 Statistical Methods
Javier Garcia - Verdugo Sanchez - Six Sigma Training - W1 Statistical MethodsJavier Garcia - Verdugo Sanchez - Six Sigma Training - W1 Statistical Methods
Javier Garcia - Verdugo Sanchez - Six Sigma Training - W1 Statistical Methods
 
Ch5_slides.ppt
Ch5_slides.pptCh5_slides.ppt
Ch5_slides.ppt
 
Ch5_slides Qwertr12234543234433444344.ppt
Ch5_slides Qwertr12234543234433444344.pptCh5_slides Qwertr12234543234433444344.ppt
Ch5_slides Qwertr12234543234433444344.ppt
 
Ch5_slides.ppt
Ch5_slides.pptCh5_slides.ppt
Ch5_slides.ppt
 
Ch5 slides
Ch5 slidesCh5 slides
Ch5 slides
 
Multinomial Logistic Regression Analysis
Multinomial Logistic Regression AnalysisMultinomial Logistic Regression Analysis
Multinomial Logistic Regression Analysis
 
FMI output gap
FMI output gapFMI output gap
FMI output gap
 
Wp13105
Wp13105Wp13105
Wp13105
 
Media attention in Belgium: How much influence do citizens and politicians ha...
Media attention in Belgium: How much influence do citizens and politicians ha...Media attention in Belgium: How much influence do citizens and politicians ha...
Media attention in Belgium: How much influence do citizens and politicians ha...
 
G0211056062
G0211056062G0211056062
G0211056062
 
20150404 rm - autocorrelation
20150404   rm - autocorrelation20150404   rm - autocorrelation
20150404 rm - autocorrelation
 
Measuring credit risk in a large banking system: econometric modeling and emp...
Measuring credit risk in a large banking system: econometric modeling and emp...Measuring credit risk in a large banking system: econometric modeling and emp...
Measuring credit risk in a large banking system: econometric modeling and emp...
 
Desinging dsp (0, 1) acceptance sampling plans based on
Desinging dsp (0, 1) acceptance sampling plans based onDesinging dsp (0, 1) acceptance sampling plans based on
Desinging dsp (0, 1) acceptance sampling plans based on
 

Seminar on Robust Regression Methods

  • 1. ROBUST REGRESSION METHOD By, SUMON JOSE A Seminar Presentation Under the Guidence of Dr. Jessy John February 24, 2015 SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 1 / 69
  • 2. CONTENTS 1 INTRODUCTION 2 REVIEW 3 ROBUSTNESS & RESISTANCE 4 APPROACH 5 STRENGTHS & WEAKNESSES 6 M- ESTIMATORS 7 DELIVERY TIME PROBLEM 8 ANALYSIS 9 PROPERTIES 10 SURVEY OF OTHER ROBUST REGRESSION ESTIMATORS 11 REFERENCE SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 2 / 69
  • 3. INTRODUCTION Perfomance Evaluation- Geethu Anna Jose SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 3 / 69
  • 4. REVIEW The classical linear regression model relates the dependednt or response variables yi to independent explanatory variables xi1, xi2, ..., xip for i = 1, .., n, such that yi = xT i β + i, (1) for i=1,...,n where xT i = (xi1, xi2, ..., xip), i denote the error terms and β = (β1, β2, ..., βp)T SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 4 / 69
  • 5. REVIEW The expected value of yi called the fitted value is ˆyi = xT i β (2) and one can use this to calculate the residual for the ith case, ri = yi − ˆyi (3) In the case of simple linear regression model we may calculate the value of β0 and β1 using the following formulae: SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 5 / 69
  • 6. REVIEW β1 = n i=1 yi xi − n i=1 yi n i=1 xi n n i=1 x2 i − ( n i=1 xi )2 n (4) β0 = y − ˆβ1x (5) The vector of fitted values ˆyi curresponding to the observed value yi may be expressed as follows: ˆy = X ˆβ (6) SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 6 / 69
  • 7. REVIEW Limitations of Least Square Estimator Extremly sensitive to deviations from the model assumptions (as normal distribution is assumed for the errors). Drastically changed by the effect of outliers. SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 7 / 69
  • 8. REVIEW What About Deleting Outliers Before Analysis All the Outliers need not be erroneous data, they could be exceptional occurances Some of such Outliers could be the result of some factors not considered in the current study So in general, unusual observations are not always bad observations. Moreover in large data it is often very difficult to spot out the outlying data. SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 8 / 69
  • 9. ROBUSTNESS AND RESISTANCE Resistant Regression Estimators Definition The Resistant regression estimators are primarily concerned with robustness of validity: meaning that their main concern is to prevent unsual observations from affecting the estimates produced. SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 9 / 69
  • 10. ROBUSTNESS AND RESISTANCE Robust Regression Estimators Definition They are concerned with both robustness of efficiency and robustness of validity, meaning that they should also maintain a small sampling variance, even when the data does not fit the assumed distribution. SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 10 / 69
  • 11. ROBUSTNESS AND RESISTANCE ⇒ In general Robust regression estimators aim to fit a model that describes the majority of a sample. ⇒ Their robustness is achieved by giving the data different weights ⇒ Whereas in Least Square Approximation all data are treated equally. SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 11 / 69
  • 12. APPROACH Robust Estimation methods are powerful tools in detection of outliers in complicated data sets. But unless the data is very well behaved, different estimators would give different estimates. On their own, they do not provide a final model. A healthy approach would be to employ both robust regression methods as well as least square method to compare the results. SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 12 / 69
  • 13. STRENGTHS & WEAKNESSES Finite Sample Breakdown Point Definition Breakdown Point (BDP) is the measure of the resistance of an estimator. The BDP of a regression estimator is the smallest fraction of contamination that can cause the estimator to ’breakdown’ and no longer represent the trend of the data. SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 13 / 69
  • 14. STRENGTHS & WEAKNESSES When an estimator breaks down, the estimate it produces from the contaminated data can become arbitrarily far from the estimate than it would give when the data was uncontaminated. SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 14 / 69
  • 15. STRENGTHS & WEAKNESSES In order to describe the BDP mathematically, define T as a regression estimator, Z as a sample of n data points and T(Z) = ˆβ. Let Z be the corrupted sample where m of the original data points are replaced with arbitrary values. The maximum effect that could be caused by such contamination is effect(m; T, Z) = supz |T(Z ) − T(Z)| (7) SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 15 / 69
  • 16. STRENGTHS & WEAKNESS When (7) is infinite, an outlier can have an arbitrarily large effect on T. The BDP of T at the sample Z is therefore defined as: BDP(T, Z) = min{ m n : effect(M; T, Z)is infinite} (8) SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 16 / 69
  • 17. STRENGTH & WEAKNESSES The Least Square Method estimator for example has a breakdown point of 1 n because just one leverage point can cause it to breakdown. As the number of data increases, the breakdown point tends to 0 and so it is said to that the least squares estimator has BDP 0%. SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 17 / 69
  • 18. STRENGTH & WEAKNESS Remark The highest breakdown point one can hope for is 50% as if more than half the data is contaminated that one cannot differentiate between ’good’ and ’bad’ data. SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 18 / 69
  • 19. STRENGTH & WEAKNESSES Relative Efficiency of an Estimator Definition The efficiency of an estimator for a particular parameter is defined as the ratio of its minimum possible variance to its actual variance. Strictly, an estimator is considered ’efficient’ when this ratio is one. SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 19 / 69
  • 20. STRENGTH & WEAKNESSES High efficiency is crucial for an estimator if the intention is to use an estimate from sample data to make inference about the larger population from which the same was drawn. SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 20 / 69
  • 21. STRENGTH & WEAKNESSES Relative Efficiency Relative efficiency compares the efficiency of an estimator to that of a well known method. In the context of regression, estimators are compared to the least squares estimator which is the most efficient estimator known. SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 21 / 69
  • 22. STRENGTH & WEAKNESSES Given two estimators T1 and T2 for a population parameter β, where T1 is the most efficient estimator possible and T2 is less efficient, the relative efficiency of T2 is calculated as the ratio of its mean squared error to the mean squared error of T1 Efficiency(T1, T2) = E[(T1 − β)2 ] E[(T2 − β)2] (9) SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 22 / 69
  • 23. M-ESTIMATORS Introduction 1 Were first proposed by Huber(1973) 2 But the early ones had the weakness in terms of one or more of the desired properties 3 From them developed the modern means SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 23 / 69
  • 24. M-ESTIMATORS Maximum Likelihood Type Estimators M-estimation is based on the idea that while we still want a maximum likelihood estimator, the errrors might be better represented by a different, heavier tailed distribution. SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 24 / 69
  • 25. M-ESTIMATORS If the probability distribution function of the error of f ( i), then the maximum likelihood estimator for β is that which maximizes the likelihood function n i=1 f ( i) = n i=1 f (yi − xT i β) (10) SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 25 / 69
  • 26. M-ESTIMATORS This means, it also maximizes the log-likelihood function n i=1 ln f ( i) = n i=1 ln f (yi − xT i β) (11) When the errrors are normally distributed it has been shown that this leads to minimising the sum of squared residuals, which is the ordinary least square method. SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 26 / 69
  • 27. M-ESTIMATORS Assuming the the errors are differently distributed, leads to the maximum likelihood esimator, minimising a different function. Using this idea, an M-estimator ˆβ minimizes n i=1 ρ( i) = n i=1 ρ(yi − xT i β) (12) where ρ(u) is a continuous, symmetric function called the objectve function with a unique minimum at 0. SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 27 / 69
  • 28. M-ESTIMATORS 1 Knowing the appropriate ρ(u) to use requires knowledge of how the errors are really distributed. 2 Functions are usually chosen through consideration of how the resulting estimator down-weights the larger residuals 3 A Robust M-estimator achieves this by minimizing the sum of a less rapidly increasing objective function than the ρ(u) = u2 of the least squares SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 28 / 69
  • 29. M-ESTIMATORS Constructing a Scale Equivariant Estimator The M-estimator is not necessarily scale invariant i.e. if the errors yi − xT i β were multiplied by a constant, the new solution to the above equation might not be the same as the scaled version of the old one. SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 29 / 69
  • 30. M-ESTIMATORS To obtain a scale invariant version of this estimator we usually solve, n i=1 ρ( i s ) = n i=1 ρ( yi − xT i β s ) (13) where s is a robust estimate of scale. SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 30 / 69
  • 31. M-ESTIMATORS A popular choice for s is the re-scaled median absolute deivation s = 1.4826XMAD (14) where MAD is the Median Absolute Deviation MAD = Median|yi − xT i ˆβ| = Median| i| (15) SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 31 / 69
  • 32. M-ESTIMATORS ’s’ is highly resistant to outlying observations, with BDP 50%, as it is based on the median rather than the mean. The estimator rescales MAD by the factor 1.4826 so that when the sample is large and i really distributed as N(0, σ2 )), s estimates the standard deviation. SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 32 / 69
  • 33. M-ESTIMATORS With a large sample and i ∼ N(0, σ2 ): P(| i| < MAD) ≈ 0.5 ⇒ P(| i −0 σ | < MAD σ ) ≈ 0.5 ⇒ P(|Z| < MAD σ ) ≈ 0.5 ⇒ MAD σ ≈ Φ−1 (0.75) SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 33 / 69
  • 34. M-ESTIMATORS ⇒ MAD Φ−1 ≈ σ 1.4826 X MAD ≈ σ Thus the tuning constant 1.4826 makes s an approximately unbiased estimator of σ if n is large and the error distribution is normal. SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 34 / 69
  • 35. M-ESTIMATORS Finding an M-Estimator To obtain an M-estimate we solve, Minimizeβ n i=1 ρ( i s ) = Minimizeβ n i=1 ρ( yi − xi β s ) (16) For that we equate the first partial derivatives of ρ with respect to βj (j=0,1,2,3,...,k) to zero, yielding a necessary condition for a minimum. SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 35 / 69
  • 36. M-ESTIMATORS This gives a system of p = k + 1 equations n i=1 Xijψ( yi − xi β s ) = 0, j = 0, 1, 2, ..., k (17) where ψ = ρ and Xij is the ith observation on the jth regressor and xi0 = 1. In general ψ is a non-linear function and so equation (17) must be solved iteratively. The most widely used method to find this is the method of iteratively reweighted least squares. SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 36 / 69
  • 37. M-ESTIMATORS To use iteratively reweighted least squares suppose that an initial estimate of ˆβ0 is available and that s is an estimate of the scale. Then we write the p = k + 1 equations as: n i=1 Xij ψ( yi − xi β s ) = n i=1 xij {ψ[(yi − xi β)/s]/(yi − xi β)/s}(yi − xi β) s = 0 (18) SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 37 / 69
  • 38. M-ESTIMATORS as n i=1 XijW 0 i (yi − xiβ) = 0, j = 0, 1, 2, ..., k (19) where W 0 i =    ψ[ (yi −xi β) s ] (yi −x i β) s if yi = xi ˆβ0 1 if yi = xi ˆβ0 (20) SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 38 / 69
  • 39. M-ESTIMATORS We may write the above equation in matrix form as follows: X W 0 Xβ = X W 0 y (21) where W0 is an n X n diagonal matrix of weights with diagonal elements given by the expression W 0 i =    ψ[ (yi −xi β) s ] (yi −x i β) s if yi = xi ˆβ0 1 if yi = xi ˆβ0 (22) SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 39 / 69
  • 40. M-ESTIMATORS From the matrix form we realize that the expression is same as that of the usual weighted least squares normal equation. Consequently the one step estimator is ˆβ1 = (X W 0 X)−1 X W 0 y (23) At the next step we recompute the weights from the equation for W but using ˆβ1 and not ˆβ0 SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 40 / 69
  • 41. M-ESTIMATORS NOTE: Usually only a few iterations are required to obtain convergence It could be easily be implemented by a computer programme. SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 41 / 69
  • 42. M-ESTIMATORS Re-Descending Estimators Re- descending M estimators are those which have influence functions that are non decreasing near the origin but decreasing towards zero far from the origin. Their ψ can be chosen to redescend smoothly to zero, so that they usually satisfy ψ(x) = 0 for all |x| > r where r is referred to as the minimum rejection point. SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 42 / 69
  • 43. M-ESTIMATORS SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 43 / 69
  • 44. M-ESTIMATORS SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 44 / 69
  • 45. M-ESTIMATORS SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 45 / 69
  • 46. M-ESTIMATORS Robust Criterion Functions Citerion ρ ψ(z) w(x) range Least Squares z2 2 z 1.0 |z| < ∞ Huber’s t-function z2 2 z 1.0 |z| < t t = 2 |z|t − t2 2 tsign(z) t |z| |x| > t Andrew’s Wave function a(1 − cos(z a)) sin(z a) sin(z a ) z a |z| ≤ aπ SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 46 / 69
  • 47. DELIVERY TIME PROBLEM Problem A Softdrink bottler is analyzing the vending machine service routes in his distriution system. He is interested in predicting the amount of time required by the route driver to service the vending machines in an outlet. This service activity includes stocking the machine with beverage products and minor maintenance or housekeeping. The industrial engineer responsible for the study has suggested that the two most important variables affecting the delivery time (y) are the numer of cases of product stocked (x1) and the distance walked by the route driver (x2). The engineer has collected 25 observations on delivery time, which are shown in the following table. Fit a regression model into it. SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 47 / 69
  • 48. DELIVERY TIME PROBLEM Table of Data Observation Delivery time Number of cases Distance in Feets i (in minutes) y x1 x2 1 16.8 7 560 2 11.50 3 320 3 12.03 3 340 4 14.88 4 80 5 13.75 6 150 6 18.11 7 330 7 8 2 110 8 17.83 7 210 9 79.24 30 1460 10 21.50 5 605 11 40.33 16 688 12 21 10 215 13 13.50 4 255 SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 48 / 69
  • 49. DELIVERY TIME PROBLEM Observation Delivery time Number of cases Distance in Feets (in minutes) y x1 x2 14 19.75 6 462 15 24.00 9 448 16 29.00 10 776 17 15.35 6 200 18 19.00 7 132 19 9.50 3 36 20 35.10 17 770 21 17.90 10 140 22 52.32 26 810 23 18.75 9 450 24 19.83 8 635 25 10.75 4 150 SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 49 / 69
  • 50. DELIVERY TIME PROBLEM Least Square Fit of the Delivery Time Data Obs. yi ˆyi ei Weight 1 .166800E+02 .217081E+02 -.502808E+01 .100000E+01 2 0115000E+02 .103536E+02 .114639E+01 .100000E+01 3 .120300E+02 .120798E+02 -.497937E-01 .100000E+01 4 .148800E+02 .995565E+01 .492435E+01 .100000E+01 5 .137500E+02 .141944E+02 -.444398E+00 .100000E+01 6 .181100E+02 .183996E+02 -.289574E+00 .100000E+01 7 .800000E+01 .715538E+01 .844624E+00 .100000E+01 8 .178300E+02 .166734E+02 .115660E+02 .100000E+01 9 .792400E+02 .718203E+02 .741971E+01 .100000E+01 10 .215000E+02 .191236E+02 .237641E+01 .100000E+01 11 .403300E+02 .380925E+02 .223749E+01 .100000E+01 12 .2100000E+02 .215930E+02 -.593041E+00 .100000E+01 13 .135000E+02 .124730E+02 .102701E+01 .100000E+01 SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 50 / 69
  • 51. DELIVERY TIME PROBLEM Obs. yi ˆyi ei Weight 14 .197500E+02 .186825E+02 .106754E+01 .100000E+01 15 .240000E+02 .233288E+02 .671202E+00 .100000E+01 16 .290000E+02 .296629E+02 -.662928E+00 .100000E+01 17 .153500E+02 .149136E+02 .436360E+00 .100000E+01 18 .190000E+02 .155514E+02 .344862E+01 .100000E+01 19 .950000E+01 .770681E+01 .179319E+01 .100000E+01 20 .351000E+02 .408880E+02 -.578797E+01 .100000E+01 21 .179000E+02 .205142E+02 -.261418E+01 .100000E+01 22 .523200E+02 .560065E+02 -.368653E+01 .100000E+01 23 .187500E+02 .233576E+02 -.460757E+01 .100000E+01 24 .198300E+02 .244029E+02 -.457285E+01 .100000E+01 25 .107500E+02 .109626E+02 -.212584E+00 .100000E+01 SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 51 / 69
  • 52. DELIVERY TIME PROBLEM Accordingly we have the following values for the parameters: ˆβ0 = 2.3412 ˆβ1 = 1.6159 ˆβ2 = 0.014385 Thus we have the regression line as follows: yi = 2.3412 + 1.6159x1 + 0.014385x2 (24) SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 52 / 69
  • 53. DELIVERY TIME PROBLEM Huber’s t-Function, t=2 Obs. yi ˆyi ei Weight 1 .166800E+02 .217651E+02 -.508511E+01 .639744E+00 2 .115000E+02 .109809E+02 .519115E+00 .100000E+01 3 .120300E+02 .126296E+02 -.599594E+00 .100000E+01 4 .148800E+02 .105856E+02 .429439E+01 .757165E+00 5 .137500E+02 .146038E+02 -.853800E+00 .100000E+01 6 .181100E+02 .186051E+02 -.495085E+00 .100000E+01 7 .800000E+01 .794135E+01 .586521E-01 .100000E+01 8 .178300E+02 .169564E+02 .873625E+00 .100000E+01 9 .792400E+02 .692795E+02 .996050E+01 .327017E+00 10 .215000E+02 .193269E+02 .217307E+01 .100000E+01 11 .403300E+02 .372777E+02 .305228E+01 .100000E+01 12 .210000E+02 .216097E+02 -.609734E+00 .100000E+01 13 .135000E+02 .129900E+02 .510021E+00 .100000E+01 SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 53 / 69
  • 54. DELIVERY TIME PROBLEM Obs. yi ˆyi ei Weight i 14 .197500E+02 .188904E+02 .859556E+00 .100000E+01 15 .240000E+02 .232828E+02 .717244E+00 .100000E+01 16 .290000E+02 .293174E+02 -.317449E+00 .100000E+01 17 .153500E+02 .152908E+02 .592377E-01 .100000E+01 18 .190000E+02 .158847E+02 .311529E+01 .100000E+01 19 .950000E+01 .845286E+01 .104714E+01 .100000E+01 20 .351000E+02 .399326E+02 -.483256E+01 .672828E+00 21 .179000E+02 .205793E+02 -.267929E+01 .100000E+01 22 .523200E+02 .542361E+02 -.191611E+01 .100000E+01 23 .187500E+02 .233102E+02 -.456023E+01 .713481E+00 24 .198300E+02 .243238E+02 .449377E+01 .723794E+00 25 .107500E+02 .115474E+02 -.797359E+00 .100000E+01 SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 54 / 69
  • 55. DELIVERY TIME PROBLEM Accordingly we get the values of the parameters as follows: ˆβ0 = 3.3736 ˆβ1 = 1.5282 ˆβ2 = 0.013739 Thus we get the regression line as follows: yi = 3.3736 + 1.5282x1 + 0.013739x2 (25) SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 55 / 69
  • 56. DELIVERY TIME PROBLEM Andrew’s Wave Function with a = 1.48 Obs. yi ˆyi ei Weight i 1 .166800E+02 .216430E+02 -.496300E+01 .427594E+00 2 .115000E+02 .116923E+02 -.192338E+00 .998944E+00 3 .120300E+02 .131457E+02 .-.111570E+01 .964551E+00 4 .148800E+02 .114549E+02 .342506E+01 .694894E+00 5 .137500E+02 .152191E+02 -.146914E+01 .939284E+00 6 .181100E+01 .188574E+02 -.747381E+00 .984039E+00 7 .800000E+01 .890189E+01 .901888E+00 .976864E+00 8 .178300E+02 ..174040E+02 ..425984E+00 .994747E+00 9 .792400E+02 .660818E+02 .131582E+02 .0 10 .215000E+02 .192716E+02 .222839E+01 .863633E+00 11 .403300E+02 .363170E+02 .401296E+01 .597491E+00 12 .210000E+02 .218392E+02 -.839167E+00 .980003E+00 13 .135000E02 .135744E+02 -.744338E+01 .999843E+00 SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 56 / 69
  • 57. DELIVERY TIME PROBLEM Obs. yi ˆyi ei Weight i 14 .197500E+02 .198979E+02 .752115E+00 .983877E+00 15 .240000E+02 .232029E+02 .797080E+00 .981854E+00 16 ..290000E+02 .286336E+02 .366350E+00 .996228E+00 17 .153500E+02 .158247E+02 -.474704E+00 .993580E+00 18 .190000E+02 .164593E+02 .254067E+01 .824146E+00 19 .950000E+01 .946384E+01 .361558E-01 .999936E+00 20 .351000E+02 .387684E+02 -.366837E+01 .655336E+00 21 .179000E+02 .209308E+02 -.303081E+01 .756603E+00 22 .523200E+02 .523766E+02 -.566063E-01 .999908E+00 23 .187500E+02 .232271E+02 .-.447714E+01 .515506E+00 24 .198300E+02 .240095E+02 -.417955E+01 .567792E+00 25 .107500E+02 .123027E+02 -1.55274E+01 .932266E+00 SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 57 / 69
  • 58. DELIVERY TIME PROBLEM Thus we have the estimates as follows: ˆβ0 = 4.6532 ˆβ1 = 1.4582 ˆβ2 = 0.012111 Thus we get the regression line as follows: yi = 4.6532 + 1.4582x1 + 0.012111x2 (26) SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 58 / 69
  • 59. ANALYSIS Computing M-Estimators Robust regression methods are not an option in most statistical software today. SAS, PROC, NLIN etc can be used to implement iteratively reweighted least squares procedure. There are also Robust procedures available in S-Pluz. SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 59 / 69
  • 60. ANALYSIS Robust Regression Methods... Robust regression methods have much to offer a data analyst. They will be extremly helpful in locating outliers and hightly influential observations. Whenever a least squares analysis is perfomed it would be useful to perform a robust fit also. SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 60 / 69
  • 61. ANALYSIS If the results of both the fit are in substantial agreement, the use of Least Square Procedure offers a good estimation of the parameters. If the results of both the procedures are not in agreement, the reason for the difference should be identified and corrected. Special attention need to be given to observations that are down weighted in the robust fit. SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 61 / 69
  • 62. PROPERTIES Breakdown Point The finite sample breakdown point is the smallest fraction of anomalous data that can cause the estimator to be useless. The smallest possible breakdown poit is 1 n, i.e. s single observation can distort the estimator so badly that it is of no practical use to the regression model builder. The breakdown point of OLS is 1 n. SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 62 / 69
  • 63. PROPERTIES M-estimators can be affected by x-space outliers in an identical manner to OLS. Consequently, the breakdown point of the class of m estimators is 1 n as well. We would generally want the breakdown point of an estimator to exceed 10%. This has led to the development of High Breakdown point estimators. SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 63 / 69
  • 64. PROPERTIES Efficiency The M estimators have a higher efficiency than the least squares, i.e. they behave well even as the size of the sample increases to ∞. SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 64 / 69
  • 65. SURVEY OF OTHER ROBUST REGRESSION ESTIMATORS High Break Down Point Estimators Because both the OLS and M-estimator suffer from a low breakdown point 1 n, considerable effort has been devoted to finding estimators that perform better with respect to this property. Often a break down point of 50% is desirable. SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 65 / 69
  • 66. SURVEY OF OTHER ROBUST REGRESSION ESTIMATORS There are various other estimation procedures like Least Median of Squares Least Trimmed Sum of Squres S Estimators R and L Estimators Robust Ridge regression MM Estimation etc. SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 66 / 69
  • 67. ABSTRACT & CONCLUSION Review ⇒ Robustness and Resistance ⇒ Our Approach ⇒ Strengths and Weaknesses ⇒ M-Estimators ⇒ Delivery time problem ⇒ Analysis ⇒ Properties ⇒ Survey of other Robust Regression Estimators SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 67 / 69
  • 68. REFERENCE 1 Draper, R Norman. & Smith, Harry. “Applied Regression Analysis”, 3rd edn., John Wiley and Sons, New York, 1998. 2 Montgomery, C Douglas. Peck, A Elizabeth. & Vining, Geoffrey G. “Introduction to Linear Regression Analysis”, 3rd edn., Wiley India, 2003. 3 Brook J, Richard. “Applied Regression Analysis and Experimental Design”, Chapman & Hall, London, 1985. 4 Rawlings O, John. “Applied Regression Analysis: A Research Tool”, Springer, New York, 1989. 5 Pedhazar, Elazar J. “Multiple Regression in Behavioural Research: Explanation and Prediction”, Wadsworth, Australia, 1997 SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 68 / 69
  • 69. THANK YOU SUMON JOSE (NIT CALICUT) ROBUST REGRESSION METHOD February 24, 2015 69 / 69