Semiparametric Least Squares estimation of single-index models

Hidehiko Ichimura
Semiparametric Least Squares (SLS) and weighted
SLS estimation of single-index models1
Péter Tóth
toth.peter@utexas.edu
UT Austin
1Journal of Econometrics 58 (1993) pp. 71-120.

Index models
I Semiparametric model: if an index parametrizing the DGP (a
distribution) consists of two parts and
, where 2 in a
nite dimensional space, while
2 in an innite dimensional
space
I Single-index model: The DGP is assumed to be
yi = (h(xi; 0)) + i for i = 1; 2; 3; :::; n
I where f(xi; yi)g is the observed iid sample
I 0 2 RM is the true (nite) parameter vector,
I E[ijxi] = 0
I h() is known up to the parameter , but () is not.
I Examples.
1

Three observations
I 1. The variation in y results from the variation in and the
variation in x (or h(x; 0))
I 2. On the 'contour line', where h(x; 0) = const all the
variation in y comes from
I 3. (2) does not (necessarily) hold for 6= 0
I So what will be the identication strategy for 0?
I What would be the estimation strategy?
I caveats - we have conditional moments
2

SLS estimator 1
I an extremum estimator
I if we know the conditional statistic, then the objective function
is the sample analogue of some measure of variation, in
particular now of the variance:
Jh
n () =
1
n
X
[yi E(yijh(xi; )]2
I you can also just sum for all the h-values
I if we do not know the conditional mean, we estimate it
nonparametrically with a smooth kernel estimator (why?), so
Jn() =
1
n
X
I(xi 2 X)[yi ^E
(xi; )]2 + op(n1)
3

SLS estimator 2
I here 2 RL
Ê
(xi; ) =
P
j6=i yjI(xj 2 Xn)K[h(xi; ) h(xj ; )]=an P
j6=i I(xj 2 Xn)K[h(xi; ) h(xj ; )]=an
I if
P
j6=i I(xj 2 Xn)K[h(xi; ) h(xj ; )]=an6= 0
I otherwise: if yi (ymin + ymax)=2, then Ê
(xi; ) = ymin
I otherwise Ê
(xi; ) = ymax
I where Xn = fxjjjx x0jj 2an for some x0 2 Xg
4

SLS estimator 3
I contd:
I an ! 0, but
p
nan ! 1 positive series
I K : R ! R is a smooth kernel function
I if all denominators of the kernel regression function are zero, ^
is assumed to be 0
I NLS vs SLS
I WSLS: W(x) weighting function will be introduced in the
kernel denominator, numerator and to the objective function
itself (in front of the quadratic fn)
I 0 W(x) W
I From now on we will restrict our attention to linear h()-s
5

Assumption 4.1.-2.
I Assumption 4.1: The () function is dierentiable and not
constant on the whole support of x0

0
I nominal regressors: these are the x vectors with lth member
(variable) xl, which are actually can be thought as functions
xl(z) of underlying regressors (z1; z2; :::; zL0
)
I we assume the underlying regressors are either continuous or
discrete
I the rst L1 nominal and L0
1 underlying regressors have
continuous marginals (the rest discrete ones)
I Assumption 4.2
I (1) xl() for all l has partial derivatives wrt the continuous
underlyings
I (2) for discrete nominal regressors @xl=@
P
zl0
= 0 for
l = L1 + 1;L1 + 2; :::;L and l0 = 1; 2; :::;L0
1 almost
everywhere in z
6

I contd.:
I (3)
TL0
1 l0 = 1fsl0
1 ; :::; sl0
L1
l = @xl
jsl0
@zl0 for some z 2 Zg? = f0g
I (4) (i) for each 2 there is an open T and at least
L L1 + 1 constant vectors cl = (cL+1; :::; cL) for
l = 0; 1; :::;L L1 such that
I cl c0 are linearly independent (for l = 1; 2; ::L L1)
I T is in
LL1
l=0
ftjt = 1x1(z) + :::
::: + L1xL1(z) + L1+1cl
L1+1 + ::: + Lcl
L z 2 Z(cl)g
where Z(cl) = fz 2 ZjxL1+1 = cl
L1+1; :::; xL(z) = cl
Lg
I (ii) and is not periodic on (T)
7

Identication: Theorem 4.1.
Let us have the linear single index model dened above. If there is
a continuous regressor that has a non-zero coecient, then
Assumption 4.1 and 4.2 (1-3) implies that 0 is identied up to a
scalar constant for all continuous regressors. In addition, if 4.2 (4)
is satised, then the coecients of the discrete regressors are also
identied (up to a scalar constant).
I Intuition for proof: assume there is another 0 that minimizes the
objective function and get contradiction.
8

What does assumption 4.2 rule out?
I Ex. 4.2.: when x1 = z and x2 = z2
I Ex. 4.3.: x1 = z1 2 [0; 1] and x2 = z2 2 f0; 1g
I so 4.2. (3) is much a non-constant/inversibility condition
I ... while 4.2. (4) is a support-like condition
I either should have small enough or large enough ( full support!)
support for the continuous variable
I what is the problem with periodic (:)?
9

Assumptions 5.1-2
I Some objects:
I X is a subset of x's support
I T(X) = ft 2 Rjt = h(x; ) 8x 2 Xg
I f(t; ) is the Lebesgue density of t = h(x; ) (aww Lord)
I Assumption 5.1: iid sample
I Assumption 5.2: 2 Int() where 2 RM is a compact set
10

Assumptions 5.3-6
I Assumption 5.3:
I 1. X is compact
I 2. infx2X f(h(x; ); ) 0
I 3. f(t; ) and E[yjh(x; ) = t] are three times dierentiable wrt t,
and the third dierential is Lipschitz jointly in both arguments
I Assumption 5.4: y is in Lm (for an m 2) and the conditional
variance of y (on x) is uniformly bounded and bounded away from
0 on X
I Assumption 5.5: h(x; ) is Lipschitz jointly on X
I Assumption 5.6: on the kernel K; besides the usuals we have that
the second derivative is Lipschitz
11

Consistency: Theorem 5.1
If Assumption 5.1-6 hold, the (W)SLS estimator dened above is
consistent.
I the proof uses that P[Jn(^) Jn(0)] = 1, and then observes that
P[Jn(^) Jn(0)] = P[Jn(^) Jn(0); ^ 2 B(0)]+
+P[Jn(^) Jn(0); ^ =2 B(0)]
P[^ 2 B(0)] + P[ inf
2nB(0)
Jn(^) Jn(0)]
I where B(0) is an open ball around 0 with radius, and
P[ inf
2nB(0)
Jn(^) Jn(0)] ! 0:
I Alternatively, I think after establishing consistency one could use
the continuous mapping theorem and the consistency theorem for
extremum estimators
12

Asymptotic normality: Theorem 5.2
Under Ass. 5.1-6, and if y has at least 3 absolute moments, 0 is
identied, regularity conditions are satised for an, then
p
n(^ 0) N(0; V 1V 1);
where the variance is of just the usual sandwhich.
I Note that the usual sandwhich formula is not feasible now, since we
have the derivative of th (:) function in it, which is unknown...
13

Some remarks
I Optimal reweighting
I 'inner weighting'
I weights are reducing variance AND bias
I the optimal weighting is the usual (X)1
I one can show it achieves the semiparametric lower bound by Newey
(1990)
I Estimation of the covariance matrix
I he introduces a kernel estimate for @ ^E
W(xi; ^)=@
I Small sample properties (example): comparable with MRC, better
than MS
I the further we go from normality the better this performs relatively
14

Identication 1
I Suppose there is minimizes the objective function, so
EfW(x)[(x; 0) E((:)jx0)]2g = 0:
I Moreover, since W(x) 0 for all x 2 X
E((:)jx0 = t) = (x(z); 0);
I but then after taking derivatives wrt z (normalize 01 =
1 = 1)
0(x; 0)[
2@x2=@zl0
+ ::: +
L1@xL1=@zl0
] = 0
I for all l0 2 f1; :::;L0
1g a.s. for z 2 Z
I where
l = 0l 01
l
I Now we need that for the z-s for which 0(0; x(z))6= 0
assumption 4.3 holds - then we see the rst statement proved.
I Common trick: t = x0( 0), this is what you really condition
on...
15

Identication 2
I So now we identied the 01; :::; 0L1 coecients up to a constant r
I This leaves us with
(x; 0) = E[()jx0] =
= (t=r + (0L1+1=r L
1+1)xL1+1 + ::: + (0L=r L
)xL)
I after starring for a minute or two you realize that Assumption 4.2.
(4) is ready-made for this...
16

Consistency
I We only have to show
P[ inf
2nB(0)
Jn(^) Jn(0)] ! 0:
I tedious algebra, the only idea: build a bridge of -s through using
EW() instead of ^EW() + triangle inequality and identication gives
the desired result
I intuition
17

Asympptotic Normality
I the standard proof from Newey-McFadden basically
I assuming kernel estimator is as consistent wsome restriction
(Lipschitz)
18

Semiparametric Least Squares estimation of single-index models

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Semiparametric Least Squares estimation of single-index models

Similar to Semiparametric Least Squares estimation of single-index models (20)

Recently uploaded

Recently uploaded (20)

Semiparametric Least Squares estimation of single-index models