This document summarizes Hidehiko Ichimura's 1993 paper on semiparametric least squares estimation of single-index models. It provides an overview of single-index models and assumptions required for identification and estimation. Key results discussed include:
1) The SLS estimator is consistent under certain regularity conditions on the model and kernel estimator.
2) Under additional moment and identification assumptions, the SLS estimator is asymptotically normal with the standard sandwich variance formula.
3) Optimal weighting of the SLS estimator achieves the semiparametric efficiency bound.
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
Semiparametric Least Squares estimation of single-index models
1. Hidehiko Ichimura
Semiparametric Least Squares (SLS) and weighted
SLS estimation of single-index models1
Péter Tóth
toth.peter@utexas.edu
UT Austin
1Journal of Econometrics 58 (1993) pp. 71-120.
3. Index models
I Semiparametric model: if an index parametrizing the DGP (a
distribution) consists of two parts and
, where 2 in a
nite dimensional space, while
2 in an innite dimensional
space
I Single-index model: The DGP is assumed to be
yi = (h(xi; 0)) + i for i = 1; 2; 3; :::; n
I where f(xi; yi)g is the observed iid sample
I 0 2 RM is the true (nite) parameter vector,
I E[ijxi] = 0
I h() is known up to the parameter , but () is not.
I Examples.
1
5. Three observations
I 1. The variation in y results from the variation in and the
variation in x (or h(x; 0))
I 2. On the 'contour line', where h(x; 0) = const all the
variation in y comes from
I 3. (2) does not (necessarily) hold for 6= 0
I So what will be the identication strategy for 0?
I What would be the estimation strategy?
I caveats - we have conditional moments
2
6. SLS estimator 1
I an extremum estimator
I if we know the conditional statistic, then the objective function
is the sample analogue of some measure of variation, in
particular now of the variance:
Jh
n () =
1
n
X
[yi E(yijh(xi; )]2
I you can also just sum for all the h-values
I if we do not know the conditional mean, we estimate it
nonparametrically with a smooth kernel estimator (why?), so
Jn() =
1
n
X
I(xi 2 X)[yi ^E
(xi; )]2 + op(n1)
3
7. SLS estimator 2
I here 2 RL
^E
(xi; ) =
P
j6=i yjI(xj 2 Xn)K[h(xi; ) h(xj ; )]=an P
j6=i I(xj 2 Xn)K[h(xi; ) h(xj ; )]=an
I if
P
j6=i I(xj 2 Xn)K[h(xi; ) h(xj ; )]=an6= 0
I otherwise: if yi (ymin + ymax)=2, then ^E
(xi; ) = ymin
I otherwise ^E
(xi; ) = ymax
I where Xn = fxjjjx x0jj 2an for some x0 2 Xg
4
8. SLS estimator 3
I contd:
I an ! 0, but
p
nan ! 1 positive series
I K : R ! R is a smooth kernel function
I if all denominators of the kernel regression function are zero, ^
is assumed to be 0
I NLS vs SLS
I WSLS: W(x) weighting function will be introduced in the
kernel denominator, numerator and to the objective function
itself (in front of the quadratic fn)
I 0 W(x) W
I From now on we will restrict our attention to linear h()-s
5
10. Assumption 4.1.-2.
I Assumption 4.1: The () function is dierentiable and not
constant on the whole support of x0
11. 0
I nominal regressors: these are the x vectors with lth member
(variable) xl, which are actually can be thought as functions
xl(z) of underlying regressors (z1; z2; :::; zL0
)
I we assume the underlying regressors are either continuous or
discrete
I the rst L1 nominal and L0
1 underlying regressors have
continuous marginals (the rest discrete ones)
I Assumption 4.2
I (1) xl() for all l has partial derivatives wrt the continuous
underlyings
I (2) for discrete nominal regressors @xl=@
P
zl0
= 0 for
l = L1 + 1;L1 + 2; :::;L and l0 = 1; 2; :::;L0
1 almost
everywhere in z
6
12. I contd.:
I (3)
TL0
1 l0 = 1fsl0
1 ; :::; sl0
L1
l = @xl
jsl0
@zl0 for some z 2 Zg? = f0g
I (4) (i) for each 2 there is an open T and at least
L L1 + 1 constant vectors cl = (cL+1; :::; cL) for
l = 0; 1; :::;L L1 such that
I cl c0 are linearly independent (for l = 1; 2; ::L L1)
I T is in
LL1
l=0
ftjt = 1x1(z) + :::
::: + L1xL1(z) + L1+1cl
L1+1 + ::: + Lcl
L z 2 Z(cl)g
where Z(cl) = fz 2 ZjxL1+1 = cl
L1+1; :::; xL(z) = cl
Lg
I (ii) and is not periodic on (T)
7
13. Identication: Theorem 4.1.
Let us have the linear single index model dened above. If there is
a continuous regressor that has a non-zero coecient, then
Assumption 4.1 and 4.2 (1-3) implies that 0 is identied up to a
scalar constant for all continuous regressors. In addition, if 4.2 (4)
is satised, then the coecients of the discrete regressors are also
identied (up to a scalar constant).
I Intuition for proof: assume there is another 0 that minimizes the
objective function and get contradiction.
8
14. What does assumption 4.2 rule out?
I Ex. 4.2.: when x1 = z and x2 = z2
I Ex. 4.3.: x1 = z1 2 [0; 1] and x2 = z2 2 f0; 1g
I so 4.2. (3) is much a non-constant/inversibility condition
I ... while 4.2. (4) is a support-like condition
I either should have small enough or large enough ( full support!)
support for the continuous variable
I what is the problem with periodic (:)?
9
16. Assumptions 5.1-2
I Some objects:
I X is a subset of x's support
I T(X) = ft 2 Rjt = h(x; ) 8x 2 Xg
I f(t; ) is the Lebesgue density of t = h(x; ) (aww Lord)
I Assumption 5.1: iid sample
I Assumption 5.2: 2 Int() where 2 RM is a compact set
10
17. Assumptions 5.3-6
I Assumption 5.3:
I 1. X is compact
I 2. infx2X f(h(x; ); ) 0
I 3. f(t; ) and E[yjh(x; ) = t] are three times dierentiable wrt t,
and the third dierential is Lipschitz jointly in both arguments
I Assumption 5.4: y is in Lm (for an m 2) and the conditional
variance of y (on x) is uniformly bounded and bounded away from
0 on X
I Assumption 5.5: h(x; ) is Lipschitz jointly on X
I Assumption 5.6: on the kernel K; besides the usuals we have that
the second derivative is Lipschitz
11
18. Consistency: Theorem 5.1
If Assumption 5.1-6 hold, the (W)SLS estimator dened above is
consistent.
I the proof uses that P[Jn(^) Jn(0)] = 1, and then observes that
P[Jn(^) Jn(0)] = P[Jn(^) Jn(0); ^ 2 B(0)]+
+P[Jn(^) Jn(0); ^ =2 B(0)]
P[^ 2 B(0)] + P[ inf
2nB(0)
Jn(^) Jn(0)]
I where B(0) is an open ball around 0 with radius, and
P[ inf
2nB(0)
Jn(^) Jn(0)] ! 0:
I Alternatively, I think after establishing consistency one could use
the continuous mapping theorem and the consistency theorem for
extremum estimators
12
19. Asymptotic normality: Theorem 5.2
Under Ass. 5.1-6, and if y has at least 3 absolute moments, 0 is
identied, regularity conditions are satised for an, then
p
n(^ 0) N(0; V 1V 1);
where the variance is of just the usual sandwhich.
I Note that the usual sandwhich formula is not feasible now, since we
have the derivative of th (:) function in it, which is unknown...
13
20. Some remarks
I Optimal reweighting
I 'inner weighting'
I weights are reducing variance AND bias
I the optimal weighting is the usual (X)1
I one can show it achieves the semiparametric lower bound by Newey
(1990)
I Estimation of the covariance matrix
I he introduces a kernel estimate for @ ^E
W(xi; ^)=@
I Small sample properties (example): comparable with MRC, better
than MS
I the further we go from normality the better this performs relatively
14
22. Identication 1
I Suppose there is minimizes the objective function, so
EfW(x)[(x; 0) E((:)jx0)]2g = 0:
I Moreover, since W(x) 0 for all x 2 X
E((:)jx0 = t) = (x(z); 0);
I but then after taking derivatives wrt z (normalize 01 =
1 = 1)
0(x; 0)[
2@x2=@zl0
+ ::: +
L1@xL1=@zl0
] = 0
I for all l0 2 f1; :::;L0
1g a.s. for z 2 Z
I where
l = 0l 01
l
I Now we need that for the z-s for which 0(0; x(z))6= 0
assumption 4.3 holds - then we see the rst statement proved.
I Common trick: t = x0( 0), this is what you really condition
on...
15
23. Identication 2
I So now we identied the 01; :::; 0L1 coecients up to a constant r
I This leaves us with
(x; 0) = E[()jx0] =
= (t=r + (0L1+1=r L
1+1)xL1+1 + ::: + (0L=r L
)xL)
I after starring for a minute or two you realize that Assumption 4.2.
(4) is ready-made for this...
16
24. Consistency
I We only have to show
P[ inf
2nB(0)
Jn(^) Jn(0)] ! 0:
I tedious algebra, the only idea: build a bridge of -s through using
EW() instead of ^EW() + triangle inequality and identication gives
the desired result
I intuition
17
25. Asympptotic Normality
I the standard proof from Newey-McFadden basically
I assuming kernel estimator is as consistent wsome restriction
(Lipschitz)
18