Partial Identification Bounds with Missing Data and Unobserved Heterogeneity

Introduction Formalization Main result Application Conclusion

Partial Identiﬁcation with Missing Data

Laurent Davezies and Xavier d’Haultfœuille
CREST, Paris


Outline

Introduction

Formalization of the problem

Main result

Application

Conclusion


Partial Identification
The literature on missing data has traditionally focused on point
identification, at the price of imposing often implausible
assumptions (Missing at Random, exclusion restrictions, parametric
models...).
In the 90’s and 00’s, Manski has shown that it was possible to
weaken these conditions and still get informative bounds on
parameters of interest in many missing data problems.
The literature on partial identification is now large and applies to
many other settings:
limited dependent data models: Chesher (2010), Chesher et
al. (2011), Bontemps et al. (2012)...
panel data models: Honore and Tamer (2006), Chernozhukov
et al. (2012), Rosen (2012)...
incomplete models: Ciliberto and Tamer (2009), Galichon and
Henry (2011), Beresteanu et al. (2012)...


Goal of this work

In missing data problems, partial identification often involves
infinite dimensional optimization, which may be impossible to solve
both in theory and computationally.
For specific models and parameters, closed forms of the bounds of
the identified set have been derived by the method of ”guess and
verify”. But methods to find bounds is often specific.
We show that for a large class of missing data problems (including
models with unobserved heterogeneity) and parameters, bounds
can be obtained by an optimization on a far smaller set than the
initial one, making the optimization often tractable.
This generalizes results of Chernozhukov et al. (2012) and
D’Haultfœuille and Rathelot (2012). Also related to Balke and
Pearle (1997), Honore and Tamer (2006) and Freyberger and
Horowitz (2012) but in an infinite dimensional setting.


General framework
We are interested in a parameter θ0 that depends on P0 , the
probability of a (partly) unobserved r.v. U. Instead of U, we
observe the r.v. O (which is related to U) and whose probability
measure is Q0 . This restricts the set of distributions of U that are
compatible with Q0 . Moreover one can impose additional
restrictions (coming from a theory) on the distributions U, some
restrictions can depends on the value of the parameter θ.

q(θ0 , P0 ) = 0: deﬁnition of the parameter θ0 + restrictions on
P0 that depends on the value of θ0
P0 ∈ R: restrictions on P0 that do not depends on P0 .

Assumption 1 (Framework)
The true parameter θ0 and distribution P0 satisfy q(θ0 ; P0 ) = 0,
where q is known, and P0 ∈ R. These restrictions exhaust the
information on (θ0 ; P0 ).


General framework
We are interested in Θ0 , the identiﬁcation region of θ0 :

Θ0 = cl{θ ∈ Θ : ∃P ∈ R : q(θ, P) = 0}

We restrict our framework to the following assumption:
Assumption 2 (Convex restriction)
Rθ = {P ∈ R : q(θ, P) = 0} is convex for every θ ∈ Θ.
True for every problem considered in practice (to our best
knowledge).
We also provide more precise results when Assumption 2 is
replaced by the following condition.
Assumption 3 (Convex restriction and linear parameter)
R is convex and closed for the weak convergence. Moreover,
q(θ, P) = θ − f (u)dP(u) with f a known (or identiﬁable) real
function satisfying |f (u)|dP0 (u) < ∞.


General framework

Example 1: missing data with a known link.
We are interested in a moment of U then θ0 = f (u)dP0 (u), but
we do not observe U but only O = s(U) where s is known,
noninjective in general (loss of information).

This case covers for instance:
sample selection model: U = (D, Y , X ) and O = (D, DY , X )
treatment eﬀects/Roy models/Ecological inference:
U = (T , Y0 , Y1 , X ) and O = (T , YT , X )
nonresponse on X : U = (D, Y , X ) and O = (D, Y , DX )


General framework

Example 1: missing data with a known link (continued).
In this case:

Q0 (A) = P(O ∈ A) = P(s(U) ∈ A) = 1{s(u) ∈ A}dP0 (u).

And then, q(θ, P) = θ − f (u)dP(u) and R is the following set of
probability distribution:

P : Q0 (A) = 1{s(u) ∈ A}dP(u) for measurable A
.
and |f (u)|dP(u) ≤ ∞

Alternatively, q and R can be adapted to other deﬁnition of θ0
(quantile, index of inequality, coeﬃcient of regression...).


General framework

Example 2: unobserved heterogeneity
Details .

Example 3: incomplet models and games with multiple equilibria.
Details .


Extreme part of convex set of distribution

Diﬃcult to compute Θ0 = {θ ∈ Θ : Rθ = ∅}. We try to get
simpliﬁcations of this problem.
For a closed and convex C, let ext(C) the set of extreme part of C,
i.e. elements of C that are not a mixture of elements of C.
Theorem 1 (Main result)
1. Under Assumptions 1 and 2,
Θ0 = {θ ∈ Θ : ext(Rθ ) = ∅}.
2. Moreover if Assumption 3 also holds, then:
.
θ = inf Θ0 = inf P∈ext(R)∩I(f ) f (u)dP(u)
.
θ = sup Θ0 = supP∈ext(R)∩I(f ) f (u)dP(u).



In finite dimension, closed, bounded and convex sets are convex
hull of their extreme parts.
When we know that distribution P0 is concentrated on a finite
number of elements of Rk , then Rθ is included in a finite
dimensional vector space. And in this case the result is
straightforward.
We extend this result to the case where P0 is concentrated on any
closed subset of Rk , then Rθ is infinite dimensional.



In inﬁnite dimension, closed, bounded and convex sets are not
characterized by their extreme parts.
[No extreme parts] Let K denote the set of real valued continuous functions f
from [0; 1] such that supx∈[0;1] |f (x)| ≤ 1 and f (0) = 0. K is a bounded, closed
and convex set for the supremum norm in the Banach space of continuous
functions from [0; 1] to R. However ext(K) is empty.
[No closure of convex hull] Let K be the set of real valued continuous functions
f from [−1; 1] such that supx∈[−1;1] |f (x)| ≤ 1. K is a bounded, closed and
convex set of a Banach space, and
ext(K) = {f : f (x) = 1 for x ∈ [−1; 1] or f (x) = −1 for x ∈ [−1; 1]},
then cl(co(ext(K))) = K.
[No continuity of linear forms] Linear forms are not necessarily continuous in
ﬁnite dimensional space, then even if K = cl(co(ext(K))) and l is a linear form
on K one can have:
sup l(x) = sup l(x)
x∈K x∈ext(K)

Proof


SSM without exclusion restriction

We are interested by a distribution of (Y , X ) but we only observe a
sample of variables (D, DY , X ), with D = 1 if Y is observed and
D = 0 if not.
In such case:
MAR: D ⊥ Y |X ⇒ point identification of PY ,X ,D
⊥
Standard Exclusion Restriction: Y ⊥ X , if it exists x such
⊥
that P(D = 1|X = x) = 1 ⇒ point identification of PY ,X ,D ,
otherwise only partial identification
Non Standard Restriction: D ⊥ X |Y , if Y and X are
⊥
sufficiently dependent (rank condition or completness
condition) ⇒ point identification, otherwise partial
identification


SSM without exclusion restriction

Instead of exclusion relation, we assume monotonicity conditions:
Assumption 4 (Monotonicity in X : MX)
x → E(D|Y , X = x) is increasing almost surely.

Assumption 5 (Monotonicity in Y : MY)
y → E(D|Y = y , X ) is increasing almost surely.

θ0 is defined by a finite number of moments of (Y , X ) (ex:
regression, quantile...). In this case Rθ can be deduced from R.
In a first time we assume that Supp(X ) = {x1 , ..., xJ }, but Y has
any support.


SSM with monotonicity on X

Instead of R, we can consider C, the set of possible p.d. of
Y |D = 0, X = xj for j = 1...J.
We show that no constraint is imposed on PY |D=0,X =xJ . Thus extremal
elements are simply dirac. Then we show that

ρ(y , xj )
fY |D=0,X =xj (y ) = rxj+1 ,xj (y ) fY |D=0,X =xj+1 (y ), (1)
ρ(y , xj+1 )

with ρ(y , x) = 1/P(D = 1|Y = y , X = x) − 1 and

P(D = 1|X = xj )P(D = 0|X = xi )fY |D=1,X =xj (y )
rxi ,xj (y ) = .
P(D = 1|X = xi )P(D = 0|X = xj )fY |D=1,X =xi (y )

By MX, the ratio in (1) is greater than one. Thus,

fY |D=0,X =xj (y ) = rxj ,xj+1 (y )fY |D=0,X =xj+1 (y ) + qj (y ), with qj ≥ 0. (2)


SSM with monotonicity on X

qj may be seen as the density of a (nonprobability) measure Qj .
Because Qj admits no restriction, extremal elements for Qj are
weighted dirac.
Then, by induction, extremal elements of PY |D=0,X =xj admit at
most J − j + 1 support points (and among them J − j are common
with PY |D=0,X =xj+1 ).
Once the support points (y1 , ..., yJ ) ∈ Y J have been determined,
the corresponding weights are given by (2). For instance

PY |D=0,X =xJ−1 = rxJ−1 ,xJ (yJ )δyJ + 1 − rxJ−1 ,xJ (yJ ) δyJ−1 .

Thus ext(C) is parametrized by Y J . Moreover, some support
points can be discarded because they lead to negative weights.


SSM with monotonicity on Y

In this case, there is no constraint on the distribution of X so one
can reason conditional on X = x.
Because
P(D = 1|X = x)
dPY |D=0,X =x (y ) = ρ(y , x)dPY |D=1,X =x (y ),
P(D = 0|X = x)

it suffices to find extremal elements on ρ(., x).
ρ(., x) is decreasing and satisfies the integral equation

1
ρ(y , x)dPY |D=1,X =x (y ) = − 1.
P(D = 1|X = x)

Extremal elements on ρ(., x) are heavyside functions satisfying an
“area restriction”.


SSM with monotonicity on Y

1
 ( y, x)  1
P ( D  1Y  y , X  x )

1
  ( y, x)dF
Y D 1, X  x
( y) 
P( D  1 X  x)
1

y

Figure: An example of extremal element under MY.

Proposition 1
Under MY in the sample selection model, we have

ext(C) = (P Y |D=1,X =x1 ,Y ≤y1 , ..., P Y |D=1,X =xJ ,Y ≤yJ ) : (y1 , ..., yJ ) ∈ Y J


SSM with double monotonicity

Still reasoning on ρ(., .), we should ﬁnd extremal parts of functions
such that
for all x, ρ(., x) is decreasing;
E(ρ(Y , x)|D = 1, X = x) = 1/P(D = 1|X = x) − 1;
for all y , ρ(y , x) ≤ ρ(y , x ) if x ≥ x .
Extremal elements are similar as before but more diﬃcult to
characterize.
One can show for instance that if X takes J values, then each ρ
takes at most J values but (ρ(., x1 ), ..., ρ(., xJ )) taken together do
not take more than 2J − 1 values.


SSM with double monotonicity

1
 ( y, x)  1
P ( D  1Y  y , X  x )

1
  ( y, x )dF
0 Y D 1, X  x0
( y) 
P( D  1 X  x0 )
1

1
  ( y, x )dF
1 Y D 1, X  x1
( y) 
P( D  1 X  x1 )
1

y

Figure: An example of extremal elements under MX, MY and with J = 2.

At the end, ext(C) is parametrized by R2J−1 × Y J(J−1) .


Extensions

If #Supp(X ) = +∞, if Assumption MX holds for X , it also
holds for Xn = n 1{X ∈[σ(i);σ(i+1)[} with
i=1
−∞ = σ(1) < ... < σ(n + 1) = +∞, then we get Θ0n , an
outer region for Θ0 .
In such case, we give technical conditions under which
Θ0 = n∈N Θ0n
If several covariates, results can be extended.


Conclusion

Still work in progress, comments are welcome...
Additional results: link with methodologies used by
Beresteanu et al. (random sets theory) or by Galichon et al.
(optimal transport) for problems where all the constraints can
be written as moment conditions.
More general: we can also use our result when constraints are
not given by moment conditions as in our application.

Supplementary material

Outline



Example 2: unobserved heterogeneity. In this case, we suppose
that the distribution of O conditional on the unobserved
heterogeneity U is known. Then P O|U (A|U = u, θ) is known (by
the model) and P O is known (by the data).

O|U
q(θ, P) = max sup P (A|u, θ)dP(u) − P(O ∈ A) , g (u, θ)dP(u)
A

This covers semiparametric nonlinear panel model:
O = ((Yt )t=1...T , (Xt )t=1...T ), U = ((Xt )t=1...T , α),
Yit = 1{Xit β0 + αi + εit ≥ 0} where the (εt )t=1...T are i.i.d.,
independent of (X , α) and with a known distribution and β0 is a
subvector of θ0 .
If θ0 = β0 then g (u, θ) = 0, if θ0 = (β0 , ∆0 ), where ∆0 is the
average marginal eﬀect of one binary covariate X1 then:
g (x1 , x2 , a, β, ∆) = E (Yt |X1t = 1, X2t = x2 , α = a, β) − E (Yt |X1t = 0, X2t = x2 , α = a, β) − ∆.

Applies to many other settings (see also Chernozhukov et al.,
2012). main .


Example 3: incomplet models and games with multiple equilibria.

Y2 = 1 Y2 = 0
Y1 = 1 (θ + ε1 , θ + ε2 ) (ε1 , 0)
Y1 = 0 (0, ε2 ) (0, 0)
Figure: Payoﬀs of entry game (with θ < 0)

Payoﬀ shifters are known by players, but econometricians only
knows that (ε1 , ε2 ) ∼ N (0, I2 ).
When (ε1 , ε2 ) ∈ [0; −θ]2 , two pure strategies
(Y1 , Y2 ) ∈ {(0, 1); (1, 0)} (and one mixed strategy) Back .


Steps of proof of main result.

The vector space of signed measure (M, |.|TV ) is the dual of
continuous functions with compact support (Cb , ||.||∞ ).
The Banach-Alaoglu theorem ensures that Rθ (as a closed
subset of the unit ball) is compact for the weak- topology.
Moreover the weak- topology is metrizable by the
Levy-Prokhorov metric.
Applying Choquet theorem: ∀P ∈ Rθ , there exists a
probability measure µP such that

gdP = gdQ dµP (Q) (3)
ext(Rθ )

for every g ∈ Cb .


Steps of proof of main result.
Considering gn → 1, one can extend the previous relation to
g = 1:
1= 1dQ dµP (Q)
ext(Rθ )

And then ∀Q ∈ Supp(µP ), Q ∈ ext(Rθ ) ∩ P = ext(Rθ )
It follows that Rθ = ∅ ⇒ ext(Rθ ) = ∅
For linear parameter: apply the Choquet Theorem to R
instead of Rθ and consider gn ∈ Cb → f to conclude that

fdP = fdQ dµP (Q)
ext(R)∩I(f )

This ensures that:

sup fdP ≤ sup fdQ
P∈R∩I(f ) Q∈ext(R)∩I(f )

Reverse inequality is straightforward. Back

Partial Identification Bounds with Missing Data and Unobserved Heterogeneity

Recommended

Recommended

More Related Content

What's hot

What's hot (17)

Viewers also liked

Viewers also liked (8)

Similar to Partial Identification Bounds with Missing Data and Unobserved Heterogeneity

Similar to Partial Identification Bounds with Missing Data and Unobserved Heterogeneity (20)

Partial Identification Bounds with Missing Data and Unobserved Heterogeneity