The document outlines the main points of a paper on partial identification with missing data:
1. It introduces the problem of partial identification in missing data problems and surveys related literature.
2. It formalizes the general framework as estimating a parameter θ0 that depends on an unobserved variable U based on an observed variable O that is related to U.
3. The main result shows that for a large class of missing data problems, bounds on the identified set Θ0 can be obtained by optimizing over the extreme parts of the restriction set Rθ rather than the full set, making the optimization problem tractable.
Partial Identification Bounds with Missing Data and Unobserved Heterogeneity
1. Introduction Formalization Main result Application Conclusion
Partial Identification with Missing Data
Laurent Davezies and Xavier d’Haultfœuille
CREST, Paris
2. Introduction Formalization Main result Application Conclusion
Outline
Introduction
Formalization of the problem
Main result
Application
Conclusion
3. Introduction Formalization Main result Application Conclusion
Outline
Introduction
Formalization of the problem
Main result
Application
Conclusion
4. Introduction Formalization Main result Application Conclusion
Partial Identification
The literature on missing data has traditionally focused on point
identification, at the price of imposing often implausible
assumptions (Missing at Random, exclusion restrictions, parametric
models...).
In the 90’s and 00’s, Manski has shown that it was possible to
weaken these conditions and still get informative bounds on
parameters of interest in many missing data problems.
The literature on partial identification is now large and applies to
many other settings:
limited dependent data models: Chesher (2010), Chesher et
al. (2011), Bontemps et al. (2012)...
panel data models: Honore and Tamer (2006), Chernozhukov
et al. (2012), Rosen (2012)...
incomplete models: Ciliberto and Tamer (2009), Galichon and
Henry (2011), Beresteanu et al. (2012)...
5. Introduction Formalization Main result Application Conclusion
Goal of this work
In missing data problems, partial identification often involves
infinite dimensional optimization, which may be impossible to solve
both in theory and computationally.
For specific models and parameters, closed forms of the bounds of
the identified set have been derived by the method of ”guess and
verify”. But methods to find bounds is often specific.
We show that for a large class of missing data problems (including
models with unobserved heterogeneity) and parameters, bounds
can be obtained by an optimization on a far smaller set than the
initial one, making the optimization often tractable.
This generalizes results of Chernozhukov et al. (2012) and
D’Haultfœuille and Rathelot (2012). Also related to Balke and
Pearle (1997), Honore and Tamer (2006) and Freyberger and
Horowitz (2012) but in an infinite dimensional setting.
6. Introduction Formalization Main result Application Conclusion
Outline
Introduction
Formalization of the problem
Main result
Application
Conclusion
7. Introduction Formalization Main result Application Conclusion
General framework
We are interested in a parameter θ0 that depends on P0 , the
probability of a (partly) unobserved r.v. U. Instead of U, we
observe the r.v. O (which is related to U) and whose probability
measure is Q0 . This restricts the set of distributions of U that are
compatible with Q0 . Moreover one can impose additional
restrictions (coming from a theory) on the distributions U, some
restrictions can depends on the value of the parameter θ.
q(θ0 , P0 ) = 0: definition of the parameter θ0 + restrictions on
P0 that depends on the value of θ0
P0 ∈ R: restrictions on P0 that do not depends on P0 .
Assumption 1 (Framework)
The true parameter θ0 and distribution P0 satisfy q(θ0 ; P0 ) = 0,
where q is known, and P0 ∈ R. These restrictions exhaust the
information on (θ0 ; P0 ).
8. Introduction Formalization Main result Application Conclusion
General framework
We are interested in Θ0 , the identification region of θ0 :
Θ0 = cl{θ ∈ Θ : ∃P ∈ R : q(θ, P) = 0}
We restrict our framework to the following assumption:
Assumption 2 (Convex restriction)
Rθ = {P ∈ R : q(θ, P) = 0} is convex for every θ ∈ Θ.
True for every problem considered in practice (to our best
knowledge).
We also provide more precise results when Assumption 2 is
replaced by the following condition.
Assumption 3 (Convex restriction and linear parameter)
R is convex and closed for the weak convergence. Moreover,
q(θ, P) = θ − f (u)dP(u) with f a known (or identifiable) real
function satisfying |f (u)|dP0 (u) < ∞.
9. Introduction Formalization Main result Application Conclusion
General framework
Example 1: missing data with a known link.
We are interested in a moment of U then θ0 = f (u)dP0 (u), but
we do not observe U but only O = s(U) where s is known,
noninjective in general (loss of information).
This case covers for instance:
sample selection model: U = (D, Y , X ) and O = (D, DY , X )
treatment effects/Roy models/Ecological inference:
U = (T , Y0 , Y1 , X ) and O = (T , YT , X )
nonresponse on X : U = (D, Y , X ) and O = (D, Y , DX )
10. Introduction Formalization Main result Application Conclusion
General framework
Example 1: missing data with a known link (continued).
In this case:
Q0 (A) = P(O ∈ A) = P(s(U) ∈ A) = 1{s(u) ∈ A}dP0 (u).
And then, q(θ, P) = θ − f (u)dP(u) and R is the following set of
probability distribution:
P : Q0 (A) = 1{s(u) ∈ A}dP(u) for measurable A
.
and |f (u)|dP(u) ≤ ∞
Alternatively, q and R can be adapted to other definition of θ0
(quantile, index of inequality, coefficient of regression...).
11. Introduction Formalization Main result Application Conclusion
General framework
Example 2: unobserved heterogeneity
Details .
Example 3: incomplet models and games with multiple equilibria.
Details .
12. Introduction Formalization Main result Application Conclusion
Outline
Introduction
Formalization of the problem
Main result
Application
Conclusion
13. Introduction Formalization Main result Application Conclusion
Extreme part of convex set of distribution
Difficult to compute Θ0 = {θ ∈ Θ : Rθ = ∅}. We try to get
simplifications of this problem.
For a closed and convex C, let ext(C) the set of extreme part of C,
i.e. elements of C that are not a mixture of elements of C.
Theorem 1 (Main result)
1. Under Assumptions 1 and 2,
Θ0 = {θ ∈ Θ : ext(Rθ ) = ∅}.
2. Moreover if Assumption 3 also holds, then:
.
θ = inf Θ0 = inf P∈ext(R)∩I(f ) f (u)dP(u)
.
θ = sup Θ0 = supP∈ext(R)∩I(f ) f (u)dP(u).
14. Introduction Formalization Main result Application Conclusion
Extreme part of convex set of distribution
In finite dimension, closed, bounded and convex sets are convex
hull of their extreme parts.
When we know that distribution P0 is concentrated on a finite
number of elements of Rk , then Rθ is included in a finite
dimensional vector space. And in this case the result is
straightforward.
We extend this result to the case where P0 is concentrated on any
closed subset of Rk , then Rθ is infinite dimensional.
15. Introduction Formalization Main result Application Conclusion
Extreme part of convex set of distribution
In infinite dimension, closed, bounded and convex sets are not
characterized by their extreme parts.
[No extreme parts] Let K denote the set of real valued continuous functions f
from [0; 1] such that supx∈[0;1] |f (x)| ≤ 1 and f (0) = 0. K is a bounded, closed
and convex set for the supremum norm in the Banach space of continuous
functions from [0; 1] to R. However ext(K) is empty.
[No closure of convex hull] Let K be the set of real valued continuous functions
f from [−1; 1] such that supx∈[−1;1] |f (x)| ≤ 1. K is a bounded, closed and
convex set of a Banach space, and
ext(K) = {f : f (x) = 1 for x ∈ [−1; 1] or f (x) = −1 for x ∈ [−1; 1]},
then cl(co(ext(K))) = K.
[No continuity of linear forms] Linear forms are not necessarily continuous in
finite dimensional space, then even if K = cl(co(ext(K))) and l is a linear form
on K one can have:
sup l(x) = sup l(x)
x∈K x∈ext(K)
Proof
16. Introduction Formalization Main result Application Conclusion
Outline
Introduction
Formalization of the problem
Main result
Application
Conclusion
17. Introduction Formalization Main result Application Conclusion
SSM without exclusion restriction
We are interested by a distribution of (Y , X ) but we only observe a
sample of variables (D, DY , X ), with D = 1 if Y is observed and
D = 0 if not.
In such case:
MAR: D ⊥ Y |X ⇒ point identification of PY ,X ,D
⊥
Standard Exclusion Restriction: Y ⊥ X , if it exists x such
⊥
that P(D = 1|X = x) = 1 ⇒ point identification of PY ,X ,D ,
otherwise only partial identification
Non Standard Restriction: D ⊥ X |Y , if Y and X are
⊥
sufficiently dependent (rank condition or completness
condition) ⇒ point identification, otherwise partial
identification
18. Introduction Formalization Main result Application Conclusion
SSM without exclusion restriction
Instead of exclusion relation, we assume monotonicity conditions:
Assumption 4 (Monotonicity in X : MX)
x → E(D|Y , X = x) is increasing almost surely.
Assumption 5 (Monotonicity in Y : MY)
y → E(D|Y = y , X ) is increasing almost surely.
θ0 is defined by a finite number of moments of (Y , X ) (ex:
regression, quantile...). In this case Rθ can be deduced from R.
In a first time we assume that Supp(X ) = {x1 , ..., xJ }, but Y has
any support.
19. Introduction Formalization Main result Application Conclusion
SSM with monotonicity on X
Instead of R, we can consider C, the set of possible p.d. of
Y |D = 0, X = xj for j = 1...J.
We show that no constraint is imposed on PY |D=0,X =xJ . Thus extremal
elements are simply dirac. Then we show that
ρ(y , xj )
fY |D=0,X =xj (y ) = rxj+1 ,xj (y ) fY |D=0,X =xj+1 (y ), (1)
ρ(y , xj+1 )
with ρ(y , x) = 1/P(D = 1|Y = y , X = x) − 1 and
P(D = 1|X = xj )P(D = 0|X = xi )fY |D=1,X =xj (y )
rxi ,xj (y ) = .
P(D = 1|X = xi )P(D = 0|X = xj )fY |D=1,X =xi (y )
By MX, the ratio in (1) is greater than one. Thus,
fY |D=0,X =xj (y ) = rxj ,xj+1 (y )fY |D=0,X =xj+1 (y ) + qj (y ), with qj ≥ 0. (2)
20. Introduction Formalization Main result Application Conclusion
SSM with monotonicity on X
qj may be seen as the density of a (nonprobability) measure Qj .
Because Qj admits no restriction, extremal elements for Qj are
weighted dirac.
Then, by induction, extremal elements of PY |D=0,X =xj admit at
most J − j + 1 support points (and among them J − j are common
with PY |D=0,X =xj+1 ).
Once the support points (y1 , ..., yJ ) ∈ Y J have been determined,
the corresponding weights are given by (2). For instance
PY |D=0,X =xJ−1 = rxJ−1 ,xJ (yJ )δyJ + 1 − rxJ−1 ,xJ (yJ ) δyJ−1 .
Thus ext(C) is parametrized by Y J . Moreover, some support
points can be discarded because they lead to negative weights.
21. Introduction Formalization Main result Application Conclusion
SSM with monotonicity on Y
In this case, there is no constraint on the distribution of X so one
can reason conditional on X = x.
Because
P(D = 1|X = x)
dPY |D=0,X =x (y ) = ρ(y , x)dPY |D=1,X =x (y ),
P(D = 0|X = x)
it suffices to find extremal elements on ρ(., x).
ρ(., x) is decreasing and satisfies the integral equation
1
ρ(y , x)dPY |D=1,X =x (y ) = − 1.
P(D = 1|X = x)
Extremal elements on ρ(., x) are heavyside functions satisfying an
“area restriction”.
22. Introduction Formalization Main result Application Conclusion
SSM with monotonicity on Y
1
( y, x) 1
P ( D 1Y y , X x )
1
( y, x)dF
Y D 1, X x
( y)
P( D 1 X x)
1
y
Figure: An example of extremal element under MY.
Proposition 1
Under MY in the sample selection model, we have
ext(C) = (P Y |D=1,X =x1 ,Y ≤y1 , ..., P Y |D=1,X =xJ ,Y ≤yJ ) : (y1 , ..., yJ ) ∈ Y J
23. Introduction Formalization Main result Application Conclusion
SSM with double monotonicity
Still reasoning on ρ(., .), we should find extremal parts of functions
such that
for all x, ρ(., x) is decreasing;
E(ρ(Y , x)|D = 1, X = x) = 1/P(D = 1|X = x) − 1;
for all y , ρ(y , x) ≤ ρ(y , x ) if x ≥ x .
Extremal elements are similar as before but more difficult to
characterize.
One can show for instance that if X takes J values, then each ρ
takes at most J values but (ρ(., x1 ), ..., ρ(., xJ )) taken together do
not take more than 2J − 1 values.
24. Introduction Formalization Main result Application Conclusion
SSM with double monotonicity
1
( y, x) 1
P ( D 1Y y , X x )
1
( y, x )dF
0 Y D 1, X x0
( y)
P( D 1 X x0 )
1
1
( y, x )dF
1 Y D 1, X x1
( y)
P( D 1 X x1 )
1
y
Figure: An example of extremal elements under MX, MY and with J = 2.
At the end, ext(C) is parametrized by R2J−1 × Y J(J−1) .
25. Introduction Formalization Main result Application Conclusion
Extensions
If #Supp(X ) = +∞, if Assumption MX holds for X , it also
holds for Xn = n 1{X ∈[σ(i);σ(i+1)[} with
i=1
−∞ = σ(1) < ... < σ(n + 1) = +∞, then we get Θ0n , an
outer region for Θ0 .
In such case, we give technical conditions under which
Θ0 = n∈N Θ0n
If several covariates, results can be extended.
26. Introduction Formalization Main result Application Conclusion
Outline
Introduction
Formalization of the problem
Main result
Application
Conclusion
27. Introduction Formalization Main result Application Conclusion
Conclusion
Still work in progress, comments are welcome...
Additional results: link with methodologies used by
Beresteanu et al. (random sets theory) or by Galichon et al.
(optimal transport) for problems where all the constraints can
be written as moment conditions.
More general: we can also use our result when constraints are
not given by moment conditions as in our application.
29. Supplementary material
Example 2: unobserved heterogeneity. In this case, we suppose
that the distribution of O conditional on the unobserved
heterogeneity U is known. Then P O|U (A|U = u, θ) is known (by
the model) and P O is known (by the data).
O|U
q(θ, P) = max sup P (A|u, θ)dP(u) − P(O ∈ A) , g (u, θ)dP(u)
A
This covers semiparametric nonlinear panel model:
O = ((Yt )t=1...T , (Xt )t=1...T ), U = ((Xt )t=1...T , α),
Yit = 1{Xit β0 + αi + εit ≥ 0} where the (εt )t=1...T are i.i.d.,
independent of (X , α) and with a known distribution and β0 is a
subvector of θ0 .
If θ0 = β0 then g (u, θ) = 0, if θ0 = (β0 , ∆0 ), where ∆0 is the
average marginal effect of one binary covariate X1 then:
g (x1 , x2 , a, β, ∆) = E (Yt |X1t = 1, X2t = x2 , α = a, β) − E (Yt |X1t = 0, X2t = x2 , α = a, β) − ∆.
Applies to many other settings (see also Chernozhukov et al.,
2012). main .
30. Supplementary material
Example 3: incomplet models and games with multiple equilibria.
Y2 = 1 Y2 = 0
Y1 = 1 (θ + ε1 , θ + ε2 ) (ε1 , 0)
Y1 = 0 (0, ε2 ) (0, 0)
Figure: Payoffs of entry game (with θ < 0)
Payoff shifters are known by players, but econometricians only
knows that (ε1 , ε2 ) ∼ N (0, I2 ).
When (ε1 , ε2 ) ∈ [0; −θ]2 , two pure strategies
(Y1 , Y2 ) ∈ {(0, 1); (1, 0)} (and one mixed strategy) Back .
31. Supplementary material
Steps of proof of main result.
The vector space of signed measure (M, |.|TV ) is the dual of
continuous functions with compact support (Cb , ||.||∞ ).
The Banach-Alaoglu theorem ensures that Rθ (as a closed
subset of the unit ball) is compact for the weak- topology.
Moreover the weak- topology is metrizable by the
Levy-Prokhorov metric.
Applying Choquet theorem: ∀P ∈ Rθ , there exists a
probability measure µP such that
gdP = gdQ dµP (Q) (3)
ext(Rθ )
for every g ∈ Cb .
32. Supplementary material
Steps of proof of main result.
Considering gn → 1, one can extend the previous relation to
g = 1:
1= 1dQ dµP (Q)
ext(Rθ )
And then ∀Q ∈ Supp(µP ), Q ∈ ext(Rθ ) ∩ P = ext(Rθ )
It follows that Rθ = ∅ ⇒ ext(Rθ ) = ∅
For linear parameter: apply the Choquet Theorem to R
instead of Rθ and consider gn ∈ Cb → f to conclude that
fdP = fdQ dµP (Q)
ext(R)∩I(f )
This ensures that:
sup fdP ≤ sup fdQ
P∈R∩I(f ) Q∈ext(R)∩I(f )
Reverse inequality is straightforward. Back