1.
Introduction Formalization Main result Application Conclusion Partial Identiﬁcation with Missing Data Laurent Davezies and Xavier d’Haultfœuille CREST, Paris
2.
Introduction Formalization Main result Application ConclusionOutline Introduction Formalization of the problem Main result Application Conclusion
3.
Introduction Formalization Main result Application ConclusionOutline Introduction Formalization of the problem Main result Application Conclusion
4.
Introduction Formalization Main result Application ConclusionPartial Identiﬁcation The literature on missing data has traditionally focused on point identiﬁcation, at the price of imposing often implausible assumptions (Missing at Random, exclusion restrictions, parametric models...). In the 90’s and 00’s, Manski has shown that it was possible to weaken these conditions and still get informative bounds on parameters of interest in many missing data problems. The literature on partial identiﬁcation is now large and applies to many other settings: limited dependent data models: Chesher (2010), Chesher et al. (2011), Bontemps et al. (2012)... panel data models: Honore and Tamer (2006), Chernozhukov et al. (2012), Rosen (2012)... incomplete models: Ciliberto and Tamer (2009), Galichon and Henry (2011), Beresteanu et al. (2012)...
5.
Introduction Formalization Main result Application ConclusionGoal of this work In missing data problems, partial identiﬁcation often involves inﬁnite dimensional optimization, which may be impossible to solve both in theory and computationally. For speciﬁc models and parameters, closed forms of the bounds of the identiﬁed set have been derived by the method of ”guess and verify”. But methods to ﬁnd bounds is often speciﬁc. We show that for a large class of missing data problems (including models with unobserved heterogeneity) and parameters, bounds can be obtained by an optimization on a far smaller set than the initial one, making the optimization often tractable. This generalizes results of Chernozhukov et al. (2012) and D’Haultfœuille and Rathelot (2012). Also related to Balke and Pearle (1997), Honore and Tamer (2006) and Freyberger and Horowitz (2012) but in an inﬁnite dimensional setting.
6.
Introduction Formalization Main result Application ConclusionOutline Introduction Formalization of the problem Main result Application Conclusion
7.
Introduction Formalization Main result Application ConclusionGeneral framework We are interested in a parameter θ0 that depends on P0 , the probability of a (partly) unobserved r.v. U. Instead of U, we observe the r.v. O (which is related to U) and whose probability measure is Q0 . This restricts the set of distributions of U that are compatible with Q0 . Moreover one can impose additional restrictions (coming from a theory) on the distributions U, some restrictions can depends on the value of the parameter θ. q(θ0 , P0 ) = 0: deﬁnition of the parameter θ0 + restrictions on P0 that depends on the value of θ0 P0 ∈ R: restrictions on P0 that do not depends on P0 . Assumption 1 (Framework) The true parameter θ0 and distribution P0 satisfy q(θ0 ; P0 ) = 0, where q is known, and P0 ∈ R. These restrictions exhaust the information on (θ0 ; P0 ).
8.
Introduction Formalization Main result Application ConclusionGeneral framework We are interested in Θ0 , the identiﬁcation region of θ0 : Θ0 = cl{θ ∈ Θ : ∃P ∈ R : q(θ, P) = 0} We restrict our framework to the following assumption: Assumption 2 (Convex restriction) Rθ = {P ∈ R : q(θ, P) = 0} is convex for every θ ∈ Θ. True for every problem considered in practice (to our best knowledge). We also provide more precise results when Assumption 2 is replaced by the following condition. Assumption 3 (Convex restriction and linear parameter) R is convex and closed for the weak convergence. Moreover, q(θ, P) = θ − f (u)dP(u) with f a known (or identiﬁable) real function satisfying |f (u)|dP0 (u) < ∞.
9.
Introduction Formalization Main result Application ConclusionGeneral framework Example 1: missing data with a known link. We are interested in a moment of U then θ0 = f (u)dP0 (u), but we do not observe U but only O = s(U) where s is known, noninjective in general (loss of information). This case covers for instance: sample selection model: U = (D, Y , X ) and O = (D, DY , X ) treatment eﬀects/Roy models/Ecological inference: U = (T , Y0 , Y1 , X ) and O = (T , YT , X ) nonresponse on X : U = (D, Y , X ) and O = (D, Y , DX )
10.
Introduction Formalization Main result Application ConclusionGeneral framework Example 1: missing data with a known link (continued). In this case: Q0 (A) = P(O ∈ A) = P(s(U) ∈ A) = 1{s(u) ∈ A}dP0 (u). And then, q(θ, P) = θ − f (u)dP(u) and R is the following set of probability distribution: P : Q0 (A) = 1{s(u) ∈ A}dP(u) for measurable A . and |f (u)|dP(u) ≤ ∞ Alternatively, q and R can be adapted to other deﬁnition of θ0 (quantile, index of inequality, coeﬃcient of regression...).
11.
Introduction Formalization Main result Application ConclusionGeneral framework Example 2: unobserved heterogeneity Details . Example 3: incomplet models and games with multiple equilibria. Details .
12.
Introduction Formalization Main result Application ConclusionOutline Introduction Formalization of the problem Main result Application Conclusion
13.
Introduction Formalization Main result Application ConclusionExtreme part of convex set of distribution Diﬃcult to compute Θ0 = {θ ∈ Θ : Rθ = ∅}. We try to get simpliﬁcations of this problem. For a closed and convex C, let ext(C) the set of extreme part of C, i.e. elements of C that are not a mixture of elements of C. Theorem 1 (Main result) 1. Under Assumptions 1 and 2, Θ0 = {θ ∈ Θ : ext(Rθ ) = ∅}. 2. Moreover if Assumption 3 also holds, then: . θ = inf Θ0 = inf P∈ext(R)∩I(f ) f (u)dP(u) . θ = sup Θ0 = supP∈ext(R)∩I(f ) f (u)dP(u).
14.
Introduction Formalization Main result Application ConclusionExtreme part of convex set of distribution In ﬁnite dimension, closed, bounded and convex sets are convex hull of their extreme parts. When we know that distribution P0 is concentrated on a ﬁnite number of elements of Rk , then Rθ is included in a ﬁnite dimensional vector space. And in this case the result is straightforward. We extend this result to the case where P0 is concentrated on any closed subset of Rk , then Rθ is inﬁnite dimensional.
15.
Introduction Formalization Main result Application ConclusionExtreme part of convex set of distribution In inﬁnite dimension, closed, bounded and convex sets are not characterized by their extreme parts. [No extreme parts] Let K denote the set of real valued continuous functions f from [0; 1] such that supx∈[0;1] |f (x)| ≤ 1 and f (0) = 0. K is a bounded, closed and convex set for the supremum norm in the Banach space of continuous functions from [0; 1] to R. However ext(K) is empty. [No closure of convex hull] Let K be the set of real valued continuous functions f from [−1; 1] such that supx∈[−1;1] |f (x)| ≤ 1. K is a bounded, closed and convex set of a Banach space, and ext(K) = {f : f (x) = 1 for x ∈ [−1; 1] or f (x) = −1 for x ∈ [−1; 1]}, then cl(co(ext(K))) = K. [No continuity of linear forms] Linear forms are not necessarily continuous in ﬁnite dimensional space, then even if K = cl(co(ext(K))) and l is a linear form on K one can have: sup l(x) = sup l(x) x∈K x∈ext(K) Proof
16.
Introduction Formalization Main result Application ConclusionOutline Introduction Formalization of the problem Main result Application Conclusion
17.
Introduction Formalization Main result Application ConclusionSSM without exclusion restriction We are interested by a distribution of (Y , X ) but we only observe a sample of variables (D, DY , X ), with D = 1 if Y is observed and D = 0 if not. In such case: MAR: D ⊥ Y |X ⇒ point identiﬁcation of PY ,X ,D ⊥ Standard Exclusion Restriction: Y ⊥ X , if it exists x such ⊥ that P(D = 1|X = x) = 1 ⇒ point identiﬁcation of PY ,X ,D , otherwise only partial identiﬁcation Non Standard Restriction: D ⊥ X |Y , if Y and X are ⊥ suﬃciently dependent (rank condition or completness condition) ⇒ point identiﬁcation, otherwise partial identiﬁcation
18.
Introduction Formalization Main result Application ConclusionSSM without exclusion restriction Instead of exclusion relation, we assume monotonicity conditions: Assumption 4 (Monotonicity in X : MX) x → E(D|Y , X = x) is increasing almost surely. Assumption 5 (Monotonicity in Y : MY) y → E(D|Y = y , X ) is increasing almost surely. θ0 is deﬁned by a ﬁnite number of moments of (Y , X ) (ex: regression, quantile...). In this case Rθ can be deduced from R. In a ﬁrst time we assume that Supp(X ) = {x1 , ..., xJ }, but Y has any support.
19.
Introduction Formalization Main result Application ConclusionSSM with monotonicity on X Instead of R, we can consider C, the set of possible p.d. of Y |D = 0, X = xj for j = 1...J. We show that no constraint is imposed on PY |D=0,X =xJ . Thus extremal elements are simply dirac. Then we show that ρ(y , xj ) fY |D=0,X =xj (y ) = rxj+1 ,xj (y ) fY |D=0,X =xj+1 (y ), (1) ρ(y , xj+1 ) with ρ(y , x) = 1/P(D = 1|Y = y , X = x) − 1 and P(D = 1|X = xj )P(D = 0|X = xi )fY |D=1,X =xj (y ) rxi ,xj (y ) = . P(D = 1|X = xi )P(D = 0|X = xj )fY |D=1,X =xi (y ) By MX, the ratio in (1) is greater than one. Thus, fY |D=0,X =xj (y ) = rxj ,xj+1 (y )fY |D=0,X =xj+1 (y ) + qj (y ), with qj ≥ 0. (2)
20.
Introduction Formalization Main result Application ConclusionSSM with monotonicity on X qj may be seen as the density of a (nonprobability) measure Qj . Because Qj admits no restriction, extremal elements for Qj are weighted dirac. Then, by induction, extremal elements of PY |D=0,X =xj admit at most J − j + 1 support points (and among them J − j are common with PY |D=0,X =xj+1 ). Once the support points (y1 , ..., yJ ) ∈ Y J have been determined, the corresponding weights are given by (2). For instance PY |D=0,X =xJ−1 = rxJ−1 ,xJ (yJ )δyJ + 1 − rxJ−1 ,xJ (yJ ) δyJ−1 . Thus ext(C) is parametrized by Y J . Moreover, some support points can be discarded because they lead to negative weights.
21.
Introduction Formalization Main result Application ConclusionSSM with monotonicity on Y In this case, there is no constraint on the distribution of X so one can reason conditional on X = x. Because P(D = 1|X = x) dPY |D=0,X =x (y ) = ρ(y , x)dPY |D=1,X =x (y ), P(D = 0|X = x) it suﬃces to ﬁnd extremal elements on ρ(., x). ρ(., x) is decreasing and satisﬁes the integral equation 1 ρ(y , x)dPY |D=1,X =x (y ) = − 1. P(D = 1|X = x) Extremal elements on ρ(., x) are heavyside functions satisfying an “area restriction”.
22.
Introduction Formalization Main result Application ConclusionSSM with monotonicity on Y 1 ( y, x) 1 P ( D 1Y y , X x ) 1 ( y, x)dF Y D 1, X x ( y) P( D 1 X x) 1 y Figure: An example of extremal element under MY. Proposition 1 Under MY in the sample selection model, we have ext(C) = (P Y |D=1,X =x1 ,Y ≤y1 , ..., P Y |D=1,X =xJ ,Y ≤yJ ) : (y1 , ..., yJ ) ∈ Y J
23.
Introduction Formalization Main result Application ConclusionSSM with double monotonicity Still reasoning on ρ(., .), we should ﬁnd extremal parts of functions such that for all x, ρ(., x) is decreasing; E(ρ(Y , x)|D = 1, X = x) = 1/P(D = 1|X = x) − 1; for all y , ρ(y , x) ≤ ρ(y , x ) if x ≥ x . Extremal elements are similar as before but more diﬃcult to characterize. One can show for instance that if X takes J values, then each ρ takes at most J values but (ρ(., x1 ), ..., ρ(., xJ )) taken together do not take more than 2J − 1 values.
24.
Introduction Formalization Main result Application ConclusionSSM with double monotonicity 1 ( y, x) 1 P ( D 1Y y , X x ) 1 ( y, x )dF 0 Y D 1, X x0 ( y) P( D 1 X x0 ) 1 1 ( y, x )dF 1 Y D 1, X x1 ( y) P( D 1 X x1 ) 1 y Figure: An example of extremal elements under MX, MY and with J = 2. At the end, ext(C) is parametrized by R2J−1 × Y J(J−1) .
25.
Introduction Formalization Main result Application ConclusionExtensions If #Supp(X ) = +∞, if Assumption MX holds for X , it also holds for Xn = n 1{X ∈[σ(i);σ(i+1)[} with i=1 −∞ = σ(1) < ... < σ(n + 1) = +∞, then we get Θ0n , an outer region for Θ0 . In such case, we give technical conditions under which Θ0 = n∈N Θ0n If several covariates, results can be extended.
26.
Introduction Formalization Main result Application ConclusionOutline Introduction Formalization of the problem Main result Application Conclusion
27.
Introduction Formalization Main result Application ConclusionConclusion Still work in progress, comments are welcome... Additional results: link with methodologies used by Beresteanu et al. (random sets theory) or by Galichon et al. (optimal transport) for problems where all the constraints can be written as moment conditions. More general: we can also use our result when constraints are not given by moment conditions as in our application.
28.
Supplementary materialOutline Supplementary material
29.
Supplementary material Example 2: unobserved heterogeneity. In this case, we suppose that the distribution of O conditional on the unobserved heterogeneity U is known. Then P O|U (A|U = u, θ) is known (by the model) and P O is known (by the data). O|U q(θ, P) = max sup P (A|u, θ)dP(u) − P(O ∈ A) , g (u, θ)dP(u) A This covers semiparametric nonlinear panel model: O = ((Yt )t=1...T , (Xt )t=1...T ), U = ((Xt )t=1...T , α), Yit = 1{Xit β0 + αi + εit ≥ 0} where the (εt )t=1...T are i.i.d., independent of (X , α) and with a known distribution and β0 is a subvector of θ0 . If θ0 = β0 then g (u, θ) = 0, if θ0 = (β0 , ∆0 ), where ∆0 is the average marginal eﬀect of one binary covariate X1 then: g (x1 , x2 , a, β, ∆) = E (Yt |X1t = 1, X2t = x2 , α = a, β) − E (Yt |X1t = 0, X2t = x2 , α = a, β) − ∆. Applies to many other settings (see also Chernozhukov et al., 2012). main .
30.
Supplementary material Example 3: incomplet models and games with multiple equilibria. Y2 = 1 Y2 = 0 Y1 = 1 (θ + ε1 , θ + ε2 ) (ε1 , 0) Y1 = 0 (0, ε2 ) (0, 0) Figure: Payoﬀs of entry game (with θ < 0) Payoﬀ shifters are known by players, but econometricians only knows that (ε1 , ε2 ) ∼ N (0, I2 ). When (ε1 , ε2 ) ∈ [0; −θ]2 , two pure strategies (Y1 , Y2 ) ∈ {(0, 1); (1, 0)} (and one mixed strategy) Back .
31.
Supplementary materialSteps of proof of main result. The vector space of signed measure (M, |.|TV ) is the dual of continuous functions with compact support (Cb , ||.||∞ ). The Banach-Alaoglu theorem ensures that Rθ (as a closed subset of the unit ball) is compact for the weak- topology. Moreover the weak- topology is metrizable by the Levy-Prokhorov metric. Applying Choquet theorem: ∀P ∈ Rθ , there exists a probability measure µP such that gdP = gdQ dµP (Q) (3) ext(Rθ ) for every g ∈ Cb .
32.
Supplementary materialSteps of proof of main result. Considering gn → 1, one can extend the previous relation to g = 1: 1= 1dQ dµP (Q) ext(Rθ ) And then ∀Q ∈ Supp(µP ), Q ∈ ext(Rθ ) ∩ P = ext(Rθ ) It follows that Rθ = ∅ ⇒ ext(Rθ ) = ∅ For linear parameter: apply the Choquet Theorem to R instead of Rθ and consider gn ∈ Cb → f to conclude that fdP = fdQ dµP (Q) ext(R)∩I(f ) This ensures that: sup fdP ≤ sup fdQ P∈R∩I(f ) Q∈ext(R)∩I(f ) Reverse inequality is straightforward. Back
Clipping is a handy way to collect and organize the most important slides from a presentation. You can keep your great finds in clipboards organized around topics.
Be the first to comment