Let's Practice What We Preach: Likelihood Methods for Monte Carlo Data
1. Let’s Practice What We Preach:
Likelihood Methods for Monte Carlo Data
Xiao-Li Meng
Department of Statistics, Harvard University
September 24, 2011
logo
Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 1 / 23
2. Let’s Practice What We Preach:
Likelihood Methods for Monte Carlo Data
Xiao-Li Meng
Department of Statistics, Harvard University
September 24, 2011
Based on
Kong, McCullagh, Meng, Nicolae, and Tan (2003, JRSS-B, with
discussions);
Kong, McCullagh, Meng, and Nicolae (2006, Doksum Festschrift);
Tan (2004, JASA); ..., Meng and Tan (201X)
logo
Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 1 / 23
8. Importance sampling (IS)
Estimand:
q1 (x)
c1 = q1 (x)µ(dx) = p2 (x)µ(dx).
Γ Γ p2 (x)
Data: {Xi2 , i = 1, . . . n2 } ∼ p2 = q2 /c2
Estimating Equation (EE):
c1 q1 (X )
r≡ = E2 .
c2 q2 (X )
The EE estimator:
n2
1 q1 (Xi2 )
ˆ=
r
n2 q2 (Xi2 )
i=1
Standard IS estimator for c1 when c2 = 1. logo
Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 2 / 23
9. What about MLE?
logo
Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 3 / 23
10. What about MLE?
The “likelihood” is:
n2
f (X12 . . . Xn2 2 ) = p2 (Xi2 ) — free of the estimand c1 !
i=1
logo
Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 3 / 23
11. What about MLE?
The “likelihood” is:
n2
f (X12 . . . Xn2 2 ) = p2 (Xi2 ) — free of the estimand c1 !
i=1
So why are {Xi2 , i = 1, . . . n2 } even relevant?
Violation of likelihood principle?
logo
Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 3 / 23
12. What about MLE?
The “likelihood” is:
n2
f (X12 . . . Xn2 2 ) = p2 (Xi2 ) — free of the estimand c1 !
i=1
So why are {Xi2 , i = 1, . . . n2 } even relevant?
Violation of likelihood principle?
What are we “inferring”?
What is the “unknown” model parameter?
logo
Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 3 / 23
13. Bridge sampling (BS)
logo
Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 4 / 23
18. What about MLE?
logo
Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 5 / 23
19. What about MLE?
The “likelihood” is:
2 nj
qj (Xij ) −n −n
∝ c1 1 c2 2 — free of data!
cj
j=1 i=1
logo
Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 5 / 23
20. What about MLE?
The “likelihood” is:
2 nj
qj (Xij ) −n −n
∝ c1 1 c2 2 — free of data!
cj
j=1 i=1
What went wrong: cj is not “free parameter” because
cj = Γ qj (x)µ(dx) and qj is known.
logo
Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 5 / 23
21. What about MLE?
The “likelihood” is:
2 nj
qj (Xij ) −n −n
∝ c1 1 c2 2 — free of data!
cj
j=1 i=1
What went wrong: cj is not “free parameter” because
cj = Γ qj (x)µ(dx) and qj is known.
So what is the “unknown” model parameter?
logo
Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 5 / 23
22. What about MLE?
The “likelihood” is:
2 nj
qj (Xij ) −n −n
∝ c1 1 c2 2 — free of data!
cj
j=1 i=1
What went wrong: cj is not “free parameter” because
cj = Γ qj (x)µ(dx) and qj is known.
So what is the “unknown” model parameter?
Turns out ˆO is the same as Bennett’s (1976) optimal acceptance
r
ratio estimator, as well as Geyer’s (1994) reversed logistic regression
estimator.
logo
Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 5 / 23
23. What about MLE?
The “likelihood” is:
2 nj
qj (Xij ) −n −n
∝ c1 1 c2 2 — free of data!
cj
j=1 i=1
What went wrong: cj is not “free parameter” because
cj = Γ qj (x)µ(dx) and qj is known.
So what is the “unknown” model parameter?
Turns out ˆO is the same as Bennett’s (1976) optimal acceptance
r
ratio estimator, as well as Geyer’s (1994) reversed logistic regression
estimator.
So why is that? Can it be improved upon without any “sleight of
hand”?
logo
Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 5 / 23
24. Pretending the measure is unknown!
logo
Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 6 / 23
25. Pretending the measure is unknown!
Because
c= q(x)µ(dx),
Γ
and q is known in the sense that we can evaluate it at any sample
value, the only way to make c “unknown” is to assume the underlying
measure µ is “unknown”.
logo
Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 6 / 23
26. Pretending the measure is unknown!
Because
c= q(x)µ(dx),
Γ
and q is known in the sense that we can evaluate it at any sample
value, the only way to make c “unknown” is to assume the underlying
measure µ is “unknown”.
This is natural because Monte Carlo simulation means we use samples
to represent, and thus estimate/infer, the underlying population
q(x)µ(dx), and hence estimate/infer µ since q is known.
logo
Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 6 / 23
27. Pretending the measure is unknown!
Because
c= q(x)µ(dx),
Γ
and q is known in the sense that we can evaluate it at any sample
value, the only way to make c “unknown” is to assume the underlying
measure µ is “unknown”.
This is natural because Monte Carlo simulation means we use samples
to represent, and thus estimate/infer, the underlying population
q(x)µ(dx), and hence estimate/infer µ since q is known.
Monte Carlo integration is about finding a tractable discrete µ to
ˆ
approximate the intractable µ.
logo
Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 6 / 23
34. Importance Sampling Likelihood
Thus the MLE for r ≡ c1 /c2 is
n2
1 q1 (Xi2 )
ˆ=
r q1 (x)ˆ(dx) =
µ
n2 q2 (Xi2 )
i=1
logo
Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 8 / 23
35. Importance Sampling Likelihood
Thus the MLE for r ≡ c1 /c2 is
n2
1 q1 (Xi2 )
ˆ=
r q1 (x)ˆ(dx) =
µ
n2 q2 (Xi2 )
i=1
When c2 = 1, q2 = p2 , standard IS estimator for c1 is obtained.
logo
Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 8 / 23
36. Importance Sampling Likelihood
Thus the MLE for r ≡ c1 /c2 is
n2
1 q1 (Xi2 )
ˆ=
r q1 (x)ˆ(dx) =
µ
n2 q2 (Xi2 )
i=1
When c2 = 1, q2 = p2 , standard IS estimator for c1 is obtained.
{X(i2) , i = 1, . . . n2 } is (minimum) sufficient for µ on
x ∈ S2 = {x : q2 (x) > 0}, and hence c1 is guaranteed to be
ˆ
consistent only when S1 ⊂ S2 .
logo
Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 8 / 23
43. Bridge Sampling Likelihood
ˆ
MLE for µ given by equating the canonical sufficient statistics P to
its expectation:
J
ˆ
nP(dx) = nj cj−1 qj (x)ˆ(dx),
ˆ µ
j=1
ˆ
nP(dx)
µ(dx) =
ˆ J
. (A)
ˆ−1
j=1 nj cj qj (x)
logo
Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 10 / 23
44. Bridge Sampling Likelihood
ˆ
MLE for µ given by equating the canonical sufficient statistics P to
its expectation:
J
ˆ
nP(dx) = nj cj−1 qj (x)ˆ(dx),
ˆ µ
j=1
ˆ
nP(dx)
µ(dx) =
ˆ J
. (A)
ˆ−1
j=1 nj cj qj (x)
Consequently, the MLE for {c1 , . . . , cJ } must satisfy
J nj
qr (xij )
cr =
ˆ qr (x) d µ =
ˆ J
. (B)
Γ j=1 i=1 s=1 ˆ−1
ns cs qs (xij )
logo
Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 10 / 23
45. Bridge Sampling Likelihood
ˆ
MLE for µ given by equating the canonical sufficient statistics P to
its expectation:
J
ˆ
nP(dx) = nj cj−1 qj (x)ˆ(dx),
ˆ µ
j=1
ˆ
nP(dx)
µ(dx) =
ˆ J
. (A)
ˆ−1
j=1 nj cj qj (x)
Consequently, the MLE for {c1 , . . . , cJ } must satisfy
J nj
qr (xij )
cr =
ˆ qr (x) d µ =
ˆ J
. (B)
Γ j=1 i=1 s=1 ˆ−1
ns cs qs (xij )
(B) is the “dual” equation of (A), and is also the same as the logo
equation for optimal multiple bridge sampling estimator (Tan 2004).
Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 10 / 23
46. But We Can Ignore Less ...
logo
Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 11 / 23
47. But We Can Ignore Less ...
To restrict the parameter space for µ by using some knowledge of the
known µ, that it, to set up a sub-model.
logo
Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 11 / 23
48. But We Can Ignore Less ...
To restrict the parameter space for µ by using some knowledge of the
known µ, that it, to set up a sub-model.
The new MLE has a smaller asymptotic variance under the submodel
than under the full model.
logo
Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 11 / 23
49. But We Can Ignore Less ...
To restrict the parameter space for µ by using some knowledge of the
known µ, that it, to set up a sub-model.
The new MLE has a smaller asymptotic variance under the submodel
than under the full model.
Examples:
logo
Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 11 / 23
50. But We Can Ignore Less ...
To restrict the parameter space for µ by using some knowledge of the
known µ, that it, to set up a sub-model.
The new MLE has a smaller asymptotic variance under the submodel
than under the full model.
Examples:
Group-invariance submodel
logo
Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 11 / 23
51. But We Can Ignore Less ...
To restrict the parameter space for µ by using some knowledge of the
known µ, that it, to set up a sub-model.
The new MLE has a smaller asymptotic variance under the submodel
than under the full model.
Examples:
Group-invariance submodel
Linear submodel
logo
Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 11 / 23
52. But We Can Ignore Less ...
To restrict the parameter space for µ by using some knowledge of the
known µ, that it, to set up a sub-model.
The new MLE has a smaller asymptotic variance under the submodel
than under the full model.
Examples:
Group-invariance submodel
Linear submodel
Log-linear submodel
logo
Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 11 / 23
53. An Universally Improved IS
logo
Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 12 / 23
54. An Universally Improved IS
Estimand: r = c1 /c2 ; cj = Rd qj (x)µ(dx)
logo
Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 12 / 23
55. An Universally Improved IS
Estimand: r = c1 /c2 ; cj = R d qj (x)µ(dx)
−1
Data: {Xi2 , i = 1, . . . n2 } i.i.d ∼ c2 q2 µ(dx)
logo
Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 12 / 23
56. An Universally Improved IS
Estimand: r = c1 /c2 ; cj = R d qj (x)µ(dx)
−1
Data: {Xi2 , i = 1, . . . n2 } i.i.d ∼ c2 q2 µ(dx)
Taking G = {Id , −Id } leads to
logo
Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 12 / 23
57. An Universally Improved IS
Estimand: r = c1 /c2 ; cj = R d qj (x)µ(dx)
−1
Data: {Xi2 , i = 1, . . . n2 } i.i.d ∼ c2 q2 µ(dx)
Taking G = {Id , −Id } leads to
n2
1 q1 (Xi2 ) + q1 (−Xi2 )
ˆG =
r .
n2 q2 (Xi2 ) + q2 (−Xi2 )
i=1
logo
Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 12 / 23
58. An Universally Improved IS
Estimand: r = c1 /c2 ; cj = R d qj (x)µ(dx)
−1
Data: {Xi2 , i = 1, . . . n2 } i.i.d ∼ c2 q2 µ(dx)
Taking G = {Id , −Id } leads to
n2
1 q1 (Xi2 ) + q1 (−Xi2 )
ˆG =
r .
n2 q2 (Xi2 ) + q2 (−Xi2 )
i=1
Because of the Rao-Blackwellization, V(ˆG ) ≤ V(ˆ).
r r
logo
Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 12 / 23
59. An Universally Improved IS
Estimand: r = c1 /c2 ; cj = R d qj (x)µ(dx)
−1
Data: {Xi2 , i = 1, . . . n2 } i.i.d ∼ c2 q2 µ(dx)
Taking G = {Id , −Id } leads to
n2
1 q1 (Xi2 ) + q1 (−Xi2 )
ˆG =
r .
n2 q2 (Xi2 ) + q2 (−Xi2 )
i=1
Because of the Rao-Blackwellization, V(ˆG ) ≤ V(ˆ).
r r
Need twice as many evaluations, but typically this is a small insurance
premium.
logo
Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 12 / 23
60. An Universally Improved IS
Estimand: r = c1 /c2 ; cj = R d qj (x)µ(dx)
−1
Data: {Xi2 , i = 1, . . . n2 } i.i.d ∼ c2 q2 µ(dx)
Taking G = {Id , −Id } leads to
n2
1 q1 (Xi2 ) + q1 (−Xi2 )
ˆG =
r .
n2 q2 (Xi2 ) + q2 (−Xi2 )
i=1
Because of the Rao-Blackwellization, V(ˆG ) ≤ V(ˆ).
r r
Need twice as many evaluations, but typically this is a small insurance
premium.
Consider S1 = R & S2 = R + . Then ˆG is consistent for r :
r
n2 n2
1 q1 (Xi2 ) 1 q1 (−Xi2 )
ˆG =
r + .
n2 q2 (Xi2 ) n2 q2 (Xi2 )
i=1 i=1
logo
Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 12 / 23
61. An Universally Improved IS
Estimand: r = c1 /c2 ; cj = R d qj (x)µ(dx)
−1
Data: {Xi2 , i = 1, . . . n2 } i.i.d ∼ c2 q2 µ(dx)
Taking G = {Id , −Id } leads to
n2
1 q1 (Xi2 ) + q1 (−Xi2 )
ˆG =
r .
n2 q2 (Xi2 ) + q2 (−Xi2 )
i=1
Because of the Rao-Blackwellization, V(ˆG ) ≤ V(ˆ).
r r
Need twice as many evaluations, but typically this is a small insurance
premium.
Consider S1 = R & S2 = R + . Then ˆG is consistent for r :
r
n2 n2
1 q1 (Xi2 ) 1 q1 (−Xi2 )
ˆG =
r + .
n2 q2 (Xi2 ) n2 q2 (Xi2 )
i=1 i=1
logo
∞
But standard IS ˆ only estimates
r 0 q1 (x)µ(dx)/c2 .
Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 12 / 23
62. There are many more improvements ...
Define a sub-model by requiring µ to be G-invariant, where G is a
finite group on Γ.
logo
Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 13 / 23
63. There are many more improvements ...
Define a sub-model by requiring µ to be G-invariant, where G is a
finite group on Γ.
The new MLE of µ is
ˆ
nP G (dx)
µG (dx) =
ˆ J
,
ˆ−1 G
j=1 nj cj q j (x)
ˆ ˆ
where P G (A) = aveg ∈G P(gA); q j G (x) = aveg ∈G qj (gx).
logo
Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 13 / 23
64. There are many more improvements ...
Define a sub-model by requiring µ to be G-invariant, where G is a
finite group on Γ.
The new MLE of µ is
ˆ
nP G (dx)
µG (dx) =
ˆ J
,
ˆ−1 G
j=1 nj cj q j (x)
where P G (A) = aveg ∈G P(gA); q j G (x) = aveg ∈G qj (gx).
ˆ ˆ
When the draws are i.i.d. within each ps dµ,
µG = E [ˆ| GX ],
ˆ µ
i.e., the Rao-Blackwellization of µ given the orbit.
ˆ
logo
Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 13 / 23
65. There are many more improvements ...
Define a sub-model by requiring µ to be G-invariant, where G is a
finite group on Γ.
The new MLE of µ is
ˆ
nP G (dx)
µG (dx) =
ˆ J
,
ˆ−1 G
j=1 nj cj q j (x)
where P G (A) = aveg ∈G P(gA); q j G (x) = aveg ∈G qj (gx).
ˆ ˆ
When the draws are i.i.d. within each ps dµ,
µG = E [ˆ| GX ],
ˆ µ
i.e., the Rao-Blackwellization of µ given the orbit.
ˆ
Consequently,
cj G =
ˆ qj (x)µG (dx) = E [ˆj |GX ].
c logo
Γ
Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 13 / 23
66. Using Groups to model trade-off
logo
Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 14 / 23
67. Using Groups to model trade-off
If G1 G2 , then
G1 G2
Var c ≤ Var c .
logo
Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 14 / 23
68. Using Groups to model trade-off
If G1 G2 , then
G1 G2
Var c ≤ Var c .
The statistical efficiency increases with the size of Gi , but so does the
computational cost needed for function evaluation (but not for
sampling, because there are no additional samples involved).
logo
Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 14 / 23
69. Linear submodel: stratified sampling (Tan 2004)
logo
Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 15 / 23
70. Linear submodel: stratified sampling (Tan 2004)
i.i.d
Data: {Xij , 1 ≤ i ≤ nj } ∼ pj (x)µ(dx), 1 ≤ j ≤ J.
logo
Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 15 / 23
71. Linear submodel: stratified sampling (Tan 2004)
i.i.d
Data: {Xij , 1 ≤ i ≤ nj } ∼ pj (x)µ(dx), 1 ≤ j ≤ J.
The sub-model has parameter space
µ: pj (x) µ(dx), 1 ≤ j ≤ J, are equal (to 1).
Γ
logo
Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 15 / 23
72. Linear submodel: stratified sampling (Tan 2004)
i.i.d
Data: {Xij , 1 ≤ i ≤ nj } ∼ pj (x)µ(dx), 1 ≤ j ≤ J.
The sub-model has parameter space
µ: pj (x) µ(dx), 1 ≤ j ≤ J, are equal (to 1).
Γ
J nj
Likelihood for µ: L(µ) = j=1 i=1 pj (Xij )µ(Xij )
logo
Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 15 / 23
73. Linear submodel: stratified sampling (Tan 2004)
i.i.d
Data: {Xij , 1 ≤ i ≤ nj } ∼ pj (x)µ(dx), 1 ≤ j ≤ J.
The sub-model has parameter space
µ: pj (x) µ(dx), 1 ≤ j ≤ J, are equal (to 1).
Γ
J nj
Likelihood for µ: L(µ) = j=1 i=1 pj (Xij )µ(Xij )
The MLE is
ˆ
P(dx)
µlin (dx) =
ˆ J
,
j=1 πj pj (x)
ˆ
where πj s are MLEs from a mixture model:
ˆ
i.i.d J
the data ∼ j=1 πj pj (·) with πj s unknown
logo
Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 15 / 23
74. So why MLE?
logo
Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 16 / 23
75. So why MLE?
Goal: to estimate c = Γ q(x)µ(dx).
logo
Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 16 / 23
76. So why MLE?
Goal: to estimate c = Γ q(x)µ(dx).
For an arbitrary vector b, consider the control-variate estimator
(Owen and Zhou 2000)
J nj
q(xji ) − b g (xji )
cb ≡
ˆ J
,
j=1 i=1 s=1 ns ps (xji )
where g = (p2 − p1 , . . . , pJ − p1 ) .
logo
Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 16 / 23
77. So why MLE?
Goal: to estimate c = Γ q(x)µ(dx).
For an arbitrary vector b, consider the control-variate estimator
(Owen and Zhou 2000)
J nj
q(xji ) − b g (xji )
cb ≡
ˆ J
,
j=1 i=1 s=1 ns ps (xji )
where g = (p2 − p1 , . . . , pJ − p1 ) .
A more general class: for J λj (x) ≡ 1 and J λj (x)bj (x) ≡ b,
j=1 j=1
consider (Veach and Guibas 1995 for bj ≡ 0; Tan, 2004)
J nj
1 q(xji ) − bj (xji )g (xji )
cλ,B =
ˆ λj (xji )
nj pj (xji )
j=1 i=1
logo
Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 16 / 23
78. So why MLE?
Goal: to estimate c = Γ q(x)µ(dx).
For an arbitrary vector b, consider the control-variate estimator
(Owen and Zhou 2000)
J nj
q(xji ) − b g (xji )
cb ≡
ˆ J
,
j=1 i=1 s=1 ns ps (xji )
where g = (p2 − p1 , . . . , pJ − p1 ) .
A more general class: for J λj (x) ≡ 1 and J λj (x)bj (x) ≡ b,
j=1 j=1
consider (Veach and Guibas 1995 for bj ≡ 0; Tan, 2004)
J nj
1 q(xji ) − bj (xji )g (xji )
cλ,B =
ˆ λj (xji )
nj pj (xji )
j=1 i=1
Should cλ,B be more efficient than cb ? Could there be something
ˆ ˆ logo
even more efficient?
Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 16 / 23
79. Three estimators for c = Γ q(x) µ(dx):
logo
Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 17 / 23
80. Three estimators for c = Γ q(x) µ(dx):
IS: 1
n
q(xi )
J
,
n j=1 πj pj (xi )
i=1
where πj = nj /n are the true proportions.
logo
Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 17 / 23
81. Three estimators for c = Γ q(x) µ(dx):
IS: 1
n
q(xi )
J
,
n j=1 πj pj (xi )
i=1
where πj = nj /n are the true proportions.
Reg: n ˆ
1 q(xi ) − β g (xi )
J
,
n j=1 πj pj (xi )
i=1
ˆ
where β is the estimated regression coefficient, ignoring stratification.
logo
Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 17 / 23
82. Three estimators for c = Γ q(x) µ(dx):
IS: 1
n
q(xi )
J
,
n j=1 πj pj (xi )
i=1
where πj = nj /n are the true proportions.
Reg: n ˆ
1 q(xi ) − β g (xi )
J
,
n j=1 πj pj (xi )
i=1
ˆ
where β is the estimated regression coefficient, ignoring stratification.
Lik: 1
n
q(xi )
J
,
n j=1 πj pj (xi )
ˆ
i=1
where πj s are the estimated proportions, ignoring stratification.
ˆ
logo
Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 17 / 23
83. Three estimators for c = Γ q(x) µ(dx):
IS: 1
n
q(xi )
J
,
n j=1 πj pj (xi )
i=1
where πj = nj /n are the true proportions.
Reg: n ˆ
1 q(xi ) − β g (xi )
J
,
n j=1 πj pj (xi )
i=1
ˆ
where β is the estimated regression coefficient, ignoring stratification.
Lik: 1
n
q(xi )
J
,
n j=1 πj pj (xi )
ˆ
i=1
where πj s are the estimated proportions, ignoring stratification.
ˆ
logo
Which one is most efficient? Least efficient?
Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 17 / 23
84. Let’s find it out ...
logo
Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 18 / 23
85. Let’s find it out ...
Γ = R10 and µ is Lebesgue measure.
logo
Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 18 / 23
86. Let’s find it out ...
Γ = R10 and µ is Lebesgue measure.
The integrand is
10 10
q(x) = 0.8 φ(x j ) + 0.2 ψ(x j ; 4) ,
j=1 j=1
where φ(·) is standard normal density and ψ(·; 4) is t4 density.
logo
Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 18 / 23
87. Let’s find it out ...
Γ = R10 and µ is Lebesgue measure.
The integrand is
10 10
q(x) = 0.8 φ(x j ) + 0.2 ψ(x j ; 4) ,
j=1 j=1
where φ(·) is standard normal density and ψ(·; 4) is t4 density.
Two sampling designs:
logo
Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 18 / 23
88. Let’s find it out ...
Γ = R10 and µ is Lebesgue measure.
The integrand is
10 10
q(x) = 0.8 φ(x j ) + 0.2 ψ(x j ; 4) ,
j=1 j=1
where φ(·) is standard normal density and ψ(·; 4) is t4 density.
Two sampling designs:
(i) q2 (x) with n draws, or
logo
Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 18 / 23
89. Let’s find it out ...
Γ = R10 and µ is Lebesgue measure.
The integrand is
10 10
q(x) = 0.8 φ(x j ) + 0.2 ψ(x j ; 4) ,
j=1 j=1
where φ(·) is standard normal density and ψ(·; 4) is t4 density.
Two sampling designs:
(i) q2 (x) with n draws, or
(ii) q1 (x) and q2 (x) each with n/2 draws,
where
10 10
q1 (x) = φ(x j ), q2 (x) = ψ(x j ; 1)
j=1 j=1 logo
Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 18 / 23
90. A little surprise?
Table: Comparison of design and estimator
one sampler two samplers
IS Reg Lik IS Reg Lik
Sqrt MSE .162 .00942 .00931 .0175 .00881 .00881
Std Err .162
.00919 .00920 .0174 .00885 .00884
√
Note: Sqrt MSE is mean squared error of the point estimates and
√
Std Err is mean of the variance estimates from 10000 repeated
simulations of size n = 500.
logo
Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 19 / 23
91. Comparison of efficiency:
logo
Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 20 / 23
92. Comparison of efficiency:
Statistical efficiency: IS < Reg ≈ Lik
logo
Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 20 / 23
93. Comparison of efficiency:
Statistical efficiency: IS < Reg ≈ Lik
IS is a stratified estimator, which uses only the labels.
logo
Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 20 / 23
94. Comparison of efficiency:
Statistical efficiency: IS < Reg ≈ Lik
IS is a stratified estimator, which uses only the labels.
Reg is conventional method of control variates.
logo
Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 20 / 23
95. Comparison of efficiency:
Statistical efficiency: IS < Reg ≈ Lik
IS is a stratified estimator, which uses only the labels.
Reg is conventional method of control variates.
Lik is constrained MLE, which uses pj s but ignores the labels;
it is exact if q = pj for any particular j.
logo
Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 20 / 23
96. Building intuition ...
logo
Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 21 / 23
97. Building intuition ...
Suppose we make n = 2 draws, one from N(0, 1) and one from
Cauchy (0, 1), hence π1 = π2 = 50%.
logo
Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 21 / 23
98. Building intuition ...
Suppose we make n = 2 draws, one from N(0, 1) and one from
Cauchy (0, 1), hence π1 = π2 = 50%.
Suppose the draws are {1, 1}, what would be the MLE (ˆ1 , π2 )?
π ˆ
logo
Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 21 / 23
99. Building intuition ...
Suppose we make n = 2 draws, one from N(0, 1) and one from
Cauchy (0, 1), hence π1 = π2 = 50%.
Suppose the draws are {1, 1}, what would be the MLE (ˆ1 , π2 )?
π ˆ
Suppose the draws are {1, 3}, what would be the MLE (ˆ1 , π2 )?
π ˆ
logo
Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 21 / 23
100. Building intuition ...
Suppose we make n = 2 draws, one from N(0, 1) and one from
Cauchy (0, 1), hence π1 = π2 = 50%.
Suppose the draws are {1, 1}, what would be the MLE (ˆ1 , π2 )?
π ˆ
Suppose the draws are {1, 3}, what would be the MLE (ˆ1 , π2 )?
π ˆ
Suppose the draws are {3, 3}, what would be the MLE (ˆ1 , π2 )?
π ˆ
logo
Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 21 / 23
101. What Did I Learn?
logo
Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 22 / 23
102. What Did I Learn?
Model what we ignore, not what we know!
logo
Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 22 / 23
103. What Did I Learn?
Model what we ignore, not what we know!
Model comparison/selection is not about which model is true (as all
of them are “true”), but which model represents a better compromise
among human, computational, and statistical efficiency.
logo
Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 22 / 23
104. What Did I Learn?
Model what we ignore, not what we know!
Model comparison/selection is not about which model is true (as all
of them are “true”), but which model represents a better compromise
among human, computational, and statistical efficiency.
There is a cure for our “schizophrenia” — we now can analyze Monte
Carlo data using the same sound statistical principles and methods for
analyzing real data.
logo
Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 22 / 23
105. If you are looking for theoretical research topics ...
logo
Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 23 / 23
106. If you are looking for theoretical research topics ...
RE-EXAM OLD ONES AND DERIVE NEW ONES!
logo
Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 23 / 23
107. If you are looking for theoretical research topics ...
RE-EXAM OLD ONES AND DERIVE NEW ONES!
Prove it is MLE, or a good approximation to MLE.
logo
Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 23 / 23
108. If you are looking for theoretical research topics ...
RE-EXAM OLD ONES AND DERIVE NEW ONES!
Prove it is MLE, or a good approximation to MLE.
Or derive MLE or a cost-effective approximation to it.
logo
Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 23 / 23
109. If you are looking for theoretical research topics ...
RE-EXAM OLD ONES AND DERIVE NEW ONES!
Prove it is MLE, or a good approximation to MLE.
Or derive MLE or a cost-effective approximation to it.
Markov chain Monte Carlo (Tan 2006, 2008)
logo
Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 23 / 23
110. If you are looking for theoretical research topics ...
RE-EXAM OLD ONES AND DERIVE NEW ONES!
Prove it is MLE, or a good approximation to MLE.
Or derive MLE or a cost-effective approximation to it.
Markov chain Monte Carlo (Tan 2006, 2008)
More ......
logo
Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 23 / 23