SlideShare a Scribd company logo
1 of 110
Download to read offline
Let’s Practice What We Preach:
          Likelihood Methods for Monte Carlo Data

                                      Xiao-Li Meng

                         Department of Statistics, Harvard University


                                  September 24, 2011




                                                                                             logo



Xiao-Li Meng (Harvard)                  MCMC+likelihood                 September 24, 2011   1 / 23
Let’s Practice What We Preach:
             Likelihood Methods for Monte Carlo Data

                                         Xiao-Li Meng

                            Department of Statistics, Harvard University


                                     September 24, 2011


Based on
Kong, McCullagh, Meng, Nicolae, and Tan (2003, JRSS-B, with
discussions);
Kong, McCullagh, Meng, and Nicolae (2006, Doksum Festschrift);
Tan (2004, JASA); ..., Meng and Tan (201X)
                                                                                                logo



   Xiao-Li Meng (Harvard)                  MCMC+likelihood                 September 24, 2011   1 / 23
Importance sampling (IS)




                                                                   logo



   Xiao-Li Meng (Harvard)   MCMC+likelihood   September 24, 2011   2 / 23
Importance sampling (IS)

    Estimand:
                                                          q1 (x)
                        c1 =       q1 (x)µ(dx) =                 p2 (x)µ(dx).
                               Γ                     Γ    p2 (x)




                                                                                             logo



   Xiao-Li Meng (Harvard)               MCMC+likelihood                 September 24, 2011   2 / 23
Importance sampling (IS)

    Estimand:
                                                          q1 (x)
                        c1 =       q1 (x)µ(dx) =                 p2 (x)µ(dx).
                               Γ                     Γ    p2 (x)

    Data: {Xi2 , i = 1, . . . n2 } ∼ p2 = q2 /c2




                                                                                             logo



   Xiao-Li Meng (Harvard)               MCMC+likelihood                 September 24, 2011   2 / 23
Importance sampling (IS)

    Estimand:
                                                           q1 (x)
                        c1 =       q1 (x)µ(dx) =                  p2 (x)µ(dx).
                               Γ                      Γ    p2 (x)

    Data: {Xi2 , i = 1, . . . n2 } ∼ p2 = q2 /c2
    Estimating Equation (EE):

                                          c1      q1 (X )
                                    r≡       = E2         .
                                          c2      q2 (X )




                                                                                              logo



   Xiao-Li Meng (Harvard)                MCMC+likelihood                 September 24, 2011   2 / 23
Importance sampling (IS)

    Estimand:
                                                           q1 (x)
                        c1 =       q1 (x)µ(dx) =                  p2 (x)µ(dx).
                               Γ                       Γ   p2 (x)

    Data: {Xi2 , i = 1, . . . n2 } ∼ p2 = q2 /c2
    Estimating Equation (EE):

                                          c1      q1 (X )
                                    r≡       = E2         .
                                          c2      q2 (X )

    The EE estimator:
                                                  n2
                                            1          q1 (Xi2 )
                                      ˆ=
                                      r
                                            n2         q2 (Xi2 )
                                                 i=1

                                                                                              logo



   Xiao-Li Meng (Harvard)                MCMC+likelihood                 September 24, 2011   2 / 23
Importance sampling (IS)

    Estimand:
                                                           q1 (x)
                        c1 =       q1 (x)µ(dx) =                  p2 (x)µ(dx).
                               Γ                       Γ   p2 (x)

    Data: {Xi2 , i = 1, . . . n2 } ∼ p2 = q2 /c2
    Estimating Equation (EE):

                                          c1      q1 (X )
                                    r≡       = E2         .
                                          c2      q2 (X )

    The EE estimator:
                                                  n2
                                            1          q1 (Xi2 )
                                      ˆ=
                                      r
                                            n2         q2 (Xi2 )
                                                 i=1

    Standard IS estimator for c1 when c2 = 1.                                                 logo



   Xiao-Li Meng (Harvard)                MCMC+likelihood                 September 24, 2011   2 / 23
What about MLE?




                                                                  logo



  Xiao-Li Meng (Harvard)   MCMC+likelihood   September 24, 2011   3 / 23
What about MLE?



   The “likelihood” is:
                                    n2
           f (X12 . . . Xn2 2 ) =         p2 (Xi2 )    — free of the estimand c1 !
                                    i=1




                                                                                           logo



  Xiao-Li Meng (Harvard)                  MCMC+likelihood             September 24, 2011   3 / 23
What about MLE?



   The “likelihood” is:
                                    n2
           f (X12 . . . Xn2 2 ) =         p2 (Xi2 )    — free of the estimand c1 !
                                    i=1

   So why are {Xi2 , i = 1, . . . n2 } even relevant?
   Violation of likelihood principle?




                                                                                           logo



  Xiao-Li Meng (Harvard)                  MCMC+likelihood             September 24, 2011   3 / 23
What about MLE?



   The “likelihood” is:
                                    n2
           f (X12 . . . Xn2 2 ) =         p2 (Xi2 )    — free of the estimand c1 !
                                    i=1

   So why are {Xi2 , i = 1, . . . n2 } even relevant?
   Violation of likelihood principle?
   What are we “inferring”?
   What is the “unknown” model parameter?



                                                                                           logo



  Xiao-Li Meng (Harvard)                  MCMC+likelihood             September 24, 2011   3 / 23
Bridge sampling (BS)




                                                                   logo



   Xiao-Li Meng (Harvard)   MCMC+likelihood   September 24, 2011   4 / 23
Bridge sampling (BS)


    Data: {Xij , i = 1, . . . , nj } ∼ pj = qj /cj , j = 1, 2




                                                                                     logo



   Xiao-Li Meng (Harvard)           MCMC+likelihood             September 24, 2011   4 / 23
Bridge sampling (BS)


    Data: {Xij , i = 1, . . . , nj } ∼ pj = qj /cj , j = 1, 2
    Estimating Equation (Meng and Wong, 1996):

                   c1   E2 [α(X )q1 (X )]
            r≡        =                   ,    ∀α: 0<|          αq1 q2 dµ| < ∞
                   c2   E1 [α(X )q2 (X )]




                                                                                       logo



   Xiao-Li Meng (Harvard)            MCMC+likelihood              September 24, 2011   4 / 23
Bridge sampling (BS)


    Data: {Xij , i = 1, . . . , nj } ∼ pj = qj /cj , j = 1, 2
    Estimating Equation (Meng and Wong, 1996):

                   c1   E2 [α(X )q1 (X )]
            r≡        =                   ,    ∀α: 0<|          αq1 q2 dµ| < ∞
                   c2   E1 [α(X )q2 (X )]

    Optimal choice: αO (x) ∝ [n1 q1 (x) + n2 rq2 (x)]−1




                                                                                       logo



   Xiao-Li Meng (Harvard)            MCMC+likelihood              September 24, 2011   4 / 23
Bridge sampling (BS)


    Data: {Xij , i = 1, . . . , nj } ∼ pj = qj /cj , j = 1, 2
    Estimating Equation (Meng and Wong, 1996):

                   c1   E2 [α(X )q1 (X )]
            r≡        =                   ,              ∀α: 0<|               αq1 q2 dµ| < ∞
                   c2   E1 [α(X )q2 (X )]

    Optimal choice: αO (x) ∝ [n1 q1 (x) + n2 rq2 (x)]−1
    Optimal estimator ˆO , the limit of
                      r
                                                n2
                                          1                      q1 (Xi2 )
                                          n2                            (t)
                                                        s1 q1 (Xi2 )+s2 ˆO q2 (Xi2 )
                                                                        r
                              (t+1)            i=1
                            ˆO
                            r         =         n1
                                          1                      q2 (Xi1 )
                                          n1                            (t)
                                                        s1 q1 (Xi1 )+s2 ˆO q2 (Xi1 )
                                                                        r
                                               i=1
                                                                                                            logo



   Xiao-Li Meng (Harvard)                      MCMC+likelihood                         September 24, 2011   4 / 23
What about MLE?




                                                                  logo



  Xiao-Li Meng (Harvard)   MCMC+likelihood   September 24, 2011   5 / 23
What about MLE?


   The “likelihood” is:
                           2   nj
                                    qj (Xij )    −n −n
                                              ∝ c1 1 c2 2    — free of data!
                                       cj
                       j=1 i=1




                                                                                            logo



  Xiao-Li Meng (Harvard)                   MCMC+likelihood             September 24, 2011   5 / 23
What about MLE?


   The “likelihood” is:
                           2   nj
                                    qj (Xij )    −n −n
                                              ∝ c1 1 c2 2    — free of data!
                                       cj
                       j=1 i=1

   What went wrong: cj is not “free parameter” because
   cj = Γ qj (x)µ(dx) and qj is known.




                                                                                            logo



  Xiao-Li Meng (Harvard)                   MCMC+likelihood             September 24, 2011   5 / 23
What about MLE?


   The “likelihood” is:
                           2   nj
                                    qj (Xij )    −n −n
                                              ∝ c1 1 c2 2    — free of data!
                                       cj
                       j=1 i=1

   What went wrong: cj is not “free parameter” because
   cj = Γ qj (x)µ(dx) and qj is known.
   So what is the “unknown” model parameter?




                                                                                            logo



  Xiao-Li Meng (Harvard)                   MCMC+likelihood             September 24, 2011   5 / 23
What about MLE?


   The “likelihood” is:
                           2   nj
                                    qj (Xij )    −n −n
                                              ∝ c1 1 c2 2    — free of data!
                                       cj
                       j=1 i=1

   What went wrong: cj is not “free parameter” because
   cj = Γ qj (x)µ(dx) and qj is known.
   So what is the “unknown” model parameter?
   Turns out ˆO is the same as Bennett’s (1976) optimal acceptance
              r
   ratio estimator, as well as Geyer’s (1994) reversed logistic regression
   estimator.


                                                                                            logo



  Xiao-Li Meng (Harvard)                   MCMC+likelihood             September 24, 2011   5 / 23
What about MLE?


   The “likelihood” is:
                           2   nj
                                    qj (Xij )    −n −n
                                              ∝ c1 1 c2 2    — free of data!
                                       cj
                       j=1 i=1

   What went wrong: cj is not “free parameter” because
   cj = Γ qj (x)µ(dx) and qj is known.
   So what is the “unknown” model parameter?
   Turns out ˆO is the same as Bennett’s (1976) optimal acceptance
              r
   ratio estimator, as well as Geyer’s (1994) reversed logistic regression
   estimator.
   So why is that? Can it be improved upon without any “sleight of
   hand”?
                                                                                            logo



  Xiao-Li Meng (Harvard)                   MCMC+likelihood             September 24, 2011   5 / 23
Pretending the measure is unknown!




                                                                   logo



   Xiao-Li Meng (Harvard)   MCMC+likelihood   September 24, 2011   6 / 23
Pretending the measure is unknown!


    Because
                            c=          q(x)µ(dx),
                                    Γ
    and q is known in the sense that we can evaluate it at any sample
    value, the only way to make c “unknown” is to assume the underlying
    measure µ is “unknown”.




                                                                           logo



   Xiao-Li Meng (Harvard)     MCMC+likelihood         September 24, 2011   6 / 23
Pretending the measure is unknown!


    Because
                            c=          q(x)µ(dx),
                                    Γ
    and q is known in the sense that we can evaluate it at any sample
    value, the only way to make c “unknown” is to assume the underlying
    measure µ is “unknown”.
    This is natural because Monte Carlo simulation means we use samples
    to represent, and thus estimate/infer, the underlying population
    q(x)µ(dx), and hence estimate/infer µ since q is known.




                                                                           logo



   Xiao-Li Meng (Harvard)     MCMC+likelihood         September 24, 2011   6 / 23
Pretending the measure is unknown!


    Because
                             c=          q(x)µ(dx),
                                     Γ
    and q is known in the sense that we can evaluate it at any sample
    value, the only way to make c “unknown” is to assume the underlying
    measure µ is “unknown”.
    This is natural because Monte Carlo simulation means we use samples
    to represent, and thus estimate/infer, the underlying population
    q(x)µ(dx), and hence estimate/infer µ since q is known.
    Monte Carlo integration is about finding a tractable discrete µ to
                                                                 ˆ
    approximate the intractable µ.

                                                                             logo



   Xiao-Li Meng (Harvard)      MCMC+likelihood          September 24, 2011   6 / 23
Importance Sampling Likelihood




                                                                   logo



   Xiao-Li Meng (Harvard)   MCMC+likelihood   September 24, 2011   7 / 23
Importance Sampling Likelihood


    Estimand: c1 =          Γ q1 (x)µ(dx)




                                                                            logo



   Xiao-Li Meng (Harvard)            MCMC+likelihood   September 24, 2011   7 / 23
Importance Sampling Likelihood


    Estimand: c1 =          Γ q1 (x)µ(dx)
                                             −1
    Data: {Xi2 , i = 1, . . . n2 } ∼ i.i.d. c2 q2 (x)µ(dx)




                                                                                  logo



   Xiao-Li Meng (Harvard)            MCMC+likelihood         September 24, 2011   7 / 23
Importance Sampling Likelihood


    Estimand: c1 =          Γ q1 (x)µ(dx)
                                             −1
    Data: {Xi2 , i = 1, . . . n2 } ∼ i.i.d. c2 q2 (x)µ(dx)
    Likelihood for µ:
                                        n2
                                               −1
                               L(µ) =         c2 q2 (Xi2 )µ(Xi2 )
                                        i=1

    Note that c2 is a functional of µ.




                                                                                         logo



   Xiao-Li Meng (Harvard)            MCMC+likelihood                September 24, 2011   7 / 23
Importance Sampling Likelihood


    Estimand: c1 =             Γ q1 (x)µ(dx)
                                             −1
    Data: {Xi2 , i = 1, . . . n2 } ∼ i.i.d. c2 q2 (x)µ(dx)
    Likelihood for µ:
                                            n2
                                                   −1
                                  L(µ) =          c2 q2 (Xi2 )µ(Xi2 )
                                            i=1

    Note that c2 is a functional of µ.
    The nonparametric MLE of µ is
                                      ˆ
                                      P(dx)
                            µ(dx) =
                            ˆ                ,     ˆ
                                                   P — empirical measure
                                      q2 (x)
                                                                                             logo



   Xiao-Li Meng (Harvard)                 MCMC+likelihood               September 24, 2011   7 / 23
Importance Sampling Likelihood




                                                                   logo



   Xiao-Li Meng (Harvard)   MCMC+likelihood   September 24, 2011   8 / 23
Importance Sampling Likelihood



    Thus the MLE for r ≡ c1 /c2 is
                                                           n2
                                                      1          q1 (Xi2 )
                            ˆ=
                            r    q1 (x)ˆ(dx) =
                                       µ
                                                      n2         q2 (Xi2 )
                                                           i=1




                                                                                              logo



   Xiao-Li Meng (Harvard)           MCMC+likelihood                      September 24, 2011   8 / 23
Importance Sampling Likelihood



    Thus the MLE for r ≡ c1 /c2 is
                                                           n2
                                                      1          q1 (Xi2 )
                            ˆ=
                            r    q1 (x)ˆ(dx) =
                                       µ
                                                      n2         q2 (Xi2 )
                                                           i=1

    When c2 = 1, q2 = p2 , standard IS estimator for c1 is obtained.




                                                                                              logo



   Xiao-Li Meng (Harvard)           MCMC+likelihood                      September 24, 2011   8 / 23
Importance Sampling Likelihood



    Thus the MLE for r ≡ c1 /c2 is
                                                           n2
                                                      1          q1 (Xi2 )
                            ˆ=
                            r    q1 (x)ˆ(dx) =
                                       µ
                                                      n2         q2 (Xi2 )
                                                           i=1

    When c2 = 1, q2 = p2 , standard IS estimator for c1 is obtained.
    {X(i2) , i = 1, . . . n2 } is (minimum) sufficient for µ on
    x ∈ S2 = {x : q2 (x) > 0}, and hence c1 is guaranteed to be
                                              ˆ
    consistent only when S1 ⊂ S2 .



                                                                                              logo



   Xiao-Li Meng (Harvard)           MCMC+likelihood                      September 24, 2011   8 / 23
Bridge Sampling Likelihood




                                                                   logo



   Xiao-Li Meng (Harvard)   MCMC+likelihood   September 24, 2011   9 / 23
Bridge Sampling Likelihood



    Estimand: ∝ cj =        Γ qj (x)µ(x), j   = 1, . . . , J.




                                                                                     logo



   Xiao-Li Meng (Harvard)           MCMC+likelihood             September 24, 2011   9 / 23
Bridge Sampling Likelihood



    Estimand: ∝ cj =         Γ qj (x)µ(x), j = 1, . . . , J.
    Data: {Xij , 1 ≤ i      ≤ nj } ∼ cj−1 qj (x)µ(dx), 1       ≤j ≤J




                                                                                        logo



   Xiao-Li Meng (Harvard)              MCMC+likelihood             September 24, 2011   9 / 23
Bridge Sampling Likelihood



    Estimand: ∝ cj =          Γ qj (x)µ(x), j = 1, . . . , J.
    Data: {Xij , 1 ≤ i      ≤ nj } ∼ cj−1 qj (x)µ(dx), 1 ≤ j ≤ J
                                              nj
    Likelihood for µ:       L(µ) = J   j=1
                                                   −1
                                              i=1 cj qj (Xij )µ(Xij )




                                                                                        logo



   Xiao-Li Meng (Harvard)              MCMC+likelihood             September 24, 2011   9 / 23
Bridge Sampling Likelihood



    Estimand: ∝ cj =             Γ qj (x)µ(x), j = 1, . . . , J.
    Data: {Xij , 1 ≤ i         ≤ nj } ∼ cj−1 qj (x)µ(dx), 1 ≤ j ≤ J
                                                 nj
    Likelihood for µ:          L(µ) = J   j=1
                                                      −1
                                                 i=1 cj qj (Xij )µ(Xij )
    Writing θ(x) = log µ(x), then
                                                             J
                            log L(µ) = n              ˆ
                                                θ(x)d P −          nj log cj (θ),
                                            Γ                j=1

    ˆ
    P is the empirical measure on {Xij , 1 ≤ i ≤ nj , 1 ≤ j ≤ J}.


                                                                                                  logo



   Xiao-Li Meng (Harvard)                  MCMC+likelihood                   September 24, 2011   9 / 23
Bridge Sampling Likelihood




                                                                    logo



   Xiao-Li Meng (Harvard)   MCMC+likelihood   September 24, 2011   10 / 23
Bridge Sampling Likelihood
                                                                  ˆ
    MLE for µ given by equating the canonical sufficient statistics P to
    its expectation:
                                       J
                             ˆ
                            nP(dx) =         nj cj−1 qj (x)ˆ(dx),
                                                ˆ          µ
                                       j=1

                                                ˆ
                                               nP(dx)
                             µ(dx) =
                             ˆ               J
                                                                .                        (A)
                                                    ˆ−1
                                             j=1 nj cj qj (x)




                                                                                          logo



   Xiao-Li Meng (Harvard)         MCMC+likelihood                   September 24, 2011   10 / 23
Bridge Sampling Likelihood
                                                                  ˆ
    MLE for µ given by equating the canonical sufficient statistics P to
    its expectation:
                                               J
                                 ˆ
                                nP(dx) =               nj cj−1 qj (x)ˆ(dx),
                                                          ˆ          µ
                                            j=1

                                                          ˆ
                                                         nP(dx)
                                  µ(dx) =
                                  ˆ                J
                                                                       .                           (A)
                                                          ˆ−1
                                                   j=1 nj cj qj (x)

    Consequently, the MLE for {c1 , . . . , cJ } must satisfy
                                                   J     nj
                                                                    qr (xij )
                     cr =
                     ˆ          qr (x) d µ =
                                         ˆ                      J
                                                                                    .              (B)
                            Γ                  j=1 i=1          s=1    ˆ−1
                                                                    ns cs qs (xij )


                                                                                                    logo



   Xiao-Li Meng (Harvard)               MCMC+likelihood                       September 24, 2011   10 / 23
Bridge Sampling Likelihood
                                                                  ˆ
    MLE for µ given by equating the canonical sufficient statistics P to
    its expectation:
                                               J
                                 ˆ
                                nP(dx) =               nj cj−1 qj (x)ˆ(dx),
                                                          ˆ          µ
                                            j=1

                                                          ˆ
                                                         nP(dx)
                                  µ(dx) =
                                  ˆ                J
                                                                       .                           (A)
                                                          ˆ−1
                                                   j=1 nj cj qj (x)

    Consequently, the MLE for {c1 , . . . , cJ } must satisfy
                                                   J     nj
                                                                    qr (xij )
                     cr =
                     ˆ          qr (x) d µ =
                                         ˆ                      J
                                                                                    .              (B)
                            Γ                  j=1 i=1          s=1    ˆ−1
                                                                    ns cs qs (xij )


    (B) is the “dual” equation of (A), and is also the same as the    logo
    equation for optimal multiple bridge sampling estimator (Tan 2004).
   Xiao-Li Meng (Harvard)               MCMC+likelihood                       September 24, 2011   10 / 23
But We Can Ignore Less ...




                                                                    logo



   Xiao-Li Meng (Harvard)   MCMC+likelihood   September 24, 2011   11 / 23
But We Can Ignore Less ...




    To restrict the parameter space for µ by using some knowledge of the
    known µ, that it, to set up a sub-model.




                                                                            logo



   Xiao-Li Meng (Harvard)     MCMC+likelihood         September 24, 2011   11 / 23
But We Can Ignore Less ...




    To restrict the parameter space for µ by using some knowledge of the
    known µ, that it, to set up a sub-model.
    The new MLE has a smaller asymptotic variance under the submodel
    than under the full model.




                                                                            logo



   Xiao-Li Meng (Harvard)     MCMC+likelihood         September 24, 2011   11 / 23
But We Can Ignore Less ...




    To restrict the parameter space for µ by using some knowledge of the
    known µ, that it, to set up a sub-model.
    The new MLE has a smaller asymptotic variance under the submodel
    than under the full model.
    Examples:




                                                                            logo



   Xiao-Li Meng (Harvard)     MCMC+likelihood         September 24, 2011   11 / 23
But We Can Ignore Less ...




    To restrict the parameter space for µ by using some knowledge of the
    known µ, that it, to set up a sub-model.
    The new MLE has a smaller asymptotic variance under the submodel
    than under the full model.
    Examples:
            Group-invariance submodel




                                                                            logo



   Xiao-Li Meng (Harvard)         MCMC+likelihood     September 24, 2011   11 / 23
But We Can Ignore Less ...




    To restrict the parameter space for µ by using some knowledge of the
    known µ, that it, to set up a sub-model.
    The new MLE has a smaller asymptotic variance under the submodel
    than under the full model.
    Examples:
            Group-invariance submodel
            Linear submodel




                                                                            logo



   Xiao-Li Meng (Harvard)         MCMC+likelihood     September 24, 2011   11 / 23
But We Can Ignore Less ...




    To restrict the parameter space for µ by using some knowledge of the
    known µ, that it, to set up a sub-model.
    The new MLE has a smaller asymptotic variance under the submodel
    than under the full model.
    Examples:
            Group-invariance submodel
            Linear submodel
            Log-linear submodel




                                                                            logo



   Xiao-Li Meng (Harvard)         MCMC+likelihood     September 24, 2011   11 / 23
An Universally Improved IS




                                                                     logo



   Xiao-Li Meng (Harvard)    MCMC+likelihood   September 24, 2011   12 / 23
An Universally Improved IS

    Estimand: r = c1 /c2 ; cj =    Rd   qj (x)µ(dx)




                                                                            logo



   Xiao-Li Meng (Harvard)         MCMC+likelihood     September 24, 2011   12 / 23
An Universally Improved IS

    Estimand: r = c1 /c2 ; cj = R d qj (x)µ(dx)
                                            −1
    Data: {Xi2 , i = 1, . . . n2 } i.i.d ∼ c2 q2 µ(dx)




                                                                               logo



   Xiao-Li Meng (Harvard)        MCMC+likelihood         September 24, 2011   12 / 23
An Universally Improved IS

    Estimand: r = c1 /c2 ; cj = R d qj (x)µ(dx)
                                            −1
    Data: {Xi2 , i = 1, . . . n2 } i.i.d ∼ c2 q2 µ(dx)
    Taking G = {Id , −Id } leads to




                                                                               logo



   Xiao-Li Meng (Harvard)        MCMC+likelihood         September 24, 2011   12 / 23
An Universally Improved IS

    Estimand: r = c1 /c2 ; cj = R d qj (x)µ(dx)
                                            −1
    Data: {Xi2 , i = 1, . . . n2 } i.i.d ∼ c2 q2 µ(dx)
    Taking G = {Id , −Id } leads to
                                         n2
                                   1          q1 (Xi2 ) + q1 (−Xi2 )
                            ˆG =
                            r                                        .
                                   n2         q2 (Xi2 ) + q2 (−Xi2 )
                                        i=1




                                                                                               logo



   Xiao-Li Meng (Harvard)               MCMC+likelihood                  September 24, 2011   12 / 23
An Universally Improved IS

    Estimand: r = c1 /c2 ; cj = R d qj (x)µ(dx)
                                            −1
    Data: {Xi2 , i = 1, . . . n2 } i.i.d ∼ c2 q2 µ(dx)
    Taking G = {Id , −Id } leads to
                                         n2
                                   1          q1 (Xi2 ) + q1 (−Xi2 )
                            ˆG =
                            r                                        .
                                   n2         q2 (Xi2 ) + q2 (−Xi2 )
                                        i=1

    Because of the Rao-Blackwellization, V(ˆG ) ≤ V(ˆ).
                                           r        r




                                                                                               logo



   Xiao-Li Meng (Harvard)               MCMC+likelihood                  September 24, 2011   12 / 23
An Universally Improved IS

    Estimand: r = c1 /c2 ; cj = R d qj (x)µ(dx)
                                            −1
    Data: {Xi2 , i = 1, . . . n2 } i.i.d ∼ c2 q2 µ(dx)
    Taking G = {Id , −Id } leads to
                                         n2
                                   1          q1 (Xi2 ) + q1 (−Xi2 )
                            ˆG =
                            r                                        .
                                   n2         q2 (Xi2 ) + q2 (−Xi2 )
                                        i=1

    Because of the Rao-Blackwellization, V(ˆG ) ≤ V(ˆ).
                                           r          r
    Need twice as many evaluations, but typically this is a small insurance
    premium.




                                                                                               logo



   Xiao-Li Meng (Harvard)               MCMC+likelihood                  September 24, 2011   12 / 23
An Universally Improved IS

    Estimand: r = c1 /c2 ; cj = R d qj (x)µ(dx)
                                            −1
    Data: {Xi2 , i = 1, . . . n2 } i.i.d ∼ c2 q2 µ(dx)
    Taking G = {Id , −Id } leads to
                                               n2
                                        1           q1 (Xi2 ) + q1 (−Xi2 )
                               ˆG =
                               r                                           .
                                        n2          q2 (Xi2 ) + q2 (−Xi2 )
                                              i=1

    Because of the Rao-Blackwellization, V(ˆG ) ≤ V(ˆ).
                                           r          r
    Need twice as many evaluations, but typically this is a small insurance
    premium.
    Consider S1 = R & S2 = R + . Then ˆG is consistent for r :
                                        r
                                        n2                      n2
                                   1          q1 (Xi2 )   1           q1 (−Xi2 )
                            ˆG =
                            r                           +                        .
                                   n2         q2 (Xi2 ) n2             q2 (Xi2 )
                                        i=1                     i=1
                                                                                                     logo



   Xiao-Li Meng (Harvard)                     MCMC+likelihood                  September 24, 2011   12 / 23
An Universally Improved IS

    Estimand: r = c1 /c2 ; cj = R d qj (x)µ(dx)
                                            −1
    Data: {Xi2 , i = 1, . . . n2 } i.i.d ∼ c2 q2 µ(dx)
    Taking G = {Id , −Id } leads to
                                               n2
                                        1           q1 (Xi2 ) + q1 (−Xi2 )
                               ˆG =
                               r                                           .
                                        n2          q2 (Xi2 ) + q2 (−Xi2 )
                                              i=1

    Because of the Rao-Blackwellization, V(ˆG ) ≤ V(ˆ).
                                           r          r
    Need twice as many evaluations, but typically this is a small insurance
    premium.
    Consider S1 = R & S2 = R + . Then ˆG is consistent for r :
                                        r
                                        n2                      n2
                                   1          q1 (Xi2 )   1           q1 (−Xi2 )
                            ˆG =
                            r                           +                        .
                                   n2         q2 (Xi2 ) n2             q2 (Xi2 )
                                        i=1                     i=1
                                                                                                     logo
                                                         ∞
    But standard IS ˆ only estimates
                    r                                   0 q1 (x)µ(dx)/c2 .
   Xiao-Li Meng (Harvard)                     MCMC+likelihood                  September 24, 2011   12 / 23
There are many more improvements ...

    Define a sub-model by requiring µ to be G-invariant, where G is a
    finite group on Γ.




                                                                            logo



   Xiao-Li Meng (Harvard)     MCMC+likelihood         September 24, 2011   13 / 23
There are many more improvements ...

    Define a sub-model by requiring µ to be G-invariant, where G is a
    finite group on Γ.
    The new MLE of µ is
                                              ˆ
                                             nP G (dx)
                            µG (dx) =
                            ˆ             J
                                                              ,
                                                 ˆ−1 G
                                          j=1 nj cj q j (x)

           ˆ                 ˆ
     where P G (A) = aveg ∈G P(gA);        q j G (x) = aveg ∈G qj (gx).




                                                                                        logo



   Xiao-Li Meng (Harvard)         MCMC+likelihood                 September 24, 2011   13 / 23
There are many more improvements ...

    Define a sub-model by requiring µ to be G-invariant, where G is a
    finite group on Γ.
    The new MLE of µ is
                                              ˆ
                                             nP G (dx)
                            µG (dx) =
                            ˆ             J
                                                              ,
                                                 ˆ−1 G
                                          j=1 nj cj q j (x)

    where P G (A) = aveg ∈G P(gA); q j G (x) = aveg ∈G qj (gx).
          ˆ                 ˆ
    When the draws are i.i.d. within each ps dµ,

                                  µG = E [ˆ| GX ],
                                  ˆ       µ

    i.e., the Rao-Blackwellization of µ given the orbit.
                                      ˆ


                                                                                        logo



   Xiao-Li Meng (Harvard)         MCMC+likelihood                 September 24, 2011   13 / 23
There are many more improvements ...

    Define a sub-model by requiring µ to be G-invariant, where G is a
    finite group on Γ.
    The new MLE of µ is
                                                     ˆ
                                                    nP G (dx)
                              µG (dx) =
                              ˆ                  J
                                                                     ,
                                                        ˆ−1 G
                                                 j=1 nj cj q j (x)

    where P G (A) = aveg ∈G P(gA); q j G (x) = aveg ∈G qj (gx).
          ˆ                 ˆ
    When the draws are i.i.d. within each ps dµ,

                                         µG = E [ˆ| GX ],
                                         ˆ       µ

    i.e., the Rao-Blackwellization of µ given the orbit.
                                      ˆ
    Consequently,

                            cj G =
                            ˆ            qj (x)µG (dx) = E [ˆj |GX ].
                                                            c                                  logo
                                     Γ

   Xiao-Li Meng (Harvard)                MCMC+likelihood                 September 24, 2011   13 / 23
Using Groups to model trade-off




                                                                    logo



   Xiao-Li Meng (Harvard)   MCMC+likelihood   September 24, 2011   14 / 23
Using Groups to model trade-off




    If G1       G2 , then
                                    G1             G2
                            Var c        ≤ Var c        .




                                                                                  logo



   Xiao-Li Meng (Harvard)      MCMC+likelihood              September 24, 2011   14 / 23
Using Groups to model trade-off




    If G1       G2 , then
                                    G1             G2
                            Var c        ≤ Var c        .


    The statistical efficiency increases with the size of Gi , but so does the
    computational cost needed for function evaluation (but not for
    sampling, because there are no additional samples involved).




                                                                                  logo



   Xiao-Li Meng (Harvard)       MCMC+likelihood             September 24, 2011   14 / 23
Linear submodel: stratified sampling (Tan 2004)




                                                                    logo



   Xiao-Li Meng (Harvard)   MCMC+likelihood   September 24, 2011   15 / 23
Linear submodel: stratified sampling (Tan 2004)

                            i.i.d
    Data: {Xij , 1 ≤ i ≤ nj } ∼ pj (x)µ(dx), 1 ≤ j ≤ J.




                                                                                logo



   Xiao-Li Meng (Harvard)           MCMC+likelihood       September 24, 2011   15 / 23
Linear submodel: stratified sampling (Tan 2004)

                                     i.i.d
    Data: {Xij , 1 ≤ i ≤ nj } ∼ pj (x)µ(dx), 1 ≤ j ≤ J.
    The sub-model has parameter space

                      µ:        pj (x) µ(dx), 1 ≤ j ≤ J, are equal (to 1).
                            Γ




                                                                                         logo



   Xiao-Li Meng (Harvard)                    MCMC+likelihood       September 24, 2011   15 / 23
Linear submodel: stratified sampling (Tan 2004)

                                     i.i.d
    Data: {Xij , 1 ≤ i ≤ nj } ∼ pj (x)µ(dx), 1 ≤ j ≤ J.
    The sub-model has parameter space

                      µ:        pj (x) µ(dx), 1 ≤ j ≤ J, are equal (to 1).
                            Γ

                                             J       nj
    Likelihood for µ: L(µ) =                 j=1     i=1 pj (Xij )µ(Xij )




                                                                                                  logo



   Xiao-Li Meng (Harvard)                    MCMC+likelihood                September 24, 2011   15 / 23
Linear submodel: stratified sampling (Tan 2004)

                                         i.i.d
    Data: {Xij , 1 ≤ i ≤ nj } ∼ pj (x)µ(dx), 1 ≤ j ≤ J.
    The sub-model has parameter space

                      µ:        pj (x) µ(dx), 1 ≤ j ≤ J, are equal (to 1).
                            Γ

                                                 J       nj
    Likelihood for µ: L(µ) =                     j=1     i=1 pj (Xij )µ(Xij )
    The MLE is
                                                              ˆ
                                                              P(dx)
                                     µlin (dx) =
                                     ˆ                      J
                                                                            ,
                                                            j=1 πj pj (x)
                                                                ˆ
    where πj s are MLEs from a mixture model:
          ˆ
                                 i.i.d      J
                    the data ∼              j=1 πj pj (·)     with πj s unknown
                                                                                                      logo



   Xiao-Li Meng (Harvard)                        MCMC+likelihood                September 24, 2011   15 / 23
So why MLE?




                                                                   logo



  Xiao-Li Meng (Harvard)   MCMC+likelihood   September 24, 2011   16 / 23
So why MLE?
   Goal: to estimate c =   Γ q(x)µ(dx).




                                                                      logo



  Xiao-Li Meng (Harvard)      MCMC+likelihood   September 24, 2011   16 / 23
So why MLE?
   Goal: to estimate c = Γ q(x)µ(dx).
   For an arbitrary vector b, consider the control-variate estimator
   (Owen and Zhou 2000)
                                   J      nj
                                               q(xji ) − b g (xji )
                           cb ≡
                           ˆ                       J
                                                                      ,
                                  j=1 i=1          s=1 ns ps (xji )

   where g = (p2 − p1 , . . . , pJ − p1 ) .




                                                                                                logo



  Xiao-Li Meng (Harvard)               MCMC+likelihood                    September 24, 2011   16 / 23
So why MLE?
   Goal: to estimate c = Γ q(x)µ(dx).
   For an arbitrary vector b, consider the control-variate estimator
   (Owen and Zhou 2000)
                                          J        nj
                                                        q(xji ) − b g (xji )
                               cb ≡
                               ˆ                            J
                                                                                ,
                                      j=1 i=1               s=1 ns ps (xji )

   where g = (p2 − p1 , . . . , pJ − p1 ) .
   A more general class: for J λj (x) ≡ 1 and J λj (x)bj (x) ≡ b,
                                   j=1            j=1
   consider (Veach and Guibas 1995 for bj ≡ 0; Tan, 2004)
                                J             nj
                                     1                         q(xji ) − bj (xji )g (xji )
                      cλ,B =
                      ˆ                            λj (xji )
                                     nj                                 pj (xji )
                               j=1        i=1


                                                                                                          logo



  Xiao-Li Meng (Harvard)                      MCMC+likelihood                       September 24, 2011   16 / 23
So why MLE?
   Goal: to estimate c = Γ q(x)µ(dx).
   For an arbitrary vector b, consider the control-variate estimator
   (Owen and Zhou 2000)
                                          J        nj
                                                        q(xji ) − b g (xji )
                               cb ≡
                               ˆ                            J
                                                                                ,
                                      j=1 i=1               s=1 ns ps (xji )

   where g = (p2 − p1 , . . . , pJ − p1 ) .
   A more general class: for J λj (x) ≡ 1 and J λj (x)bj (x) ≡ b,
                                   j=1            j=1
   consider (Veach and Guibas 1995 for bj ≡ 0; Tan, 2004)
                                J             nj
                                     1                         q(xji ) − bj (xji )g (xji )
                      cλ,B =
                      ˆ                            λj (xji )
                                     nj                                 pj (xji )
                               j=1        i=1


   Should cλ,B be more efficient than cb ? Could there be something
          ˆ                         ˆ                                                                     logo

   even more efficient?
  Xiao-Li Meng (Harvard)                      MCMC+likelihood                       September 24, 2011   16 / 23
Three estimators for c =      Γ q(x) µ(dx):




                                                                    logo



   Xiao-Li Meng (Harvard)   MCMC+likelihood   September 24, 2011   17 / 23
Three estimators for c =             Γ q(x) µ(dx):

    IS:                     1
                                 n
                                          q(xi )
                                        J
                                                          ,
                            n           j=1 πj pj (xi )
                                i=1

    where πj = nj /n are the true proportions.




                                                                                    logo



   Xiao-Li Meng (Harvard)       MCMC+likelihood               September 24, 2011   17 / 23
Three estimators for c =                 Γ q(x) µ(dx):

    IS:                     1
                                     n
                                               q(xi )
                                             J
                                                               ,
                            n                j=1 πj pj (xi )
                                 i=1

    where πj = nj /n are the true proportions.

    Reg:                         n                ˆ
                            1            q(xi ) − β g (xi )
                                             J
                                                                   ,
                            n                j=1 πj pj (xi )
                                i=1

          ˆ
    where β is the estimated regression coefficient, ignoring stratification.




                                                                                             logo



   Xiao-Li Meng (Harvard)        MCMC+likelihood                       September 24, 2011   17 / 23
Three estimators for c =                 Γ q(x) µ(dx):

    IS:                      1
                                     n
                                               q(xi )
                                             J
                                                               ,
                             n               j=1 πj pj (xi )
                                 i=1

    where πj = nj /n are the true proportions.

    Reg:                         n                ˆ
                            1            q(xi ) − β g (xi )
                                             J
                                                                   ,
                            n                j=1 πj pj (xi )
                                i=1

          ˆ
    where β is the estimated regression coefficient, ignoring stratification.

    Lik:                     1
                                     n
                                               q(xi )
                                             J
                                                               ,
                             n               j=1 πj pj (xi )
                                                 ˆ
                                 i=1

    where πj s are the estimated proportions, ignoring stratification.
          ˆ
                                                                                             logo



   Xiao-Li Meng (Harvard)        MCMC+likelihood                       September 24, 2011   17 / 23
Three estimators for c =                 Γ q(x) µ(dx):

    IS:                      1
                                     n
                                               q(xi )
                                             J
                                                               ,
                             n               j=1 πj pj (xi )
                                 i=1

    where πj = nj /n are the true proportions.

    Reg:                         n                ˆ
                            1            q(xi ) − β g (xi )
                                             J
                                                                   ,
                            n                j=1 πj pj (xi )
                                i=1

          ˆ
    where β is the estimated regression coefficient, ignoring stratification.

    Lik:                     1
                                     n
                                               q(xi )
                                             J
                                                               ,
                             n               j=1 πj pj (xi )
                                                 ˆ
                                 i=1

    where πj s are the estimated proportions, ignoring stratification.
          ˆ
                                                                                             logo
    Which one is most efficient? Least efficient?
   Xiao-Li Meng (Harvard)        MCMC+likelihood                       September 24, 2011   17 / 23
Let’s find it out ...




                                                                    logo



   Xiao-Li Meng (Harvard)   MCMC+likelihood   September 24, 2011   18 / 23
Let’s find it out ...

    Γ = R10 and µ is Lebesgue measure.




                                                                     logo



   Xiao-Li Meng (Harvard)    MCMC+likelihood   September 24, 2011   18 / 23
Let’s find it out ...

    Γ = R10 and µ is Lebesgue measure.
    The integrand is
                                         10                    10
                            q(x) = 0.8         φ(x j ) + 0.2         ψ(x j ; 4) ,
                                         j=1                   j=1

    where φ(·) is standard normal density and ψ(·; 4) is t4 density.




                                                                                                   logo



   Xiao-Li Meng (Harvard)                MCMC+likelihood                     September 24, 2011   18 / 23
Let’s find it out ...

    Γ = R10 and µ is Lebesgue measure.
    The integrand is
                                         10                    10
                            q(x) = 0.8         φ(x j ) + 0.2         ψ(x j ; 4) ,
                                         j=1                   j=1

    where φ(·) is standard normal density and ψ(·; 4) is t4 density.
    Two sampling designs:




                                                                                                   logo



   Xiao-Li Meng (Harvard)                MCMC+likelihood                     September 24, 2011   18 / 23
Let’s find it out ...

    Γ = R10 and µ is Lebesgue measure.
    The integrand is
                                         10                    10
                            q(x) = 0.8         φ(x j ) + 0.2         ψ(x j ; 4) ,
                                         j=1                   j=1

    where φ(·) is standard normal density and ψ(·; 4) is t4 density.
    Two sampling designs:
      (i) q2 (x) with n draws, or




                                                                                                   logo



   Xiao-Li Meng (Harvard)                MCMC+likelihood                     September 24, 2011   18 / 23
Let’s find it out ...

    Γ = R10 and µ is Lebesgue measure.
    The integrand is
                                             10                    10
                             q(x) = 0.8            φ(x j ) + 0.2         ψ(x j ; 4) ,
                                             j=1                   j=1

    where φ(·) is standard normal density and ψ(·; 4) is t4 density.
    Two sampling designs:
       (i) q2 (x) with n draws, or
      (ii) q1 (x) and q2 (x) each with n/2 draws,
    where
                                       10                               10
                            q1 (x) =         φ(x j ),   q2 (x) =             ψ(x j ; 1)
                                       j=1                           j=1                                logo



   Xiao-Li Meng (Harvard)                    MCMC+likelihood                      September 24, 2011   18 / 23
A little surprise?



                       Table: Comparison of design and estimator

                             one sampler                         two samplers
                        IS     Reg         Lik            IS        Reg           Lik
   Sqrt MSE           .162    .00942    .00931           .0175     .00881      .00881
   Std Err            .162
                        .00919 .00920    .0174 .00885 .00884
                       √
   Note: Sqrt MSE is mean squared error of the point estimates and
               √
   Std Err is mean of the variance estimates from 10000 repeated
   simulations of size n = 500.

                                                                                            logo



   Xiao-Li Meng (Harvard)              MCMC+likelihood                September 24, 2011   19 / 23
Comparison of efficiency:




                                                                   logo



  Xiao-Li Meng (Harvard)   MCMC+likelihood   September 24, 2011   20 / 23
Comparison of efficiency:




   Statistical efficiency: IS < Reg ≈ Lik




                                                                     logo



  Xiao-Li Meng (Harvard)     MCMC+likelihood   September 24, 2011   20 / 23
Comparison of efficiency:




   Statistical efficiency: IS < Reg ≈ Lik
   IS is a stratified estimator, which uses only the labels.




                                                                              logo



  Xiao-Li Meng (Harvard)       MCMC+likelihood          September 24, 2011   20 / 23
Comparison of efficiency:




   Statistical efficiency: IS < Reg ≈ Lik
   IS is a stratified estimator, which uses only the labels.
   Reg is conventional method of control variates.




                                                                              logo



  Xiao-Li Meng (Harvard)       MCMC+likelihood          September 24, 2011   20 / 23
Comparison of efficiency:




   Statistical efficiency: IS < Reg ≈ Lik
   IS is a stratified estimator, which uses only the labels.
   Reg is conventional method of control variates.
   Lik is constrained MLE, which uses pj s but ignores the labels;
                             it is exact if q = pj for any particular j.




                                                                              logo



  Xiao-Li Meng (Harvard)       MCMC+likelihood          September 24, 2011   20 / 23
Building intuition ...




                                                                    logo



   Xiao-Li Meng (Harvard)   MCMC+likelihood   September 24, 2011   21 / 23
Building intuition ...




    Suppose we make n = 2 draws, one from N(0, 1) and one from
    Cauchy (0, 1), hence π1 = π2 = 50%.




                                                                         logo



   Xiao-Li Meng (Harvard)   MCMC+likelihood        September 24, 2011   21 / 23
Building intuition ...




    Suppose we make n = 2 draws, one from N(0, 1) and one from
    Cauchy (0, 1), hence π1 = π2 = 50%.
    Suppose the draws are {1, 1}, what would be the MLE (ˆ1 , π2 )?
                                                         π ˆ




                                                                            logo



   Xiao-Li Meng (Harvard)     MCMC+likelihood         September 24, 2011   21 / 23
Building intuition ...




    Suppose we make n = 2 draws, one from N(0, 1) and one from
    Cauchy (0, 1), hence π1 = π2 = 50%.
    Suppose the draws are {1, 1}, what would be the MLE (ˆ1 , π2 )?
                                                         π ˆ
    Suppose the draws are {1, 3}, what would be the MLE (ˆ1 , π2 )?
                                                         π ˆ




                                                                            logo



   Xiao-Li Meng (Harvard)     MCMC+likelihood         September 24, 2011   21 / 23
Building intuition ...




    Suppose we make n = 2 draws, one from N(0, 1) and one from
    Cauchy (0, 1), hence π1 = π2 = 50%.
    Suppose the draws are {1, 1}, what would be the MLE (ˆ1 , π2 )?
                                                         π ˆ
    Suppose the draws are {1, 3}, what would be the MLE (ˆ1 , π2 )?
                                                         π ˆ
    Suppose the draws are {3, 3}, what would be the MLE (ˆ1 , π2 )?
                                                         π ˆ




                                                                            logo



   Xiao-Li Meng (Harvard)     MCMC+likelihood         September 24, 2011   21 / 23
What Did I Learn?




                                                                   logo



  Xiao-Li Meng (Harvard)   MCMC+likelihood   September 24, 2011   22 / 23
What Did I Learn?




   Model what we ignore, not what we know!




                                                                   logo



  Xiao-Li Meng (Harvard)   MCMC+likelihood   September 24, 2011   22 / 23
What Did I Learn?




   Model what we ignore, not what we know!
   Model comparison/selection is not about which model is true (as all
   of them are “true”), but which model represents a better compromise
   among human, computational, and statistical efficiency.




                                                                          logo



  Xiao-Li Meng (Harvard)     MCMC+likelihood        September 24, 2011   22 / 23
What Did I Learn?




   Model what we ignore, not what we know!
   Model comparison/selection is not about which model is true (as all
   of them are “true”), but which model represents a better compromise
   among human, computational, and statistical efficiency.
   There is a cure for our “schizophrenia” — we now can analyze Monte
   Carlo data using the same sound statistical principles and methods for
   analyzing real data.




                                                                            logo



  Xiao-Li Meng (Harvard)      MCMC+likelihood         September 24, 2011   22 / 23
If you are looking for theoretical research topics ...




                                                                     logo



   Xiao-Li Meng (Harvard)   MCMC+likelihood    September 24, 2011   23 / 23
If you are looking for theoretical research topics ...




    RE-EXAM OLD ONES AND DERIVE NEW ONES!




                                                                     logo



   Xiao-Li Meng (Harvard)   MCMC+likelihood    September 24, 2011   23 / 23
If you are looking for theoretical research topics ...




    RE-EXAM OLD ONES AND DERIVE NEW ONES!
            Prove it is MLE, or a good approximation to MLE.




                                                                                     logo



   Xiao-Li Meng (Harvard)          MCMC+likelihood             September 24, 2011   23 / 23
If you are looking for theoretical research topics ...




    RE-EXAM OLD ONES AND DERIVE NEW ONES!
            Prove it is MLE, or a good approximation to MLE.
            Or derive MLE or a cost-effective approximation to it.




                                                                                    logo



   Xiao-Li Meng (Harvard)           MCMC+likelihood           September 24, 2011   23 / 23
If you are looking for theoretical research topics ...




    RE-EXAM OLD ONES AND DERIVE NEW ONES!
            Prove it is MLE, or a good approximation to MLE.
            Or derive MLE or a cost-effective approximation to it.
    Markov chain Monte Carlo (Tan 2006, 2008)




                                                                                    logo



   Xiao-Li Meng (Harvard)           MCMC+likelihood           September 24, 2011   23 / 23
If you are looking for theoretical research topics ...




    RE-EXAM OLD ONES AND DERIVE NEW ONES!
            Prove it is MLE, or a good approximation to MLE.
            Or derive MLE or a cost-effective approximation to it.
    Markov chain Monte Carlo (Tan 2006, 2008)
    More ......




                                                                                    logo



   Xiao-Li Meng (Harvard)           MCMC+likelihood           September 24, 2011   23 / 23

More Related Content

What's hot

Adaptive Coordinate Descent
Adaptive Coordinate DescentAdaptive Coordinate Descent
Adaptive Coordinate DescentIlya Loshchilov
 
The convenience yield implied by quadratic volatility smiles presentation [...
The convenience yield implied by quadratic volatility smiles   presentation [...The convenience yield implied by quadratic volatility smiles   presentation [...
The convenience yield implied by quadratic volatility smiles presentation [...yigalbt
 
David Seim. Tax Rates, Tax Evasion and Cognitive Skills
David Seim. Tax Rates, Tax Evasion and Cognitive SkillsDavid Seim. Tax Rates, Tax Evasion and Cognitive Skills
David Seim. Tax Rates, Tax Evasion and Cognitive SkillsEesti Pank
 
87fg8df7g8df7g87sd7f8s7df8sdf7
87fg8df7g8df7g87sd7f8s7df8sdf787fg8df7g8df7g87sd7f8s7df8sdf7
87fg8df7g8df7g87sd7f8s7df8sdf78as787f8s
 
A type system for the vectorial aspects of the linear-algebraic lambda-calculus
A type system for the vectorial aspects of the linear-algebraic lambda-calculusA type system for the vectorial aspects of the linear-algebraic lambda-calculus
A type system for the vectorial aspects of the linear-algebraic lambda-calculusAlejandro Díaz-Caro
 
Adaptive Signal and Image Processing
Adaptive Signal and Image ProcessingAdaptive Signal and Image Processing
Adaptive Signal and Image ProcessingGabriel Peyré
 
Camera calibration
Camera calibrationCamera calibration
Camera calibrationYuji Oyamada
 
Computational complexity
Computational complexityComputational complexity
Computational complexityFulvio Corno
 
CVPR2010: Advanced ITinCVPR in a Nutshell: part 7: Future Trend
CVPR2010: Advanced ITinCVPR in a Nutshell: part 7: Future TrendCVPR2010: Advanced ITinCVPR in a Nutshell: part 7: Future Trend
CVPR2010: Advanced ITinCVPR in a Nutshell: part 7: Future Trendzukun
 
Engr 371 final exam august 1999
Engr 371 final exam august 1999Engr 371 final exam august 1999
Engr 371 final exam august 1999amnesiann
 
Scientific Computing with Python Webinar 9/18/2009:Curve Fitting
Scientific Computing with Python Webinar 9/18/2009:Curve FittingScientific Computing with Python Webinar 9/18/2009:Curve Fitting
Scientific Computing with Python Webinar 9/18/2009:Curve FittingEnthought, Inc.
 
Camera parameters
Camera parametersCamera parameters
Camera parametersTheYacine
 
Lecture1
Lecture1Lecture1
Lecture1rjaeh
 
Elliptic Curve Cryptography and Zero Knowledge Proof
Elliptic Curve Cryptography and Zero Knowledge ProofElliptic Curve Cryptography and Zero Knowledge Proof
Elliptic Curve Cryptography and Zero Knowledge ProofArunanand Ta
 

What's hot (20)

Adaptive Coordinate Descent
Adaptive Coordinate DescentAdaptive Coordinate Descent
Adaptive Coordinate Descent
 
The convenience yield implied by quadratic volatility smiles presentation [...
The convenience yield implied by quadratic volatility smiles   presentation [...The convenience yield implied by quadratic volatility smiles   presentation [...
The convenience yield implied by quadratic volatility smiles presentation [...
 
David Seim. Tax Rates, Tax Evasion and Cognitive Skills
David Seim. Tax Rates, Tax Evasion and Cognitive SkillsDavid Seim. Tax Rates, Tax Evasion and Cognitive Skills
David Seim. Tax Rates, Tax Evasion and Cognitive Skills
 
87fg8df7g8df7g87sd7f8s7df8sdf7
87fg8df7g8df7g87sd7f8s7df8sdf787fg8df7g8df7g87sd7f8s7df8sdf7
87fg8df7g8df7g87sd7f8s7df8sdf7
 
A type system for the vectorial aspects of the linear-algebraic lambda-calculus
A type system for the vectorial aspects of the linear-algebraic lambda-calculusA type system for the vectorial aspects of the linear-algebraic lambda-calculus
A type system for the vectorial aspects of the linear-algebraic lambda-calculus
 
Adaptive Signal and Image Processing
Adaptive Signal and Image ProcessingAdaptive Signal and Image Processing
Adaptive Signal and Image Processing
 
Future Value and Present Value --- Paper (2006)
Future Value and Present Value --- Paper (2006)Future Value and Present Value --- Paper (2006)
Future Value and Present Value --- Paper (2006)
 
Camera calibration
Camera calibrationCamera calibration
Camera calibration
 
rinko2010
rinko2010rinko2010
rinko2010
 
rinko2011-agh
rinko2011-aghrinko2011-agh
rinko2011-agh
 
Computational complexity
Computational complexityComputational complexity
Computational complexity
 
CVPR2010: Advanced ITinCVPR in a Nutshell: part 7: Future Trend
CVPR2010: Advanced ITinCVPR in a Nutshell: part 7: Future TrendCVPR2010: Advanced ITinCVPR in a Nutshell: part 7: Future Trend
CVPR2010: Advanced ITinCVPR in a Nutshell: part 7: Future Trend
 
Tprimal agh
Tprimal aghTprimal agh
Tprimal agh
 
Engr 371 final exam august 1999
Engr 371 final exam august 1999Engr 371 final exam august 1999
Engr 371 final exam august 1999
 
Scientific Computing with Python Webinar 9/18/2009:Curve Fitting
Scientific Computing with Python Webinar 9/18/2009:Curve FittingScientific Computing with Python Webinar 9/18/2009:Curve Fitting
Scientific Computing with Python Webinar 9/18/2009:Curve Fitting
 
presentacion
presentacionpresentacion
presentacion
 
Camera parameters
Camera parametersCamera parameters
Camera parameters
 
Cheat Sheet
Cheat SheetCheat Sheet
Cheat Sheet
 
Lecture1
Lecture1Lecture1
Lecture1
 
Elliptic Curve Cryptography and Zero Knowledge Proof
Elliptic Curve Cryptography and Zero Knowledge ProofElliptic Curve Cryptography and Zero Knowledge Proof
Elliptic Curve Cryptography and Zero Knowledge Proof
 

Viewers also liked

Séminaire de Physique à Besancon, Nov. 22, 2012
Séminaire de Physique à Besancon, Nov. 22, 2012Séminaire de Physique à Besancon, Nov. 22, 2012
Séminaire de Physique à Besancon, Nov. 22, 2012Christian Robert
 
ABC and empirical likelihood
ABC and empirical likelihoodABC and empirical likelihood
ABC and empirical likelihoodChristian Robert
 
Gelfand and Smith (1990), read by
Gelfand and Smith (1990), read byGelfand and Smith (1990), read by
Gelfand and Smith (1990), read byChristian Robert
 
Reading Efron's 1979 paper on bootstrap
Reading Efron's 1979 paper on bootstrapReading Efron's 1979 paper on bootstrap
Reading Efron's 1979 paper on bootstrapChristian Robert
 

Viewers also liked (7)

Séminaire de Physique à Besancon, Nov. 22, 2012
Séminaire de Physique à Besancon, Nov. 22, 2012Séminaire de Physique à Besancon, Nov. 22, 2012
Séminaire de Physique à Besancon, Nov. 22, 2012
 
ABC in Varanasi
ABC in VaranasiABC in Varanasi
ABC in Varanasi
 
ABC and empirical likelihood
ABC and empirical likelihoodABC and empirical likelihood
ABC and empirical likelihood
 
Gelfand and Smith (1990), read by
Gelfand and Smith (1990), read byGelfand and Smith (1990), read by
Gelfand and Smith (1990), read by
 
Hastings paper discussion
Hastings paper discussionHastings paper discussion
Hastings paper discussion
 
Reading Efron's 1979 paper on bootstrap
Reading Efron's 1979 paper on bootstrapReading Efron's 1979 paper on bootstrap
Reading Efron's 1979 paper on bootstrap
 
ABC workshop: 17w5025
ABC workshop: 17w5025ABC workshop: 17w5025
ABC workshop: 17w5025
 

Similar to Let's Practice What We Preach: Likelihood Methods for Monte Carlo Data

Montpellier Math Colloquium
Montpellier Math ColloquiumMontpellier Math Colloquium
Montpellier Math ColloquiumChristian Robert
 
Divergence center-based clustering and their applications
Divergence center-based clustering and their applicationsDivergence center-based clustering and their applications
Divergence center-based clustering and their applicationsFrank Nielsen
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...ijceronline
 
Meanshift Tracking Presentation
Meanshift Tracking PresentationMeanshift Tracking Presentation
Meanshift Tracking Presentationsandtouch
 
Divergence clustering
Divergence clusteringDivergence clustering
Divergence clusteringFrank Nielsen
 
CVPR2010: Advanced ITinCVPR in a Nutshell: part 5: Shape, Matching and Diverg...
CVPR2010: Advanced ITinCVPR in a Nutshell: part 5: Shape, Matching and Diverg...CVPR2010: Advanced ITinCVPR in a Nutshell: part 5: Shape, Matching and Diverg...
CVPR2010: Advanced ITinCVPR in a Nutshell: part 5: Shape, Matching and Diverg...zukun
 
Xian's abc
Xian's abcXian's abc
Xian's abcDeb Roy
 
ABC: How Bayesian can it be?
ABC: How Bayesian can it be?ABC: How Bayesian can it be?
ABC: How Bayesian can it be?Christian Robert
 
ABC-Xian
ABC-XianABC-Xian
ABC-XianDeb Roy
 
R. Jimenez - Fundamental Physics from Astronomical Observations
R. Jimenez - Fundamental Physics from Astronomical ObservationsR. Jimenez - Fundamental Physics from Astronomical Observations
R. Jimenez - Fundamental Physics from Astronomical ObservationsSEENET-MTP
 
Optimal interval clustering: Application to Bregman clustering and statistica...
Optimal interval clustering: Application to Bregman clustering and statistica...Optimal interval clustering: Application to Bregman clustering and statistica...
Optimal interval clustering: Application to Bregman clustering and statistica...Frank Nielsen
 
Slides: On the Chi Square and Higher-Order Chi Distances for Approximating f-...
Slides: On the Chi Square and Higher-Order Chi Distances for Approximating f-...Slides: On the Chi Square and Higher-Order Chi Distances for Approximating f-...
Slides: On the Chi Square and Higher-Order Chi Distances for Approximating f-...Frank Nielsen
 
Slides: A glance at information-geometric signal processing
Slides: A glance at information-geometric signal processingSlides: A glance at information-geometric signal processing
Slides: A glance at information-geometric signal processingFrank Nielsen
 
fauvel_igarss.pdf
fauvel_igarss.pdffauvel_igarss.pdf
fauvel_igarss.pdfgrssieee
 
論文紹介:Towards Robust Adaptive Object Detection Under Noisy Annotations
論文紹介:Towards Robust Adaptive Object Detection Under Noisy Annotations論文紹介:Towards Robust Adaptive Object Detection Under Noisy Annotations
論文紹介:Towards Robust Adaptive Object Detection Under Noisy AnnotationsToru Tamaki
 
Munich07 Foils
Munich07 FoilsMunich07 Foils
Munich07 FoilsAntonini
 
Computational Information Geometry on Matrix Manifolds (ICTP 2013)
Computational Information Geometry on Matrix Manifolds (ICTP 2013)Computational Information Geometry on Matrix Manifolds (ICTP 2013)
Computational Information Geometry on Matrix Manifolds (ICTP 2013)Frank Nielsen
 

Similar to Let's Practice What We Preach: Likelihood Methods for Monte Carlo Data (20)

Montpellier Math Colloquium
Montpellier Math ColloquiumMontpellier Math Colloquium
Montpellier Math Colloquium
 
YSC 2013
YSC 2013YSC 2013
YSC 2013
 
Divergence center-based clustering and their applications
Divergence center-based clustering and their applicationsDivergence center-based clustering and their applications
Divergence center-based clustering and their applications
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
 
Meanshift Tracking Presentation
Meanshift Tracking PresentationMeanshift Tracking Presentation
Meanshift Tracking Presentation
 
Divergence clustering
Divergence clusteringDivergence clustering
Divergence clustering
 
CVPR2010: Advanced ITinCVPR in a Nutshell: part 5: Shape, Matching and Diverg...
CVPR2010: Advanced ITinCVPR in a Nutshell: part 5: Shape, Matching and Diverg...CVPR2010: Advanced ITinCVPR in a Nutshell: part 5: Shape, Matching and Diverg...
CVPR2010: Advanced ITinCVPR in a Nutshell: part 5: Shape, Matching and Diverg...
 
Xian's abc
Xian's abcXian's abc
Xian's abc
 
ABC in Roma
ABC in RomaABC in Roma
ABC in Roma
 
ABC: How Bayesian can it be?
ABC: How Bayesian can it be?ABC: How Bayesian can it be?
ABC: How Bayesian can it be?
 
ABC-Xian
ABC-XianABC-Xian
ABC-Xian
 
R. Jimenez - Fundamental Physics from Astronomical Observations
R. Jimenez - Fundamental Physics from Astronomical ObservationsR. Jimenez - Fundamental Physics from Astronomical Observations
R. Jimenez - Fundamental Physics from Astronomical Observations
 
Optimal interval clustering: Application to Bregman clustering and statistica...
Optimal interval clustering: Application to Bregman clustering and statistica...Optimal interval clustering: Application to Bregman clustering and statistica...
Optimal interval clustering: Application to Bregman clustering and statistica...
 
Slides: On the Chi Square and Higher-Order Chi Distances for Approximating f-...
Slides: On the Chi Square and Higher-Order Chi Distances for Approximating f-...Slides: On the Chi Square and Higher-Order Chi Distances for Approximating f-...
Slides: On the Chi Square and Higher-Order Chi Distances for Approximating f-...
 
Slides: A glance at information-geometric signal processing
Slides: A glance at information-geometric signal processingSlides: A glance at information-geometric signal processing
Slides: A glance at information-geometric signal processing
 
fauvel_igarss.pdf
fauvel_igarss.pdffauvel_igarss.pdf
fauvel_igarss.pdf
 
論文紹介:Towards Robust Adaptive Object Detection Under Noisy Annotations
論文紹介:Towards Robust Adaptive Object Detection Under Noisy Annotations論文紹介:Towards Robust Adaptive Object Detection Under Noisy Annotations
論文紹介:Towards Robust Adaptive Object Detection Under Noisy Annotations
 
Md2521102111
Md2521102111Md2521102111
Md2521102111
 
Munich07 Foils
Munich07 FoilsMunich07 Foils
Munich07 Foils
 
Computational Information Geometry on Matrix Manifolds (ICTP 2013)
Computational Information Geometry on Matrix Manifolds (ICTP 2013)Computational Information Geometry on Matrix Manifolds (ICTP 2013)
Computational Information Geometry on Matrix Manifolds (ICTP 2013)
 

More from Christian Robert

Asymptotics of ABC, lecture, Collège de France
Asymptotics of ABC, lecture, Collège de FranceAsymptotics of ABC, lecture, Collège de France
Asymptotics of ABC, lecture, Collège de FranceChristian Robert
 
Workshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael MartinWorkshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael MartinChristian Robert
 
How many components in a mixture?
How many components in a mixture?How many components in a mixture?
How many components in a mixture?Christian Robert
 
Testing for mixtures at BNP 13
Testing for mixtures at BNP 13Testing for mixtures at BNP 13
Testing for mixtures at BNP 13Christian Robert
 
Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?Christian Robert
 
Testing for mixtures by seeking components
Testing for mixtures by seeking componentsTesting for mixtures by seeking components
Testing for mixtures by seeking componentsChristian Robert
 
discussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihooddiscussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihoodChristian Robert
 
NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)Christian Robert
 
Coordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like samplerCoordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like samplerChristian Robert
 
Laplace's Demon: seminar #1
Laplace's Demon: seminar #1Laplace's Demon: seminar #1
Laplace's Demon: seminar #1Christian Robert
 
Likelihood-free Design: a discussion
Likelihood-free Design: a discussionLikelihood-free Design: a discussion
Likelihood-free Design: a discussionChristian Robert
 

More from Christian Robert (20)

Asymptotics of ABC, lecture, Collège de France
Asymptotics of ABC, lecture, Collège de FranceAsymptotics of ABC, lecture, Collège de France
Asymptotics of ABC, lecture, Collège de France
 
Workshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael MartinWorkshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael Martin
 
discussion of ICML23.pdf
discussion of ICML23.pdfdiscussion of ICML23.pdf
discussion of ICML23.pdf
 
How many components in a mixture?
How many components in a mixture?How many components in a mixture?
How many components in a mixture?
 
restore.pdf
restore.pdfrestore.pdf
restore.pdf
 
Testing for mixtures at BNP 13
Testing for mixtures at BNP 13Testing for mixtures at BNP 13
Testing for mixtures at BNP 13
 
Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?
 
CDT 22 slides.pdf
CDT 22 slides.pdfCDT 22 slides.pdf
CDT 22 slides.pdf
 
Testing for mixtures by seeking components
Testing for mixtures by seeking componentsTesting for mixtures by seeking components
Testing for mixtures by seeking components
 
discussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihooddiscussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihood
 
NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
 
Coordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like samplerCoordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like sampler
 
eugenics and statistics
eugenics and statisticseugenics and statistics
eugenics and statistics
 
Laplace's Demon: seminar #1
Laplace's Demon: seminar #1Laplace's Demon: seminar #1
Laplace's Demon: seminar #1
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
 
asymptotics of ABC
asymptotics of ABCasymptotics of ABC
asymptotics of ABC
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
 
Likelihood-free Design: a discussion
Likelihood-free Design: a discussionLikelihood-free Design: a discussion
Likelihood-free Design: a discussion
 
the ABC of ABC
the ABC of ABCthe ABC of ABC
the ABC of ABC
 

Let's Practice What We Preach: Likelihood Methods for Monte Carlo Data

  • 1. Let’s Practice What We Preach: Likelihood Methods for Monte Carlo Data Xiao-Li Meng Department of Statistics, Harvard University September 24, 2011 logo Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 1 / 23
  • 2. Let’s Practice What We Preach: Likelihood Methods for Monte Carlo Data Xiao-Li Meng Department of Statistics, Harvard University September 24, 2011 Based on Kong, McCullagh, Meng, Nicolae, and Tan (2003, JRSS-B, with discussions); Kong, McCullagh, Meng, and Nicolae (2006, Doksum Festschrift); Tan (2004, JASA); ..., Meng and Tan (201X) logo Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 1 / 23
  • 3. Importance sampling (IS) logo Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 2 / 23
  • 4. Importance sampling (IS) Estimand: q1 (x) c1 = q1 (x)µ(dx) = p2 (x)µ(dx). Γ Γ p2 (x) logo Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 2 / 23
  • 5. Importance sampling (IS) Estimand: q1 (x) c1 = q1 (x)µ(dx) = p2 (x)µ(dx). Γ Γ p2 (x) Data: {Xi2 , i = 1, . . . n2 } ∼ p2 = q2 /c2 logo Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 2 / 23
  • 6. Importance sampling (IS) Estimand: q1 (x) c1 = q1 (x)µ(dx) = p2 (x)µ(dx). Γ Γ p2 (x) Data: {Xi2 , i = 1, . . . n2 } ∼ p2 = q2 /c2 Estimating Equation (EE): c1 q1 (X ) r≡ = E2 . c2 q2 (X ) logo Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 2 / 23
  • 7. Importance sampling (IS) Estimand: q1 (x) c1 = q1 (x)µ(dx) = p2 (x)µ(dx). Γ Γ p2 (x) Data: {Xi2 , i = 1, . . . n2 } ∼ p2 = q2 /c2 Estimating Equation (EE): c1 q1 (X ) r≡ = E2 . c2 q2 (X ) The EE estimator: n2 1 q1 (Xi2 ) ˆ= r n2 q2 (Xi2 ) i=1 logo Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 2 / 23
  • 8. Importance sampling (IS) Estimand: q1 (x) c1 = q1 (x)µ(dx) = p2 (x)µ(dx). Γ Γ p2 (x) Data: {Xi2 , i = 1, . . . n2 } ∼ p2 = q2 /c2 Estimating Equation (EE): c1 q1 (X ) r≡ = E2 . c2 q2 (X ) The EE estimator: n2 1 q1 (Xi2 ) ˆ= r n2 q2 (Xi2 ) i=1 Standard IS estimator for c1 when c2 = 1. logo Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 2 / 23
  • 9. What about MLE? logo Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 3 / 23
  • 10. What about MLE? The “likelihood” is: n2 f (X12 . . . Xn2 2 ) = p2 (Xi2 ) — free of the estimand c1 ! i=1 logo Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 3 / 23
  • 11. What about MLE? The “likelihood” is: n2 f (X12 . . . Xn2 2 ) = p2 (Xi2 ) — free of the estimand c1 ! i=1 So why are {Xi2 , i = 1, . . . n2 } even relevant? Violation of likelihood principle? logo Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 3 / 23
  • 12. What about MLE? The “likelihood” is: n2 f (X12 . . . Xn2 2 ) = p2 (Xi2 ) — free of the estimand c1 ! i=1 So why are {Xi2 , i = 1, . . . n2 } even relevant? Violation of likelihood principle? What are we “inferring”? What is the “unknown” model parameter? logo Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 3 / 23
  • 13. Bridge sampling (BS) logo Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 4 / 23
  • 14. Bridge sampling (BS) Data: {Xij , i = 1, . . . , nj } ∼ pj = qj /cj , j = 1, 2 logo Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 4 / 23
  • 15. Bridge sampling (BS) Data: {Xij , i = 1, . . . , nj } ∼ pj = qj /cj , j = 1, 2 Estimating Equation (Meng and Wong, 1996): c1 E2 [α(X )q1 (X )] r≡ = , ∀α: 0<| αq1 q2 dµ| < ∞ c2 E1 [α(X )q2 (X )] logo Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 4 / 23
  • 16. Bridge sampling (BS) Data: {Xij , i = 1, . . . , nj } ∼ pj = qj /cj , j = 1, 2 Estimating Equation (Meng and Wong, 1996): c1 E2 [α(X )q1 (X )] r≡ = , ∀α: 0<| αq1 q2 dµ| < ∞ c2 E1 [α(X )q2 (X )] Optimal choice: αO (x) ∝ [n1 q1 (x) + n2 rq2 (x)]−1 logo Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 4 / 23
  • 17. Bridge sampling (BS) Data: {Xij , i = 1, . . . , nj } ∼ pj = qj /cj , j = 1, 2 Estimating Equation (Meng and Wong, 1996): c1 E2 [α(X )q1 (X )] r≡ = , ∀α: 0<| αq1 q2 dµ| < ∞ c2 E1 [α(X )q2 (X )] Optimal choice: αO (x) ∝ [n1 q1 (x) + n2 rq2 (x)]−1 Optimal estimator ˆO , the limit of r n2 1 q1 (Xi2 ) n2 (t) s1 q1 (Xi2 )+s2 ˆO q2 (Xi2 ) r (t+1) i=1 ˆO r = n1 1 q2 (Xi1 ) n1 (t) s1 q1 (Xi1 )+s2 ˆO q2 (Xi1 ) r i=1 logo Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 4 / 23
  • 18. What about MLE? logo Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 5 / 23
  • 19. What about MLE? The “likelihood” is: 2 nj qj (Xij ) −n −n ∝ c1 1 c2 2 — free of data! cj j=1 i=1 logo Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 5 / 23
  • 20. What about MLE? The “likelihood” is: 2 nj qj (Xij ) −n −n ∝ c1 1 c2 2 — free of data! cj j=1 i=1 What went wrong: cj is not “free parameter” because cj = Γ qj (x)µ(dx) and qj is known. logo Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 5 / 23
  • 21. What about MLE? The “likelihood” is: 2 nj qj (Xij ) −n −n ∝ c1 1 c2 2 — free of data! cj j=1 i=1 What went wrong: cj is not “free parameter” because cj = Γ qj (x)µ(dx) and qj is known. So what is the “unknown” model parameter? logo Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 5 / 23
  • 22. What about MLE? The “likelihood” is: 2 nj qj (Xij ) −n −n ∝ c1 1 c2 2 — free of data! cj j=1 i=1 What went wrong: cj is not “free parameter” because cj = Γ qj (x)µ(dx) and qj is known. So what is the “unknown” model parameter? Turns out ˆO is the same as Bennett’s (1976) optimal acceptance r ratio estimator, as well as Geyer’s (1994) reversed logistic regression estimator. logo Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 5 / 23
  • 23. What about MLE? The “likelihood” is: 2 nj qj (Xij ) −n −n ∝ c1 1 c2 2 — free of data! cj j=1 i=1 What went wrong: cj is not “free parameter” because cj = Γ qj (x)µ(dx) and qj is known. So what is the “unknown” model parameter? Turns out ˆO is the same as Bennett’s (1976) optimal acceptance r ratio estimator, as well as Geyer’s (1994) reversed logistic regression estimator. So why is that? Can it be improved upon without any “sleight of hand”? logo Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 5 / 23
  • 24. Pretending the measure is unknown! logo Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 6 / 23
  • 25. Pretending the measure is unknown! Because c= q(x)µ(dx), Γ and q is known in the sense that we can evaluate it at any sample value, the only way to make c “unknown” is to assume the underlying measure µ is “unknown”. logo Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 6 / 23
  • 26. Pretending the measure is unknown! Because c= q(x)µ(dx), Γ and q is known in the sense that we can evaluate it at any sample value, the only way to make c “unknown” is to assume the underlying measure µ is “unknown”. This is natural because Monte Carlo simulation means we use samples to represent, and thus estimate/infer, the underlying population q(x)µ(dx), and hence estimate/infer µ since q is known. logo Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 6 / 23
  • 27. Pretending the measure is unknown! Because c= q(x)µ(dx), Γ and q is known in the sense that we can evaluate it at any sample value, the only way to make c “unknown” is to assume the underlying measure µ is “unknown”. This is natural because Monte Carlo simulation means we use samples to represent, and thus estimate/infer, the underlying population q(x)µ(dx), and hence estimate/infer µ since q is known. Monte Carlo integration is about finding a tractable discrete µ to ˆ approximate the intractable µ. logo Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 6 / 23
  • 28. Importance Sampling Likelihood logo Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 7 / 23
  • 29. Importance Sampling Likelihood Estimand: c1 = Γ q1 (x)µ(dx) logo Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 7 / 23
  • 30. Importance Sampling Likelihood Estimand: c1 = Γ q1 (x)µ(dx) −1 Data: {Xi2 , i = 1, . . . n2 } ∼ i.i.d. c2 q2 (x)µ(dx) logo Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 7 / 23
  • 31. Importance Sampling Likelihood Estimand: c1 = Γ q1 (x)µ(dx) −1 Data: {Xi2 , i = 1, . . . n2 } ∼ i.i.d. c2 q2 (x)µ(dx) Likelihood for µ: n2 −1 L(µ) = c2 q2 (Xi2 )µ(Xi2 ) i=1 Note that c2 is a functional of µ. logo Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 7 / 23
  • 32. Importance Sampling Likelihood Estimand: c1 = Γ q1 (x)µ(dx) −1 Data: {Xi2 , i = 1, . . . n2 } ∼ i.i.d. c2 q2 (x)µ(dx) Likelihood for µ: n2 −1 L(µ) = c2 q2 (Xi2 )µ(Xi2 ) i=1 Note that c2 is a functional of µ. The nonparametric MLE of µ is ˆ P(dx) µ(dx) = ˆ , ˆ P — empirical measure q2 (x) logo Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 7 / 23
  • 33. Importance Sampling Likelihood logo Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 8 / 23
  • 34. Importance Sampling Likelihood Thus the MLE for r ≡ c1 /c2 is n2 1 q1 (Xi2 ) ˆ= r q1 (x)ˆ(dx) = µ n2 q2 (Xi2 ) i=1 logo Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 8 / 23
  • 35. Importance Sampling Likelihood Thus the MLE for r ≡ c1 /c2 is n2 1 q1 (Xi2 ) ˆ= r q1 (x)ˆ(dx) = µ n2 q2 (Xi2 ) i=1 When c2 = 1, q2 = p2 , standard IS estimator for c1 is obtained. logo Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 8 / 23
  • 36. Importance Sampling Likelihood Thus the MLE for r ≡ c1 /c2 is n2 1 q1 (Xi2 ) ˆ= r q1 (x)ˆ(dx) = µ n2 q2 (Xi2 ) i=1 When c2 = 1, q2 = p2 , standard IS estimator for c1 is obtained. {X(i2) , i = 1, . . . n2 } is (minimum) sufficient for µ on x ∈ S2 = {x : q2 (x) > 0}, and hence c1 is guaranteed to be ˆ consistent only when S1 ⊂ S2 . logo Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 8 / 23
  • 37. Bridge Sampling Likelihood logo Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 9 / 23
  • 38. Bridge Sampling Likelihood Estimand: ∝ cj = Γ qj (x)µ(x), j = 1, . . . , J. logo Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 9 / 23
  • 39. Bridge Sampling Likelihood Estimand: ∝ cj = Γ qj (x)µ(x), j = 1, . . . , J. Data: {Xij , 1 ≤ i ≤ nj } ∼ cj−1 qj (x)µ(dx), 1 ≤j ≤J logo Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 9 / 23
  • 40. Bridge Sampling Likelihood Estimand: ∝ cj = Γ qj (x)µ(x), j = 1, . . . , J. Data: {Xij , 1 ≤ i ≤ nj } ∼ cj−1 qj (x)µ(dx), 1 ≤ j ≤ J nj Likelihood for µ: L(µ) = J j=1 −1 i=1 cj qj (Xij )µ(Xij ) logo Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 9 / 23
  • 41. Bridge Sampling Likelihood Estimand: ∝ cj = Γ qj (x)µ(x), j = 1, . . . , J. Data: {Xij , 1 ≤ i ≤ nj } ∼ cj−1 qj (x)µ(dx), 1 ≤ j ≤ J nj Likelihood for µ: L(µ) = J j=1 −1 i=1 cj qj (Xij )µ(Xij ) Writing θ(x) = log µ(x), then J log L(µ) = n ˆ θ(x)d P − nj log cj (θ), Γ j=1 ˆ P is the empirical measure on {Xij , 1 ≤ i ≤ nj , 1 ≤ j ≤ J}. logo Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 9 / 23
  • 42. Bridge Sampling Likelihood logo Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 10 / 23
  • 43. Bridge Sampling Likelihood ˆ MLE for µ given by equating the canonical sufficient statistics P to its expectation: J ˆ nP(dx) = nj cj−1 qj (x)ˆ(dx), ˆ µ j=1 ˆ nP(dx) µ(dx) = ˆ J . (A) ˆ−1 j=1 nj cj qj (x) logo Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 10 / 23
  • 44. Bridge Sampling Likelihood ˆ MLE for µ given by equating the canonical sufficient statistics P to its expectation: J ˆ nP(dx) = nj cj−1 qj (x)ˆ(dx), ˆ µ j=1 ˆ nP(dx) µ(dx) = ˆ J . (A) ˆ−1 j=1 nj cj qj (x) Consequently, the MLE for {c1 , . . . , cJ } must satisfy J nj qr (xij ) cr = ˆ qr (x) d µ = ˆ J . (B) Γ j=1 i=1 s=1 ˆ−1 ns cs qs (xij ) logo Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 10 / 23
  • 45. Bridge Sampling Likelihood ˆ MLE for µ given by equating the canonical sufficient statistics P to its expectation: J ˆ nP(dx) = nj cj−1 qj (x)ˆ(dx), ˆ µ j=1 ˆ nP(dx) µ(dx) = ˆ J . (A) ˆ−1 j=1 nj cj qj (x) Consequently, the MLE for {c1 , . . . , cJ } must satisfy J nj qr (xij ) cr = ˆ qr (x) d µ = ˆ J . (B) Γ j=1 i=1 s=1 ˆ−1 ns cs qs (xij ) (B) is the “dual” equation of (A), and is also the same as the logo equation for optimal multiple bridge sampling estimator (Tan 2004). Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 10 / 23
  • 46. But We Can Ignore Less ... logo Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 11 / 23
  • 47. But We Can Ignore Less ... To restrict the parameter space for µ by using some knowledge of the known µ, that it, to set up a sub-model. logo Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 11 / 23
  • 48. But We Can Ignore Less ... To restrict the parameter space for µ by using some knowledge of the known µ, that it, to set up a sub-model. The new MLE has a smaller asymptotic variance under the submodel than under the full model. logo Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 11 / 23
  • 49. But We Can Ignore Less ... To restrict the parameter space for µ by using some knowledge of the known µ, that it, to set up a sub-model. The new MLE has a smaller asymptotic variance under the submodel than under the full model. Examples: logo Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 11 / 23
  • 50. But We Can Ignore Less ... To restrict the parameter space for µ by using some knowledge of the known µ, that it, to set up a sub-model. The new MLE has a smaller asymptotic variance under the submodel than under the full model. Examples: Group-invariance submodel logo Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 11 / 23
  • 51. But We Can Ignore Less ... To restrict the parameter space for µ by using some knowledge of the known µ, that it, to set up a sub-model. The new MLE has a smaller asymptotic variance under the submodel than under the full model. Examples: Group-invariance submodel Linear submodel logo Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 11 / 23
  • 52. But We Can Ignore Less ... To restrict the parameter space for µ by using some knowledge of the known µ, that it, to set up a sub-model. The new MLE has a smaller asymptotic variance under the submodel than under the full model. Examples: Group-invariance submodel Linear submodel Log-linear submodel logo Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 11 / 23
  • 53. An Universally Improved IS logo Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 12 / 23
  • 54. An Universally Improved IS Estimand: r = c1 /c2 ; cj = Rd qj (x)µ(dx) logo Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 12 / 23
  • 55. An Universally Improved IS Estimand: r = c1 /c2 ; cj = R d qj (x)µ(dx) −1 Data: {Xi2 , i = 1, . . . n2 } i.i.d ∼ c2 q2 µ(dx) logo Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 12 / 23
  • 56. An Universally Improved IS Estimand: r = c1 /c2 ; cj = R d qj (x)µ(dx) −1 Data: {Xi2 , i = 1, . . . n2 } i.i.d ∼ c2 q2 µ(dx) Taking G = {Id , −Id } leads to logo Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 12 / 23
  • 57. An Universally Improved IS Estimand: r = c1 /c2 ; cj = R d qj (x)µ(dx) −1 Data: {Xi2 , i = 1, . . . n2 } i.i.d ∼ c2 q2 µ(dx) Taking G = {Id , −Id } leads to n2 1 q1 (Xi2 ) + q1 (−Xi2 ) ˆG = r . n2 q2 (Xi2 ) + q2 (−Xi2 ) i=1 logo Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 12 / 23
  • 58. An Universally Improved IS Estimand: r = c1 /c2 ; cj = R d qj (x)µ(dx) −1 Data: {Xi2 , i = 1, . . . n2 } i.i.d ∼ c2 q2 µ(dx) Taking G = {Id , −Id } leads to n2 1 q1 (Xi2 ) + q1 (−Xi2 ) ˆG = r . n2 q2 (Xi2 ) + q2 (−Xi2 ) i=1 Because of the Rao-Blackwellization, V(ˆG ) ≤ V(ˆ). r r logo Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 12 / 23
  • 59. An Universally Improved IS Estimand: r = c1 /c2 ; cj = R d qj (x)µ(dx) −1 Data: {Xi2 , i = 1, . . . n2 } i.i.d ∼ c2 q2 µ(dx) Taking G = {Id , −Id } leads to n2 1 q1 (Xi2 ) + q1 (−Xi2 ) ˆG = r . n2 q2 (Xi2 ) + q2 (−Xi2 ) i=1 Because of the Rao-Blackwellization, V(ˆG ) ≤ V(ˆ). r r Need twice as many evaluations, but typically this is a small insurance premium. logo Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 12 / 23
  • 60. An Universally Improved IS Estimand: r = c1 /c2 ; cj = R d qj (x)µ(dx) −1 Data: {Xi2 , i = 1, . . . n2 } i.i.d ∼ c2 q2 µ(dx) Taking G = {Id , −Id } leads to n2 1 q1 (Xi2 ) + q1 (−Xi2 ) ˆG = r . n2 q2 (Xi2 ) + q2 (−Xi2 ) i=1 Because of the Rao-Blackwellization, V(ˆG ) ≤ V(ˆ). r r Need twice as many evaluations, but typically this is a small insurance premium. Consider S1 = R & S2 = R + . Then ˆG is consistent for r : r n2 n2 1 q1 (Xi2 ) 1 q1 (−Xi2 ) ˆG = r + . n2 q2 (Xi2 ) n2 q2 (Xi2 ) i=1 i=1 logo Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 12 / 23
  • 61. An Universally Improved IS Estimand: r = c1 /c2 ; cj = R d qj (x)µ(dx) −1 Data: {Xi2 , i = 1, . . . n2 } i.i.d ∼ c2 q2 µ(dx) Taking G = {Id , −Id } leads to n2 1 q1 (Xi2 ) + q1 (−Xi2 ) ˆG = r . n2 q2 (Xi2 ) + q2 (−Xi2 ) i=1 Because of the Rao-Blackwellization, V(ˆG ) ≤ V(ˆ). r r Need twice as many evaluations, but typically this is a small insurance premium. Consider S1 = R & S2 = R + . Then ˆG is consistent for r : r n2 n2 1 q1 (Xi2 ) 1 q1 (−Xi2 ) ˆG = r + . n2 q2 (Xi2 ) n2 q2 (Xi2 ) i=1 i=1 logo ∞ But standard IS ˆ only estimates r 0 q1 (x)µ(dx)/c2 . Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 12 / 23
  • 62. There are many more improvements ... Define a sub-model by requiring µ to be G-invariant, where G is a finite group on Γ. logo Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 13 / 23
  • 63. There are many more improvements ... Define a sub-model by requiring µ to be G-invariant, where G is a finite group on Γ. The new MLE of µ is ˆ nP G (dx) µG (dx) = ˆ J , ˆ−1 G j=1 nj cj q j (x) ˆ ˆ where P G (A) = aveg ∈G P(gA); q j G (x) = aveg ∈G qj (gx). logo Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 13 / 23
  • 64. There are many more improvements ... Define a sub-model by requiring µ to be G-invariant, where G is a finite group on Γ. The new MLE of µ is ˆ nP G (dx) µG (dx) = ˆ J , ˆ−1 G j=1 nj cj q j (x) where P G (A) = aveg ∈G P(gA); q j G (x) = aveg ∈G qj (gx). ˆ ˆ When the draws are i.i.d. within each ps dµ, µG = E [ˆ| GX ], ˆ µ i.e., the Rao-Blackwellization of µ given the orbit. ˆ logo Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 13 / 23
  • 65. There are many more improvements ... Define a sub-model by requiring µ to be G-invariant, where G is a finite group on Γ. The new MLE of µ is ˆ nP G (dx) µG (dx) = ˆ J , ˆ−1 G j=1 nj cj q j (x) where P G (A) = aveg ∈G P(gA); q j G (x) = aveg ∈G qj (gx). ˆ ˆ When the draws are i.i.d. within each ps dµ, µG = E [ˆ| GX ], ˆ µ i.e., the Rao-Blackwellization of µ given the orbit. ˆ Consequently, cj G = ˆ qj (x)µG (dx) = E [ˆj |GX ]. c logo Γ Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 13 / 23
  • 66. Using Groups to model trade-off logo Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 14 / 23
  • 67. Using Groups to model trade-off If G1 G2 , then G1 G2 Var c ≤ Var c . logo Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 14 / 23
  • 68. Using Groups to model trade-off If G1 G2 , then G1 G2 Var c ≤ Var c . The statistical efficiency increases with the size of Gi , but so does the computational cost needed for function evaluation (but not for sampling, because there are no additional samples involved). logo Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 14 / 23
  • 69. Linear submodel: stratified sampling (Tan 2004) logo Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 15 / 23
  • 70. Linear submodel: stratified sampling (Tan 2004) i.i.d Data: {Xij , 1 ≤ i ≤ nj } ∼ pj (x)µ(dx), 1 ≤ j ≤ J. logo Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 15 / 23
  • 71. Linear submodel: stratified sampling (Tan 2004) i.i.d Data: {Xij , 1 ≤ i ≤ nj } ∼ pj (x)µ(dx), 1 ≤ j ≤ J. The sub-model has parameter space µ: pj (x) µ(dx), 1 ≤ j ≤ J, are equal (to 1). Γ logo Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 15 / 23
  • 72. Linear submodel: stratified sampling (Tan 2004) i.i.d Data: {Xij , 1 ≤ i ≤ nj } ∼ pj (x)µ(dx), 1 ≤ j ≤ J. The sub-model has parameter space µ: pj (x) µ(dx), 1 ≤ j ≤ J, are equal (to 1). Γ J nj Likelihood for µ: L(µ) = j=1 i=1 pj (Xij )µ(Xij ) logo Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 15 / 23
  • 73. Linear submodel: stratified sampling (Tan 2004) i.i.d Data: {Xij , 1 ≤ i ≤ nj } ∼ pj (x)µ(dx), 1 ≤ j ≤ J. The sub-model has parameter space µ: pj (x) µ(dx), 1 ≤ j ≤ J, are equal (to 1). Γ J nj Likelihood for µ: L(µ) = j=1 i=1 pj (Xij )µ(Xij ) The MLE is ˆ P(dx) µlin (dx) = ˆ J , j=1 πj pj (x) ˆ where πj s are MLEs from a mixture model: ˆ i.i.d J the data ∼ j=1 πj pj (·) with πj s unknown logo Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 15 / 23
  • 74. So why MLE? logo Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 16 / 23
  • 75. So why MLE? Goal: to estimate c = Γ q(x)µ(dx). logo Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 16 / 23
  • 76. So why MLE? Goal: to estimate c = Γ q(x)µ(dx). For an arbitrary vector b, consider the control-variate estimator (Owen and Zhou 2000) J nj q(xji ) − b g (xji ) cb ≡ ˆ J , j=1 i=1 s=1 ns ps (xji ) where g = (p2 − p1 , . . . , pJ − p1 ) . logo Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 16 / 23
  • 77. So why MLE? Goal: to estimate c = Γ q(x)µ(dx). For an arbitrary vector b, consider the control-variate estimator (Owen and Zhou 2000) J nj q(xji ) − b g (xji ) cb ≡ ˆ J , j=1 i=1 s=1 ns ps (xji ) where g = (p2 − p1 , . . . , pJ − p1 ) . A more general class: for J λj (x) ≡ 1 and J λj (x)bj (x) ≡ b, j=1 j=1 consider (Veach and Guibas 1995 for bj ≡ 0; Tan, 2004) J nj 1 q(xji ) − bj (xji )g (xji ) cλ,B = ˆ λj (xji ) nj pj (xji ) j=1 i=1 logo Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 16 / 23
  • 78. So why MLE? Goal: to estimate c = Γ q(x)µ(dx). For an arbitrary vector b, consider the control-variate estimator (Owen and Zhou 2000) J nj q(xji ) − b g (xji ) cb ≡ ˆ J , j=1 i=1 s=1 ns ps (xji ) where g = (p2 − p1 , . . . , pJ − p1 ) . A more general class: for J λj (x) ≡ 1 and J λj (x)bj (x) ≡ b, j=1 j=1 consider (Veach and Guibas 1995 for bj ≡ 0; Tan, 2004) J nj 1 q(xji ) − bj (xji )g (xji ) cλ,B = ˆ λj (xji ) nj pj (xji ) j=1 i=1 Should cλ,B be more efficient than cb ? Could there be something ˆ ˆ logo even more efficient? Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 16 / 23
  • 79. Three estimators for c = Γ q(x) µ(dx): logo Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 17 / 23
  • 80. Three estimators for c = Γ q(x) µ(dx): IS: 1 n q(xi ) J , n j=1 πj pj (xi ) i=1 where πj = nj /n are the true proportions. logo Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 17 / 23
  • 81. Three estimators for c = Γ q(x) µ(dx): IS: 1 n q(xi ) J , n j=1 πj pj (xi ) i=1 where πj = nj /n are the true proportions. Reg: n ˆ 1 q(xi ) − β g (xi ) J , n j=1 πj pj (xi ) i=1 ˆ where β is the estimated regression coefficient, ignoring stratification. logo Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 17 / 23
  • 82. Three estimators for c = Γ q(x) µ(dx): IS: 1 n q(xi ) J , n j=1 πj pj (xi ) i=1 where πj = nj /n are the true proportions. Reg: n ˆ 1 q(xi ) − β g (xi ) J , n j=1 πj pj (xi ) i=1 ˆ where β is the estimated regression coefficient, ignoring stratification. Lik: 1 n q(xi ) J , n j=1 πj pj (xi ) ˆ i=1 where πj s are the estimated proportions, ignoring stratification. ˆ logo Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 17 / 23
  • 83. Three estimators for c = Γ q(x) µ(dx): IS: 1 n q(xi ) J , n j=1 πj pj (xi ) i=1 where πj = nj /n are the true proportions. Reg: n ˆ 1 q(xi ) − β g (xi ) J , n j=1 πj pj (xi ) i=1 ˆ where β is the estimated regression coefficient, ignoring stratification. Lik: 1 n q(xi ) J , n j=1 πj pj (xi ) ˆ i=1 where πj s are the estimated proportions, ignoring stratification. ˆ logo Which one is most efficient? Least efficient? Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 17 / 23
  • 84. Let’s find it out ... logo Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 18 / 23
  • 85. Let’s find it out ... Γ = R10 and µ is Lebesgue measure. logo Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 18 / 23
  • 86. Let’s find it out ... Γ = R10 and µ is Lebesgue measure. The integrand is 10 10 q(x) = 0.8 φ(x j ) + 0.2 ψ(x j ; 4) , j=1 j=1 where φ(·) is standard normal density and ψ(·; 4) is t4 density. logo Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 18 / 23
  • 87. Let’s find it out ... Γ = R10 and µ is Lebesgue measure. The integrand is 10 10 q(x) = 0.8 φ(x j ) + 0.2 ψ(x j ; 4) , j=1 j=1 where φ(·) is standard normal density and ψ(·; 4) is t4 density. Two sampling designs: logo Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 18 / 23
  • 88. Let’s find it out ... Γ = R10 and µ is Lebesgue measure. The integrand is 10 10 q(x) = 0.8 φ(x j ) + 0.2 ψ(x j ; 4) , j=1 j=1 where φ(·) is standard normal density and ψ(·; 4) is t4 density. Two sampling designs: (i) q2 (x) with n draws, or logo Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 18 / 23
  • 89. Let’s find it out ... Γ = R10 and µ is Lebesgue measure. The integrand is 10 10 q(x) = 0.8 φ(x j ) + 0.2 ψ(x j ; 4) , j=1 j=1 where φ(·) is standard normal density and ψ(·; 4) is t4 density. Two sampling designs: (i) q2 (x) with n draws, or (ii) q1 (x) and q2 (x) each with n/2 draws, where 10 10 q1 (x) = φ(x j ), q2 (x) = ψ(x j ; 1) j=1 j=1 logo Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 18 / 23
  • 90. A little surprise? Table: Comparison of design and estimator one sampler two samplers IS Reg Lik IS Reg Lik Sqrt MSE .162 .00942 .00931 .0175 .00881 .00881 Std Err .162 .00919 .00920 .0174 .00885 .00884 √ Note: Sqrt MSE is mean squared error of the point estimates and √ Std Err is mean of the variance estimates from 10000 repeated simulations of size n = 500. logo Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 19 / 23
  • 91. Comparison of efficiency: logo Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 20 / 23
  • 92. Comparison of efficiency: Statistical efficiency: IS < Reg ≈ Lik logo Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 20 / 23
  • 93. Comparison of efficiency: Statistical efficiency: IS < Reg ≈ Lik IS is a stratified estimator, which uses only the labels. logo Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 20 / 23
  • 94. Comparison of efficiency: Statistical efficiency: IS < Reg ≈ Lik IS is a stratified estimator, which uses only the labels. Reg is conventional method of control variates. logo Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 20 / 23
  • 95. Comparison of efficiency: Statistical efficiency: IS < Reg ≈ Lik IS is a stratified estimator, which uses only the labels. Reg is conventional method of control variates. Lik is constrained MLE, which uses pj s but ignores the labels; it is exact if q = pj for any particular j. logo Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 20 / 23
  • 96. Building intuition ... logo Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 21 / 23
  • 97. Building intuition ... Suppose we make n = 2 draws, one from N(0, 1) and one from Cauchy (0, 1), hence π1 = π2 = 50%. logo Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 21 / 23
  • 98. Building intuition ... Suppose we make n = 2 draws, one from N(0, 1) and one from Cauchy (0, 1), hence π1 = π2 = 50%. Suppose the draws are {1, 1}, what would be the MLE (ˆ1 , π2 )? π ˆ logo Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 21 / 23
  • 99. Building intuition ... Suppose we make n = 2 draws, one from N(0, 1) and one from Cauchy (0, 1), hence π1 = π2 = 50%. Suppose the draws are {1, 1}, what would be the MLE (ˆ1 , π2 )? π ˆ Suppose the draws are {1, 3}, what would be the MLE (ˆ1 , π2 )? π ˆ logo Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 21 / 23
  • 100. Building intuition ... Suppose we make n = 2 draws, one from N(0, 1) and one from Cauchy (0, 1), hence π1 = π2 = 50%. Suppose the draws are {1, 1}, what would be the MLE (ˆ1 , π2 )? π ˆ Suppose the draws are {1, 3}, what would be the MLE (ˆ1 , π2 )? π ˆ Suppose the draws are {3, 3}, what would be the MLE (ˆ1 , π2 )? π ˆ logo Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 21 / 23
  • 101. What Did I Learn? logo Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 22 / 23
  • 102. What Did I Learn? Model what we ignore, not what we know! logo Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 22 / 23
  • 103. What Did I Learn? Model what we ignore, not what we know! Model comparison/selection is not about which model is true (as all of them are “true”), but which model represents a better compromise among human, computational, and statistical efficiency. logo Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 22 / 23
  • 104. What Did I Learn? Model what we ignore, not what we know! Model comparison/selection is not about which model is true (as all of them are “true”), but which model represents a better compromise among human, computational, and statistical efficiency. There is a cure for our “schizophrenia” — we now can analyze Monte Carlo data using the same sound statistical principles and methods for analyzing real data. logo Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 22 / 23
  • 105. If you are looking for theoretical research topics ... logo Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 23 / 23
  • 106. If you are looking for theoretical research topics ... RE-EXAM OLD ONES AND DERIVE NEW ONES! logo Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 23 / 23
  • 107. If you are looking for theoretical research topics ... RE-EXAM OLD ONES AND DERIVE NEW ONES! Prove it is MLE, or a good approximation to MLE. logo Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 23 / 23
  • 108. If you are looking for theoretical research topics ... RE-EXAM OLD ONES AND DERIVE NEW ONES! Prove it is MLE, or a good approximation to MLE. Or derive MLE or a cost-effective approximation to it. logo Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 23 / 23
  • 109. If you are looking for theoretical research topics ... RE-EXAM OLD ONES AND DERIVE NEW ONES! Prove it is MLE, or a good approximation to MLE. Or derive MLE or a cost-effective approximation to it. Markov chain Monte Carlo (Tan 2006, 2008) logo Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 23 / 23
  • 110. If you are looking for theoretical research topics ... RE-EXAM OLD ONES AND DERIVE NEW ONES! Prove it is MLE, or a good approximation to MLE. Or derive MLE or a cost-effective approximation to it. Markov chain Monte Carlo (Tan 2006, 2008) More ...... logo Xiao-Li Meng (Harvard) MCMC+likelihood September 24, 2011 23 / 23