Your SlideShare is downloading. ×
Viva extented final
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Viva extented final

146
views

Published on

Published in: Technology, Sports

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
146
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. ATHENS UNIVERSITY OF ECONOMICS AND BUSINESS DEPARTMENT OF STATISTICS Efficient Bayesian Marginal Likelihood estimation in Generalised Linear Latent Variable Models thesis submitted by Silia Vitoratou advisors Ioannis Ntzoufras Irini Moustaki Athens, 2013
  • 2. Overview Thesis structure Chapter 1 Chapter 2 Chapter 3 Chapter 4 Chapter 5 Chapter 6 Chapter 7 2
  • 3. Chapter 1 Key ideas and origins of the latent variable models (LVM). “...co-relation must be the consequence of the variations of the two organs being partly due to common causes ...“ Francis Galton, 1888. • Suppose we want to infer for concepts that cannot be measured directly (such as emotions, attitudes, perceptions, proficiency etc). • We assume that they can be measured indirectly through other observed items. • The key idea is that all dependencies among p-manifest variables (observed items) are attributed to k-latent (unobserved) ones. • By principle, k << p. Hence, at the same time, the LVM methodology is a multivariate analysis technique which aims to reduce the dimensionality, with as little loss of information as possible. 3
  • 4. Chapter 1 A unified approach: Generalised linear latent variable models (GLLVM). Generalized linear latent variable model (GLLVM; Bartholomew &Knott, 1999; Skrondal and Rabe-Hesketh, 2004) . The models assumes that the response variables are linear combinations of the latent ones and it consists of three components: (a) the multivariate random component: where each observed item Yj, (j = 1, ..., p) has a distribution from the exponential family (Bernoulli, Multinomial, Normal, Gamma), (b) the systematic component: where the latent variables Zℓ, ℓ = 1, ..., k, produce the linear predictor ηj for each Yj (c) the link function : which connects the previous two components 4
  • 5. Chapter 1 A unified approach: Generalised linear latent variable models (GLLVM). Special case: Generalized linear latent trait model- with binary items (Moustaki &Knott, 2000) . The conditionals are in this case Bernoulli( ), where is the conditional probability of a positive response to the observed item. The logistic model is used for the response probabilities: • The item parameters are often referred to as the difficulty and the discrimination parameters (respectively) of the item j. All examples considered in this thesis refer to multivariate IRT (2-PL) models. Current findings apply directly or can be expanded to any type of GLLVM. 5
  • 6. Chapter 1 A unified approach: Generalised linear latent variable models (GLLVM). As only the p-items can be observed, any inference must be based on their joint distribution. All data dependencies are attributed to the existence of the latent variables. Hence, the observed variables are assumed independent given the latent (local independence assumption) : where is the prior distribution for the latent variables. A fully Bayesian approach requires that the item parameter vector is also stochastic, associated with a prior probability. 6
  • 7. Chapter 2 The fully Bayesian analogue: GLLTM with binary items A) Priors All model parameters are assumed a-priori independent where Prior from Ntzoufras et al. (2000) Fouskakis et al. (2009) leading to For unique solution we use the Cholesky decomposition on B: 7
  • 8. Chapter 2 The fully Bayesian analogue: GLLTM with binary items B) Sampling from the posterior • A Metropolis-within-Gibbs algorithm initially presented for IRT models by Patz and Junker (1996) was used here for the multivariate case (k>1). • Each item is updated in one block. So are the latent variables for each person. C) Model evaluation • In this thesis, the Bayes Factor (BF; Jeffreys, 1961; Kass and Raftery, 1995) was used for model comparison. • The BF is defined as the ratio of the posterior odds of two competing models (say m1 and m2) multiplied by their corresponding prior odds. Provided that the models have equal prior probabilities, is given by: that is, the ratio of the two models’ marginal or integrated likelihoods (hereafter Bayesian marginal likelihood; BML). 8
  • 9. Chapter 2 Estimating the Bayesian marginal likelihood The BML (also known as the prior predictive distribution) is defined as the expected model likelihood over the model parameters’ prior: that quite often is a high dimensional integral, not available in closed form. Monte Carlo integration is often used to estimate it, as for instance the arithmetic mean: This simple estimator does not really work adequately and a plethora of Markov chains Monte Carlo (MCMC) techniques are employed instead in the literature. 9
  • 10. Chapter 2 Estimating the Bayesian marginal likelihood  The point based estimators (PBE) employ the candidates’ identity (Besag, 1989), in a point of high density: • Laplace-Metropolis (LM; Lewis & Raftery, 1997) • Gaussian copula (GC; Nott et al, 2008) • Chib & Jeliazkov (CJ; Chib & Jeliazkov, 2001)  The bridge sampling estimators (BSE), employ a bridge function , based on the form of which, several BML identities can be derived (even pre– existing): • Harmonic mean (HM; Newton & Raftery, 1994) • Reciprocal mean (RM; Gelfand & Dey, 1994) • Bridge harmonic (BH; Meng & Wong, 1996) • Bridge geometric (BG; Meng & Wong, 1996)  The path sampling estimators (PSE), employ a continuous and differential path , to link two un-normalised densities and compute the ratio of the corresponding constants: • Power posteriors (PPT; Friel & Pettitt, 2008; Lartillot &Philippe, 2006) • Steppingstone (PPS ; Xie at al, 2011) • Generalised steppingstone (IPS; Fan et al, 2011) 10
  • 11. Chapter 3 The behavior of joint and marginal Monte Carlo estimators in multi-parameter latent variable models Monte Carlo integration: the case of GLLVM From the early readings the methods applied for the parameter estimation of model settings with latent variables relied on the joint likelihood Lord and Novick, 1968; Lord,1980 or the marginal likelihood Bock and Aitkin, 1981; Moustaki and Knott, 2000 Under the conditional independence assumptions of the GLLVMs, there are two equivalent formulations of the BML, which lead to different MC estimators, namely the joint BML and the marginal BML 11
  • 12. Chapter 3 The behavior of joint and marginal Monte Carlo estimators in multi-parameter latent variable models Monte Carlo integration: the case of GLLVM A motivating example A simulated data set with p = 6 items, N = 600 cases and k = 2 factors was considered. Three popular BSE were computed under both approaches (R= 50,000 posterior observations , after burn in period of 10,000 and thinning interval of 10). • BH: Largest error difference but rather close estimation... • BG: Largest difference in the estimation without large error difference... Differences are due to Monte Carlo integration, under independence assumptions 12
  • 13. Chapter 3 The behavior of joint and marginal Monte Carlo estimators in multi-parameter latent variable models Monte Carlo integration: the case of GLLVM The joint version of BH comes with much higher MCE than the RM... ...but is the joint version of RM that fails to converge to the true value. ? 13
  • 14. Chapter 3 The behavior of joint and marginal Monte Carlo estimators in multi-parameter latent variable models Monte Carlo integration under independence • Consider any integral of the form: • The corresponding MC estimator is: assuming a random sample of points drawn from h • The corresponding Monte Carlo Error (MCE) is: • Assume independence, that is, hence 14
  • 15. Chapter 3 The behavior of joint and marginal Monte Carlo estimators in multi-parameter latent variable models Monte Carlo integration under independence The two estimators are associated with different MCEs. Based on the early results of Goodman (1962), for the variance of N independent variables, the variances of the estimators are: for each term In finite settings, the difference can be outstanding 15
  • 16. Chapter 3 The behavior of joint and marginal Monte Carlo estimators in multi-parameter latent variable models Monte Carlo integration under independence In particular, the difference in the variances is given by Naturally, it depends on R. Note however that also it depends on • dimensionality (N), since more positive terms are added, and • on the means and variances of the N variables involved At the same time, the difference in the means is given by • Total covariation index (multivariate extension of the covariance). • Under independence the index should be zero (the reverse statement does not hold) • At the sample, the covariances, no matter how small, are non-zero leading to non zero TCI. •Depends also on the number of the variables (N), their means, and their variation through the covariances 16
  • 17. Chapter 3 The behavior of joint and marginal Monte Carlo estimators in multi-parameter latent variable models Monte Carlo integration: the case of GLLVM A motivating example-Revisited Different variables are being averaged, leading to different variance components Total covariance cancels out for the BH. 17
  • 18. Chapter 3 The behavior of joint and marginal Monte Carlo estimators in multi-parameter latent variable models Monte Carlo integration & independence Refer to Chapter 3 of the current thesis for: • more results on the error difference, • properties of the TCI, • extension to conditional independence, • and more illustrative examples. 18
  • 19. Chapter 4 Bayesian marginal likelihood estimation using the Metropolis kernel in multi-parameter latent variable models Basic idea Based on the work of Chib & Jeliazkov(2001), it is shown in Chapter 3 that the Metropolis kernel can be used to marginalise out any subset of the parameter vector, that otherwise would not be feasible. • Consider the kernel of the Metropolis – Hastings algorithm, which denotes the transition probability of sampling , given that has been already generated: Transition probability Acceptance probability Proposal density • Then, the latent vector can be marginalised out directly from the Metropolis kernel as follows: 19
  • 20. Chapter 4 Bayesian marginal likelihood estimation using the Metropolis kernel in multi-parameter latent variable models Chib & Jeliazkov estimator Let us suppose that the parameter space is divided into p blocks of parameters. Then, using the Law of total probability, the posterior at a specific point can be decomposed to • If analytically available use candidates’ (Besag, 1989) formula to compute the BML directly. • If the full conditionals are known, Chib (1995) uses the output from the Gibbs sampler to estimate them. • Otherwise Chib and Jeliazkov (2001) show that each posterior ordinate can be computed by Requires p sequential MCMC runs. 20
  • 21. Chapter 4 Bayesian marginal likelihood estimation using the Metropolis kernel in multi-parameter latent variable models Chib & Jeliazkov estimator for models with latent vectors The number of latent variables can be hundreds if not thousands. Hence the method is time consuming. Chib & Jeliazkov suggest to use the last ordinate to marginalise out the latent vector, provided that is analytically tractable (often it is not). In Chapter 4 of the thesis, it is shown that the latent vector can be marginalised out directly from the MH kernel, as follows: Hence the dimension of the latent vector is not an issue. This observation however leads to another result. Assuming local independence, prior independence and a Metropolis - within – Gibbs algorithm, as in the case of the GLLVM, the Chib & Jeliazkov identity is drastically simplified as follows: Hence the number of blocks , also, is not an issue. • The latent vector is marginalised out as previously. • Moreover, even there are p-blocks for the model parameters, only the full MCMC is required. • Can be used under data augmentations schemes that produce independence 21
  • 22. Chapter 4 Bayesian marginal likelihood estimation using the Metropolis kernel in multi-parameter latent variable models Independence Chib & Jeliazkov estimator Three simulated data sets – under different scenarios. Compare CJI with ML estimators. Rtotal 1st batch 30 batches 1000 2000 3000 iterations per batch 22
  • 23. Chapter 6 Implementation in simulated and real life datasets Some results •p =6 items, •N=600 individuals, •k=1 factor kmodel = ktrue 23
  • 24. Chapter 6 Implementation in simulated and real life datasets Some results •p =6 items, •N=600 individuals, •k=2 factors kmodel = ktrue 24
  • 25. Chapter 6 Implementation in simulated and real life datasets Some results •p =8 items, •N=700 individuals, •k=3 factor kmodel = ktrue 25
  • 26. Chapter 6 Implementation in simulated and real life datasets Some results •p =6 items, •N=600 individuals, •k=1 factor kmodel <ktrue 26
  • 27. Chapter 6 Implementation in simulated and real life datasets Some results •p =6 items, •N=600 individuals, •k=2 factors kmodel >ktrue 27
  • 28. Chapter 6 Implementation in simulated and real life datasets Concluding comments  Refer to Chapter 4 of the current thesis for more details on the implementation of the CJI (or see Vitoratou et al, 2013) : More comparisons are presented in Chapter 6 of the thesis, in simulated and real data sets. Some comments: • The harmonic mean failed in all cases. • The BSE were successful in all examples. o The BG estimator was consistently associated with the smallest error. o The RM was also well behaved in all cases. o The BH was associated with more error that the former two BSE. • The PBE are well behaved: o LM is very quick and efficient – but might fail if the posterior is not symmetrical. o Similarly for the GC. o CJI is well behaved but time consuming. Since it is distributional free, can be used as a benchmark method to get an idea of the BML. 28
  • 29. Chapter 5 Thermodynamic assessment of probability distribution divergencies and Bayesian model comparison Thermodynamics and Bayes Ideas initially implemented in thermodynamics are currently explored in Bayesian model evaluation. Assume two unnormalised densities (q1 and q0) and we are interested in the ratio of their normalising constants (λ). For that purpose we use a continuous and differential function of the form geometric path which links the endpoint densities temperature parameter Boltzmann-Gibbs distribution Partition function Then the ratio λ can be computed via the thermodynamic integration identity (TI): Bayes free energy 29
  • 30. Chapter 5 Thermodynamic assessment of probability distribution divergencies and Bayesian model comparison Thermodynamics and BML: Power Posteriors The first application of the TI to the problem of estimating the BML is the power posteriors (PP) method (Friel and Pettitt, 2008; Lartillot and Philippe, 2006). Let then prior-posterior path power posterior leading via the thermodynamic integration to the Bayesian marginal likelihood For ts close to 0 we sample from densities close to the prior, where the variability is typically high. 30
  • 31. Chapter 5 Thermodynamic assessment of probability distribution divergencies and Bayesian model comparison Thermodynamics and BML: Importance Posteriors Lefebvre et al. (2010) considered other options than the prior for the zero endpoint, keeping the unnormalised posterior at the unit endpoint. Any proper density g() will do: An appealing option is to use an importance (envelope) function, that is a density as close as possible to the posterior). importance-posterior path importance posterior For ts close to 0 we sample from densities close to the importance function, solving the problem of high variability. 31
  • 32. Chapter 5 Thermodynamic assessment of probability distribution divergencies and Bayesian model comparison An alternative approach: stepping-stone identities Xie et al (2011) using the prior and the posterior as endpoint densities, considered a different approach to compute the BMI, also related to thermodynamics (Neal, 1993). First, the interval [0,1] is partitioned into n points and the free energy can be computed as: Stepping stone • Under the power posteriors path, Xie et al (2011) showed that the BML occurs as: • Under the importance posteriors path, Fan et al (2011) showed that the BML occurs as: However, the stepping–stone identity (SI) is even more general and can be used under different paths, as an alternative to the TI: 32
  • 33. Chapter 5 Thermodynamic assessment of probability distribution divergencies and Bayesian model comparison Path sampling identities for the BML- revisited Hence, there are two general identities to compute a ratio of normalising constants, within the path sampling framework, namely Different paths lead to different expressions for the BML: Identity for the BML path TI SI Prior posterior Power posteriors (PPT) Stepping-stone (PPS) Importance posterior Importance posteriors (IPT) Generalised stepping stone (IPS) Friel and Pettitt, 2008 Lartillot and Philippe, 2006 inspired by Lefebvre et al. (2010) Xie et al (2011) Fan et al (2011) Other paths can be used, under both approaches, to derive identities for the BML or any other ratio of normalising constants. Hereafter, the identities with be named by the path employed, with a subscript denoting the method implemented, e.g. IPS 33
  • 34. Chapter 5 Thermodynamic assessment of probability distribution divergencies and Bayesian model comparison Thermodynamics & direct BF identities: Model switching Lartillot and Philippe (2006) considered as endpoint densities the unormalised posteriors of two competing models: leading to the model switching path leading via the thermodynamic integration to the Bayes Factor bidirectional melting-annealing sampling scheme. While it is easy to derive the SI counterpart expression: 34
  • 35. Chapter 5 Thermodynamic assessment of probability distribution divergencies and Bayesian model comparison Thermodynamics & direct BF identities: Quadrivials Based on the idea of Lartillot and Philippe (2006) we may proceed with the compound paths. which consist of • a hyper, geometric path which links two competing models, and • a nested, geometric path for each endpoint function Qi , i=0,1. The two intersecting paths form a quadrivial Which can be used either with the TI or the SI approach. If the ratio of interest is the BF, the two BMLs should be derived at the endpoints of [0,1]. The PP and the IP paths are natural choices for the nested part of the identity. For the latter 35
  • 36. Chapter 5 Thermodynamic assessment of probability distribution divergencies and Bayesian model comparison Sources of error in path sampling estimators a) The integral over [0,1] in the TI is typically approximated via numerical approaches, such as the trapezoidal or Simpson’s rule (Neal, 1993; Gelman and Meng, 1998), which require an n-point discretisation of [0,1]: Note that the temperature schedule is also required for the SI method (it defines the stepping stone ratios) . The discretisation introduces error to the TI and SI estimators, that is referred to as the discretisation error. It can be reduced by a) increasing the number of points n and/or b) by assigning more points closer to the endpoint that is associated higher variability. b) At each point the corresponding , a separate MCMC run is performed with target distribution . Hence, Monte Carlo error occurs also at each run. c) As a third source of error can be considered also the path-related error. We may gain insight into a) and c) by considering the measures of entropy related to the TI. 36
  • 37. Chapter 5 Thermodynamic assessment of probability distribution divergencies and Bayesian model comparison Performance: Pine data-a simple regression example Measurements taken on 42 specimens. A linear regression model was fitted for the specimen’s maximum compressive strength (y), using their density (x) as independent variable: The objective in this example is to illustrate how each method and path combination responds to prior uncertainty. To do so, we use three different prior schemes, namely: The ratios of the corresponding BMLs under the three priors were estimated over n1 = 50 and n2 = 100 evenly spaced temperatures. At each temperature, a Gibbs algorithm was implemented and 30,000 posterior observations were generated; after discarding 5,000 as a burn-in period. 37
  • 38. Chapter 5 Thermodynamic assessment of probability distribution divergencies and Bayesian model comparison Performance: Pine data-a simple regression example Implementing a uniform temperature schedule: Reflects difference in the path-related error Reflects difference in the discretisation error All quadrivals come with smaller batch mean error Note: PP works just fine under a geometric temperature schedule that samples more points from the prior. 38
  • 39. Chapter 5 Thermodynamic assessment of probability distribution divergencies and Bayesian model comparison Thermodynamic integration & distribution divergencies Based on the prior-posterior path, Friel and Pettitt (2008) and Lefebvre et al. (2010) showed that the PP method is connected with the Kullback – Leibler diveregence (KL; Kullback & Leibler, 1951). Relative entropy Differential entropy Cross entropy Here we present their findings on a general form, that is, for any geometric path according to the TI it holds that symmetrised KL 39
  • 40. Chapter 5 Thermodynamic assessment of probability distribution divergencies and Bayesian model comparison Thermodynamic integration & distribution divergencies Graphical representation of the TI What about the intermediate points? 40
  • 41. Chapter 5 Thermodynamic assessment of probability distribution divergencies and Bayesian model comparison Thermodynamic integration & distribution divergencies TI minus free energy at each point Instead of integrating the mean energy over the entire interval [0,1], there is an optimal temperature, where the mean energy equals the free energy. 41
  • 42. Chapter 5 Thermodynamic assessment of probability distribution divergencies and Bayesian model comparison Thermodynamic integration & distribution divergencies Graphical representation of the NTI functional KL difference in the KL-distance of the sampling distribution pt from p1 and p0 The ratio of interest occurs at the point where the sampling distribution is equidistant from the endpoint densities 42
  • 43. Chapter 5 Thermodynamic assessment of probability distribution divergencies and Bayesian model comparison Thermodynamic integration & distribution divergencies The normalised thermodynamic integral Hence: •According to the PPT method, the BML occurs at the point where the sampling distribution is equidistant from the prior and the posterior. •According to the QMST method, the BF occurs at the point where the sampling distribution is equidistant from the two posteriors. The sampling distribution pt is the Boltzmann-Gibbs distribution pertaining to the Hamiltonian (energy function) . Therefore •according to the NTI, when geometric paths are employed, the free energy occurs at the point where the Boltzmann-Gibbs distribution is equidistant from the distributions at the endpoint states. 43
  • 44. Chapter 5 Thermodynamic assessment of probability distribution divergencies and Bayesian model comparison Thermodynamic integration & distribution divergencies Graphical representation of the NTI What are the areas stand for? 44
  • 45. Chapter 5 Thermodynamic assessment of probability distribution divergencies and Bayesian model comparison Thermodynamic integration & distribution divergencies The normalised thermodynamic integral and probability distribution divergencies A key observation here is that the sampling distribution embodies the Chernoff coefficient (Chernoff, 1952) : Based on that, the NTI can be written as: meaning that and therefore, the areas correspond to the Chernoff t-divergence. At t=t*, we obtain the so-called Chernoff information: 45
  • 46. Chapter 5 Thermodynamic assessment of probability distribution divergencies and Bayesian model comparison Thermodynamic integration & distribution divergencies Using the output from path sampling, the Chernoff divergence can be computed easily (see Chapter 5 of the thesis for a step-by step algorithm). Along with the Chernoff estimation, a number of other f-divergencies can be directly estimated, namely • the Bhattacharyya distance (Bhattacharyya, 1943) at t = 0.5, • the Hellinger distance (Bhattacharyya, 1943; Hellinger, 1909), • the Rényi t-divergence (Rényi, 1961) and • the Tsallis t-relative entropy (Tsallis, 2001) . These measures of entropy are commonly used in • information theory, pattern recognition, cryptography, machine learning, • hypothesis testing • and recently, in non-equilibrium thermodynamics. 46
  • 47. Chapter 5 Thermodynamic assessment of probability distribution divergencies and Bayesian model comparison Thermodynamic integration & distribution divergencies Measures of entropy and the NTI 47
  • 48. Chapter 5 Thermodynamic assessment of probability distribution divergencies and Bayesian model comparison Path selection, temperature schedule and error. These results provide insight also on the error of the path sampling estimators. To begin with Lefebre et al (2010) have showed that the total variance is associated with the J−divergence of the endpoint densities and therefore with the choice of the path. Graphically • the J-distance coincides with the slope of the secant defined at the endpoint densities. The shape of the curve is a graphical representation of the total variance. • the slope of the tangent at a particular point ti, coincides with the local variance Higher local variances, at the points where the curve is steeper. • the graphical representation of two competing paths provides information about the estimators’ variances. Paths with smaller cliffs are easier to take! 48
  • 49. Chapter 5 Thermodynamic assessment of probability distribution divergencies and Bayesian model comparison Path selection, temperature schedule and error. Numerical approximation of the TI: Assign more tis at points where the curve is steeper (higher local variances) Different level of accuracy towards the two endpoints The discretization error depends primarily on the path 49
  • 50. Future work Currently developing a library in R for BML estimation in GLLTM with Danny Arends. Expand results (and R library) to account for other type of data. Further study on the TCI (Chapter 3). Use the ideas in Chapter 4 to construct a better Metropolis algorithm for GLLVMs. Proceed further on the ideas presented in Chapter 5, with regard to the quadrivials, the temperature schedule and the optimal t*. Explore applications to information criteria. 50
  • 51. Bibliography Bartholomew, D. and Knott, M. (1999). Latent variable models and factor analysis. Kendall’s Library of Statistics, 7. Wiley. Bhattacharyya, A. (1943). On a measure of divergence between two statistical populations defined by their probability distributions. Bulletin of the Calcutta Mathematical Society, 35:99–109. Besag, J. (1989). A candidate’s formula: A curious result in Bayesian prediction. Biometrika, 76:183. Bock, R. and Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika, 46:443–459. Chernoff, H. (1952). A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations. The Annals of Mathematical Statistics, 23(4). Chib, S. (1995). Marginal likelihood from the Gibbs output. Journal of the American Statistical Association, 90:1313–1321. Chib, S. and Jeliazkov, I. (2001). Marginal likelihood from the Metropolis-Hastings output. Journal of the American Statistical Association, 96:270–281. Fan, Y., Wu, R., Chen, M., Kuo, L., and Lewis, P. (2011). Choosing among partition models in Bayesian phylogenetics. Molecular Biology and Evolution, 28(2):523–532. Fouskakis, D., Ntzoufras, I., and Draper, D. (2009). Bayesian variable selection using cost-adjusted BIC, with application to cost-effective measurement of quality of healthcare. Annals of Applied Statistics, 3:663–690. Friel, N. and Pettitt, N. (2008). Marginal likelihood estimation via power posteriors. Journal of the Royal Statistical Society Series B (Statistical Methodology), 70(3):589–607. Gelfand, A. E. and Dey, D. K. (1994). Bayesian Model Choice: Asymptotics and exact calculations. Journal of the Royal Statistical Society. Series B (Methodological), 56(3):501–514. Gelman, A. and Meng, X. (1998). Simulating normalizing constants: from importance sampling to bridge sampling to path sampling. Statistical Science, 13(2):163–185. Goodman, L. A. (1962). The variance of the product of K random variables. Journal of the American Statistical Association, 57:54–60. Hellinger, E. (1909). Neue Begr¨undung der Theorie quadratischer Formen von unendlichvielen Veranderlichen. Journal fddotur die reine und angewandte Mathematik, 136:210–271. Jeffreys, H. (1946). An invariant form for the prior probability in estimation problems. Proceedings of the Royal Society of London. Series A, Mathematical and Physical Sciences, 186(1007):453–461. Kass, R. and Raftery, A. (1995). Bayes factors. Journal of the American Statistical Association, 90:773–795. Kullback, S. and Leibler, R. A. (1951). On information and sufficiency. Annals of Mathematical Statistics, 22:49–86. Lewis, S. and Raftery, A. (1997). Estimating Bayes factors via posterior simulation with the Laplace Metropolis estimator. Journal of the American Statistical Association, 92:648–655. Lartillot, N. and Philippe, H. (2006). Computing Bayes factors using Thermodynamic Integration. Systematic Biology, 55:195–207. Lefebvre, G., Steele, R., and Vandal, A. C. (2010). A path sampling identity for computing the Kullback-Leibler and J divergences. Computational Statistics and Data Analysis, 54(7):1719–1731. Lord, F. M. (1980). Applications of Item Response Theory to practical testing problems.Erlbaum Associates, Hillsdale, NJ. Lord, F. M. and Novick, M. R. (1968). Statistical theories of mental test scores. Addison-Wesley, Oxford, UK 51
  • 52. Meng, X.-L. and Wong, W.-H. (1996). Simulating ratios of normalizing constants via a simple identity: A theoretical exploration. Statistica Sinica, 6:831–860. Moustaki, I. and Knott, M. (2000). Generalized Latent Trait Models. Psychometrika, 65:391–411. Neal, R. M. (1993). Probabilistic inference using Markov chain Monte Carlo methods.Technical Report CRG-TR-93-1, University of Toronto. Newton, M. and Raftery, A. (1994). Approximate Bayesian inference with the weighted likelihood bootstrap. Journal of the Royal Statistical Society, 56:3–48. Nott, D., Kohn, R., and Fielding, M. (2008). Approximating the marginal likelihood using copula. arXiv:0810.5474v1. Available at http://arxiv.org/abs/0810.5474v1 Ntzoufras, I., Dellaportas, P., and Forster, J. (2000). Bayesian variable and link determination for Generalised Linear Models. Journal of Statistical Planning and Inference,111(1-2):165–180. Patz, R. J. and Junker, B. W. (1999b). A straightforward approach to Markov chain Monte Carlo methods for item response models. Journal of Educational and Behavioral Statistics, 24(2):146–178. Rabe-Hesketh, S., Skrondal, A., and Pickles, A. (2005). Maximum likelihood estimation of limited and discrete dependent variable models with nested random effects. Journal of Econometrics, 128:301–323. Raftery, A. and Banleld, J. (1991). Stopping the Gibbs sampler, the use of morphology, and other issues in spatial statistics. Annals of the Institute of Statistical Mathematics, 43(430):32–43. Rasch, G. (1960). Probabilistic Models for Some Intelligence and Attainment Tests. Paedagogiske Institut, Copenhagen. Renyi, A. (1961). On measures of entropy and information. In Proceedings of the 4 th Berkeley Symposium on Mathematics, Statistics and Probability, pages 547–561. Tsallis et al., Nonextensive Statistical Mechanics and Its Applications, edited by S.Abe and Y. Okamoto (Springer-Verlag, Heidelberg, 2001); see also the comprehensive list of references at http://tsallis.cat.cbpf.br/biblio.htm. Vitoratou, S., Ntzoufras, I., and Moustaki, I. (2013). Marginal likelihood estimation from the Metropolis output: tips and tricks for efficient implementation in generalized linear latent variable models. To appear in: Journal of Statistical Computation and Simulation. Xie, W., Lewis, P., Fan, Y., Kuo, L., and Chen, M. (2011). Improving marginal likelihood estimation for Bayesian phylogenetic model selection. Systematic Biology, 60(2):150–160. This thesis is dedicated to 52