Geert van Kollenburg-masterthesis

Comparison of Asymptotic, Bootstrap
and Posterior Predictive P-values in
Assessing Latent Class Model Fit
Geert van Kollenburg 647091∗
Abstract
Goodness-of-ﬁt testing in Latent Class analysis can result in unreliable
asymptotic p-value when the reference distributions are unknown or
when the contingency tables become sparse. For instance, it has been
shown that the asymptotic p-value belonging to the likelihood ratio
statistic becomes untrustworthy in sparse data. A number of solutions
to this problem have risen in the form of resampling techniques. The
parametric bootstrap uses the maximum likelihood estimates as popu-
lation parameters to sample new datasets to see whether the observed
statistics are likely to occur under the proposed model. The posterior
predictive check is the Bayesian alternative for a p-value and is simi-
lar to the bootstrap, but controls for uncertainty about the parameter
values by drawing samples from the posterior predictive distribution.
The purpose of this thesis is to compare the asymptotic, bootstrap
and posterior predictive p-values in assessing the model-ﬁt of latent
class models when sample size is large and when it is small.
Key words: Latent Class Analysis, Goodness-of-Fit, Bayes Theorem,
Parametric Bootstrap, posterior predictive check.
∗
Department of Methodology and Statistics, Tilburg University, the Netherlands.
1

1 Introduction
To test the fit of a latent class (LC) model to a dataset, there exist overall
goodness-of-fit tests, which measure the discrepancy between observed fre-
quencies and those expected under the proposed model for all cells in a cor-
responding contingency table (e.g. the likelihood ratio L2
(Vermunt, 2010))
Also bivariate, or higher order, measures can be estimates, which assess the
remaining association between two or more items in a dataset after a LC
model has been fitted. For instance, the bivariate residual (BVR)(Vermunt
& Magidson, 2005) is an approximation of the score-test for the association
parameter between two items. Its value gives an indication of the estimated
increase in model fit if the association parameter would be included in the
model.
Significance testing may become troublesome if the distribution of a
statistic is unknown. For example, the score test follows asymptotically a
chi-squared distribution when the model is true (Bera & Bilias, 2001), but as
the BVR is an approximation, its distribution is at best only approximated
by the chi-squared distribution. This problem broadens when even more
complex measures (e.g. the sum of all BVRs) are being used. The quality
of the approximation of the reference distribution then depends on the qual-
ity of the approximations to the score-tests. Here, the BVR is an example
where there is some approximation possible, but the asymptotic distribu-
tion of other statistics may not be approximated well by other distributions
2

or it may be very hard to derive the asymptotic distributions of a statistic
analytically.
Even when a statistic follows a known distribution asymptotically (i.e.,
when the sample size goes to infinity), its use in performing significance
tests can become inappropriate sample sizes are not large and contingency
tables become sparse. Also as the number of items increases or the sample
size is small to moderate, the contingency tables may quickly become sparse
(i.e. many cells will have 0 or 1 entries). For instance, with 10 dichotomous
items there are already 210
= 1024 cells in the table and in these cases the
asymptotic distributions do not hold anymore and the associated p-values
become untrustworthy (Maydue-Olivares & Joe, 2006, Reiser & Lin, 1999;
Vermunt, 2010). In case of unknown, untrustworthy or incorrect distributions
it is necessary to calculate empirical reference distributions. According to
Formann(2003) this holds for overall goodnes-of-fit tests, residuals and other
statistics.
In order to determine empirical reference distributions resampling tech-
niques, like the parametric bootstrap by Collins et al. (1993; in Formann,
2003), have been proposed to solve the problem of untrustworthy asymptotic
p-values and unknown distributions. If one assumes that the data contain
information about the true values of the parameters of interest, it is possi-
ble to create a reference distribution to determine how likely an observation
is given the estimated parameters. The parametric bootstrap, for instance,
is implemented in the software package LatentGold (Vermunt & Magidson,
3

2005) and uses Monte Carlo simulations to approximate the empirical dis-
tribution of the goodness-of-fit statistics based on the maximum likelihood
(ML) estimates obtained from the data.
Instead of relying on the ML estimates, several authors propose using
Bayesian methods to assess model fit in LC analysis (Berkhof, Van Mechelen
& Gelman, 2003; Garrett & Zeger, 2000; Hoijtink, 1998). The Bayesian
method for obtaining a p-value is the Posterior Predictive Check (PPC),
which can be used in complex models where analytic solutions are tedious
to obtain. Instead of relying on ML estimates, this method uses random
draws for the unknown parameters from the posterior predictive distribution
to determine how likely an observed statistic is (Gelman, Carlin, Stern &
Rubin, 2004).
The purpose of this thesis is to investigate the PPC as an alternative
to asymptotic and bootstrap p-values in assessing model-fit of LC models.
Also a comparison will be made between all methods to check whether they
produce comparable results in large samples and whether the resampling
techniques are more adequate than the asymptotic p-value in small samples.
To investigate this, I use a number of commonly used fit statistics of which
the long-run behavior of the resulting p-values from the different methods
will be compared by a Monte Carlo simulation study. This will lead to
a direct comparison of the asymptotic, bootstrap and PPC p-values under
different conditions such as sample size. Importantly, it is assessed whether
the different p-values are uniformly distributed under the null-hypothesis,
4

and whether nominal Type-I error levels are correct for the given statistics.
I do not intend to discuss the use of cut-off scores in significance testing, but
rather apply the commonly used levels as a reference to the behavior of the
statistics under the different methods.
The outline of this thesis is as follows. Section 2 describes the LC model,
estimation of a LC model, and the fit statistics used in the study. Section 3
provides an overview of the used methods for obtaining p-values. Section 4
illustrates the simulation studies and gives the results. In Section 5 an em-
pirical dataset is analyzed to illustrate the techniques that result in p-values.
Finally, in Section 6 I discuss the findings and issues in need of further re-
search.
2 Latent Class Analysis
2.1 Defining the LC model
In the multivariate setting, let an N × J matrix Y , contain the responses
of N units (i.e. individuals) on J discrete variables with Rj, j = 1, . . . , J
categories. Let Yi = (Yi1, . . . , YiJ ) be row i, i = 1, . . . , N of Y , containing
the responses to the J variables. In total there are S = J
j=1 Rj possible
response patterns for Yi. Therefore, let Ys, s = 1, . . . , S denote a specific
pattern, and ns denote the observed pattern count. Finally let y(without
subscripts) denote an observed dataset.
The LC model assumes that the N = ns units can be partitioned into
5

C latent classes, which have their own probability density for the responses.
A unit’s unobservable class membership is represented by the latent variable
θ and a particular class is denoted by c, with c = 1, . . . , C. The idea is then
to ﬁnd a LC model with the lowest number of classes for which the responses
conditional on class membership are independent. This assumption is called
local independence and lies at the basis of LC analysis.
In a LC model P(Ys), the probability of observing pattern Ys, is assumed
to be is a weighted average of the class-speciﬁc probabilities, with weights πc
being the probability that an individual belongs to LC c (Vermunt, 2010).
So for each of the S patterns, the probability density is given by
P(Ys) =
C
c=1
πcP(Ys|θ = c), (1)
Assuming local independence,
P(Ys|θ = c) =
J
j=1
P(Ysj|θ = c). (2)
Using the notation of Vermunt (2010) to indicate the conditional item re-
sponse probability of a person in class c giving response r to item j as πjrc,
the conditional probability P(Ysj|θ = c) is then a multinomial probability
density given by
P(Ysj|θ = c) =
R
r=1
π
y∗
sjr
jrc , (3)
where y∗
sjr is 1 if Ysj = r and 0 otherwise.
6

Lastly, the probability that a person belongs to LC c, conditional on
having response Ys, called the posterior membership probability (Vermunt,
2010), is obtained using the Bayes rule:
πc|s =
P(Ys|θ = c)πc
P(Ys)
(4)
2.2 Estimating the LC Model
To obtain ML estimates for the LC model, typically the Expectation- Maximization
(EM) algorithm (Goodman, 1974) is used. The EM algorithm finds the ML
estimates by maximizing the log-likelihood function
log L = ns
S
s=1
log P(Ys). (5)
Because only non-zero response frequencies attribute to the likelihood, the
convention 0 log(0) = 0 is used throughout this thesis.The details of the EM
algorithm and the convention 0 log(0) = 0 are discussed in Appendix A.
Using the EM to obtain the ML estimates requires that starting values
are provided for the parameters in ψ = (πjrc, πc) , denoted as π
(0)
jrc and π
(0)
jrc.
Caution is advised that when the starting values are too similar, the model
can become unidentifiable. To solve this, it should be possible to order the
LCs by π
(0)
c or, for instance, π
(0)
1rc (Hoijtink, 1998). For further discussion
on the identifiability of LC models, including item/class ratios see Goodman
(1974). The EM algorithm goes as follows:
7

Step 0: Choose initial values for ψ(0)
and set t = 1.
Step 1: Expectation step
Given ψ(t−1)
, calculate πc|s(see Equation 4). Then multiply this with
ns to obtain n
(t)
sc , the estimated number of respondents in each class
having pattern s.
Step 2: Maximization step
Calculate
π(t)
c = mc/N =
S
s=1
nsc/N
and
π
(t)
jrc =
S
s=1
(n(t)
sc y∗
sjr)/mc,
where y∗
sjr is 1 if Ysj = r and 0 otherwise.
Step 3: Set t = t + 1 and repeat Steps 1 and 2 until the decrease in the log-
likelihood between two iterations is smaller than a given convergence
criteria (e.g., 10−8
).
Estimation of the model can also be done in a Bayesian context using a
Gibbs sampler (e.g., Hoijtink, 1998). The Gibbs sampler is similar to the
EM procedure, but relies on sampling distributions at each step (Ligtvoet
& Vermunt, 2011) and results in an estimated (posterior) distribution of
the parameters rather then stationary estimates for ψ. The Gibbs sampler
proceeds as follows:
8

Step 0: Choose initial values for ψ(0)
and set d = 1.
Step 1: Data augmentation
Given ψ(d−1)
, calculate πc|s(see Equation 4). Then, every subject with
a particular pattern is assigned to a LC by drawing from a multino-
mial distribution with probabilities from πc|s. This results in both the
class sizes m
(d)
c and n
(d)
jrc, the number of respondents from class c with
response r to item j.
Step 2: Draw a sample from the posteriors for
π(d)
c ∼ Dir(m
(d)
1 + αc, . . . , m
(d)
C + αc)
and
(π
(d)
j1c, . . . , π
(d)
jRc) ∼ Dir(n
(d)
j1c + αjrc, . . . , n
(d)
jRc + αjrc),
where αc = 1/C and αjrc = 1/Rj (see Appendix B)
Step 3: Set d = d+1 and repeat Steps 1 and 2 until convergence (Section 3.3
describes a method for assessing the convergence of the sampler. For
more Bayesian convergence criteria see e.g. Brooks and Gelman, 1998).
After convergence, repeat Step 1 and 2 L times and keep the sampled
values to estimate the posterior distribution of the parameters.
In the simulation study (see Section 4), I use the population parameter values
as starting points and use a burn-in of 100 iterations before I start sampling.
9

This way the method is likely to start close to the parameter values and the
posterior is properly estimated. When the population values were not useful
(e.g., when the analysis only had 1 LC), I used the ML estimates that were
obtained from the EM as starting values.
2.3 Model-fit test statistics
Three test-statistics are used to assess model fit. These fit statistics are indi-
cators of the local dependencies given class membership. Let es = P(Ys)N
denote the expected pattern frequencies under the fitted LC model given the
(estimated) values of ψ (from which P(Ys) is calculated). The likelihood
ratio statistic L2
and the overall Pearson chi-squared test statistic X2
are
then:
L2
= 2
S
s=1
ns ln
ns
es
,
X2
=
S
s=1
(ns − es)2
es
Thirdly also the bivariate residual (BVR) is used, which measures re-
maining local dependencies between two items. The BVRs are X2
values
computed for pairs of variables (Vermunt & Magidson, 2005). So for items j
and j
BV Rjj =
Rj
r=1
R
j
r =1
(nrr − err )2
err
.
To investigate the BVR statistics based on a number of random samples
10

I assume that all BVRs behave the same and will therefore only need to
analyze the BVR of items 1 and 2.
The L2
, X2
and BV R are all in the form of less-is-better and can be seen
as indicators of badness-of-fit. In the next section I will describe how these
statistics can be used to perform significance tests for goodness-of-fit. The
significance tests are based on p-values, which indicate how likely the value
of an observed statistic is, given certain assumptions about the population
parameters and/or the data. The methods differ from each other in the
assumptions about the population parameters and in the estimation process.
First I will describe how to obtain a p-value using a asymptotic reference
distribution, then by means of parametric bootstrap and finally by means of
two PPCs.
3 Estimating p-values
3.1 Asymptotic reference distribution
In the frequentist framework, the p-value is the theoretical probability of
finding a test statistic that is more extreme than the one actually observed,
under the null-hypothesis H0 (Hogg & Tanis, 2010). In testing a LC model
with C classes, we base the p-value on the assumption that this model is true.
The p-value associated with an observed test statistic Tobs is the probability
that a value for T is at least as extreme as Tobs, given the C-class model is
true.
11

In testing model fit I am only interested in the probability of worse fit.
This is indicated by larger values for T so the asymptotic p-value can be
defined as
pa = Pr(T ≥ Tobs|H0), (6)
where the conditioning upon H0 can means that the posited model is assumed
to be true or that ψ = ψ0, the values postulated in H0(Gelman et al., 2004;,
Meng, 1994)). To obtain this p-value one calculates the area beyond the value
of Tobs in a reference distribution, with a specified degrees of freedom (df).
In an unrestricted LC model the L2
and X2
statistics under H0 are assumed
to asymptotically follow a chi-squared distribution (χ2
df ) with df given by
df =
J
j=1
Rj − C[1 +
J
j=1
(Rj − 1)]. (7)
As noted before, the BVR does not have a direct reference distribution
since it is an approximation of the score-test which follows a chi-squared
distribution. In the coming simulation only binary variables are used and
the BVR will then approximate the score-test for a 2 × 2 contingency table.
Because the score-test is known to asymptotically follow a (Rj −1)×(Rj −1) =
1 df chi-squared distribution in this case, I will assume that the BVR can be
approximated by the same asymptotic distribution and check the validity of
this assumption.
Issues concerning pa values
Besides misconceptions and malpractices concerning p-values (see Sterne
12

& Smith, 2001 for a clear evaluation), also statistical problems arise with the
use of (asymptotic) reference distributions. One problem with the asymptotic
p-value is that if it is unknown what distribution a statistic follows, the use of
an incorrect reference distribution can result in inaccurate p-values. Another
problem is that, by deﬁnition, an asymptotic p-value is not exact because
sample sizes are always ﬁnite. And although results might be trustworthy in
very large samples, even moderate sample sizes can lead to inaccurate results.
When the number of items in the data becomes large, the observed pat-
tern frequencies in the contingency tables quickly become very sparse and
one needs very large sample sizes to control for this. In sparse tables statis-
tics like the L2
cannot be approximated well. And even though pa can still be
calculated, its values can no longer be trusted (Magidson & Vermunt, 2004;
Maydue-Olivares & Joe, 2006; Reiser & Lin, 1999; Vermunt, 2010). Other
methods have to be used in order to get more reliable and accurate p-values
in situations where these issues occur.
Because of these and other problems associated with pa-values, other
methods have been proposed to obtain p-values, which do not rely on asymp-
totic theory, but are based on resampling techniques. These techniques gen-
erate a large number of random replicate samples from a set of (estimated)
population parameter values. For each of these datasets yrep
it is possible
to calculate the statistics of interest and determine the probability that a
statistic Trepis larger than the one observed. This is done by estimating the
proportion of Trepthat were more extreme than Tobs, given the estimation of
13

the parameters. For the LC model, I will compare resampling techniques
from the frequentist (bootstrap) and from the Bayesian framework (PPC).
3.2 Parametric Bootstrap Method
The parametric bootstrap can be used to estimate the distribution of the
statistics for which the distribution is unknown, either due to the limited
sample size or to inapproximability. If we use the ML estimates from the
observed data as population values, it is possible to estimate the probability
that Trep ≥ Tobs, given that the estimates are true (Langeheine, Pannekoek
& Van de Pol, 1996). The bootstrap p-value is then given by:
pb = Pr[(Trep ≥ Tobs)| ˆψ, H0]. (8)
The bootstrap method proceeds as follows:
Step 1. Assume that the model (H0) is true.
Step 2. Treat the ML estimates from the observed data under H0 as popu-
lation parameters.
Step 3. Draw B random replicate samples yrep,b
, b = 1, . . . , B of size N
based on these population parameter estimates
Step 4. Estimate the LC model for each dataset using the EM algorithm
and calculate Tb
repfrom the ML estimates ˆψ
b
.
14

The proportion
B−1
B
b=1
I(Tb
rep≥Tobs)
,
(where the indicator function I equals 1 if Tb
rep ≥ Tobs and 0 otherwise) is
taken to be the estimate of pb. In words, pb is (estimated by) the proportion
of samples in which the value Tb
rep is greater than of equal to Tobs.
3.3 Posterior Predictive Check
The PPC is the Bayesian counterpart of the classical statistical tests (Meng,
1994). Given that H0 is true and that the observed data came from the
population of interest, the posterior predictive (PP) p-value is given by:
pp = Pr[(Tl
rep ≥ Tobs)|y, H0]. (9)
In the Bayesian framework one is not particularly interested in the probability
that observed data have come from a population with parameters posited in
the null-hypothesis (as in the frequentist framework), but rather in what the
probability is that parameters have certain values given that the observed
data indeed came from that population (Gelman et al., 2004).
As a result of this philosophy the major diﬀerence with the bootstrap is
that the PPC is based on the posterior distribution P(ψ|y) of the unknown
parameters (rather than on a point estimate like ˆψ) and on the predictive
distribution for the replicated data P(yrep
|ψ). In its general form, the prob-
15

ability in Equation 9 is taken over the joint distribution P(ψ, yrep
|y) so that
pp = I(Tl
rep≥Tobs)
P(yrep
|ψ)P(ψ|y)dyrep
dψ, (10)
where I equals 1 if Tl
rep ≥ Tobs for all possible values of Tobs(Gelman et
al., 2004). Appendix B shows how the posterior and PP distribution are
obtained.
In practice, the PP distribution P(yrep
|ψ) is usually estimated through
simulations and the pp-value is then estimated based on these draws. In
principle the PPC is done like this:
Step 1. Assume that the model is true.
Step 2. Draw L samples from the PP distribution to obtain ψl
and yrep,l
,
l = 1, . . . , L.
Step 3. Estimate the LC model under H0 on each dataset yrep,l
and calculate
the statistic Tl
rep.
So Tl
rep is obtained by estimating the model under H0 using the EM algo-
rithm. For each replication the ML estimates ˆψ
l
are used to calculate Tl
rep
and the proportion
L−1
L
l=1
I(Tl
rep≥Tobs)
,
(where the indicator function I equals 1 if Tl
rep ≥ Tobs and 0 otherwise) is
taken to be the estimate of pp.
16

In more complex models (like the LC model) however, it may not be
possible to obtain the PP distribution in Step 2 analytically. The solution
involves splitting up Step 2 and using an iterative sampling procedure:
Step 2a. Draw a sample from the posterior distribution ψl
∼ P(ψ|y).
Step 2b. Generate a replicate dataset yrep,l
∼ P(yrep
|ψl
).
Step 2c. Repeat Steps 2a and 2b to obtain L replicated datasets.
But, as shown in Appendix B, the posterior distribution for the LC model
again does not have a convenient form to sample directly from. Fortunately
the Gibbs sampler, as discussed in Section 2.2, can be used to obtain the re-
quired posterior draws ψl
(Rubin & Stern, 1994). At convergence, the draws
in a Gibbs sampler iteration are actually samples from the posterior P(ψ|y)
and as a result the L iterations result in an approximation of the posterior
distribution. Performing Step 2b results in draws from the predictive distri-
bution. The joint draws from the posterior distribution and the predictive
distribution can together be seen as a single draw from the PP distribution
Figure 1 in Appendix C is a graphical representation of the PPC. The
upper plot is a trace plot and depicts the values of the Trep = X2
rep statistic
during the L = 500 replications for the empirical example described in Sec-
tion 5 where N = 94 and C = 2. If the plot shows any long-term trends,
this is an indication that successive draws are highly correlated and that the
method has not converged. The values should move freely around in the
value space, without getting stuck in a local region (King et al., 2011). The
17

bottom plot shows a smoothed density of the replicated values. The horizon-
tal and vertical dashed lines indicates the observed value X2
obs = 67.993 and
the proportion of values above or beyond that line (.554) is the estimate for
pp.
PPC using discrepancy variables
The formulation of the PP p-value has been extended by Gelman et
al.(2004) by using, instead of a statistic T, a discrepancy variable D(ψ)
which depends on the data as well as the parameters. For each draw from
the posterior Dobs(ψl
)is calculated as the discrepancy between ˆψand ψl
and
Drep(ψl
)is calculated as the discrepancy between ˆψ
l
and ψl
.
The p-value for the discrepancy measure is given by:
pd = Pr[Drep(ψ) ≥ Dobs(ψ)|y, H0].
Goodness-of-ﬁt measures like L2
can be used as discrepancy variables be-
cause the predicted pattern frequencies are functions of the parameters in
ψ. For instance, the expected frequencies for the L2
are calculated as
el
s = P(Ys|ψl
)N. The discrepancy p-value is estimated by taking the L
sampled draws, computing the predicted pattern frequencies el
s directly from
ψl
, and computing Dobs(ψl
)and Drep(ψl
)based on these predicted frequen-
cies. In this method on obtains L ’observed’ discrepancies Dobs(ψl
)and L
18

replicated discrepancies Drep(ψl
). The pd is estimated by
L−1
L
l=1
I(Drep(ψl
) ≥ Dobs(ψl
)).
The PPC using discrepancy variables was used in LC analysis by Berkhof,
van Mechelen and Gelman (2003) and Meulders et al. (2002), who indicate
that this procedure tends to be conservative. Conservativeness, however, is
not the only issue with the pd-value. Hjort, Dahl & Steinbakk (2006) showed
that the distribution of pd under H0 is far from uniform and have indicated
that its values need to be adjusted in order to make results interpretable.
Hjort et al. investigated the behavior of pd in a number of models, but
not the LC model. In order to test the appropriateness of the method it is
important to investigate the behavior of pd in the current setting as well, and
the method is therefore included in this study.
4 Simulation study
To compare the methods described above, the behaviors of the p-values under
diﬀerent situations need to be assessed. In situations where H0 is true, the
p-values from the ﬁt-statistics described in Section 2.3 should be uniformly
distributed (Sackrowitz & Samuel-Cahn, 1999). Deviations from uniformity
could indicate that the used reference distribution or method is incorrect.
The uniformity of the p-values will therefore be used to assess applicability
19

of the methods in different situations.
To investigate the behavior of the proposed p-values I generated data for
J = 6 dichotomous items (Rj = 2, for all j). The population class sizes
and conditional response probabilities used throughout the simulations can
be found in Table 1. To test the behavior of the p-values under H0 in large
Table 1: Population values for the simulation studies
c = 1 c = 2
πc) 0.5 0.5
πj1c 0.8 0.2
πj2c 0.2 0.8
samples I generated 500 datasets with N = 1000. In large samples the p-
values ought to behave approximately equivalently. Since one of the reasons
for using resampling techniques was usage in small samples and spare tables
I generated the same number of datasets with N = 100. On all datasets a
2-class LC model was fitted using the EM algorithm. At convergence the
asymptotic p-values were calculated for the L2
and X2
based on the χ2
50
distribution and for the BV R12 using the χ2
1 distribution. To obtain the
pb-value the bootstrap with B = 100 was performed and similarly the pp and
pd were calculated based on L = 100 PP samples. In total, the LC model
had to be fitted to 200,000 additional datasets.
To test the behavior of the p-values under a misspecified model and to
perform a power test, again 500 datasets with N = 1000 and 500 datasets
with N = 100 were generated from a 2-class population, but each of these
datasets was analyzed using a 1-class LC model. I then calculated the pa-
20

values (with df = 57 for the L2
and X2
) and obtained the pb, pp and pd-values
based on B = L = 100.
To check whether the p-values are uniformly distributed under H0 I per-
formed two numerical checks and a graphical check to substantiate the find-
ings. If a p-value is uniformly distributed its expected value E(p) = .5 and
P(p < .05) = .05 (i.e., in 5% of the cases the p-value is less than .05).
I use the convenient significance level of .05 (Fisher, 1954) as upper-limit
for rejecting the null-hypothesis. If there are considerable deviations from
the indicators of uniformity, the used method might be inappropriate or in-
correctly specified. The graphical checks are shown as the distributions of
the p-values, smoothed using splines to approximate the log-densities (see
Stone, et al., 1997). These graphical checks can be used directly to see any
deviations from uniformity anywhere. Please note that sharp increases in
density at the very boundaries (at approximately < .02, > .98) are due to
the estimation procedure rather than implying practical malbehavior of the
p-value.
Results
Figure 2 in Appendix C and Table 2 provide the results of the p-values
under H0 for sample size of N = 1000. Figure 3 in Appendix C and Table 3
provide the results under H0 for N = 100. The densities of the pa-values
are depicted as solid lines, the pb-values as dashed lines, the pp-values as
dash-dotted lines and the pd-values are shown as dotted lines. Also included
is a reference line indicating a truly uniform distribution as a reference. The
21

tables can be used as a summary of the figures and include two checks for nor-
mality; the expected p-values E(p) and P(p < .05) for the different goodness-
of-fit statistics. Not only can these proportions be used as an indication of
systematic deviations from uniformity but may also be helpful if only Type-I
error (false rejection of the null-hypothesis) rates are the issue of concern.
The results show that with a sample size of N = 1000, under H0, the chi-
squared reference distribution used for the pa-values is not an exact reference
to the L2
statistic. Using the χ2
50 distribution resulted in too liberal results,
since the Type-1 error rate was .094 (almost twice high as expected under
H0). Also the expected values is much lower than .5. From Figure 2 it is
clear that the density becomes larger as pa comes closer to 0, indicating too
many small p-values. Although this may be due to sampling fluctuations
given the limited number of simulations, it is worth mentioning that within
the same analyses the pa-value for the X2
statistic shows this behavior much
less. To illustrate, there were 81 analyses where the pa-value for L2
was less
than .10 (where there should be only 50). In those analyses the X2
had
p-values less than .10 in only 57 cases. Inspection of the pa-values for BV R12
clearly indicates that BVR does not follow a χ2
1 distribution. The density of
p-values becomes larger in a linear fashion as the values of pa increase.
Conversely, from Table 2 it can be seen that in the large sample case the
pb and pp-values only seem to provide somewhat too liberal results, having
slightly too many values smaller than .05. Other than that these p-values
show very good approximations to the uniform distribution. In the current
22

Table 2: Uniformity measures of p-values
E(p) Pr(p < .05)
pa pb pp pd pa pb pp pd
L2
.4388 .4945 .4946 .8449 .094 .062 .064 .002
X2
.4917 .4918 .4923 .8496 .060 .068 .064 .000
BV R12 .6706 .5065 .5072 .7667 .000 .046 .046 .000
N = 1000, MC simulations = 500, bootstrap/PPC replications = 100
setting, with large sample size, pb and pp clearly outperform the asymptotic
p-value for both the L2
and BVR, but this is perhaps more likely due the
specification of the asymptotic reference distribution than to the quality of
the methods in the large sample case since the methods have very similar
behaviors for the X2
statistic.
As expected, the most ’problematic’ results came from the PPC using
discrepancy variables, which is very clearly not adequate for testing model-
fit using any of the goodness-of-fit statistics. In line with the findings of Hjort
et al. (2006) the pd is distributed far from uniform in the LC goodness-of-fit
setting. Figure 2 shows that for the L2
and X2
the density increases as pd
gets larger and peaks at 1. For the BVR statistic it peaks at around .78,
with a range of [0.54, 0.93]. In only 1 dataset (the value .002 in Table 2) a
pd-value was found that was less than .05.
From Table 3 it can be seen that in sparser datasets the expected values of
pb and pp are somewhat higher than pa for the L2
statistic (perhaps still due
to the asymptotic reference distribution), about equal for X2
and lower for
the BVR (although rather trivial since the reference distribution was clearly
23

Table 3: Uniformity measures of p-values
E(p) Pr(p < .05)
pa pb pp pd pa pb pp pd
L2
.4019 .4354 .4352 .8854 .016 .040 .034 .000
X2
.5224 .5200 .5114 .8535 .028 .024 .018 .000
BV R12 .6758 .5088 .5136 .7607 .004 .040 .038 .000
N = 100, MC simulations = 500, bootstrap/PPC replications = 100
inadequate). Also in sparser tables the pd has much higher values than the
other measures, except for pa of the BVR (again probably due to incorrect
reference). All methods tend to be conservative in that too few p-values were
less than .05, even when the expected values are lower than expected. From
Figure 3 it can be seen that the distribution of the pa-value under H0 with a
small sample size is far from uniform for the L2
statistic. Interestingly this
behavior is mimicked by the pb and pp. Although the behavior is similar,
the pb and pp are distributed more ﬂatly for all statistics, with the bootstrap
method resulting in the least peaked distribution.
Finally, in analyzing the 500 datasets of N = 1000 from a 2-class pop-
ulation with a 1 class model, the probability of correctly rejecting the null-
hypothesis (i.e., the power) was 1, using any of the statistics. That is, all
pa-values were less than 10−19
for the BVR, and less than 10−161
for the L2
and less than 10−291
for the X2
statistic. All other p-values were always equal
to 0. In the 500 smaller samples, all p-values resulted in a power of 1 for the
L2
and X2
.
Although the power for the BVR was 1 in the previous simulation, it is
24

not a very good measure to determine model-misfit if analyzed solely as it
is based only on the two-item relationships. That is to say, if one BVR does
not provide a small p-value, it does not indicate an that the whole model fits
well. This aspect is captured by the p-values in the small sample case. The
expected and maximum p-values, as well as power (indicated as P(p < .05),
the probability of a value less than .05) for all methods are provided in
Table 4. Also here, the pd provides very inadequate results if the values are
not processed (see Hjort et al., 2006).
Table 4: Power results for the BVR
pa pb pp pd
E(p) .001 0.010 0.009 0.146
P(p < .05) .964 .944 .952 .284
max(p) .565 0.60 0.55 0.84
5 Empirical example
To illustrate the usage of the proposed methods I have analyzed data which
were obtained by Galen and Gambino (1975, in Rindskopf, 2002) in a study
of 94 patients who suffered chest pains and were admitted to an emergency
room. Four indicators of myocardial infarction (MI) were scored either a 1
(present) or 0 (not-present); the patients’ heart-rhythm Q-waves (Q), high
low-density blood cholesterol levels (L), creatine phosphokinase levels (C) and
their clinical history (H). The response patterns and their observed frequen-
cies can be found in Table 5. Rindskopf indicated that the data are consistent
25

with a 2-class LC model, with df = 6, the L2
= 4.29 with pa = .64.
To obtain the 4 p-values for the different statistics, I used the χ2
6 reference
distribution for the L2
and X2
, and set B = L = 500 to obtain the resam-
pling p-values. Because the data is quite sparse, given the results from the
simulation study with N=100, I expected to find that the pb and pp would be
higher than pa for the L2
statistic, about equal for X2
and lower for the BVR
(due to the unknown reference distribution for the BVR). Also I expected pd
to be much higher than the other p-values but less so than pa for the BVR.
Table 5: Response pattern frequencies
Q L C H count Q L C H count
0 0 0 0 33 1 0 0 0 0
0 0 0 1 7 1 0 0 1 0
0 0 1 0 7 1 0 1 0 2
0 0 1 1 5 1 0 1 1 3
0 1 0 0 1 1 1 0 0 0
0 1 0 1 0 1 1 0 1 0
0 1 1 0 3 1 1 1 0 4
0 1 1 1 5 1 1 1 1 24
Table 6 provides the conditional response probabilities and group sizes
resulting from fitting the 2 LC model on the data (which are identical to
those reported by Rindskopf, 2002). The first class (likely to have had MI)
had high conditional probabilities for all indicators , the other group had low
conditional probabilities.
In Table 7 the estimated p-values from all methods are shown for the
2-class model for the three used statistics. As none of the p-values are small,
all p-values indicate that the 2-class model fits the data well. Against ex-
26

Table 6: ML parameter estimates of ψfor the MI data using a 2-Class model
MI no MI
πc 0.4578 0.5422
Q 0 0.2332 1.0000
Q 1 0.7668 0.0000
L 0 0.1721 0.9731
L 1 0.8279 0.0269
C 0 0.0000 0.8045
C 1 1.0000 0.1955
H 0 0.2086 0.8049
H 1 0.7914 0.1951
pectation the bootstrap resulted in much smaller p-values than the other
methods for the L2
and X2
. Although no p-value indicated lack of fit, there
are large differences in the actual values of the p-values.
Table 7: Results for the empirical example
p-value
pa pb pp pd
L2
= 4.292611 .637 .358 .606 .874
X2
= 4.22263 .647 .306 .554 .892
BV R12 = 0.1545949 .694 .230 .182 .652
df = 6, N = 94, B = L = 500
6 Discussion
In this thesis I compared different p-values in goodness-of fit testing of LC
models. The classical asymptotic p-value was compared to the p-values ob-
tained by means of parametric bootstrap and PPCs in large and small sam-
ples. The methods were discussed and the differences illustrated. Two prob-
27

lems that occur in using asymptotic p-values were discussed, firstly that they
cannot be trusted in small samples, and secondly that they are not useful
when it is unknown what distribution a statistic follows.
The results suggested that the χ2
df may not be a valid reference for the
L2
statistic in LC analysis, since it produced too liberal results in large
samples under H0. Also the BVR has been shown to clearly not follow an χ2
1
distribution. The pb and pp showed much better behavior than the asymptotic
p-value for both the L2
and BVR, although this might have been due to the
used asymptotic reference distribution, since the methods were comparable
for the X2
, for which also pa showed good behavior.
Whether the bootstrap or PPC are better methods for approximating a
p-value in the current setting is not clear-cut. The data for N = 100 were not
extremely sparse since the number of patterns with observed frequencies of 0
or 1 was not so large. But especially the L2
statistic showed very surprising
behaviors and needs to be investigated further.
More research should be done to investigate the distribution of the L2
and BVR statistics, which can be done by looking at the actual values of the
statistics rather than the p-values under the reference distribution.
Additionally, analysis of the empirical example showed that the p-values
can differ from each other quite severely within one dataset, even though
the expected values did not differ much. To find out more about the dif-
ference between the p-values within datasets, a comparison of the p-values
within each simulation could provide a better insight into the characteristics
28

of the data responsible for these differences. This may result in a clearer
understanding of when each of the methods can be used optimally.
Since the current research has focused on (overall) goodness-of-fit statis-
tics, an option for future research is to do a similar study to investigate
the applicability of resampling techniques to issues regarding LC model se-
lection/comparison. For instance, the PPC could provide a p-value for the
increase in fit when adding LCs or when including local dependencies.
This said, I have only considered rather simple LC models and future
research on this topic should include, for example, models with more LCs,
local dependencies or models which include covariates.
Note on computational time
Because for each dataset B = L = 100 bootstraps and PPCs are per-
formed to estimate pb, pp and pd, a total of 400,000 replicated datasets had
to be computed and analyzed using the EM algorithm, which can become
rather time consuming. For instance the analysis for N = 100 with 2 LCs
took over 20 hours to complete on a 32 bits, 2.61 GHz, 3.43 GB RAM com-
puter using the software package R (CRAN, 2012).
However, the individual analyses themselves do not take very long (a
couple of minutes per run). The assessment of the empirical data using all
techniques took only about 3 minutes with 500 bootstrap/PPC replications,
indicating the practical usefulness of the methods in obtaining p-values. Of
course the empirical dataset was not very large, but researchers should not
be inhibited to use these techniques in empirical research. The used soft-
29

and hardware (and the eﬃciency of the programming) can greatly diminish
the time needed to analyze a problem and, moreover, even waiting a day to
get reliable research results should be considered worthwhile.
30

References
Bera, A. K. & Bilias, Y. (2001). Rao’s score, Neyman’s C(a) and Silvey’s
LM tests: An essay on historical developments and some new results.
Journal of Statistical Planning and Inference, 97, 944.
Berkhof, J., Van Mechelen, I., & Gelman, A. (2003). A Bayesian approach
to the selection and testing of Mixture Models. Statistica Sinica, 13,
423 – 442.
Brooks, S. P. & Gelman, A. (1998). General Methods for Monitoring
Convergence of Iterative Simulations. Journal of Computational and
Graphical Statistics, 7(4), 434–455
Fisher, R. A. (1925). Statistical methods for research workers (chapter 3).
Retrieved May 2, 2012, from http://psychclassics.yorku.ca/Fisher/Methods/
Formann, A. K. (2003). Latent Class Model Diagnosis – a review and some
proposals. Computational Statistics & Data Analysis ,41, 548 – 559.
Galindo–Garre, F., & Vermunt, J.K, (2005). Testing log–linear models
with inequality constraints: a comparison of asymptotic, bootstrap,
and posterior predictive p values. Statistica Neerlandica, 59, 82–94.
Garrett, S. G., & Zeger, S. L. (2000). Latent Class Model Diagnosis. Bio-
metrics, 56, 1055–1067.
Gelman, A., Carlin, J., Stern, H. & Rubin D. (2004). Bayesian Data Anal-
ysis. 2nd edition. Boca Raton, FL: Chapman & Hall
Goodman, L.A. (1974). Exploratory latent structure analysis using both
identiﬁable and unidentiﬁable models. Biometrika, 61, 215–231.
Hjort, N. L., Dahl, F. A. & Steinbakk, G. H. (2006): Post–Processing Pos-
terior Predictive p Values. Journal of the American Statistical Associ-
ation, 101(475), 1157–1174.
Hogg, R. V. & Tanis, E. A. (2010). Probability and Statistical Inference.
8th edition. Upper Saddle River, NJ: Pearson Prentice Hall
31

Hoijtink, H. (1998). Constrained Latent Class Analysis using the Gibbs
Sampler and Posterior Predictive P–values: applications to educational
testing. Statistica Sinica, 8, 691–711.
King, M. D., Calamante, F., Clark, C. A. & Gadian, D. G. (2011). Markov
Chain Monte Carlo Random Effects Modeling in Magnetic Resonance
Image Processing Using the RBugs Interface to WinBUGS. Journal of
Statistical Software, 44(2), available online from http://www.jstatsoft.org/v44/i02
Langeheine, R., Pannekoek, J. & Van de Pol, F.(1996). Bootstrapping
Goodness–of–Fit Measures in Categorical Data Analysis. Sociological
Methods & Research, 24, 492–516.
Ligtvoet, R. & Vermunt, J.K. (2012). Latent class models for testing mono-
tonicity and invariant item ordering for polytomous items. British
Journal of Mathematical and Statistical Psychology, 65(2), 237–250.
Magidson, J., and Vermunt, J.K, (2004) Latent class models. in D. Kaplan
(ed.), The Sage Handbook of Quantitative Methodology for the Social
Sciences (pp. 175–198). Thousand Oaks, CA: Sage Publications, Inc.
Maydeu–Olivares, A. & Joe, H. (2006). Limited Goodness–of–Fit testing in
Multidimensional Contingency tables. Psychometrika, 71, 713–732.
Meulders, M., de Boeck, P., Kuppens, P. & Van Mechelen, I. (2002). Con-
strained Latent Class Analysis of Three-Way Three-Mode Data. Jour-
nal of Classification, 19, 277–302
Nylund, K. L., Asparouhov, T. & Muthn, B.O.(2007). Deciding on the
Number of Classes in Latent Class Analysis and Growth Mixture Mod-
eling: A Monte Carlo Simulation Study. Structural Equation Modeling:
A Multidisciplinary Journal, 14(4), 535–569.
Reiser, M., & Lin, Y. (1999). Goodness–of–fit test for the latent class model
when expected frequencies are small. M.Sobel and M.Becker (Eds.),
Sociological Methodology (pp. 81–111). Boston: Blackwell Publishers.
Rindskopf, D. (2002). The use of latent class analysis in medical diagnosis.
Proceedings of the Joint Meetings of the American Statistical Associa-
tion, 29122916.
32

Rubin, D. B., & Stern, H. S. (1994). Testing in latent class models using
a posterior predictive check distribution. In Von Eye, A. & Clogg,
C. C. (Eds.), Latent variables analysis: Applications for developmental
research (pp. 420–438). Thousand Oaks, CA: Sage Publications, Inc.
Sackrowitz, H. & Samuel–Cahn, E. (1999). P Values as Random Variables–
Expected P Values. The American Statistician, 53(4), 326–331.
Sterne, J. A. C. & Smith, G. D. (2001) Sifting the evidencewhat’s wrong
with signiﬁcance tests?. BMJ, 322, 226–231.
Stone, C. J., Hansen, M., Kooperberg, C. & Truong, Y. K. (1997). The
use of polynomial splines and their tensor products in extended linear
modeling (with discussion). Annals of Statistics, 25, 1371–1470.
Tanner, M. A. & Wong, H.W. (1984). The Calculation of Posterior Dis-
tributions by Data Augmentation. Journal of the American Statistical
Association, 82(398), 528–540
Vermunt, J. K. (2010). Latent Class Models. In P. Peterson, E. Baker, &
B. McGaw (Eds.), International Encyclopedia of Education (pp. 238–
244). Oxford: Elsevier
Vermunt, J.K., & Magidson, J. (2005). Technical Guide for Latent GOLD
4.0: Basic and Advanced. Belmont Massachusetts: Statistical Innova-
tions Inc.
33

A EM algorithm
The EM algorithm
Because the LC membership is unobservable, the (logarithm of the) like-
lihood is hard to estimate. The summation within the log makes separation
of the product terms unviable. It is possible, however, to use a sequential
algorithm if we give starting values for the missing data (i.e., the unobserved
class membership).
Combining Equations 1-3 to obtain the likelihood gives:
P(Ys) =
C
c=1
πc
J
j=1
R
r=1
π
y∗
sjr
jrc (11)
and taking the log gives the log-likelihood:
log P(Ys) = log
C
c=1
πc
J
j=1
R
r=1
π
y∗
sjr
jrc . (12)
With class membership unobservable, this expression is unsolvable. However,
if we impute values for the missing class membership (also called data aug-
mentation, e.g., Ligtvoet & Vermunt, 2011), the expression can be written
as:
log P(Ys) = ns
C
c=1
πc|s log πc
J
j=1
R
r=1
π
y∗
sjr
jrc .
Now, the EM algorithm consists of sequentially updating πc|s(providing πc)
34

and πjrc to maximize
log L =
S
s=1
log P(Ys).
The algorithm continues until the change in the log-likelihood between iter-
ation t and t + 1 is lower than a given convergence criterium. The values for
which this log-likelihood is maximized are the ML estimates.
Using the EM algorithm it can, however, occur that convergence is at-
tained at a local maximum. Often, to control for this, multiple starting sets
are used and the values for ψ resulting in the highest log-likelihood are taken
as ML estimates.
0 log(0) = 0 convention
In order to only let observed patterns contribute to the likelihood, I used
a convention that 0 log(0) = 0. This is needed because log(0) is undefined,
and multiplying log(0) with 0 will technically not result in 0. Following is
the justification of using the convention.
If I define the natural logarithm as log(x) =
x
1
1
t
dt, and need to find a
reasonable value for 0 log(0), I should take the limit as x approaches 0. Using
Hopital’s Rule one can show that although log(0) is undefined, the limit of
35

x log(x) as x approaches zero is:
lim
x→0
xlog(x) = lim
x→0
log(x)
x−1
= lim
x→0
x−1
−x−2
= lim
x→0
−x
= 0
36

B The Gibbs sampler (in LC analysis)
The Gibbs sampler can be used to estimate the LC model, as described in
Section 2.2, but also to perform the PPC (see Section 3.3 as a means of test-
ing model fit. The Bayesian model fit approach compares the goodness-of-fit
statistic Tobsto a reference distribution which is obtained by averaging the
distribution P(T|ψ) over the posterior P(ψ|y). When the posterior distri-
bution is not (or tediously) calculable analytically, one can use simulations to
estimate it. Here I show in detail how to obtain the posterior (and) predictive
distribution for ψand yrep
and perform the PPC.
The method goes as follows:
Step 1. Assume that the model is true.
Step 2a. Draw a sample from the posterior distribution ψl
∼ P(ψ|y).
Step 2b. Generate a replicate dataset yrep,l
∼ P(yrep
|ψl
).
Step 2c. Repeat Steps 2a and 2b to obtain L draws from the posterior
predictive distribution.
Step 3. Estimate the LC model under H0 on each dataset and calculate the
statistic Tl
rep.
Drawing the samples in Step 2 had to be split into 3 parts, which involve the
posterior distribution of the parameters in ψ, from which it is not straight-
forward to draws samples of the LC model parameters. The following text
37

discusses how to specify the posterior distribution and how to obtain samples
from it using the Gibbs sampler.
This discussion is about obtaining (draws from) the posterior distribution
P(ψ|y). Note that this applies to the Gibbs sampler both in the estimation
process as in the PPC.. The posterior distribution of ψ can be obtained
using the Bayes rule:
P(ψ|y) =
P(y|ψ)P(ψ)
P(y)
(13)
∝ P(y|ψ)P(ψ) (14)
The term P(y) is called the marginal likelihood or normalizing constant.
To draw samples from the posterior we can simply use Equation 14 because
the shape of the distribution is not inﬂuenced by multiplying/dividing by a
constant. However, as can be seen, one does need a prior distribution P(ψ)
for the parameters in ψ, which can be used to include prior knowledge (or
lack thereof) about the parameters of interest.
For each set of multinomial parameters (e.g., πjrc, r = 1, . . . , Rj) I have
used a Dirichlet prior distribution. For dichotomous variables (Rj = 2 for all
j), I could equivalently have used Beta distributions (Gelman et al., 2004),
but for the sake of generality, I show the use of the Dirichlet distribution here.
For example, the prior distribution of the conditional response probabilities
38

of a person in LC c = 1, . . . , C on item j = 1, . . . , J is given by:
P(πjrc, r = 1, . . . , R) =
Rj
q=1
αjqc !
Rj
q=1
αjqc!
Rj
r=1
π
αjrc−1
jrc (15)
∝
Rj
r=1
π
αjrc−1
jrc . (16)
It is commonplace to ignore the constant and only indicate the parts of the
distribution which involve the parameters (here, πjrc) and use the propor-
tionality property. The prior distribution for the class sizes is given by:
P(θc, c = 1, . . . , C) ∝
C
c=1
παc−1
c (17)
The values for the hyperparameters αjrc in absolute sense indicate the
strength of one’s prior belief about the probability of giving response r to item
j in class c, and the relative sizes of the hyperparameters indicate the relative
probabilities for the responses (Rubin & Stern, 1994). αc is used likewise for
the class-sizes. To indicate no prior knowledge about the items of LC sizes
I only use vague (diﬀuse) priors in the analysis where c αc = r αjrc = 1
(see Section 2.2.
The prior distribution of the entire set ψ is the product of the priors on
39

the elements in it:
p(ψ) =
C
c=1
παc−1
c
R1
r=1
πα1rc−1
1rc × · · · ×
RJ
r=1
παJrc−1
Jrc (18)
and the posterior is then obtained by combining this prior distribution with
the likelihood (Equation 11) of the LC model (Rubin & Stern, 1994):
P(ψ|y) ∝
S
s=1
C
c=1
P(Ys)
ns
P(ψ). (19)
As indicated earlier, this posterior distribution does not have a convenient
form to sample from. But, as it turns out, augmenting the data with esti-
mates for the unobserved LC memberships can make the model estimable.
As shown in Section 2.2, the Gibbs sampler can be used to estimate the
LC model in an iterative fashion, but it requires that unobserved indicators
for the LC memberships are used to augment the data. In this way it is
possible to obtain conditional distributions of the parameters given the LC
membership (Tanner & Wong, 1984). To illustrate, let Zsic = 1 if the ith ob-
servation in the sth cell of the contingency table (i = 1, . . . , ns, s = 1, . . . , S)
belongs to LC c and 0 otherwise. Then the joint distribution
P(ψ, Z, y) ∝
S
s=1
ns
i=1
C
c=1
P(Ys)Zsic
P(ψ). (20)
The distribution of ψconditional on Z and yis given by the product of inde-
pendent Dirichlet distributions with hyperparameters αjrc +njrc and αc +mc.
40

The conditional probability P(Z|ψ, y) is given by the Bernoulli distribution.
Using the Bayes rule, the probabilities that Zsic = 1 is obtained using Equa-
tion 4:
P(Zsic = 1|ψ, y) =
P(Ys|θ = c)πc
P(Ys)
. (21)
These conditional distributions are easy to sample from (see Section 2.2. The
Gibbs sampler described in this thesis does this iteratively, and at conver-
gence, the sampled values for Z and ψare draws from the joint posterior
distribution P(Z, ψ|y) (Rubin & Stern, 1994; Tanner & Wong, 1984). To
avoid correlations between the samples, one is advised not to use subsequent
draws, but, for instance, to retain only every 50th draw or so.
To obtain the replicate data yrep,l
in Step 2b as a draw from the predic-
tive distribution P(yrep
|psibf), we just need to draw N observations from a
multinomial distribution with probabilities P(Ys) estimated from ψl
.
41

C Figures
0 100 300 500
051525
Trace of replicated X2
Iteration
Trep(X2)
0 5 10 15
0.000.050.100.15
Density for replicated X2
Replicated X2 values
Density
Figure 1: Example of trace and density plot for the PPC in the empirical
data. The dashed lines indicate X2
obs = 4.223, pp = .554
42

0.0 0.2 0.4 0.6 0.8 1.0
0.00.51.01.52.0
L2
p−value
p−valuedensity
0.0 0.2 0.4 0.6 0.8 1.0
0.00.51.01.52.0
X2
p−value
p−valuedensity
0.0 0.2 0.4 0.6 0.8 1.0
0.00.51.01.52.0
BVR
p−value
p−valuedensity
Asympotic p−values
Bootstrap
PPC
Discrepancy
Uniform
Figure 2: P-value log-densities for the 2-Class model with N = 1000
43

0.0 0.2 0.4 0.6 0.8 1.0
0.00.51.01.52.0
L2
p−value
p−valuedensity
0.0 0.2 0.4 0.6 0.8 1.0
0.00.51.01.52.0
X2
p−value
p−valuedensity
0.0 0.2 0.4 0.6 0.8 1.0
0.00.51.01.52.0
BVR
p−value
p−valuedensity
Asympotic p−values
Bootstrap
PPC
Discrepancy
Uniform
Figure 3: P-value log-densities for the 2-Class model with N = 100
44

Geert van Kollenburg-masterthesis

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Geert van Kollenburg-masterthesis

Similar to Geert van Kollenburg-masterthesis (20)

Geert van Kollenburg-masterthesis