1. 1/44
Bayesian Structural Equations Modeling
M’hamed (Hamy) Temkit1
1Division of Biostatistics
Mayo Clinic, Arizona
Applied Statistics Seminar, November 17, 2016
M’hamed (Hamy) Temkit Division of Biostatistics
Bayesian Structural Equations Modeling
2. 2/44
Outline
Introduction to SEM
Covariance Analysis
SEM Estimation (GLS vs MLE)
CFA
The General Model of SEM
LAAVAN
Bayesian Paradigm
Bayesian SEM
Bayesian CFA
BLAAVAN
CONCLUSION
M’hamed (Hamy) Temkit Division of Biostatistics
Bayesian Structural Equations Modeling
7. 7/44
Background
Factor Analysis (Spearman, 1904)
Path Analysis (Sewal Wright 1918,1921,1934,1960)
Confirmatory Factor Analysis (CFA)(Joreskog, 1969 )
General SEM ( Joreskog (1973), Wiley (1973))
LISREL model (Wiley (1973), Joreskog (1977))
Generalized least squares Browne (1974,1982,1984)
M’hamed (Hamy) Temkit Division of Biostatistics
Bayesian Structural Equations Modeling
8. 8/44
Relevant Reading References
Structural Equations With Latent Variables (Bollen, 1989)
Structural Equations Modeling With Amos (Byrn)
Latent Curve Models (Bollen, Curran 2006)
Structural Equation Modeling, A Bayesian Approach (Sik-Yum
Lee 2007)
Structural Equation Modeling: A Multidisciplinary Journal
M’hamed (Hamy) Temkit Division of Biostatistics
Bayesian Structural Equations Modeling
9. 9/44
First Principle: Linear Regression
M’hamed (Hamy) Temkit Division of Biostatistics
Bayesian Structural Equations Modeling
10. 10/44
Linear Regression: The Machinery
yi = β0 + β1xi + i , i = 1, n (regression line)
min
n
i=1
(yi − β0 − β1xi )2
(OLS)
and if i ∼ N(0, σ2) iid’s
max
n
i=1
1
2πσ2
exp(−
1
2σ2
n
i=1
(yi − β0 − β1xi )2
) (ML)
ˆβ ∼ N(β, σ2
(X X)−1
)
M’hamed (Hamy) Temkit Division of Biostatistics
Bayesian Structural Equations Modeling
11. 11/44
Pros and Cons of Regression (Linear Models)
Oversimplistic view of the Phenomena
Underestimates Measurement error (covariates are fixed)
Lacking in simultaneous equations in general (mediation )
Lacks flexibility to fit the SEM models
M’hamed (Hamy) Temkit Division of Biostatistics
Bayesian Structural Equations Modeling
12. 12/44
What is SEM
A melding of factor analysis and path (regression) analysis
into one comprehensive statistical methodolgy
Simultaneous equation modeling
Does the implied covariance matrix match up with the
observed covariance matrix
Degree to which they match represents the goodness of fit
M’hamed (Hamy) Temkit Division of Biostatistics
Bayesian Structural Equations Modeling
30. 30/44
Why Bayesian
Flexibility to utilize prior knowledge ( priors )
Robust to small sample sizes
Bayes Factor and flexibility in comparing models
Easy production of the Latent scores ( Factors )
Blaavan ( open software in R )
WinBUGS ( open software )
M’hamed (Hamy) Temkit Division of Biostatistics
Bayesian Structural Equations Modeling
31. 31/44
Bayesian References
A Bayesian approach to confirmatory factor analysis (Lee,
1980)
Evaluation of the Bayesian and maximum likelihood
approaches in analyzing structural equation models with small
small sample sizes (Lee, Song, 2004)
Structural Equation Modeling, A Bayesian Approach (Lee,
2007)
Basic and Advanced Bayesian Structural Equation Modeling,
With Applications in the Medical and Behavioral Sciences
(Song, Lee, 2012)
M’hamed (Hamy) Temkit Division of Biostatistics
Bayesian Structural Equations Modeling
32. 32/44
Bayesian estimation
log p(Θ|Y , M) ∝ log p(Y |Θ, M) + log p(Θ)
M: arbitrary SEM model
Y: observed dataset of raw observations, sample size n
θ: Random vector of parameters in M
M’hamed (Hamy) Temkit Division of Biostatistics
Bayesian Structural Equations Modeling
33. 33/44
Conjugate priors
p(y|θ) = n
k θy (1 − θ)n−y , θ ∈ (0, 1)
p(θ) ∝ θα−1(1 − θ)β−1 , θ ∼ β(α, β)
p(θ|y) ∝ p(y|θ)p(θ) ∝ θy (1 − θ)n−y (1 − θ)β−1
∝ θy+α−1(1 − θ)n−y+β−1 ∼ β(y + α, n − y + β)
The prior p(θ) and posterior p(θ|y) have the same distribution
form
M’hamed (Hamy) Temkit Division of Biostatistics
Bayesian Structural Equations Modeling
34. 34/44
Measurement model (CFA) Bayesian approach
yi = Λwi + i , i = 1, · · · , n, yi ∈ Rk
wi ∼ N(0, Φ), w ∈ Rq
i ∼ N(0, Ψ ), Ψ diagonal, Ψ k elements
wi and i are independent
Λ, Φ, Ψ are the parameters
Let Λt
k be the kth row of Λ
M’hamed (Hamy) Temkit Division of Biostatistics
Bayesian Structural Equations Modeling
35. 35/44
Measurement model (CFA) priors
The conjugate priors on the parameters are:
Ψ k ∼ IGamma(α∗
0 k, β∗
0 k)
[Λk|Ψ k] ∼ N(Λ0k, Ψ kH0yk)
Φ ∼ IWq(R∗
0 , ρ0), R∗
0 is pd
The problem is choosing the hyperparameters, such that we have
informative vs. non informative priors
M’hamed (Hamy) Temkit Division of Biostatistics
Bayesian Structural Equations Modeling
36. 36/44
Measurement model (CFA) Gibbs Sampling (MCMC)
Let Y = {y1, · · · , yn} be the observed data matrix
Ω = (w1, · · · , wn) matrix of the the latent variables
(Y , Ω) is the complete dataset ( augmented data )
P(Λ, Φ, Ψ |Y ) the posterior is intractable
P(Λ, Φ, Ψ |Ω, Y ) usually standard
P(Ω|Λ, Φ, Ψ , Y ) can be also derived based on Model M
M’hamed (Hamy) Temkit Division of Biostatistics
Bayesian Structural Equations Modeling
37. 37/44
Measurement model (CFA) Gibbs Sampling
The Gibbs sampling algorithm allows to sample from
P(Λ, Φ, Ψ , Ω|Y )
at the (j + 1)thiteration given Ωj , Λj , Φj , Ψj
Generate Ωj+1 ∼ P(Ω|Λj , Φj , Ψj
, Y )
Generate Ψj+1
∼ P(Ψ |Ωj+1, Λj , Φj , Y )
Generate Φj+1 ∼ P(Φ|Ωj+1, Λj , Ψj+1
, Y )
Generate Λj+1 ∼ P(Λ|Ωj+1, Φj+1, Ψj+1
, Y )
M’hamed (Hamy) Temkit Division of Biostatistics
Bayesian Structural Equations Modeling
38. 38/44
Measurement model (CFA) Posterior Parameters Estimates
θt = (Λt, Φt, Ψt), t = 1, · · · , T∗
ˆθ =
1
T∗
T∗
i=1
θt
var(ˆθ) =
1
(T∗ − 1)
T∗
i=1
(θt
− ˆθ)(θt
− ˆθ)t
along with 95% confidence intervals using the Q0.025 and Q0.975
M’hamed (Hamy) Temkit Division of Biostatistics
Bayesian Structural Equations Modeling
43. 43/44
Conclusions
The frequentist SEM approach is based on MLE
The Bayesian approach with data augmentation and MCMC
methods is flexible to analyze SEM
The Bayesian approach may be used when prior knowledge is
availabe when small sample size
Some open problems (power, optimal designs, GSEM, etc...)
M’hamed (Hamy) Temkit Division of Biostatistics
Bayesian Structural Equations Modeling