Bayesian_SEM_HT

1/44
Bayesian Structural Equations Modeling
M’hamed (Hamy) Temkit1
1Division of Biostatistics
Mayo Clinic, Arizona
Applied Statistics Seminar, November 17, 2016
M’hamed (Hamy) Temkit Division of Biostatistics

2/44
Outline
Introduction to SEM
Covariance Analysis
SEM Estimation (GLS vs MLE)
CFA
The General Model of SEM
LAAVAN
Bayesian Paradigm
Bayesian SEM
Bayesian CFA
BLAAVAN
CONCLUSION

3/44
Motivation

4/44
Motivation

5/44
Two Paradigms
Covariance Analysis
Σ = Σ(θ)
Bayesian Inference
p(θ | y) = p(y | θ)p(θ)

6/44
Brief SEM Terminology
ξ1
X1
X2
δ1 
δ2 
λx11 
λx21 
ξ2
X3
X4
δ1 
δ2 
λx32 
λx42 
ξ3
X5
X6
δ1 
δ2 
λx53 
λx63 
η1
η 2
y1
y2
y3
y4
ε1
ε2
ε3
ε4
λy11 
λy21 
λy32 
λy42 
Measurement model
Structural model
β21 
γ11 
γ12 
γ22 
γ23 
ϕ21 
ϕ32 
ϕ31 
Endogenous latent variables
Exogenous latent variables

7/44
Background
Factor Analysis (Spearman, 1904)
Path Analysis (Sewal Wright 1918,1921,1934,1960)
Conﬁrmatory Factor Analysis (CFA)(Joreskog, 1969 )
General SEM ( Joreskog (1973), Wiley (1973))
LISREL model (Wiley (1973), Joreskog (1977))
Generalized least squares Browne (1974,1982,1984)

8/44
Relevant Reading References
Structural Equations With Latent Variables (Bollen, 1989)
Structural Equations Modeling With Amos (Byrn)
Latent Curve Models (Bollen, Curran 2006)
Structural Equation Modeling, A Bayesian Approach (Sik-Yum
Lee 2007)
Structural Equation Modeling: A Multidisciplinary Journal

9/44
First Principle: Linear Regression

10/44
Linear Regression: The Machinery
yi = β0 + β1xi + i , i = 1, n (regression line)
min
n
i=1
(yi − β0 − β1xi )2
(OLS)
and if i ∼ N(0, σ2) iid’s
max
n
i=1
1
2πσ2
exp(−
1
2σ2
n
i=1
(yi − β0 − β1xi )2
) (ML)
ˆβ ∼ N(β, σ2
(X X)−1
)

11/44
Pros and Cons of Regression (Linear Models)
Oversimplistic view of the Phenomena
Underestimates Measurement error (covariates are fixed)
Lacking in simultaneous equations in general (mediation )
Lacks flexibility to fit the SEM models

12/44
What is SEM
A melding of factor analysis and path (regression) analysis
into one comprehensive statistical methodolgy
Simultaneous equation modeling
Does the implied covariance matrix match up with the
observed covariance matrix
Degree to which they match represents the goodness of ﬁt

13/44
Estimation (graph)
1.00 0.49
1.00 3.51
1.00 0.84
1.00 230.18
0.59
0.02
-0.00
1.09 1.32
1.20 0.47
0.44 0.34
1.18 -123.86
0.27
-0.02
1.22
0.00
0.51
x1 x2
x3 x4
x5 x6
x7 x8
Eps
Tlr
Eng
Rng

14/44
Estimation (equations)
Measurement Model:
x1 = a1 + epistemiology + e1
x2 = a2 + b2 epistemiology + e2
x3 = a3 + tolerance + e3
x4 = a4 + b4 tolerance + e4
x5 = a5 + engagement + e5
x6 = a6 + b6 engagement + e6
x7 = a7 + range + e7
x8 = a8 + b8 range + e8
Structural Model:
tolerance = a9 + b9 epistemiology + e9
range = a10 + b10 tolerance
b11 engagement + e10
cov(epist, engag) = 0

15/44
Estimation: objective function
S =




1
n
n
i=1(x1i − ¯x1)2 1
n
n
i=1(x1i − ¯x1)(x2i − ¯x2) · · · cov(x1, x8)
cov(x1, x2) var(x2) · · · cov(x2, x8)
· · · · · · · · · · · ·
cov(x1, x8) cov(x2, x8) · · · var(x8)




Σ(θ) = cov(x1, x2, · · · , x8) =




var(x1) cov(x1, x2) · · · cov(x1, x8)
cov(x1, x2) var(x2) · · · cov(x2, x8)
· · · · · · · · · · · ·
cov(x1, x8) cov(x2, x8) · · · var(x8)




S ≈ Σ(θ)
Basically, minimize f (Σ(θ), S)

16/44
Generalized Least Squares (GLS)
{x1, · · · , xn} ∼ N(0, Σ(θ0)), xi ∈ Rp iid’s
vec S
L
−→ N(Σ(θ0), C)
G(θ) = 2−1
tr{(S − Σ(θ))V }2
, V > 0
ˆθ
L
−→ N(θ0, D(θ0))
nG(ˆθ)
L
−→ χ2
p∗−q
p∗ = p(p+1)
2 , q parameters
H0 : Σ = Σ(θ) vs Ha : Σ = Σ(θ)

17/44
Maximum Likelihood (ML)
{x1, · · · , xn} ∼ N(µ0, Σ(θ0)), xi ∈ Rp iid’s
(n − 1)S ∼ Wp(R0, ρ0)
F(θ) = log det(Σθ) + tr((SΣ(θ))−1
) − log det(S) − p
˜θM
L
−→ N(θ0, C2(θ0))
nF( ˜θM)
L
−→ χ2
p∗−q
H0 : Σ = Σ(θ) vs Ha : Σ = Σ(θ)

18/44
SEM Modeling
Model ( Diagram )
Identifyability ( q ≤ 2−1p(p + 1)),
check identifyabiltiy rules in Bollen (page 238)
Constraints ( loadings equal 1 )
EDA ( Distribution, correlation, outliers, etc...)
EDA ( Estimation )
Fit indices ( SMR ( residuals ))
Diagnostics ( residuals, outliers, etc... )

19/44
Measurement model (CFA)
xi = Λξi + i , i = 1, · · · , n
ξ ∼ N(0, Φ), Latent variables
∼ N(0, Ψ ), Ψ diagonal
ξ and are uncorrelated
Σ = ΛΦΛt
+ Ψ
Λ, Φ, Ψ are the parameters

20/44
CFA Example (graph)
1.00 0.55 0.73 1.00 1.11 0.93 1.00 1.18 1.08
0.55 1.13 0.84 0.37 0.45 0.36 0.80 0.49 0.57
0.81 0.98 0.38
0.41
0.26
0.17
x1 x2 x3 x4 x5 x6 x7 x8 x9
vsl txt spd

21/44
CFA (loadings and latents)
ξ =


vsl
txt
spd


Λ =














1 0 0
λ21 0 0
λ31 0 0
0 1 0
0 λ52 0
0 λ62 0
0 0 1
0 0 λ82
0 0 λ92














But also remember the variances and covariances

22/44
CFA using Laavan (R)
library(stringr)
library(lavaan)
library(DiagrammeR)
library(dplyr)
library(semPlot)
# specify the model
HS.model <-
" visual =~ x1 + x2 + x3
textual =~ x4 + x5 + x6
speed =~ x7 + x8 + x9 "
fit.HS <- sem(HS.model,
data=HolzingerSwineford1939)
summary(fit.HS)
semPaths(fit.HS, intercept = FALSE,
whatLabel = "est",
residuals = TRUE, exoCov = TRUE)

23/44
CFA Example (output)
> summary(fit.HS)
lavaan (0.5-22) converged normally after 35 iterations
Number of observations 301
Estimator ML
Minimum Function Test Statistic 85.306
Degrees of freedom 24
P-value (Chi-square) 0.000
Parameter Estimates:
Information Expected
Standard Errors Standard
Latent Variables:
Estimate Std.Err z-value P(>|z|)
visual =~
x1 1.000
x2 0.554 0.100 5.554 0.000
x3 0.729 0.109 6.685 0.000
textual =~
x4 1.000
x5 1.113 0.065 17.014 0.000
x6 0.926 0.055 16.703 0.000
speed =~
x7 1.000
x8 1.180 0.165 7.152 0.000
x9 1.082 0.151 7.155 0.000
Covariances:
visual ~~
textual 0.408 0.074 5.552 0.000
speed 0.262 0.056 4.660 0.000
textual ~~
speed 0.173 0.049 3.518 0.000
Variances:
.x1 0.549 0.114 4.833 0.000
.x2 1.134 0.102 11.146 0.000
.x3 0.844 0.091 9.317 0.000
.x4 0.371 0.048 7.779 0.000
.x5 0.446 0.058 7.642 0.000
.x6 0.356 0.043 8.277 0.000
.x7 0.799 0.081 9.823 0.000
.x8 0.488 0.074 6.573 0.000
.x9 0.566 0.071 8.003 0.000
visual 0.809 0.145 5.564 0.000
textual 0.979 0.112 8.737 0.000
speed 0.384 0.086 4.451 0.000

24/44
Structural model (SEM)
η = Bη + Γξ + ζ
y = Λy η +
x = Λx ξ + δ
B, Γ, Λy , Λx , Φ, Ψ, Θ , Θδ, are the parameters

25/44
SEM Example (graph)
1.00 2.18 1.82
1.00 1.26 1.06 1.26 1.00 1.19 1.28 1.27
1.48 0.57
0.84
0.62
1.31
2.15 0.79 0.35
1.36
x1 x2 x3
y1 y2 y3 y4 y5 y6 y7 y8
i60
d60 d65

26/44
SEM Example (some equations)
d60
d65
=
0 0
B21 0
d60
d65
+
γ11
γ21
i60 +
ξ1
ξ2
Σ(θ) =
Σyy (θ) Σyx (θ)
Σxy (θ) Σxx (θ)

27/44
SEM Example ( R code)
# specify the model
model <- ’
# latent variables
ind60 =~ x1 + x2 + x3
dem60 =~ y1 + y2 + y3 + y4
dem65 =~ y5 + y6 + y7 + y8
# regressions
dem60 ~ ind60
dem65 ~ ind60 + dem60
# residual covariances
y1 ~~ y5
y2 ~~ y4 + y6
y3 ~~ y7
y4 ~~ y8
y6 ~~ y8
’
fit <- sem(model, data=PoliticalDemocracy)
summary(fit)
semPaths(fit, intercept = FALSE, whatLabel = "est",
residuals = FALSE, exoCov = FALSE)

28/44
SEM Example (output)
summary(fit)
lavaan (0.5-22) converged normally after 68 iterations
Estimator ML
Minimum Function Test Statistic 38.125
Degrees of freedom 35
P-value (Chi-square) 0.329
Information Expected
Standard Errors Standard
Latent Variables:
ind60 =~
x1 1.000
x2 2.180 0.139 15.742 0.000
x3 1.819 0.152 11.967 0.000
dem60 =~
y1 1.000
y2 1.257 0.182 6.889 0.000
y3 1.058 0.151 6.987 0.000
y4 1.265 0.145 8.722 0.000
dem65 =~
y5 1.000
y6 1.186 0.169 7.024 0.00
y7 1.280 0.160 8.002 0.00
y8 1.266 0.158 8.007 0.00
Regressions:
Estimate Std.Err z-value P(>|z|
dem60 ~
ind60 1.483 0.399 3.715 0.00
dem65 ~
ind60 0.572 0.221 2.586 0.01
dem60 0.837 0.098 8.514 0.00

29/44
SEM Example (output)
Covariances:
.y1 ~~
.y5 0.624 0.358 1.741 0.082
.y2 ~~
.y4 1.313 0.702 1.871 0.061
.y6 2.153 0.734 2.934 0.003
.y3 ~~
.y7 0.795 0.608 1.308 0.191
.y4 ~~
.y8 0.348 0.442 0.787 0.431
.y6 ~~
.y8 1.356 0.568 2.386 0.017
Variances:
.x1 0.082 0.019 4.184 0.000
.x2 0.120 0.070 1.718 0.086
.x3 0.467 0.090 5.177 0.000
.y1 1.891 0.444 4.256 0.000
.y2 7.373 1.374 5.366 0.000
.y3 5.067 0.952 5.324 0.000
.y4 3.148 0.739 4.261 0.000
.y5 2.351 0.480 4.895 0.000
.y6 4.954 0.914 5.419 0.000
.y7 3.431 0.713 4.814 0.00
.y8 3.254 0.695 4.685 0.00
ind60 0.448 0.087 5.173 0.00
.dem60 3.956 0.921 4.295 0.00
.dem65 0.172 0.215 0.803 0.42

30/44
Why Bayesian
Flexibility to utilize prior knowledge ( priors )
Robust to small sample sizes
Bayes Factor and ﬂexibility in comparing models
Easy production of the Latent scores ( Factors )
Blaavan ( open software in R )
WinBUGS ( open software )

31/44
Bayesian References
A Bayesian approach to conﬁrmatory factor analysis (Lee,
1980)
Evaluation of the Bayesian and maximum likelihood
approaches in analyzing structural equation models with small
small sample sizes (Lee, Song, 2004)
Structural Equation Modeling, A Bayesian Approach (Lee,
2007)
Basic and Advanced Bayesian Structural Equation Modeling,
With Applications in the Medical and Behavioral Sciences
(Song, Lee, 2012)

32/44
Bayesian estimation
log p(Θ|Y , M) ∝ log p(Y |Θ, M) + log p(Θ)
M: arbitrary SEM model
Y: observed dataset of raw observations, sample size n
θ: Random vector of parameters in M

33/44
Conjugate priors
p(y|θ) = n
k θy (1 − θ)n−y , θ ∈ (0, 1)
p(θ) ∝ θα−1(1 − θ)β−1 , θ ∼ β(α, β)
p(θ|y) ∝ p(y|θ)p(θ) ∝ θy (1 − θ)n−y (1 − θ)β−1
∝ θy+α−1(1 − θ)n−y+β−1 ∼ β(y + α, n − y + β)
The prior p(θ) and posterior p(θ|y) have the same distribution
form

34/44
Measurement model (CFA) Bayesian approach
yi = Λwi + i , i = 1, · · · , n, yi ∈ Rk
wi ∼ N(0, Φ), w ∈ Rq
i ∼ N(0, Ψ ), Ψ diagonal, Ψ k elements
wi and i are independent
Λ, Φ, Ψ are the parameters
Let Λt
k be the kth row of Λ

35/44
Measurement model (CFA) priors
The conjugate priors on the parameters are:
Ψ k ∼ IGamma(α∗
0 k, β∗
0 k)
[Λk|Ψ k] ∼ N(Λ0k, Ψ kH0yk)
Φ ∼ IWq(R∗
0 , ρ0), R∗
0 is pd
The problem is choosing the hyperparameters, such that we have
informative vs. non informative priors

36/44
Measurement model (CFA) Gibbs Sampling (MCMC)
Let Y = {y1, · · · , yn} be the observed data matrix
Ω = (w1, · · · , wn) matrix of the the latent variables
(Y , Ω) is the complete dataset ( augmented data )
P(Λ, Φ, Ψ |Y ) the posterior is intractable
P(Λ, Φ, Ψ |Ω, Y ) usually standard
P(Ω|Λ, Φ, Ψ , Y ) can be also derived based on Model M

37/44
Measurement model (CFA) Gibbs Sampling
The Gibbs sampling algorithm allows to sample from
P(Λ, Φ, Ψ , Ω|Y )
at the (j + 1)thiteration given Ωj , Λj , Φj , Ψj
Generate Ωj+1 ∼ P(Ω|Λj , Φj , Ψj
, Y )
Generate Ψj+1
∼ P(Ψ |Ωj+1, Λj , Φj , Y )
Generate Φj+1 ∼ P(Φ|Ωj+1, Λj , Ψj+1
, Y )
Generate Λj+1 ∼ P(Λ|Ωj+1, Φj+1, Ψj+1
, Y )

38/44
Measurement model (CFA) Posterior Parameters Estimates
θt = (Λt, Φt, Ψt), t = 1, · · · , T∗
ˆθ =
1
T∗
T∗
i=1
θt
var(ˆθ) =
1
(T∗ − 1)
T∗
i=1
(θt
− ˆθ)(θt
− ˆθ)t
along with 95% conﬁdence intervals using the Q0.025 and Q0.975

39/44
Bayesian CFA Example using Blaavan
library(blavaan)
# specify the model
bHS.model <- " visual =~ x1 + x2 + x3
textual =~ x4 + x5 + x6
speed =~ x7 + x8 + x9
# intercepts
x1 ~ 0
x2 ~ 0
x3 ~ 0
x4 ~ 0
x5 ~ 0
x6 ~ 0
x7 ~ 0
x8 ~ 0
x9 ~ 0
"
bfit.HS <- bsem(bHS.model,
data=HolzingerSwineford1939 )
summary(bfit.HS)
fitMeasures(bfit.HS,fit.measures="all", baseline.model= NULL)

40/44
Bayesian CFA Example (output)
blavaan (0.2-2) results of 10000 samples after 5000 adapt+burnin iterations
Number of missing patterns 1
Statistic MargLogLik PPP
Value -4481.087 0.000
Latent Variables:
Estimate Post.SD HPD.025 HPD.975 PSRF Prior
visual =~
x1 1.000
x2 1.221 0.018 1.186 1.255 1.000 dnorm(0,1e-2)
x3 0.463 0.012 0.438 0.487 1.000 dnorm(0,1e-2)
textual =~
x4 1.000
x5 1.404 0.020 1.365 1.445 1.004 dnorm(0,1e-2)
x6 0.731 0.016 0.7 0.761 1.001 dnorm(0,1e-2)
speed =~
x7 1.000
x8 1.320 0.020 1.28 1.357 1.002 dnorm(0,1e-2)
x9 1.286 0.019 1.25 1.325 1.002 dnorm(0,1e-2)

41/44
Covariances:
visual ~~
textual 15.500 1.321 12.998 18.14 1.000 dwish(iden,4)
speed 20.910 1.764 17.576 24.439 1.000 dwish(iden,4)
textual ~~
speed 13.003 1.118 10.9 15.259 1.000 dwish(iden,4)
Intercepts:
.x1 0.000
.x2 0.000
.x3 0.000
.x4 0.000
.x5 0.000
.x6 0.000
.x7 0.000
.x8 0.000
.x9 0.000
visual 0.000
textual 0.000
speed 0.000

42/44
Variances:
.x1 0.716 0.088 0.547 0.891 1.001 dgamma(1,.5)
.x2 1.219 0.138 0.96 1.5 1.000 dgamma(1,.5)
.x3 0.993 0.086 0.832 1.164 1.000 dgamma(1,.5)
.x4 0.449 0.053 0.346 0.552 1.001 dgamma(1,.5)
.x5 0.314 0.069 0.184 0.452 1.002 dgamma(1,.5)
.x6 0.509 0.048 0.417 0.604 1.000 dgamma(1,.5)
.x7 0.877 0.084 0.717 1.045 1.000 dgamma(1,.5)
.x8 0.567 0.077 0.417 0.72 1.000 dgamma(1,.5)
.x9 0.478 0.068 0.347 0.61 1.000 dgamma(1,.5)
visual 24.998 2.118 20.929 29.176 1.000 dwish(iden,4)
textual 10.256 0.882 8.518 11.953 1.001 dwish(iden,4)
speed 17.812 1.539 14.813 20.859 1.001 dwish(iden,4)
> fitMeasures(bfit.HS,fit.measures="all", baseline.model= NULL)
npar logl ppp bic dic p_dic waic
21.000 -4398.287 0.000 8916.354 8837.747 20.586 8838.364
p_waic looic p_loo margloglik
20.848 8838.391 20.861 -4481.087

43/44
Conclusions
The frequentist SEM approach is based on MLE
The Bayesian approach with data augmentation and MCMC
methods is ﬂexible to analyze SEM
The Bayesian approach may be used when prior knowledge is
availabe when small sample size
Some open problems (power, optimal designs, GSEM, etc...)

44/44
THANK YOU!

Bayesian_SEM_HT

Recommended

Recommended

More Related Content

What's hot

What's hot (12)

Viewers also liked

Viewers also liked (19)

Similar to Bayesian_SEM_HT

Similar to Bayesian_SEM_HT (20)

Bayesian_SEM_HT