SlideShare a Scribd company logo
1 of 44
Download to read offline
1/44
Bayesian Structural Equations Modeling
M’hamed (Hamy) Temkit1
1Division of Biostatistics
Mayo Clinic, Arizona
Applied Statistics Seminar, November 17, 2016
M’hamed (Hamy) Temkit Division of Biostatistics
Bayesian Structural Equations Modeling
2/44
Outline
Introduction to SEM
Covariance Analysis
SEM Estimation (GLS vs MLE)
CFA
The General Model of SEM
LAAVAN
Bayesian Paradigm
Bayesian SEM
Bayesian CFA
BLAAVAN
CONCLUSION
M’hamed (Hamy) Temkit Division of Biostatistics
Bayesian Structural Equations Modeling
3/44
Motivation
M’hamed (Hamy) Temkit Division of Biostatistics
Bayesian Structural Equations Modeling
4/44
Motivation
M’hamed (Hamy) Temkit Division of Biostatistics
Bayesian Structural Equations Modeling
5/44
Two Paradigms
Covariance Analysis
Σ = Σ(θ)
Bayesian Inference
p(θ | y) = p(y | θ)p(θ)
M’hamed (Hamy) Temkit Division of Biostatistics
Bayesian Structural Equations Modeling
6/44
Brief SEM Terminology
ξ1
X1
X2
δ1

δ2

λx11

λx21

ξ2
X3
X4
δ1

δ2

λx32

λx42

ξ3
X5
X6
δ1

δ2

λx53

λx63

η1
η 2
y1
y2
y3
y4
ε1
ε2
ε3
ε4
λy11

λy21

λy32

λy42

Measurement model
Structural model
β21

γ11

γ12

γ22

γ23

ϕ21

ϕ32

ϕ31

Endogenous latent variables
Exogenous latent variables
M’hamed (Hamy) Temkit Division of Biostatistics
Bayesian Structural Equations Modeling
7/44
Background
Factor Analysis (Spearman, 1904)
Path Analysis (Sewal Wright 1918,1921,1934,1960)
Confirmatory Factor Analysis (CFA)(Joreskog, 1969 )
General SEM ( Joreskog (1973), Wiley (1973))
LISREL model (Wiley (1973), Joreskog (1977))
Generalized least squares Browne (1974,1982,1984)
M’hamed (Hamy) Temkit Division of Biostatistics
Bayesian Structural Equations Modeling
8/44
Relevant Reading References
Structural Equations With Latent Variables (Bollen, 1989)
Structural Equations Modeling With Amos (Byrn)
Latent Curve Models (Bollen, Curran 2006)
Structural Equation Modeling, A Bayesian Approach (Sik-Yum
Lee 2007)
Structural Equation Modeling: A Multidisciplinary Journal
M’hamed (Hamy) Temkit Division of Biostatistics
Bayesian Structural Equations Modeling
9/44
First Principle: Linear Regression
M’hamed (Hamy) Temkit Division of Biostatistics
Bayesian Structural Equations Modeling
10/44
Linear Regression: The Machinery
yi = β0 + β1xi + i , i = 1, n (regression line)
min
n
i=1
(yi − β0 − β1xi )2
(OLS)
and if i ∼ N(0, σ2) iid’s
max
n
i=1
1
2πσ2
exp(−
1
2σ2
n
i=1
(yi − β0 − β1xi )2
) (ML)
ˆβ ∼ N(β, σ2
(X X)−1
)
M’hamed (Hamy) Temkit Division of Biostatistics
Bayesian Structural Equations Modeling
11/44
Pros and Cons of Regression (Linear Models)
Oversimplistic view of the Phenomena
Underestimates Measurement error (covariates are fixed)
Lacking in simultaneous equations in general (mediation )
Lacks flexibility to fit the SEM models
M’hamed (Hamy) Temkit Division of Biostatistics
Bayesian Structural Equations Modeling
12/44
What is SEM
A melding of factor analysis and path (regression) analysis
into one comprehensive statistical methodolgy
Simultaneous equation modeling
Does the implied covariance matrix match up with the
observed covariance matrix
Degree to which they match represents the goodness of fit
M’hamed (Hamy) Temkit Division of Biostatistics
Bayesian Structural Equations Modeling
13/44
Estimation (graph)
1.00 0.49
1.00 3.51
1.00 0.84
1.00 230.18
0.59
0.02
-0.00
1.09 1.32
1.20 0.47
0.44 0.34
1.18 -123.86
0.27
-0.02
1.22
0.00
0.51
x1 x2
x3 x4
x5 x6
x7 x8
Eps
Tlr
Eng
Rng
M’hamed (Hamy) Temkit Division of Biostatistics
Bayesian Structural Equations Modeling
14/44
Estimation (equations)
Measurement Model:
x1 = a1 + epistemiology + e1
x2 = a2 + b2 epistemiology + e2
x3 = a3 + tolerance + e3
x4 = a4 + b4 tolerance + e4
x5 = a5 + engagement + e5
x6 = a6 + b6 engagement + e6
x7 = a7 + range + e7
x8 = a8 + b8 range + e8
Structural Model:
tolerance = a9 + b9 epistemiology + e9
range = a10 + b10 tolerance
b11 engagement + e10
cov(epist, engag) = 0
M’hamed (Hamy) Temkit Division of Biostatistics
Bayesian Structural Equations Modeling
15/44
Estimation: objective function
S =




1
n
n
i=1(x1i − ¯x1)2 1
n
n
i=1(x1i − ¯x1)(x2i − ¯x2) · · · cov(x1, x8)
cov(x1, x2) var(x2) · · · cov(x2, x8)
· · · · · · · · · · · ·
cov(x1, x8) cov(x2, x8) · · · var(x8)




Σ(θ) = cov(x1, x2, · · · , x8) =




var(x1) cov(x1, x2) · · · cov(x1, x8)
cov(x1, x2) var(x2) · · · cov(x2, x8)
· · · · · · · · · · · ·
cov(x1, x8) cov(x2, x8) · · · var(x8)




S ≈ Σ(θ)
Basically, minimize f (Σ(θ), S)
M’hamed (Hamy) Temkit Division of Biostatistics
Bayesian Structural Equations Modeling
16/44
Generalized Least Squares (GLS)
{x1, · · · , xn} ∼ N(0, Σ(θ0)), xi ∈ Rp iid’s
vec S
L
−→ N(Σ(θ0), C)
G(θ) = 2−1
tr{(S − Σ(θ))V }2
, V > 0
ˆθ
L
−→ N(θ0, D(θ0))
nG(ˆθ)
L
−→ χ2
p∗−q
p∗ = p(p+1)
2 , q parameters
H0 : Σ = Σ(θ) vs Ha : Σ = Σ(θ)
M’hamed (Hamy) Temkit Division of Biostatistics
Bayesian Structural Equations Modeling
17/44
Maximum Likelihood (ML)
{x1, · · · , xn} ∼ N(µ0, Σ(θ0)), xi ∈ Rp iid’s
(n − 1)S ∼ Wp(R0, ρ0)
F(θ) = log det(Σθ) + tr((SΣ(θ))−1
) − log det(S) − p
˜θM
L
−→ N(θ0, C2(θ0))
nF( ˜θM)
L
−→ χ2
p∗−q
H0 : Σ = Σ(θ) vs Ha : Σ = Σ(θ)
M’hamed (Hamy) Temkit Division of Biostatistics
Bayesian Structural Equations Modeling
18/44
SEM Modeling
Model ( Diagram )
Identifyability ( q ≤ 2−1p(p + 1)),
check identifyabiltiy rules in Bollen (page 238)
Constraints ( loadings equal 1 )
EDA ( Distribution, correlation, outliers, etc...)
EDA ( Estimation )
Fit indices ( SMR ( residuals ))
Diagnostics ( residuals, outliers, etc... )
M’hamed (Hamy) Temkit Division of Biostatistics
Bayesian Structural Equations Modeling
19/44
Measurement model (CFA)
xi = Λξi + i , i = 1, · · · , n
ξ ∼ N(0, Φ), Latent variables
∼ N(0, Ψ ), Ψ diagonal
ξ and are uncorrelated
Σ = ΛΦΛt
+ Ψ
Λ, Φ, Ψ are the parameters
M’hamed (Hamy) Temkit Division of Biostatistics
Bayesian Structural Equations Modeling
20/44
CFA Example (graph)
1.00 0.55 0.73 1.00 1.11 0.93 1.00 1.18 1.08
0.55 1.13 0.84 0.37 0.45 0.36 0.80 0.49 0.57
0.81 0.98 0.38
0.41
0.26
0.17
x1 x2 x3 x4 x5 x6 x7 x8 x9
vsl txt spd
M’hamed (Hamy) Temkit Division of Biostatistics
Bayesian Structural Equations Modeling
21/44
CFA (loadings and latents)
ξ =


vsl
txt
spd


Λ =














1 0 0
λ21 0 0
λ31 0 0
0 1 0
0 λ52 0
0 λ62 0
0 0 1
0 0 λ82
0 0 λ92














But also remember the variances and covariances
M’hamed (Hamy) Temkit Division of Biostatistics
Bayesian Structural Equations Modeling
22/44
CFA using Laavan (R)
library(stringr)
library(lavaan)
library(DiagrammeR)
library(dplyr)
library(semPlot)
# specify the model
HS.model <-
" visual =~ x1 + x2 + x3
textual =~ x4 + x5 + x6
speed =~ x7 + x8 + x9 "
fit.HS <- sem(HS.model,
data=HolzingerSwineford1939)
summary(fit.HS)
semPaths(fit.HS, intercept = FALSE,
whatLabel = "est",
residuals = TRUE, exoCov = TRUE)
M’hamed (Hamy) Temkit Division of Biostatistics
Bayesian Structural Equations Modeling
23/44
CFA Example (output)
> summary(fit.HS)
lavaan (0.5-22) converged normally after 35 iterations
Number of observations 301
Estimator ML
Minimum Function Test Statistic 85.306
Degrees of freedom 24
P-value (Chi-square) 0.000
Parameter Estimates:
Information Expected
Standard Errors Standard
Latent Variables:
Estimate Std.Err z-value P(>|z|)
visual =~
x1 1.000
x2 0.554 0.100 5.554 0.000
x3 0.729 0.109 6.685 0.000
textual =~
x4 1.000
x5 1.113 0.065 17.014 0.000
x6 0.926 0.055 16.703 0.000
speed =~
x7 1.000
x8 1.180 0.165 7.152 0.000
x9 1.082 0.151 7.155 0.000
Covariances:
Estimate Std.Err z-value P(>|z|)
visual ~~
textual 0.408 0.074 5.552 0.000
speed 0.262 0.056 4.660 0.000
textual ~~
speed 0.173 0.049 3.518 0.000
Variances:
Estimate Std.Err z-value P(>|z|)
.x1 0.549 0.114 4.833 0.000
.x2 1.134 0.102 11.146 0.000
.x3 0.844 0.091 9.317 0.000
.x4 0.371 0.048 7.779 0.000
.x5 0.446 0.058 7.642 0.000
.x6 0.356 0.043 8.277 0.000
.x7 0.799 0.081 9.823 0.000
.x8 0.488 0.074 6.573 0.000
.x9 0.566 0.071 8.003 0.000
visual 0.809 0.145 5.564 0.000
textual 0.979 0.112 8.737 0.000
speed 0.384 0.086 4.451 0.000
M’hamed (Hamy) Temkit Division of Biostatistics
Bayesian Structural Equations Modeling
24/44
Structural model (SEM)
η = Bη + Γξ + ζ
y = Λy η +
x = Λx ξ + δ
B, Γ, Λy , Λx , Φ, Ψ, Θ , Θδ, are the parameters
M’hamed (Hamy) Temkit Division of Biostatistics
Bayesian Structural Equations Modeling
25/44
SEM Example (graph)
1.00 2.18 1.82
1.00 1.26 1.06 1.26 1.00 1.19 1.28 1.27
1.48 0.57
0.84
0.62
1.31
2.15 0.79 0.35
1.36
x1 x2 x3
y1 y2 y3 y4 y5 y6 y7 y8
i60
d60 d65
M’hamed (Hamy) Temkit Division of Biostatistics
Bayesian Structural Equations Modeling
26/44
SEM Example (some equations)
d60
d65
=
0 0
B21 0
d60
d65
+
γ11
γ21
i60 +
ξ1
ξ2
Σ(θ) =
Σyy (θ) Σyx (θ)
Σxy (θ) Σxx (θ)
M’hamed (Hamy) Temkit Division of Biostatistics
Bayesian Structural Equations Modeling
27/44
SEM Example ( R code)
# specify the model
model <- ’
# latent variables
ind60 =~ x1 + x2 + x3
dem60 =~ y1 + y2 + y3 + y4
dem65 =~ y5 + y6 + y7 + y8
# regressions
dem60 ~ ind60
dem65 ~ ind60 + dem60
# residual covariances
y1 ~~ y5
y2 ~~ y4 + y6
y3 ~~ y7
y4 ~~ y8
y6 ~~ y8
’
fit <- sem(model, data=PoliticalDemocracy)
summary(fit)
semPaths(fit, intercept = FALSE, whatLabel = "est",
residuals = FALSE, exoCov = FALSE)
M’hamed (Hamy) Temkit Division of Biostatistics
Bayesian Structural Equations Modeling
28/44
SEM Example (output)
summary(fit)
lavaan (0.5-22) converged normally after 68 iterations
Number of observations 75
Estimator ML
Minimum Function Test Statistic 38.125
Degrees of freedom 35
P-value (Chi-square) 0.329
Parameter Estimates:
Information Expected
Standard Errors Standard
Latent Variables:
Estimate Std.Err z-value P(>|z|)
ind60 =~
x1 1.000
x2 2.180 0.139 15.742 0.000
x3 1.819 0.152 11.967 0.000
dem60 =~
y1 1.000
y2 1.257 0.182 6.889 0.000
y3 1.058 0.151 6.987 0.000
y4 1.265 0.145 8.722 0.000
dem65 =~
y5 1.000
y6 1.186 0.169 7.024 0.00
y7 1.280 0.160 8.002 0.00
y8 1.266 0.158 8.007 0.00
Regressions:
Estimate Std.Err z-value P(>|z|
dem60 ~
ind60 1.483 0.399 3.715 0.00
dem65 ~
ind60 0.572 0.221 2.586 0.01
dem60 0.837 0.098 8.514 0.00
M’hamed (Hamy) Temkit Division of Biostatistics
Bayesian Structural Equations Modeling
29/44
SEM Example (output)
Covariances:
Estimate Std.Err z-value P(>|z|)
.y1 ~~
.y5 0.624 0.358 1.741 0.082
.y2 ~~
.y4 1.313 0.702 1.871 0.061
.y6 2.153 0.734 2.934 0.003
.y3 ~~
.y7 0.795 0.608 1.308 0.191
.y4 ~~
.y8 0.348 0.442 0.787 0.431
.y6 ~~
.y8 1.356 0.568 2.386 0.017
Variances:
Estimate Std.Err z-value P(>|z|)
.x1 0.082 0.019 4.184 0.000
.x2 0.120 0.070 1.718 0.086
.x3 0.467 0.090 5.177 0.000
.y1 1.891 0.444 4.256 0.000
.y2 7.373 1.374 5.366 0.000
.y3 5.067 0.952 5.324 0.000
.y4 3.148 0.739 4.261 0.000
.y5 2.351 0.480 4.895 0.000
.y6 4.954 0.914 5.419 0.000
.y7 3.431 0.713 4.814 0.00
.y8 3.254 0.695 4.685 0.00
ind60 0.448 0.087 5.173 0.00
.dem60 3.956 0.921 4.295 0.00
.dem65 0.172 0.215 0.803 0.42
M’hamed (Hamy) Temkit Division of Biostatistics
Bayesian Structural Equations Modeling
30/44
Why Bayesian
Flexibility to utilize prior knowledge ( priors )
Robust to small sample sizes
Bayes Factor and flexibility in comparing models
Easy production of the Latent scores ( Factors )
Blaavan ( open software in R )
WinBUGS ( open software )
M’hamed (Hamy) Temkit Division of Biostatistics
Bayesian Structural Equations Modeling
31/44
Bayesian References
A Bayesian approach to confirmatory factor analysis (Lee,
1980)
Evaluation of the Bayesian and maximum likelihood
approaches in analyzing structural equation models with small
small sample sizes (Lee, Song, 2004)
Structural Equation Modeling, A Bayesian Approach (Lee,
2007)
Basic and Advanced Bayesian Structural Equation Modeling,
With Applications in the Medical and Behavioral Sciences
(Song, Lee, 2012)
M’hamed (Hamy) Temkit Division of Biostatistics
Bayesian Structural Equations Modeling
32/44
Bayesian estimation
log p(Θ|Y , M) ∝ log p(Y |Θ, M) + log p(Θ)
M: arbitrary SEM model
Y: observed dataset of raw observations, sample size n
θ: Random vector of parameters in M
M’hamed (Hamy) Temkit Division of Biostatistics
Bayesian Structural Equations Modeling
33/44
Conjugate priors
p(y|θ) = n
k θy (1 − θ)n−y , θ ∈ (0, 1)
p(θ) ∝ θα−1(1 − θ)β−1 , θ ∼ β(α, β)
p(θ|y) ∝ p(y|θ)p(θ) ∝ θy (1 − θ)n−y (1 − θ)β−1
∝ θy+α−1(1 − θ)n−y+β−1 ∼ β(y + α, n − y + β)
The prior p(θ) and posterior p(θ|y) have the same distribution
form
M’hamed (Hamy) Temkit Division of Biostatistics
Bayesian Structural Equations Modeling
34/44
Measurement model (CFA) Bayesian approach
yi = Λwi + i , i = 1, · · · , n, yi ∈ Rk
wi ∼ N(0, Φ), w ∈ Rq
i ∼ N(0, Ψ ), Ψ diagonal, Ψ k elements
wi and i are independent
Λ, Φ, Ψ are the parameters
Let Λt
k be the kth row of Λ
M’hamed (Hamy) Temkit Division of Biostatistics
Bayesian Structural Equations Modeling
35/44
Measurement model (CFA) priors
The conjugate priors on the parameters are:
Ψ k ∼ IGamma(α∗
0 k, β∗
0 k)
[Λk|Ψ k] ∼ N(Λ0k, Ψ kH0yk)
Φ ∼ IWq(R∗
0 , ρ0), R∗
0 is pd
The problem is choosing the hyperparameters, such that we have
informative vs. non informative priors
M’hamed (Hamy) Temkit Division of Biostatistics
Bayesian Structural Equations Modeling
36/44
Measurement model (CFA) Gibbs Sampling (MCMC)
Let Y = {y1, · · · , yn} be the observed data matrix
Ω = (w1, · · · , wn) matrix of the the latent variables
(Y , Ω) is the complete dataset ( augmented data )
P(Λ, Φ, Ψ |Y ) the posterior is intractable
P(Λ, Φ, Ψ |Ω, Y ) usually standard
P(Ω|Λ, Φ, Ψ , Y ) can be also derived based on Model M
M’hamed (Hamy) Temkit Division of Biostatistics
Bayesian Structural Equations Modeling
37/44
Measurement model (CFA) Gibbs Sampling
The Gibbs sampling algorithm allows to sample from
P(Λ, Φ, Ψ , Ω|Y )
at the (j + 1)thiteration given Ωj , Λj , Φj , Ψj
Generate Ωj+1 ∼ P(Ω|Λj , Φj , Ψj
, Y )
Generate Ψj+1
∼ P(Ψ |Ωj+1, Λj , Φj , Y )
Generate Φj+1 ∼ P(Φ|Ωj+1, Λj , Ψj+1
, Y )
Generate Λj+1 ∼ P(Λ|Ωj+1, Φj+1, Ψj+1
, Y )
M’hamed (Hamy) Temkit Division of Biostatistics
Bayesian Structural Equations Modeling
38/44
Measurement model (CFA) Posterior Parameters Estimates
θt = (Λt, Φt, Ψt), t = 1, · · · , T∗
ˆθ =
1
T∗
T∗
i=1
θt
var(ˆθ) =
1
(T∗ − 1)
T∗
i=1
(θt
− ˆθ)(θt
− ˆθ)t
along with 95% confidence intervals using the Q0.025 and Q0.975
M’hamed (Hamy) Temkit Division of Biostatistics
Bayesian Structural Equations Modeling
39/44
Bayesian CFA Example using Blaavan
library(blavaan)
# specify the model
bHS.model <- " visual =~ x1 + x2 + x3
textual =~ x4 + x5 + x6
speed =~ x7 + x8 + x9
# intercepts
x1 ~ 0
x2 ~ 0
x3 ~ 0
x4 ~ 0
x5 ~ 0
x6 ~ 0
x7 ~ 0
x8 ~ 0
x9 ~ 0
"
bfit.HS <- bsem(bHS.model,
data=HolzingerSwineford1939 )
summary(bfit.HS)
fitMeasures(bfit.HS,fit.measures="all", baseline.model= NULL)
M’hamed (Hamy) Temkit Division of Biostatistics
Bayesian Structural Equations Modeling
40/44
Bayesian CFA Example (output)
blavaan (0.2-2) results of 10000 samples after 5000 adapt+burnin iterations
Number of observations 301
Number of missing patterns 1
Statistic MargLogLik PPP
Value -4481.087 0.000
Parameter Estimates:
Latent Variables:
Estimate Post.SD HPD.025 HPD.975 PSRF Prior
visual =~
x1 1.000
x2 1.221 0.018 1.186 1.255 1.000 dnorm(0,1e-2)
x3 0.463 0.012 0.438 0.487 1.000 dnorm(0,1e-2)
textual =~
x4 1.000
x5 1.404 0.020 1.365 1.445 1.004 dnorm(0,1e-2)
x6 0.731 0.016 0.7 0.761 1.001 dnorm(0,1e-2)
speed =~
x7 1.000
x8 1.320 0.020 1.28 1.357 1.002 dnorm(0,1e-2)
x9 1.286 0.019 1.25 1.325 1.002 dnorm(0,1e-2)
M’hamed (Hamy) Temkit Division of Biostatistics
Bayesian Structural Equations Modeling
41/44
Bayesian CFA Example (output)
Covariances:
Estimate Post.SD HPD.025 HPD.975 PSRF Prior
visual ~~
textual 15.500 1.321 12.998 18.14 1.000 dwish(iden,4)
speed 20.910 1.764 17.576 24.439 1.000 dwish(iden,4)
textual ~~
speed 13.003 1.118 10.9 15.259 1.000 dwish(iden,4)
Intercepts:
Estimate Post.SD HPD.025 HPD.975 PSRF Prior
.x1 0.000
.x2 0.000
.x3 0.000
.x4 0.000
.x5 0.000
.x6 0.000
.x7 0.000
.x8 0.000
.x9 0.000
visual 0.000
textual 0.000
speed 0.000
M’hamed (Hamy) Temkit Division of Biostatistics
Bayesian Structural Equations Modeling
42/44
Bayesian CFA Example (output)
Variances:
Estimate Post.SD HPD.025 HPD.975 PSRF Prior
.x1 0.716 0.088 0.547 0.891 1.001 dgamma(1,.5)
.x2 1.219 0.138 0.96 1.5 1.000 dgamma(1,.5)
.x3 0.993 0.086 0.832 1.164 1.000 dgamma(1,.5)
.x4 0.449 0.053 0.346 0.552 1.001 dgamma(1,.5)
.x5 0.314 0.069 0.184 0.452 1.002 dgamma(1,.5)
.x6 0.509 0.048 0.417 0.604 1.000 dgamma(1,.5)
.x7 0.877 0.084 0.717 1.045 1.000 dgamma(1,.5)
.x8 0.567 0.077 0.417 0.72 1.000 dgamma(1,.5)
.x9 0.478 0.068 0.347 0.61 1.000 dgamma(1,.5)
visual 24.998 2.118 20.929 29.176 1.000 dwish(iden,4)
textual 10.256 0.882 8.518 11.953 1.001 dwish(iden,4)
speed 17.812 1.539 14.813 20.859 1.001 dwish(iden,4)
> fitMeasures(bfit.HS,fit.measures="all", baseline.model= NULL)
npar logl ppp bic dic p_dic waic
21.000 -4398.287 0.000 8916.354 8837.747 20.586 8838.364
p_waic looic p_loo margloglik
20.848 8838.391 20.861 -4481.087
M’hamed (Hamy) Temkit Division of Biostatistics
Bayesian Structural Equations Modeling
43/44
Conclusions
The frequentist SEM approach is based on MLE
The Bayesian approach with data augmentation and MCMC
methods is flexible to analyze SEM
The Bayesian approach may be used when prior knowledge is
availabe when small sample size
Some open problems (power, optimal designs, GSEM, etc...)
M’hamed (Hamy) Temkit Division of Biostatistics
Bayesian Structural Equations Modeling
44/44
THANK YOU!
M’hamed (Hamy) Temkit Division of Biostatistics
Bayesian Structural Equations Modeling

More Related Content

What's hot (12)

Slides ACTINFO 2016
Slides ACTINFO 2016Slides ACTINFO 2016
Slides ACTINFO 2016
 
Slides econ-lm
Slides econ-lmSlides econ-lm
Slides econ-lm
 
Slides Bank England
Slides Bank EnglandSlides Bank England
Slides Bank England
 
Proba stats-r1-2017
Proba stats-r1-2017Proba stats-r1-2017
Proba stats-r1-2017
 
Machine learning (2)
Machine learning (2)Machine learning (2)
Machine learning (2)
 
Slides ineq-4
Slides ineq-4Slides ineq-4
Slides ineq-4
 
Side 2019 #5
Side 2019 #5Side 2019 #5
Side 2019 #5
 
Slides ensae-2016-11
Slides ensae-2016-11Slides ensae-2016-11
Slides ensae-2016-11
 
Slides econometrics-2017-graduate-2
Slides econometrics-2017-graduate-2Slides econometrics-2017-graduate-2
Slides econometrics-2017-graduate-2
 
Econometrics, PhD Course, #1 Nonlinearities
Econometrics, PhD Course, #1 NonlinearitiesEconometrics, PhD Course, #1 Nonlinearities
Econometrics, PhD Course, #1 Nonlinearities
 
Tutorial on testing at O'Bayes 2015, Valencià, June 1, 2015
Tutorial on testing at O'Bayes 2015, Valencià, June 1, 2015Tutorial on testing at O'Bayes 2015, Valencià, June 1, 2015
Tutorial on testing at O'Bayes 2015, Valencià, June 1, 2015
 
Side 2019 #7
Side 2019 #7Side 2019 #7
Side 2019 #7
 

Viewers also liked

PSPP overview and Introduction to R & R Commander
PSPP overview and Introduction to R & R CommanderPSPP overview and Introduction to R & R Commander
PSPP overview and Introduction to R & R CommanderBernard Deepal W. Jayamanne
 
作らずにポテンシャルを検証する方法
作らずにポテンシャルを検証する方法作らずにポテンシャルを検証する方法
作らずにポテンシャルを検証する方法Shigeyuki Kameda
 
Asistencia de padres de familia. 1 mer reunion
Asistencia de padres de familia. 1 mer reunionAsistencia de padres de familia. 1 mer reunion
Asistencia de padres de familia. 1 mer reunionCarlos Perez
 
Configuración de la red
Configuración de la redConfiguración de la red
Configuración de la redIvanerito
 
Центр развития социально значимых предпринимательских проектов
Центр развития социально значимых предпринимательских проектовЦентр развития социально значимых предпринимательских проектов
Центр развития социально значимых предпринимательских проектовLAZOVOY
 
Computer Hardware Introduction
Computer Hardware IntroductionComputer Hardware Introduction
Computer Hardware IntroductionSeenivasan SR
 
Международный кинофестиваль имени Андрея Тарковского «Зеркало»
Международный кинофестиваль имени Андрея Тарковского «Зеркало»Международный кинофестиваль имени Андрея Тарковского «Зеркало»
Международный кинофестиваль имени Андрея Тарковского «Зеркало»culture-brand
 
Flipped classroom nuestro sistema solar-
Flipped classroom  nuestro sistema solar-Flipped classroom  nuestro sistema solar-
Flipped classroom nuestro sistema solar-fmoraga
 
Measurement Statistics
Measurement StatisticsMeasurement Statistics
Measurement StatisticsAreej Fatima
 
Cубботы активиста
Cубботы активистаCубботы активиста
Cубботы активистаTCenter500
 
Basics in Epidemiology & Biostatistics 2 RSS6 2014
Basics in Epidemiology & Biostatistics 2 RSS6 2014Basics in Epidemiology & Biostatistics 2 RSS6 2014
Basics in Epidemiology & Biostatistics 2 RSS6 2014RSS6
 
Randomized controlled trial: Going for the Gold
Randomized controlled trial: Going for the GoldRandomized controlled trial: Going for the Gold
Randomized controlled trial: Going for the GoldGaurav Kamboj
 
Basics in Epidemiology & Biostatistics 1 RSS6 2014
Basics in Epidemiology & Biostatistics 1 RSS6 2014Basics in Epidemiology & Biostatistics 1 RSS6 2014
Basics in Epidemiology & Biostatistics 1 RSS6 2014RSS6
 
Meta analysis: Made Easy with Example from RevMan
Meta analysis: Made Easy with Example from RevManMeta analysis: Made Easy with Example from RevMan
Meta analysis: Made Easy with Example from RevManGaurav Kamboj
 
Logistic regression with SPSS examples
Logistic regression with SPSS examplesLogistic regression with SPSS examples
Logistic regression with SPSS examplesGaurav Kamboj
 
Introduction to biostatistics
Introduction to biostatisticsIntroduction to biostatistics
Introduction to biostatisticsAli Al Mousawi
 

Viewers also liked (19)

PSPP overview and Introduction to R & R Commander
PSPP overview and Introduction to R & R CommanderPSPP overview and Introduction to R & R Commander
PSPP overview and Introduction to R & R Commander
 
Proteccion juridica DIANA CABALLERO
Proteccion juridica DIANA CABALLEROProteccion juridica DIANA CABALLERO
Proteccion juridica DIANA CABALLERO
 
作らずにポテンシャルを検証する方法
作らずにポテンシャルを検証する方法作らずにポテンシャルを検証する方法
作らずにポテンシャルを検証する方法
 
Asistencia de padres de familia. 1 mer reunion
Asistencia de padres de familia. 1 mer reunionAsistencia de padres de familia. 1 mer reunion
Asistencia de padres de familia. 1 mer reunion
 
Propuesta
PropuestaPropuesta
Propuesta
 
Configuración de la red
Configuración de la redConfiguración de la red
Configuración de la red
 
Центр развития социально значимых предпринимательских проектов
Центр развития социально значимых предпринимательских проектовЦентр развития социально значимых предпринимательских проектов
Центр развития социально значимых предпринимательских проектов
 
Computer Hardware Introduction
Computer Hardware IntroductionComputer Hardware Introduction
Computer Hardware Introduction
 
Международный кинофестиваль имени Андрея Тарковского «Зеркало»
Международный кинофестиваль имени Андрея Тарковского «Зеркало»Международный кинофестиваль имени Андрея Тарковского «Зеркало»
Международный кинофестиваль имени Андрея Тарковского «Зеркало»
 
Measurement and descriptive statistics
Measurement and descriptive statisticsMeasurement and descriptive statistics
Measurement and descriptive statistics
 
Flipped classroom nuestro sistema solar-
Flipped classroom  nuestro sistema solar-Flipped classroom  nuestro sistema solar-
Flipped classroom nuestro sistema solar-
 
Measurement Statistics
Measurement StatisticsMeasurement Statistics
Measurement Statistics
 
Cубботы активиста
Cубботы активистаCубботы активиста
Cубботы активиста
 
Basics in Epidemiology & Biostatistics 2 RSS6 2014
Basics in Epidemiology & Biostatistics 2 RSS6 2014Basics in Epidemiology & Biostatistics 2 RSS6 2014
Basics in Epidemiology & Biostatistics 2 RSS6 2014
 
Randomized controlled trial: Going for the Gold
Randomized controlled trial: Going for the GoldRandomized controlled trial: Going for the Gold
Randomized controlled trial: Going for the Gold
 
Basics in Epidemiology & Biostatistics 1 RSS6 2014
Basics in Epidemiology & Biostatistics 1 RSS6 2014Basics in Epidemiology & Biostatistics 1 RSS6 2014
Basics in Epidemiology & Biostatistics 1 RSS6 2014
 
Meta analysis: Made Easy with Example from RevMan
Meta analysis: Made Easy with Example from RevManMeta analysis: Made Easy with Example from RevMan
Meta analysis: Made Easy with Example from RevMan
 
Logistic regression with SPSS examples
Logistic regression with SPSS examplesLogistic regression with SPSS examples
Logistic regression with SPSS examples
 
Introduction to biostatistics
Introduction to biostatisticsIntroduction to biostatistics
Introduction to biostatistics
 

Similar to Bayesian_SEM_HT

Inference for stochastic differential equations via approximate Bayesian comp...
Inference for stochastic differential equations via approximate Bayesian comp...Inference for stochastic differential equations via approximate Bayesian comp...
Inference for stochastic differential equations via approximate Bayesian comp...Umberto Picchini
 
ABC short course: model choice chapter
ABC short course: model choice chapterABC short course: model choice chapter
ABC short course: model choice chapterChristian Robert
 
Decision Trees and Bayes Classifiers
Decision Trees and Bayes ClassifiersDecision Trees and Bayes Classifiers
Decision Trees and Bayes ClassifiersAlexander Jung
 
Columbia workshop [ABC model choice]
Columbia workshop [ABC model choice]Columbia workshop [ABC model choice]
Columbia workshop [ABC model choice]Christian Robert
 
Hessian Matrices in Statistics
Hessian Matrices in StatisticsHessian Matrices in Statistics
Hessian Matrices in StatisticsFerris Jumah
 
A Tutorial of the EM-algorithm and Its Application to Outlier Detection
A Tutorial of the EM-algorithm and Its Application to Outlier DetectionA Tutorial of the EM-algorithm and Its Application to Outlier Detection
A Tutorial of the EM-algorithm and Its Application to Outlier DetectionKonkuk University, Korea
 
An Interactive Decomposition Algorithm for Two-Level Large Scale Linear Multi...
An Interactive Decomposition Algorithm for Two-Level Large Scale Linear Multi...An Interactive Decomposition Algorithm for Two-Level Large Scale Linear Multi...
An Interactive Decomposition Algorithm for Two-Level Large Scale Linear Multi...IJERA Editor
 
Statistical Decision Theory
Statistical Decision TheoryStatistical Decision Theory
Statistical Decision TheorySangwoo Mo
 
A MODIFIED VORTEX SEARCH ALGORITHM FOR NUMERICAL FUNCTION OPTIMIZATION
A MODIFIED VORTEX SEARCH ALGORITHM FOR NUMERICAL FUNCTION OPTIMIZATIONA MODIFIED VORTEX SEARCH ALGORITHM FOR NUMERICAL FUNCTION OPTIMIZATION
A MODIFIED VORTEX SEARCH ALGORITHM FOR NUMERICAL FUNCTION OPTIMIZATIONijaia
 
Approximate Bayesian computation for the Ising/Potts model
Approximate Bayesian computation for the Ising/Potts modelApproximate Bayesian computation for the Ising/Potts model
Approximate Bayesian computation for the Ising/Potts modelMatt Moores
 
Eigenvalues of Symmetrix Hierarchical Matrices
Eigenvalues of Symmetrix Hierarchical MatricesEigenvalues of Symmetrix Hierarchical Matrices
Eigenvalues of Symmetrix Hierarchical MatricesThomas Mach
 
Image Classification And Support Vector Machine
Image Classification And Support Vector MachineImage Classification And Support Vector Machine
Image Classification And Support Vector MachineShao-Chuan Wang
 

Similar to Bayesian_SEM_HT (20)

MUMS: Bayesian, Fiducial, and Frequentist Conference - Multidimensional Monot...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Multidimensional Monot...MUMS: Bayesian, Fiducial, and Frequentist Conference - Multidimensional Monot...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Multidimensional Monot...
 
MUMS Opening Workshop - Quantifying Nonparametric Modeling Uncertainty with B...
MUMS Opening Workshop - Quantifying Nonparametric Modeling Uncertainty with B...MUMS Opening Workshop - Quantifying Nonparametric Modeling Uncertainty with B...
MUMS Opening Workshop - Quantifying Nonparametric Modeling Uncertainty with B...
 
Inference for stochastic differential equations via approximate Bayesian comp...
Inference for stochastic differential equations via approximate Bayesian comp...Inference for stochastic differential equations via approximate Bayesian comp...
Inference for stochastic differential equations via approximate Bayesian comp...
 
ABC short course: model choice chapter
ABC short course: model choice chapterABC short course: model choice chapter
ABC short course: model choice chapter
 
Decision Trees and Bayes Classifiers
Decision Trees and Bayes ClassifiersDecision Trees and Bayes Classifiers
Decision Trees and Bayes Classifiers
 
Columbia workshop [ABC model choice]
Columbia workshop [ABC model choice]Columbia workshop [ABC model choice]
Columbia workshop [ABC model choice]
 
Edinburgh, Bayes-250
Edinburgh, Bayes-250Edinburgh, Bayes-250
Edinburgh, Bayes-250
 
Hessian Matrices in Statistics
Hessian Matrices in StatisticsHessian Matrices in Statistics
Hessian Matrices in Statistics
 
2018 MUMS Fall Course - Introduction to statistical and mathematical model un...
2018 MUMS Fall Course - Introduction to statistical and mathematical model un...2018 MUMS Fall Course - Introduction to statistical and mathematical model un...
2018 MUMS Fall Course - Introduction to statistical and mathematical model un...
 
QMC: Transition Workshop - Small Sample Statistical Analysis and Algorithms f...
QMC: Transition Workshop - Small Sample Statistical Analysis and Algorithms f...QMC: Transition Workshop - Small Sample Statistical Analysis and Algorithms f...
QMC: Transition Workshop - Small Sample Statistical Analysis and Algorithms f...
 
1641
16411641
1641
 
A Tutorial of the EM-algorithm and Its Application to Outlier Detection
A Tutorial of the EM-algorithm and Its Application to Outlier DetectionA Tutorial of the EM-algorithm and Its Application to Outlier Detection
A Tutorial of the EM-algorithm and Its Application to Outlier Detection
 
An Interactive Decomposition Algorithm for Two-Level Large Scale Linear Multi...
An Interactive Decomposition Algorithm for Two-Level Large Scale Linear Multi...An Interactive Decomposition Algorithm for Two-Level Large Scale Linear Multi...
An Interactive Decomposition Algorithm for Two-Level Large Scale Linear Multi...
 
Statistical Decision Theory
Statistical Decision TheoryStatistical Decision Theory
Statistical Decision Theory
 
A MODIFIED VORTEX SEARCH ALGORITHM FOR NUMERICAL FUNCTION OPTIMIZATION
A MODIFIED VORTEX SEARCH ALGORITHM FOR NUMERICAL FUNCTION OPTIMIZATIONA MODIFIED VORTEX SEARCH ALGORITHM FOR NUMERICAL FUNCTION OPTIMIZATION
A MODIFIED VORTEX SEARCH ALGORITHM FOR NUMERICAL FUNCTION OPTIMIZATION
 
Es272 ch5a
Es272 ch5aEs272 ch5a
Es272 ch5a
 
Approximate Bayesian computation for the Ising/Potts model
Approximate Bayesian computation for the Ising/Potts modelApproximate Bayesian computation for the Ising/Potts model
Approximate Bayesian computation for the Ising/Potts model
 
QMC Opening Workshop, High Accuracy Algorithms for Interpolating and Integrat...
QMC Opening Workshop, High Accuracy Algorithms for Interpolating and Integrat...QMC Opening Workshop, High Accuracy Algorithms for Interpolating and Integrat...
QMC Opening Workshop, High Accuracy Algorithms for Interpolating and Integrat...
 
Eigenvalues of Symmetrix Hierarchical Matrices
Eigenvalues of Symmetrix Hierarchical MatricesEigenvalues of Symmetrix Hierarchical Matrices
Eigenvalues of Symmetrix Hierarchical Matrices
 
Image Classification And Support Vector Machine
Image Classification And Support Vector MachineImage Classification And Support Vector Machine
Image Classification And Support Vector Machine
 

Bayesian_SEM_HT

  • 1. 1/44 Bayesian Structural Equations Modeling M’hamed (Hamy) Temkit1 1Division of Biostatistics Mayo Clinic, Arizona Applied Statistics Seminar, November 17, 2016 M’hamed (Hamy) Temkit Division of Biostatistics Bayesian Structural Equations Modeling
  • 2. 2/44 Outline Introduction to SEM Covariance Analysis SEM Estimation (GLS vs MLE) CFA The General Model of SEM LAAVAN Bayesian Paradigm Bayesian SEM Bayesian CFA BLAAVAN CONCLUSION M’hamed (Hamy) Temkit Division of Biostatistics Bayesian Structural Equations Modeling
  • 3. 3/44 Motivation M’hamed (Hamy) Temkit Division of Biostatistics Bayesian Structural Equations Modeling
  • 4. 4/44 Motivation M’hamed (Hamy) Temkit Division of Biostatistics Bayesian Structural Equations Modeling
  • 5. 5/44 Two Paradigms Covariance Analysis Σ = Σ(θ) Bayesian Inference p(θ | y) = p(y | θ)p(θ) M’hamed (Hamy) Temkit Division of Biostatistics Bayesian Structural Equations Modeling
  • 6. 6/44 Brief SEM Terminology ξ1 X1 X2 δ1
 δ2
 λx11
 λx21
 ξ2 X3 X4 δ1
 δ2
 λx32
 λx42
 ξ3 X5 X6 δ1
 δ2
 λx53
 λx63
 η1 η 2 y1 y2 y3 y4 ε1 ε2 ε3 ε4 λy11
 λy21
 λy32
 λy42
 Measurement model Structural model β21
 γ11
 γ12
 γ22
 γ23
 ϕ21
 ϕ32
 ϕ31
 Endogenous latent variables Exogenous latent variables M’hamed (Hamy) Temkit Division of Biostatistics Bayesian Structural Equations Modeling
  • 7. 7/44 Background Factor Analysis (Spearman, 1904) Path Analysis (Sewal Wright 1918,1921,1934,1960) Confirmatory Factor Analysis (CFA)(Joreskog, 1969 ) General SEM ( Joreskog (1973), Wiley (1973)) LISREL model (Wiley (1973), Joreskog (1977)) Generalized least squares Browne (1974,1982,1984) M’hamed (Hamy) Temkit Division of Biostatistics Bayesian Structural Equations Modeling
  • 8. 8/44 Relevant Reading References Structural Equations With Latent Variables (Bollen, 1989) Structural Equations Modeling With Amos (Byrn) Latent Curve Models (Bollen, Curran 2006) Structural Equation Modeling, A Bayesian Approach (Sik-Yum Lee 2007) Structural Equation Modeling: A Multidisciplinary Journal M’hamed (Hamy) Temkit Division of Biostatistics Bayesian Structural Equations Modeling
  • 9. 9/44 First Principle: Linear Regression M’hamed (Hamy) Temkit Division of Biostatistics Bayesian Structural Equations Modeling
  • 10. 10/44 Linear Regression: The Machinery yi = β0 + β1xi + i , i = 1, n (regression line) min n i=1 (yi − β0 − β1xi )2 (OLS) and if i ∼ N(0, σ2) iid’s max n i=1 1 2πσ2 exp(− 1 2σ2 n i=1 (yi − β0 − β1xi )2 ) (ML) ˆβ ∼ N(β, σ2 (X X)−1 ) M’hamed (Hamy) Temkit Division of Biostatistics Bayesian Structural Equations Modeling
  • 11. 11/44 Pros and Cons of Regression (Linear Models) Oversimplistic view of the Phenomena Underestimates Measurement error (covariates are fixed) Lacking in simultaneous equations in general (mediation ) Lacks flexibility to fit the SEM models M’hamed (Hamy) Temkit Division of Biostatistics Bayesian Structural Equations Modeling
  • 12. 12/44 What is SEM A melding of factor analysis and path (regression) analysis into one comprehensive statistical methodolgy Simultaneous equation modeling Does the implied covariance matrix match up with the observed covariance matrix Degree to which they match represents the goodness of fit M’hamed (Hamy) Temkit Division of Biostatistics Bayesian Structural Equations Modeling
  • 13. 13/44 Estimation (graph) 1.00 0.49 1.00 3.51 1.00 0.84 1.00 230.18 0.59 0.02 -0.00 1.09 1.32 1.20 0.47 0.44 0.34 1.18 -123.86 0.27 -0.02 1.22 0.00 0.51 x1 x2 x3 x4 x5 x6 x7 x8 Eps Tlr Eng Rng M’hamed (Hamy) Temkit Division of Biostatistics Bayesian Structural Equations Modeling
  • 14. 14/44 Estimation (equations) Measurement Model: x1 = a1 + epistemiology + e1 x2 = a2 + b2 epistemiology + e2 x3 = a3 + tolerance + e3 x4 = a4 + b4 tolerance + e4 x5 = a5 + engagement + e5 x6 = a6 + b6 engagement + e6 x7 = a7 + range + e7 x8 = a8 + b8 range + e8 Structural Model: tolerance = a9 + b9 epistemiology + e9 range = a10 + b10 tolerance b11 engagement + e10 cov(epist, engag) = 0 M’hamed (Hamy) Temkit Division of Biostatistics Bayesian Structural Equations Modeling
  • 15. 15/44 Estimation: objective function S =     1 n n i=1(x1i − ¯x1)2 1 n n i=1(x1i − ¯x1)(x2i − ¯x2) · · · cov(x1, x8) cov(x1, x2) var(x2) · · · cov(x2, x8) · · · · · · · · · · · · cov(x1, x8) cov(x2, x8) · · · var(x8)     Σ(θ) = cov(x1, x2, · · · , x8) =     var(x1) cov(x1, x2) · · · cov(x1, x8) cov(x1, x2) var(x2) · · · cov(x2, x8) · · · · · · · · · · · · cov(x1, x8) cov(x2, x8) · · · var(x8)     S ≈ Σ(θ) Basically, minimize f (Σ(θ), S) M’hamed (Hamy) Temkit Division of Biostatistics Bayesian Structural Equations Modeling
  • 16. 16/44 Generalized Least Squares (GLS) {x1, · · · , xn} ∼ N(0, Σ(θ0)), xi ∈ Rp iid’s vec S L −→ N(Σ(θ0), C) G(θ) = 2−1 tr{(S − Σ(θ))V }2 , V > 0 ˆθ L −→ N(θ0, D(θ0)) nG(ˆθ) L −→ χ2 p∗−q p∗ = p(p+1) 2 , q parameters H0 : Σ = Σ(θ) vs Ha : Σ = Σ(θ) M’hamed (Hamy) Temkit Division of Biostatistics Bayesian Structural Equations Modeling
  • 17. 17/44 Maximum Likelihood (ML) {x1, · · · , xn} ∼ N(µ0, Σ(θ0)), xi ∈ Rp iid’s (n − 1)S ∼ Wp(R0, ρ0) F(θ) = log det(Σθ) + tr((SΣ(θ))−1 ) − log det(S) − p ˜θM L −→ N(θ0, C2(θ0)) nF( ˜θM) L −→ χ2 p∗−q H0 : Σ = Σ(θ) vs Ha : Σ = Σ(θ) M’hamed (Hamy) Temkit Division of Biostatistics Bayesian Structural Equations Modeling
  • 18. 18/44 SEM Modeling Model ( Diagram ) Identifyability ( q ≤ 2−1p(p + 1)), check identifyabiltiy rules in Bollen (page 238) Constraints ( loadings equal 1 ) EDA ( Distribution, correlation, outliers, etc...) EDA ( Estimation ) Fit indices ( SMR ( residuals )) Diagnostics ( residuals, outliers, etc... ) M’hamed (Hamy) Temkit Division of Biostatistics Bayesian Structural Equations Modeling
  • 19. 19/44 Measurement model (CFA) xi = Λξi + i , i = 1, · · · , n ξ ∼ N(0, Φ), Latent variables ∼ N(0, Ψ ), Ψ diagonal ξ and are uncorrelated Σ = ΛΦΛt + Ψ Λ, Φ, Ψ are the parameters M’hamed (Hamy) Temkit Division of Biostatistics Bayesian Structural Equations Modeling
  • 20. 20/44 CFA Example (graph) 1.00 0.55 0.73 1.00 1.11 0.93 1.00 1.18 1.08 0.55 1.13 0.84 0.37 0.45 0.36 0.80 0.49 0.57 0.81 0.98 0.38 0.41 0.26 0.17 x1 x2 x3 x4 x5 x6 x7 x8 x9 vsl txt spd M’hamed (Hamy) Temkit Division of Biostatistics Bayesian Structural Equations Modeling
  • 21. 21/44 CFA (loadings and latents) ξ =   vsl txt spd   Λ =               1 0 0 λ21 0 0 λ31 0 0 0 1 0 0 λ52 0 0 λ62 0 0 0 1 0 0 λ82 0 0 λ92               But also remember the variances and covariances M’hamed (Hamy) Temkit Division of Biostatistics Bayesian Structural Equations Modeling
  • 22. 22/44 CFA using Laavan (R) library(stringr) library(lavaan) library(DiagrammeR) library(dplyr) library(semPlot) # specify the model HS.model <- " visual =~ x1 + x2 + x3 textual =~ x4 + x5 + x6 speed =~ x7 + x8 + x9 " fit.HS <- sem(HS.model, data=HolzingerSwineford1939) summary(fit.HS) semPaths(fit.HS, intercept = FALSE, whatLabel = "est", residuals = TRUE, exoCov = TRUE) M’hamed (Hamy) Temkit Division of Biostatistics Bayesian Structural Equations Modeling
  • 23. 23/44 CFA Example (output) > summary(fit.HS) lavaan (0.5-22) converged normally after 35 iterations Number of observations 301 Estimator ML Minimum Function Test Statistic 85.306 Degrees of freedom 24 P-value (Chi-square) 0.000 Parameter Estimates: Information Expected Standard Errors Standard Latent Variables: Estimate Std.Err z-value P(>|z|) visual =~ x1 1.000 x2 0.554 0.100 5.554 0.000 x3 0.729 0.109 6.685 0.000 textual =~ x4 1.000 x5 1.113 0.065 17.014 0.000 x6 0.926 0.055 16.703 0.000 speed =~ x7 1.000 x8 1.180 0.165 7.152 0.000 x9 1.082 0.151 7.155 0.000 Covariances: Estimate Std.Err z-value P(>|z|) visual ~~ textual 0.408 0.074 5.552 0.000 speed 0.262 0.056 4.660 0.000 textual ~~ speed 0.173 0.049 3.518 0.000 Variances: Estimate Std.Err z-value P(>|z|) .x1 0.549 0.114 4.833 0.000 .x2 1.134 0.102 11.146 0.000 .x3 0.844 0.091 9.317 0.000 .x4 0.371 0.048 7.779 0.000 .x5 0.446 0.058 7.642 0.000 .x6 0.356 0.043 8.277 0.000 .x7 0.799 0.081 9.823 0.000 .x8 0.488 0.074 6.573 0.000 .x9 0.566 0.071 8.003 0.000 visual 0.809 0.145 5.564 0.000 textual 0.979 0.112 8.737 0.000 speed 0.384 0.086 4.451 0.000 M’hamed (Hamy) Temkit Division of Biostatistics Bayesian Structural Equations Modeling
  • 24. 24/44 Structural model (SEM) η = Bη + Γξ + ζ y = Λy η + x = Λx ξ + δ B, Γ, Λy , Λx , Φ, Ψ, Θ , Θδ, are the parameters M’hamed (Hamy) Temkit Division of Biostatistics Bayesian Structural Equations Modeling
  • 25. 25/44 SEM Example (graph) 1.00 2.18 1.82 1.00 1.26 1.06 1.26 1.00 1.19 1.28 1.27 1.48 0.57 0.84 0.62 1.31 2.15 0.79 0.35 1.36 x1 x2 x3 y1 y2 y3 y4 y5 y6 y7 y8 i60 d60 d65 M’hamed (Hamy) Temkit Division of Biostatistics Bayesian Structural Equations Modeling
  • 26. 26/44 SEM Example (some equations) d60 d65 = 0 0 B21 0 d60 d65 + γ11 γ21 i60 + ξ1 ξ2 Σ(θ) = Σyy (θ) Σyx (θ) Σxy (θ) Σxx (θ) M’hamed (Hamy) Temkit Division of Biostatistics Bayesian Structural Equations Modeling
  • 27. 27/44 SEM Example ( R code) # specify the model model <- ’ # latent variables ind60 =~ x1 + x2 + x3 dem60 =~ y1 + y2 + y3 + y4 dem65 =~ y5 + y6 + y7 + y8 # regressions dem60 ~ ind60 dem65 ~ ind60 + dem60 # residual covariances y1 ~~ y5 y2 ~~ y4 + y6 y3 ~~ y7 y4 ~~ y8 y6 ~~ y8 ’ fit <- sem(model, data=PoliticalDemocracy) summary(fit) semPaths(fit, intercept = FALSE, whatLabel = "est", residuals = FALSE, exoCov = FALSE) M’hamed (Hamy) Temkit Division of Biostatistics Bayesian Structural Equations Modeling
  • 28. 28/44 SEM Example (output) summary(fit) lavaan (0.5-22) converged normally after 68 iterations Number of observations 75 Estimator ML Minimum Function Test Statistic 38.125 Degrees of freedom 35 P-value (Chi-square) 0.329 Parameter Estimates: Information Expected Standard Errors Standard Latent Variables: Estimate Std.Err z-value P(>|z|) ind60 =~ x1 1.000 x2 2.180 0.139 15.742 0.000 x3 1.819 0.152 11.967 0.000 dem60 =~ y1 1.000 y2 1.257 0.182 6.889 0.000 y3 1.058 0.151 6.987 0.000 y4 1.265 0.145 8.722 0.000 dem65 =~ y5 1.000 y6 1.186 0.169 7.024 0.00 y7 1.280 0.160 8.002 0.00 y8 1.266 0.158 8.007 0.00 Regressions: Estimate Std.Err z-value P(>|z| dem60 ~ ind60 1.483 0.399 3.715 0.00 dem65 ~ ind60 0.572 0.221 2.586 0.01 dem60 0.837 0.098 8.514 0.00 M’hamed (Hamy) Temkit Division of Biostatistics Bayesian Structural Equations Modeling
  • 29. 29/44 SEM Example (output) Covariances: Estimate Std.Err z-value P(>|z|) .y1 ~~ .y5 0.624 0.358 1.741 0.082 .y2 ~~ .y4 1.313 0.702 1.871 0.061 .y6 2.153 0.734 2.934 0.003 .y3 ~~ .y7 0.795 0.608 1.308 0.191 .y4 ~~ .y8 0.348 0.442 0.787 0.431 .y6 ~~ .y8 1.356 0.568 2.386 0.017 Variances: Estimate Std.Err z-value P(>|z|) .x1 0.082 0.019 4.184 0.000 .x2 0.120 0.070 1.718 0.086 .x3 0.467 0.090 5.177 0.000 .y1 1.891 0.444 4.256 0.000 .y2 7.373 1.374 5.366 0.000 .y3 5.067 0.952 5.324 0.000 .y4 3.148 0.739 4.261 0.000 .y5 2.351 0.480 4.895 0.000 .y6 4.954 0.914 5.419 0.000 .y7 3.431 0.713 4.814 0.00 .y8 3.254 0.695 4.685 0.00 ind60 0.448 0.087 5.173 0.00 .dem60 3.956 0.921 4.295 0.00 .dem65 0.172 0.215 0.803 0.42 M’hamed (Hamy) Temkit Division of Biostatistics Bayesian Structural Equations Modeling
  • 30. 30/44 Why Bayesian Flexibility to utilize prior knowledge ( priors ) Robust to small sample sizes Bayes Factor and flexibility in comparing models Easy production of the Latent scores ( Factors ) Blaavan ( open software in R ) WinBUGS ( open software ) M’hamed (Hamy) Temkit Division of Biostatistics Bayesian Structural Equations Modeling
  • 31. 31/44 Bayesian References A Bayesian approach to confirmatory factor analysis (Lee, 1980) Evaluation of the Bayesian and maximum likelihood approaches in analyzing structural equation models with small small sample sizes (Lee, Song, 2004) Structural Equation Modeling, A Bayesian Approach (Lee, 2007) Basic and Advanced Bayesian Structural Equation Modeling, With Applications in the Medical and Behavioral Sciences (Song, Lee, 2012) M’hamed (Hamy) Temkit Division of Biostatistics Bayesian Structural Equations Modeling
  • 32. 32/44 Bayesian estimation log p(Θ|Y , M) ∝ log p(Y |Θ, M) + log p(Θ) M: arbitrary SEM model Y: observed dataset of raw observations, sample size n θ: Random vector of parameters in M M’hamed (Hamy) Temkit Division of Biostatistics Bayesian Structural Equations Modeling
  • 33. 33/44 Conjugate priors p(y|θ) = n k θy (1 − θ)n−y , θ ∈ (0, 1) p(θ) ∝ θα−1(1 − θ)β−1 , θ ∼ β(α, β) p(θ|y) ∝ p(y|θ)p(θ) ∝ θy (1 − θ)n−y (1 − θ)β−1 ∝ θy+α−1(1 − θ)n−y+β−1 ∼ β(y + α, n − y + β) The prior p(θ) and posterior p(θ|y) have the same distribution form M’hamed (Hamy) Temkit Division of Biostatistics Bayesian Structural Equations Modeling
  • 34. 34/44 Measurement model (CFA) Bayesian approach yi = Λwi + i , i = 1, · · · , n, yi ∈ Rk wi ∼ N(0, Φ), w ∈ Rq i ∼ N(0, Ψ ), Ψ diagonal, Ψ k elements wi and i are independent Λ, Φ, Ψ are the parameters Let Λt k be the kth row of Λ M’hamed (Hamy) Temkit Division of Biostatistics Bayesian Structural Equations Modeling
  • 35. 35/44 Measurement model (CFA) priors The conjugate priors on the parameters are: Ψ k ∼ IGamma(α∗ 0 k, β∗ 0 k) [Λk|Ψ k] ∼ N(Λ0k, Ψ kH0yk) Φ ∼ IWq(R∗ 0 , ρ0), R∗ 0 is pd The problem is choosing the hyperparameters, such that we have informative vs. non informative priors M’hamed (Hamy) Temkit Division of Biostatistics Bayesian Structural Equations Modeling
  • 36. 36/44 Measurement model (CFA) Gibbs Sampling (MCMC) Let Y = {y1, · · · , yn} be the observed data matrix Ω = (w1, · · · , wn) matrix of the the latent variables (Y , Ω) is the complete dataset ( augmented data ) P(Λ, Φ, Ψ |Y ) the posterior is intractable P(Λ, Φ, Ψ |Ω, Y ) usually standard P(Ω|Λ, Φ, Ψ , Y ) can be also derived based on Model M M’hamed (Hamy) Temkit Division of Biostatistics Bayesian Structural Equations Modeling
  • 37. 37/44 Measurement model (CFA) Gibbs Sampling The Gibbs sampling algorithm allows to sample from P(Λ, Φ, Ψ , Ω|Y ) at the (j + 1)thiteration given Ωj , Λj , Φj , Ψj Generate Ωj+1 ∼ P(Ω|Λj , Φj , Ψj , Y ) Generate Ψj+1 ∼ P(Ψ |Ωj+1, Λj , Φj , Y ) Generate Φj+1 ∼ P(Φ|Ωj+1, Λj , Ψj+1 , Y ) Generate Λj+1 ∼ P(Λ|Ωj+1, Φj+1, Ψj+1 , Y ) M’hamed (Hamy) Temkit Division of Biostatistics Bayesian Structural Equations Modeling
  • 38. 38/44 Measurement model (CFA) Posterior Parameters Estimates θt = (Λt, Φt, Ψt), t = 1, · · · , T∗ ˆθ = 1 T∗ T∗ i=1 θt var(ˆθ) = 1 (T∗ − 1) T∗ i=1 (θt − ˆθ)(θt − ˆθ)t along with 95% confidence intervals using the Q0.025 and Q0.975 M’hamed (Hamy) Temkit Division of Biostatistics Bayesian Structural Equations Modeling
  • 39. 39/44 Bayesian CFA Example using Blaavan library(blavaan) # specify the model bHS.model <- " visual =~ x1 + x2 + x3 textual =~ x4 + x5 + x6 speed =~ x7 + x8 + x9 # intercepts x1 ~ 0 x2 ~ 0 x3 ~ 0 x4 ~ 0 x5 ~ 0 x6 ~ 0 x7 ~ 0 x8 ~ 0 x9 ~ 0 " bfit.HS <- bsem(bHS.model, data=HolzingerSwineford1939 ) summary(bfit.HS) fitMeasures(bfit.HS,fit.measures="all", baseline.model= NULL) M’hamed (Hamy) Temkit Division of Biostatistics Bayesian Structural Equations Modeling
  • 40. 40/44 Bayesian CFA Example (output) blavaan (0.2-2) results of 10000 samples after 5000 adapt+burnin iterations Number of observations 301 Number of missing patterns 1 Statistic MargLogLik PPP Value -4481.087 0.000 Parameter Estimates: Latent Variables: Estimate Post.SD HPD.025 HPD.975 PSRF Prior visual =~ x1 1.000 x2 1.221 0.018 1.186 1.255 1.000 dnorm(0,1e-2) x3 0.463 0.012 0.438 0.487 1.000 dnorm(0,1e-2) textual =~ x4 1.000 x5 1.404 0.020 1.365 1.445 1.004 dnorm(0,1e-2) x6 0.731 0.016 0.7 0.761 1.001 dnorm(0,1e-2) speed =~ x7 1.000 x8 1.320 0.020 1.28 1.357 1.002 dnorm(0,1e-2) x9 1.286 0.019 1.25 1.325 1.002 dnorm(0,1e-2) M’hamed (Hamy) Temkit Division of Biostatistics Bayesian Structural Equations Modeling
  • 41. 41/44 Bayesian CFA Example (output) Covariances: Estimate Post.SD HPD.025 HPD.975 PSRF Prior visual ~~ textual 15.500 1.321 12.998 18.14 1.000 dwish(iden,4) speed 20.910 1.764 17.576 24.439 1.000 dwish(iden,4) textual ~~ speed 13.003 1.118 10.9 15.259 1.000 dwish(iden,4) Intercepts: Estimate Post.SD HPD.025 HPD.975 PSRF Prior .x1 0.000 .x2 0.000 .x3 0.000 .x4 0.000 .x5 0.000 .x6 0.000 .x7 0.000 .x8 0.000 .x9 0.000 visual 0.000 textual 0.000 speed 0.000 M’hamed (Hamy) Temkit Division of Biostatistics Bayesian Structural Equations Modeling
  • 42. 42/44 Bayesian CFA Example (output) Variances: Estimate Post.SD HPD.025 HPD.975 PSRF Prior .x1 0.716 0.088 0.547 0.891 1.001 dgamma(1,.5) .x2 1.219 0.138 0.96 1.5 1.000 dgamma(1,.5) .x3 0.993 0.086 0.832 1.164 1.000 dgamma(1,.5) .x4 0.449 0.053 0.346 0.552 1.001 dgamma(1,.5) .x5 0.314 0.069 0.184 0.452 1.002 dgamma(1,.5) .x6 0.509 0.048 0.417 0.604 1.000 dgamma(1,.5) .x7 0.877 0.084 0.717 1.045 1.000 dgamma(1,.5) .x8 0.567 0.077 0.417 0.72 1.000 dgamma(1,.5) .x9 0.478 0.068 0.347 0.61 1.000 dgamma(1,.5) visual 24.998 2.118 20.929 29.176 1.000 dwish(iden,4) textual 10.256 0.882 8.518 11.953 1.001 dwish(iden,4) speed 17.812 1.539 14.813 20.859 1.001 dwish(iden,4) > fitMeasures(bfit.HS,fit.measures="all", baseline.model= NULL) npar logl ppp bic dic p_dic waic 21.000 -4398.287 0.000 8916.354 8837.747 20.586 8838.364 p_waic looic p_loo margloglik 20.848 8838.391 20.861 -4481.087 M’hamed (Hamy) Temkit Division of Biostatistics Bayesian Structural Equations Modeling
  • 43. 43/44 Conclusions The frequentist SEM approach is based on MLE The Bayesian approach with data augmentation and MCMC methods is flexible to analyze SEM The Bayesian approach may be used when prior knowledge is availabe when small sample size Some open problems (power, optimal designs, GSEM, etc...) M’hamed (Hamy) Temkit Division of Biostatistics Bayesian Structural Equations Modeling
  • 44. 44/44 THANK YOU! M’hamed (Hamy) Temkit Division of Biostatistics Bayesian Structural Equations Modeling