Successfully reported this slideshow.                                       Upcoming SlideShare
×

# A measure to evaluate latent variable model fit by sensitivity analysis

787 views

Published on

Latent variable models involve restrictions on the data that can be formulated in terms of "misspecifications": restrictions with a model-based meaning. Examples include zero cross-loadings and local dependencies, as well as “measurement invariance” or “differential item functioning”. If incorrect, misspecifications can potentially disturb the main purpose of the latent variable analysis—seriously so in some cases.
Recently, I proposed to evaluate whether a particular analysis at hand is such a case or not.
To do this, I define a measure based on the likelihood of the restricted model that approximates the change in the parameters of interest if the misspecification were freed, the EPC-interest. The main idea is to examine the EPC-interest and free those misspecifications that are “important” while ignoring those that are not. I have implemented the EPC-interest in the lavaan software for structural equation modeling and the Latent Gold software for latent class analysis.
This approach can resolve several problems and inconsistencies in the current practice of model fit evaluation used in latent variable analysis, something I illustrate using analyses from the “measurement invariance” literature and from item response theory.

Published in: Science
• Full Name
Comment goes here.

Are you sure you want to Yes No • Be the first to comment

### A measure to evaluate latent variable model fit by sensitivity analysis

1. 1. A measure to evaluate latent variable model ﬁt by sensitivity analysis Daniel Oberski Department of methodology and statistics Dept of Statistics, Leiden University Latent variable model ﬁt by sensitivity analysis Daniel Oberski
2. 2. Latent variable models What do they assume and what are they good for? Latent variable model ﬁt by sensitivity analysis Daniel Oberski
3. 3. ξ y1 y2 yJ... p(y) = ∑ ξ p(ξ) J∏ j=1 p(yj|ξ) Latent variable model ﬁt by sensitivity analysis Daniel Oberski
4. 4. ξ y1 y2 yJ... p(y) = ∑ ξ p(ξ)p(y1, y2|ξ) J∏ j=3 p(yj|ξ) Latent variable model ﬁt by sensitivity analysis Daniel Oberski
5. 5. Example Goal: estimate false positives and false negatives in four diagnostic tests for C. Trachomatis infection: y1 Ligase chain reaction (LCR) test (Yes/No); y2 Polymerase chain reaction (PCR) test (Yes/No); y3 DNA probe test (DNAP) (Yes/No); y4 Culture (CULT) (Yes/No). Tool: 2-latent class model (diseased or non-diseased). (Original data from Dendukuri et al. 2009) Latent variable model ﬁt by sensitivity analysis Daniel Oberski
6. 6. Assume: ξ y1 y2 yJ... But really: ξ y1 y2 yJ... What difference does it make for the goal: false positives and false negatives? (simulation by Van Smeden et al., submitted) Latent variable model ﬁt by sensitivity analysis Daniel Oberski
7. 7. ξ y1 y2 yJ... x p(y) = ∑ ξ p(ξ|x) J∏ j=1 p(yj|ξ) Latent variable model ﬁt by sensitivity analysis Daniel Oberski
8. 8. ξ y1 y2 yJ... x p(y) = ∑ ξ p(ξ|x) J∏ j=1 p(yj|ξ, x) Latent variable model ﬁt by sensitivity analysis Daniel Oberski
9. 9. Example Goal: Estimate gender differences in ”valuing Stimulation”: (1) Very much like me; (2) Like me; (3) Somewhat like me; (4) A little like me; (5) Not like me; (6) Not like me at all. impdiff (S)he looks for adventures and likes to take risks. (S)he wants to have an exciting life. impadv (S)he likes surprises and is always looking for new things to do. He thinks it is important to do lots of different things in life. Tool: Structural Equation Model for European Social Survey data (n = 18519 men and 16740 women). (Original study by Schwarz et al. 2005) Latent variable model ﬁt by sensitivity analysis Daniel Oberski
10. 10. Assume: ξ y1 y2 yJ... x But really (?): ξ y1 y2 yJ... x What difference does it make for the goal: true gender differences in values? (re-analysis of data by Oberski 2014) q q q q q q q q Men value more Women value more −0.2 0.0 0.2 ACPO ST SD HE COTR SE UN BE "Human value" factor Latentmeandifferenceestimate±2s.e. Model q Scalar invariance Free intercept 'Adventure' Latent variable model ﬁt by sensitivity analysis Daniel Oberski
11. 11. PROBLEM The original authors found that the conditional independence model ﬁt the data ”approximately” (p. 1013)... ”Chi-square deteriorated signiﬁcantly, ∆χ2 (19) = 3313, p < .001, but CFI did not change. Change in chi-square is highly sensitive with large sample sizes and complex models. The other indices suggested that scalar invariance might be accepted (CFI = .88, RMSEA = .04, CI = .039.040, PCLOSE = 1.0).” ... but unfortunately this ”acceptable” misspeciﬁcation could reverse their conclusions! Latent variable model ﬁt by sensitivity analysis Daniel Oberski
12. 12. Numbers that indicate how well the model ﬁts the data • Likelihood Ratio vs. saturated • Information-based criteria: AIC, BIC, CAIC, ... • Bivariate residuals (Maydeu & Joe 2005; Oberski, Van Kollenburg & Vermunt 2013) • Score/Lagrange multiplier tests, “modiﬁcation index”, “expected parameter change” (EPC) (Saris, Satorra & Sörbom 1989; Oberski & Vermunt 2013; Oberski & Vermunt accepted) “Fit indices”: • RMSEA: √ (χ2/df)−1) N−1 • CFI: [ (χ2 null − dfnull) − (χ2 − df) ] /(χ2 null − dfnull) • Lots of others: TLI, NFI, NNFI, RFI, IFI, RNI, RMR, SRMR1-3, GFI, AGFI, MFI, ECVI, ... Latent variable model ﬁt by sensitivity analysis Daniel Oberski
13. 13. What is the problem? • We do latent variable modeling with a goal in mind. • But the latent variable model might be misspeciﬁed. • The appropriate question: ”will that affect my goal?” • The actual question: ”do the data ﬁt the model in the population” (LR) or ”are the model and the data far apart relative to model complexity” (RMSEA etc.) What is the solution? Evaluate directly what effect possible misspeciﬁcations have on the goal of the analysis. Latent variable model ﬁt by sensitivity analysis Daniel Oberski
14. 14. How to evaluate directly what effect possible misspeciﬁcations have on the goal of the analysis. Latent variable model ﬁt by sensitivity analysis Daniel Oberski
15. 15. Two ideas to evaluate the effect of misspeciﬁcations 1 Try out all possible models with misspeciﬁcations, calculate the estimates of interest under these models and evaluate whether these are substantively different. Advantage: Does the job. Disadvantage: There may be too many alternative models. Also: are applied researchers really going to do this? 2 Use EPC-interest: expected change in free parameters Advantage: Does the job without the need to estimate any alternative models. Disadvantage: Is an approximation (though a reasonable one). Latent variable model ﬁt by sensitivity analysis Daniel Oberski
16. 16. EPC-interest applied to Stimulation example • After ﬁtting the full scalar invariance model, • Effect size estimate of sex difference in Stimulation is +0.214 (s.e. 0.0139). • But EPC-interest of equal ”Adventure” item intercept is -0.243. • So EPC-interest suggests conclusion can be reversed by freeing a misspeciﬁed scalar invariance restriction • Actual change when freeing this intercept is very close to EPC-interest: -0.235. Latent variable model ﬁt by sensitivity analysis Daniel Oberski
17. 17. EPC-interest How does it work? Latent variable model ﬁt by sensitivity analysis Daniel Oberski
18. 18. • Let’s say there is a restricted model whose purpose it is to estimate its parameters, θ, or some linear function of them such as a subselection, Pθ. • We could parameterize these restrictions as ψ = 0. For example: ψ could be direct effect of gender on ”Adventure”, or loglinear dependence between DNA tests. • The maximum likelihood estimates are then ˆθ = arg max L(θ, ψ = 0) Question: How much would ˆθ change if we freed ψ? Latent variable model ﬁt by sensitivity analysis Daniel Oberski
19. 19. How much would ˆθ change if we freed ψ? The trick is to consider estimate of θ we would get under ψ ̸= 0; that is, ˜θ = arg max L(θ, ψ). As it turns out, we don’t actually need ˜θ, since ˜θ − ˆθ = ˆH −1 θθ ˆHθψD−1 [ ∂L(θ, ψ) ∂ψ θ=ˆθ ] + O(δ′ δ), where H is a Hessian, D = ˆHψψ − ˆH ′ θψ ˆH −1 θθ ˆHθψ and δ is the ”overall wrongness” of the model (ψ′ , θ′ − ˆθ ′ )′. Latent variable model ﬁt by sensitivity analysis Daniel Oberski
20. 20. How much would ˆθ change if we freed ψ? Dropping the approximation term (assuming the model parameters are not ”too far” from the truth) we get the approximation EPC-interest = −P ˆH −1 θθ ˆHθψ EPC-self ≈ −P ˆH −1 θθ ˆHθψ ( ψ − ˆψ ) For those of you familiar with Structural Equation Modeling (or attending my 2013 MBC2 talk), ”EPC-self” is the usual ”expected parameter change” in the ﬁxed parameter vector, i.e. the size of the misspeciﬁcation. Latent variable model ﬁt by sensitivity analysis Daniel Oberski
21. 21. Monte Carlo simulation: EPC-interest is a good approximation to the actual change in parameters of interest when freeing equality restriction Average over 200 replications ∆ν1 ng EPC-self ∆ˆα ∆ˆα bias EPC-interest EPC-interest bias 0.1 50 0.064 0.240 -0.040 -0.034 0.005 0.3 50 0.213 0.313 -0.113 -0.113 -0.001 0.8 50 0.657 0.505 -0.305 -0.401 -0.096 0.1 100 0.058 0.231 -0.031 -0.031 0.000 0.3 100 0.203 0.323 -0.123 -0.109 0.014 0.8 100 0.619 0.492 -0.292 -0.370 -0.077 0.1 500 0.063 0.233 -0.033 -0.033 0.000 0.3 500 0.208 0.307 -0.107 -0.112 -0.005 0.8 500 0.598 0.501 -0.301 -0.349 -0.048 Latent variable model ﬁt by sensitivity analysis Daniel Oberski
22. 22. Another example showcasing EPC-interest Latent variable model ﬁt by sensitivity analysis Daniel Oberski
23. 23. Ranking data in 48 WVS countries Option # M/P Value wording Set A 1. M A high level of economic growth 2. M Making sure this country has strong defense forces 3. P Seeing that people have more say about how things are done at their jobs and in their communities 4. P Trying to make our cities and countryside more beautiful Set B 1. M Maintaining order in the nation 2. P Giving people more say in important government decisions 3. M Fighting rising prices 4. P Protecting freedom of speech Set C 1. M A stable economy 2. P Progress toward a less impersonal and more humane society 3. P Progress toward a society in which ideas count more than money 4. M The ﬁght against crime Latent variable model ﬁt by sensitivity analysis Daniel Oberski
24. 24. Figure: Graphical representation of the multilevel latent class regression model for (post)materialism measured by three partial ranking tasks. Observed variables are shown in rectangles while unobserved (“latent”) variables are shown in ellipses. Latent variable model ﬁt by sensitivity analysis Daniel Oberski
25. 25. Latent class ranking model with 4 choices Each ranking set, for example, set A: P(A1ic = a1, A2ic = a2|Xic = x) = ωa1x ∑ k ωkx ωa2x ∑ k̸=a1 ωkx , where ωkx is the “utility” of object k for respondents in class x. Multilevel structure to account for the countries using group class variable G: P(Xic = x|Z1ic = z1ic, Z2ic = z2, Gc = g) = = exp(αx + γ1xz1 + γ2xz2 + βgx) ∑ t exp(αt + γ1tz1 + γ2tz2 + +βtg) , Latent variable model ﬁt by sensitivity analysis Daniel Oberski
26. 26. Multilevel latent class model w/ covariates for rankings L(θ) = P(A1, A2, B1, B2, C1, C2|Z1, Z2) = C∏ c=1 ∑ G P(Gc) nc∏ i=1 ∑ X P(Xic|Z1ic, Z2ic, Gc)× P(A1ic, A2ic|Xic)P(B1ic, B2ic|Xic)P(C1ic, C2ic|Xic), Goal: estimate γ (especially its sign). Possible problem: Violations of scalar and metric measurement invariance (DIF), parameterized respectively as τ∗ and λ∗. Solution: See if these matter for the sign of γ. Latent variable model ﬁt by sensitivity analysis Daniel Oberski
27. 27. Table: Full invariance multilevel latent class model: parameter estimates of interest with standard errors (columns 3 and 4), as well as expected change in these parameters measured by the EPC-interest when freeing each of six sets of possible misspeciﬁcations (columns 5–10). EPC-interest for... τ∗ jkg λ∗ jkxg Estimates Ranking task Ranking t Est. s.e. 1 2 3 1 2 Class 1 GDP -0.035 (0.007) -0.013 0.021 -0.002 0.073 0.252 Class 2 GDP -0.198 (0.012) -0.018 -0.035 0.015 -0.163 -0.058 Class 1 Women 0.013 (0.001) -0.006 0.002 0.000 -0.003 0.029 Class 2 Women -0.037 (0.001) 0.007 -0.003 0.002 -0.006 -0.013 Latent variable model ﬁt by sensitivity analysis Daniel Oberski
28. 28. Table: Partially invariant multilevel latent class model: parameter estimates of interest with standard errors (columns 3 and 4), as well as expected change in these parameters measured by the EPC-interest when freeing each of four sets of remaining possible misspeciﬁcations (columns 5–7 and 10). EPC-interest for non-invariance of... τ∗ kg λ∗ kxg Ranking task Ranking task Est. s.e. 1 2 3 1 2 3 Class 1 GDP -0.127 (0.008) -0.015 -0.003 0.002 0.097 Class 2 GDP 0.057 (0.011) -0.043 -0.013 0.002 0.161 Class 1 Women 0.008 (0.001) -0.002 0.000 0.002 0.001 Class 2 Women 0.020 (0.001) -0.007 -0.001 0.002 0.007 Latent variable model ﬁt by sensitivity analysis Daniel Oberski
29. 29. Mixed Postmaterialist Materialist Mixed Postmaterialist Materialist % Women in parliament GDP per capita 0.2 0.4 0.6 Minimum Maximum Minimum Maximum Covariate level ProbabilityofClass Figure: Estimated probability of choosing each class as a function of the covariates of interest under the ﬁnal model. Latent variable model ﬁt by sensitivity analysis Daniel Oberski
30. 30. ARM AUS AZE BLR CHL CHNCOL CYP DEU DZA ECU EGY ESPEST GHA IRQ JOR JPN KAZ KGZ KOR LBN MAR MEX MYSNGA NLD NZL PAK PER PHL POLQAT ROU RUS RWA SGPSVN SWE TTO TUN TUR UKR URY USA UZB YEM ZWE ARM AUS AZE BLR CHL CHN COL CYP DEU DZA ECU EGY ESP ESTGHA IRQJOR JPN KAZ KGZ KOR LBN MAR MEX MYSNGA NLD NZL PAK PER PHLPOL QAT ROU RUS RWA SGP SVN SWE TTO TUN TUR UKR URY USA UZB YEM ZWE ARM AUS AZE BLR CHL CHN COL CYP DEU DZA ECU EGY ESP EST GHA IRQ JOR JPN KAZKGZ KOR LBN MAR MEX MYSNGA NLDNZL PAK PER PHL POL QAT ROU RUS RWASGP SVN SWE TTO TUN TUR UKR URY USA UZB YEM ZWE Class 1 ("Materialist") Class 2 ("Postmaterialist") Class 3 ("Mixed") 0.0 0.2 0.4 0.6 0.8 0 20 40 0 20 40 0 20 40 % Women in Parliament Classposterior ARM AUS AZE BLR CHL CHNCOL CYP DEU DZA ECU EGY ESPEST GHA IRQ JOR JPN KAZ KGZ KOR LBN MAR MEX MYSNGA NLD NZL PAK PER PHL POL QATROU RUS RWA SGPSVN SWE TTO TUN TUR UKR URY USA UZB YEM ZWE ARM AUS AZE BLR CHL CHN COL CYP DEU DZA ECU EGY ESP ESTGHA IRQJOR JPN KAZ KGZ KOR LBNMAR MEX MYSNGA NLD NZL PAK PER PHL POL QAT ROU RUS RWA SGP SVN SWE TTO TUN TUR UKR URY USA UZB YEMZWE ARM AUS AZE BLR CHL CHN COL CYP DEU DZA ECU EGY ESP EST GHA IRQ JOR JPNKAZKGZ KOR LBN MAR MEX MYSNGA NLDNZL PAK PER PHL POL QAT ROU RUSRWA SGP SVN SWE TTO TUN TUR UKR URY USA UZB YEM ZWE Class 1 ("Materialist") Class 2 ("Postmaterialist") Class 3 ("Mixed") 0.0 0.2 0.4 0.6 0.8 7 8 9 10 11 7 8 9 10 11 7 8 9 10 11 Ln(GDP per capita) Classposterior Latent variable model ﬁt by sensitivity analysis Daniel Oberski
31. 31. What has been gained by using EPC-interest: I am fairly conﬁdent here that there truly is ”approximate measurement invariance”, in the sense that any violations of measurement invariance do not bias the primary conclusions. I think attaining this goal is the main purpose of model ﬁt evaluation. Latent variable model ﬁt by sensitivity analysis Daniel Oberski
32. 32. Conclusion Latent variable model ﬁt by sensitivity analysis Daniel Oberski
33. 33. Conclusion • Latent variable modeling is often performed for a purpose; • Model ﬁt evaluation should then be done for the reason that violations of assumptions can disturb this purpose. • Introduced the EPC-interest to look into this; • Evaluates the change in the parameter(s) of interest that would result if a restriction is freed that parameterizes a potential violation of assumptions. Latent variable model ﬁt by sensitivity analysis Daniel Oberski
34. 34. Implemented in SEM software lavaan for R: Oberski (2014). Evaluating Sensitivity of Parameters of Interest to Measurement Invariance in Latent Variable Models. Political Analysis, 22 (1). Implemented in LCA software Latent Gold: Oberski, Vermunt & Moors (submitted). Evaluating measurement invariance in categorical data latent variable models with the EPC-interest. Under review. Oberski & Vermunt (2014). A model-based approach to goodness-of-ﬁt evaluation in item response theory. Measurement, 11, 117–122. Nagelkerke, Oberski, & Vermunt (accepted). ”Goodness-of-ﬁt of Multilevel Latent Class Models for Categorical Data”. Sociological Methodology. Oberski & Vermunt (conditionally accepted). ”The Expected Parameter Change (EPC) for Local Dependence Assessment in Binary Data Latent Class Models”. Psychometrika. Latent variable model ﬁt by sensitivity analysis Daniel Oberski
35. 35. Thank you for your attention! Daniel Oberski doberski@uvt.nl See http://daob.nl/publications for full texts & code Latent variable model ﬁt by sensitivity analysis Daniel Oberski
36. 36. SEM regression coefﬁcient example European Sociological Review 2008, 24(5), 583–599 Latent variable model ﬁt by sensitivity analysis Daniel Oberski
37. 37. SEM regression coefﬁcient example Conservation Self−transcendence q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q Sweden Danmark Austria Switzerland Netherlands Germany Ireland Spain Norway Hungary Finland Portugal France Belgium Slovenia United Kingdom Greece Czech Republic Poland Sweden Danmark Austria Switzerland Netherlands Germany Ireland Spain Norway Hungary Finland Portugal France Belgium Slovenia United Kingdom Greece Czech Republic Poland ALLOWNOCOND −1.0 −0.5 0.0 0.5 1.0 −1.0 −0.5 0.0 0.5 1.0 Regression coefficient Latent variable model ﬁt by sensitivity analysis Daniel Oberski
38. 38. SEM regression coefﬁcient example EPC-interest statistics of at least 0.1 in absolute value with respect to the latent variable regression coefﬁcients. Metric invariance (loading) restriction “Conditions → Work skills” in... Slovenia France Hungary Ireland EPC-interest w.r.t.: Conditions → Self-transcendence -0.073 -0.092 -0.067 0.073 Conservation 0.144 0.139 0.123 -0.113 SEPC-self 0.610 0.692 0.759 -0.514 Latent variable model ﬁt by sensitivity analysis Daniel Oberski
39. 39. SEM regression coefﬁcient example What has been gained by using EPC-interest • Full metric invariance model: ”close ﬁt”; • EPC-interest still detects threats to cross-country comparisons of regression coefﬁcients; • MI and EPC-self do not detect these particular misspeciﬁcations; • MI and EPC-self detect other misspeciﬁcations; • Looking at EPC-interest reveals that these do not affect the cross-country comparisons of regression coefﬁcients. Latent variable model ﬁt by sensitivity analysis Daniel Oberski