SlideShare a Scribd company logo
1 of 63
Download to read offline
SOC 8811 ADVANCED STATISTICS
LECTURE NOTES
STRUCTURAL EQUATION MODELS
SPRING 2004
Prof. David Knoke
Sociology Department
909 Social Sciences
(612) 624-6816/4300
knoke@atlas.socsci.umn.edu
2
TABLE OF CONTENTS
Review of Correlation & Covariance 3
Structural Equation Models 5
Validity and Reliability 8
Classical Test Theory 11
Parallel Measures 13
Factor Analysis 16
Factor Analysis of GSS Job Values 20
PRELIS to Create a Matrix 21
SIMPLIS Commands 22
Confirmatory Factor Analysis 23
Model Fit Statistics 30
A Two-Factor Model 34
Modification Indexes 35
A MIMIC Model 38
A Chain Model 44
Equality Constraints on Parameters 46
Comparing Models for Groups 48
A Path Model 51
Model Identification 54
Factor Analysis of Dichotomies 56
SEM with Ordinal Variables 61
3
REVIEW OF CORRELATION & COVARIANCE
Because structural equation models (SEMs) are based on analyses of
covariance or correlation matrices, a brief review of these descriptive
statistics may be helpful.
The Pearson product-moment correlation coefficient for two continuous
variables, Y and X, measures the amount of dispersion (spread) around a
linear least-squares regression line. For a population, using a Greek letter
to indicate a parameter, the OLS estimator for the bivariate regression
slope of Y on X is:
∑
∑
=
=
−
−−
= N
i
i
N
i
ii
YX
XX
XXYY
1
2
1
)(
))((
β
The numerator of this parameter is the sum across the N observations of
the cross-product of deviations of both variables around their means. The
denominator is the sum of squared deviations of X around its means. If we
divide both the numerator and denominator by N, the regression slope
formula becomes:
∑
∑
=
=
−
−−
= N
i
i
N
i
ii
YX
NXX
NXXYY
1
2
1
/)(
/))((
β
The numerator is called the covariance of Y and X and the denominator is
the variance of X. Thus, we can simplify the OLS estimator of the bivariate
regression coefficient as the ratio of those two components:
2
X
YX
YX
σ
σ
β =
4
Depending on the direction of the covariance of Y and X, a bivariate
regression slope may have a positive or negative sign, indicating the
direction of the relationship between Y and X in the population.
In a bivariate regression the population coefficient of determination, ρ2
,
indicates the proportion of total variation in Y that is determined by its
linear relationship with X. One of its formulas (see SSDA4 pp. 184 for
details) involves the ratio of the squared covariance to the product of both
variances:
22
2
2
YX
YX
σσ
σ
ρ =
Given to the squaring, the coefficient of determination cannot have a
negative sign.
The Pearson product-moment correlation coefficient is defined as the
square root of the coefficient of determination. It summarizes the linear
relationship and takes the same sign (plus or minus) as the regression
slope:
XY
YX
YXYX
σσ
σ
ρρ == 2
Thus, the correlation is also defined as the covariance of Y and X divided
by the product of the standard deviations of both variables. It ranges
between +1.00 and –1.00 and has a value of 0 when the two variables do
not covary (i.e., are unrelated). The sign attached to the correlation must
be the same as the signs of the covariance and the regression slope.
Both correlations and covariances are symmetric; that is, XYYX ρρ = and
XYYX σσ = , which can be ascertained by noting that the order of cross-
product multiplication is irrelevant in the regression slope formula above.
One important relation between covariance and correlation is to observe
what happens when both X and Y are standardized variables; that is,
turned into Z-scores by subtracting the mean and dividing by the standard
deviation. Into the formula above for ρYX, substitute Z-scores for both
variables:
5
XY
XY
XY
XY
XY ZZ
ZZ
ZZ
ZZ
ZZ σ
σ
σσ
σ
ρ ===
)1)(1(
Because the standard deviation of a Z-score is 1.00, the correlation
coefficient for two standardized measures equals their covariance.
Correlation coefficients “scale-free,” that is, they are unaffected by whether
the units of measurement are the original scales or their transformed Z-
scores. We will see that structural equation models can be estimated
using either covariances or correlations (or both).
STRUCTURAL EQUATION MODELS
This part of this course examines the basics of structural equation models
(SEMs), specifically the LISREL (LInear Structural RELations) approach
developed three decades ago by Swedish psychometricians Karl Jöreskog
and Dag Sörbom. We’ll be using LISREL 8.54 to analyze General Social
Survey data. These notes use the simplified commands (SIMPLIS), as does
Chapter 12 in SSDA4. To understand causal diagrams, a good preparation
would be to skim Chapter 11 on causal models and path analysis.
However, I will try to develop everything we need on that topic in these
lecture notes, primarily by working through increasingly complex data
analysis examples.
As with every statistical method, the structural equation approach is more
suitable to some types of data and measures than to others. Two major
uses of LISREL are: (1) to model social psychological attitudes (factor
structures), in which one or more unobserved constructs generate the
variation in several observed indicators; and (2) to estimate parameters for
a causal model, in which some variables are treated as causes of other
variables (the effects). The chief advantage of LISREL over alternative
methods (such as path analysis and index construction), lies in its power
to combine observed measures with relations among unobserved
constructs into a single integrated system.
6
I like to imagine that the relationship between structural and measurement
levels of analysis can be traced back to a famous philosophical metaphor
in Plato’s New Republic: the shadows that the unenlightened prisoners see
on the cave wall are obscure reflections of an underlying reality which
analysts cannot view directly but can only seek to comprehend through
intellectual reasoning. Concepts and the objects they indicate are not
identical phenomena (the point of René Magritte’s droll painting, “Ceci
n’est pas une pipe”). Similarly, as Plato elsewhere reasoned, a triangle
drawn with pencil and paper is a flawed representation of the abstract,
eternal concept of “triangle” that exists beyond the realm of sensual
perception. By analogy, social scientists can never accurately observe
peoples’ attitudes (not even their visible behaviors), but can only infer their
existence by making noisy, error-prone measurements – such as
respondents’ responses to survey questions – that are only partially
influenced by their unobservable true beliefs (or actions).
“The famous pipe. How people reproached me for it! And yet, could
you stuff my pipe? No, it’s just a representation, is it not? So if I had
written on my picture ‘This is a pipe,’ I'd have been lying!”
- René Magritte
7
Modern measurement theory concerns the relationships between a latent
construct at the theoretical or conceptual level and observed indicators at
the level of empirical observations:
Complete these examples:
CONSTRUCTS INDICATOR(S)
Religiosity ______________________________
Industrialization ______________________________
Delinquency ______________________________
Centralization ______________________________
Intelligence ______________________________
_________________ Sudden numbness, confusion,
difficulty seeing, severe headache,
loss of coordination & balance
_________________ Fewer social services; low tax
rates; stronger national defense
EMPIRICAL LEVEL
Observed indicators
Income
Education
Occupation
CONCEPTUAL LEVEL
Latent construct
Socioeconomic
Status (SES)
8
VALIDITY AND RELIABILITY
Measurement theory seeks to represent a latent construct with one or more
observable indicators (operational measure or variable) that accurately
capture that theoretical construct. Two desirable properties of empirical
quantitative measures are high levels of validity and reliability:
• Validity: The degree to which a variable’s operationalization
accurately reflects the concept it is intended to measure.
• Reliability: The extent to which different operationalizations of the
same concept produce consistent results. The proportion of an
item’s variance that is attributable to the unobserved cause or
source.
Many validity issues concern how well or poorly an observable variable
reflects its latent counterpart. Another central concern is with accurately
depicting the (causal or covariational) relationships among several
theoretical constructs, using information about the covariation among
observed indicators. This latter interest lies at the heart of the factor
analysis and structural equation models examined in later sections.
Reliability refers to the replicability of a measure under the same
conditions. A perfectly reliable measure must generate the same scores
when conditions are identical. A measure may be very reliable but not
valid; that is, an instrument can precisely measure some phenomenon yet
represent complete nonsense. For example, your bathroom scale
consistently gives identical readings when you step off and on, but it
invalidly operationalizes your true weight (you dialed it back 5 pounds).
To be valid, a measure or indicator must be reliable. In the extreme,
if a measure’s reliability is zero, its validity is also zero. However, a given
indicator may vary in the extent of its validity as a measure of different
concepts. For example, education, measured as years of formal schooling,
might be used both as an indicator of educational persistence and as an
indicator of socioeconomic status (SES). Validity is clearly affected by the
choice of one’s indicator(s). For example, we can treat church attendance
as a measure of Americans’ religiosity, but this indicator might have only
moderate validity because some highly religious persons don’t attend
services, and some go to church mainly for social purposes. A more valid
measure of religiosity would include not only attendance at religious
services, but also would query people about their religious beliefs (e.g., in
9
the efficacy of prayer, the existence of an afterlife, and infallibility of
scriptures).
Unfortunately, researchers never obtain perfect measurements in the real
world; that is, all measures are subject to measurement error, hence they
are all unreliable and invalid to some greater or lesser degree.
Measurement theory is therefore also a theory about the magnitudes and
sources of errors in empirical observations.
Reliability assumes random errors. When a measurement is repeated over
numerous occasions under the same conditions, if random error occurs,
then the resulting variations in scores form a normal distribution about the
measure’s true value. The standard error of that distribution represents
the magnitude of the measurement error: the larger the standard error, the
lower the measure’s reliability. By definition, random errors are
uncorrelated with any variable, including other random error variables.
Natural scientists also face measurement reliability problems. For
example, astronomers made important contributions to measurement
theory by developing techniques for estimating the true transit times
of Jupiter’s moons from erroneous telescopic observations. (See
Stephen M. Stigler. 1986. The History of Statistics: The Measurement
of Uncertainty Before 1900. Cambridge, MA: Harvard University
Press.)
Systematic error (nonrandom error) implies a miscalibration of the
measuring instrument that biases the scores by consistently over- or
underestimating a latent construct (e.g., your miscalibrated bathroom
scale). Such consistent biases don’t alter the measure’s reliability, but
they clearly alter its validity because they prevent the indicator from
accurately representing the theoretical concept.
The research methodology literature discusses several types of validity,
but we lack space to examine all these conceptual distinctions (Box 12.1
defines a variety of validity concepts). For purposes of explicating
structural equations models, we’ll assume that the empirical observations
we use have adequate content validity as indicators of the designated
latent constructs. Therefore, we turn next to the quantification of reliability
in classical test theory.
10
BOX 12.1 Varieties of Validity
Validity indicates the appropriateness of a measurement instrument, such
as a battery of test items, for the concept it intends to measure. In other
words, an instrument’s validity denotes the extent to which measures
what it is supposed to measure. Validity can be established by experts
knowledgeable about a substantive domain, or by demonstrating a
measure’s consistency with the theoretical concepts it is designed to
represent. Three traditional types of measurement validity are construct,
criterion-related, and content validity. Brief definitions and examples of
these various validity types are:
Construct validity: the extent to which a measure agrees with theoretical
expectations; for example, IQ test items try to measure theoretically
hypothesized dimensions of intelligence. Measures with high convergent
validity and discriminant validity exhibit high agreement with theoretically
similar measures but low correlations with dissimilar measures,
respectively.
Criterion-related validity: the extent to which a measure accurately
predicts performances on some subsequently observable activity (the
criterion); for example, how highly a written driving-test score correlates
with people’s actual skills in operating an automobile. A measure’s
concurrent validity is assessed by its ability to discriminate between
persons with and without the criterion. A measure’s predictive validity is
demonstrated by its accuracy in forecasting future behavior.
Content validity: the extent to which a measure adequately represents
the defined domain of interest that it was designed to measure; for
example, a mathematical ability test should cover the full range of
students’ mathematical knowledge.
11
Classical Test Theory
Classical test theory depicts the observed score (X) of respondent i on a
measuring instrument, such as a test battery or survey item, as arising
from two hypothetical unobservable sources: the respondent’s “true
score” and an error component:
A person’s true score is the average that would be obtained across
infinitely repeated measures of X. In the theoretical definition of random
error, the distribution of error forms a normal distribution around a mean
value of zero. Because the ± error deviations around the true score cancel
one another, the expected value (mean) of the errors is zero and the
expected value of the observed scores equals the respondent i’s true
score:
iTiXE µ=)(
Further, the error term is assumed to be uncorrelated with its true score
(which makes sense if the errors are really random). Hence, both
components make unique contributions to the variances of the observed
scores in a population:
222
εσσσ += TX
That is, the observed score variance is the sum of the true score variance
plus error variance. For good measures, the error variance is small relative
to the observed variance; poor measures have the opposite pattern.
Truei Xi Errori
Xi = Ti + ei
12
The reliability of X is defined as the ratio between true score and observed
score variances (“rho” here is not the same as the Pearsonian correlation):
22
2
2
2
eT
T
X
T
X
σσ
σ
σ
σ
ρ
+
==
Note that reliability ranges between 0 (when the true score variance is zero)
and 1 (when the error variance is zero). Values between these extremes
reflect the relative proportions of error and true score variation in the
measure of X.
Rearranging the definition of reliability reveals that the true score variance
equals the observed score variance times the reliability:
22
XXT σρσ =
Hence, we can estimate the unobserved true score variance from a
measure’s reliability and its observed variance.
Finally, reliability can also be expressed using the error term*:
2
2
1
X
e
X
σ
σ
ρ −=
This formula again demonstrates that reliability ranges between 0 and 1: if
the entire observed variance is error, ρX = 0; but if no random error exists,
then ρX = 1.
13
_______________________________________________________________
* The derivation of the error term above is:
2
2
2
22
2
2
22
1
X
e
X
X
eX
X
T
X
XXT
σ
σ
ρ
σ
σσ
σ
σ
ρ
σρσ
−=
−
==
=
_______________________________________________________________
Parallel Measures
If we had a second measure of the same unobservable construct that
differed from the first indicator only in their errors (the true scores are
equal), we would have two parallel measures where:
Assuming that the population variances of their error terms are equal, then
a measure’s reliability is the correlation of the parallel forms. The proof:
_______________________________________________________________
1. The correlation coefficient for two variables is defined as the ratio of the
covariance to the product of standard deviations:
21
21
21
XX
XX
XXr
σσ
σ
=
X1i = Ti + e1i
X2i = Ti + e2i
14
2. In the numerator, substitute the two variables’ true and error scores and
multiply the subscripts:
2
2
)()(
2121
2121
T
eeTeTeT
eTeTXX
σ
σσσσ
σσ
=
+++=
= ++
The three terms on the right are zero because the error terms and true
scores are uncorrelated.
3. Because the standard deviations of parallel measures are equal, the
denominator simplifies to:
2
21 XXX σσσ =
4. Hence, by substituting this term into the denominator in step 1, the
correlation coefficient for parallel measures becomes:
2
2
21
X
T
XXr
σ
σ
=
5. We previously defined the right-side expression in step 4 as the
reliability; therefore:
XXXr ρ=21
_______________________________________________________________
An important consequence of this identity is that the true score’s variance
can be estimated as the product of just two empirical measures, the
correlation coefficient and the variance. Rearranging step 4 above:
22
21 XXXT σρσ =
15
The correlation between the true score and an observed variable equals
the square root of the reliability, which is also the square root of the
correlation between two parallel measures:
211 XXXTX ρρρ ==
This equation shows that the correlation between an observable indicator
and the unobservable true score it measures can be estimated as the
square root of the reliability of indicator X. For example, if X has reliability
= 0.64, then the correlation with its true score = 0.80. What is the reliability
of X if its true-score correlation = 0.81?
The measurement theory principles discussed in this section are
incorporated into structural equation models, which I introduce next
through the confirmatory factor analytic approach to modeling the
relationships between observed indicators and latent constructs.
16
FACTOR ANALYSIS
Factor analysis refers to a family of statistical methods that represent the
relationships among a set of observed variables in terms of an
hypothesized smaller number of latent constructs, or common factors. The
common factors are assumed to generate the observed variables’
covariations (or correlations, if all measures are standardized with zero
means and unit variances). For example, respondents’ observed scores on
several mental ability tests (e.g., IQ, SAT, GRE exams) allegedly result from
unobserved common verbal and quantitative factors. Or covariations
among numerous socioeconomic indicators of urban communities depend
on latent industrialization, health, and welfare factors.
Of the two major classes of factor analysis, exploratory and confirmatory,
we limit our discussion to the latter. In confirmatory factor analysis (CFA)
a researcher posits an a priori theoretical measurement model to describe
or explain the relationship between the underlying unobserved constructs
(“factors”) and the empirical measures. Then, the analyst uses statistical
fit criteria to assess the degree to which the sample data are consistent
with the posited model; that is, to ask whether the results confirm the
hypothesized model? In practice, however, researchers seldom conduct
only one test of a confirmatory factor model. Rather, based on initial
estimates, they typically alter some model specifications and re-analyze
the new model, trying to improve its fit to the data. Hence, most
applications of CFA to investigate latent factors involve successive
modeling attempts. We apply this successive model-fitting strategy in
estimating alternative models to explain the empirical relationships among
a set of observed variables.
17
Researchers use confirmatory factor analysis to estimate the parameters of
a measurement model. Consider this diagram showing a single latent
factor measured by four empirical variables.
where:
F = latent common factor
Xi = observed variable i (indicator)
ei = unobserved “error” source (unique factor) for variable Xi
bi = “factor loading” effect of common factor F on observed variable Xi
di = effect of unique factor ei on observed variable Xi
This diagram implies that, if the latent variable were observed, it would
produce values of the indicators. Each observed score is a linear
combination of this common factor plus a unique error term. We can see
these relationships clearly by writing the four implied measurement
equations, which closely resemble the classical test theory equation:
X1 = b1 F + d1 e1
X2 = b2 F + d2 e2
X3 = b3 F + d3 e3
X4 = b4 F + d4 e4
F
X1 X2 X3 X4
e1 e4e3e2
b4b3b2
d4
b1
d3d2d1
18
Note the non-coincidental similarity for these factor analytic equations to
classical test theory’s representation of an observed score as a sum of a
true score and an error term.
The diagram above shows that the error terms are uncorrelated with the
factors and among themselves. Hence, the only sources of indicator i’s
variance are the common factor F and the indicator’s unique error term:
2222
ii FiX εσβσ Θ+=
where
2
iεΘ signifies the variance of the error in Xi. Because F is
unobserved, its variance is unknown. And because it is unknown, we can
assume it is a standardized variable, which means that its variance = 1.0.
Therefore,
222
ii iX εβσ Θ+=
Note that this formulation closely resembles the classical test theory
equation in which the variance of a measure equals the sum of two
components -- the true score variance plus the error variance. Next note
that if we standardize Xi, then the sum of these two components must
equal 1.0. A CFA model has another similarity to the classical test theory.
The reliability of indicator Xi is defined as the squared correlation between
a factor and an indicator. This value is the proportion of variation in Xi that
is statistically “explained” by the common factor (the “true score” in
classical test theory) that it purports to measure.
22
iFXX ii
βρρ ==
Hence, item reliability equals the square of its factor loading.
19
Finally, the covariance between two indicators in a single-factor model is
the expected value of the product of their two factor loadings:
)])([( 221121
εβεβσ ++= FFEXX
which, because the error terms are uncorrelated with the factor and with
each other, simplifies to:
21
2
2121
ββσββσ == FXX
When all variances standardized, this relationship further simplifies to:
2121 ZZZZ ρσ =
That is, the correlation of a pair of observed variables loading on a
common factor is the product of their standardized factor loadings.
What are the reliabilities of each indicator X, the error variances, and the
expected correlations between every pair of X’s for this single-factor
model:
F
X1 X2 X3 X4
.6.9.7.8
20
Factor Analysis of GSS Job Values
To illustrate LISREL procedures for estimating a single-factor model, I use
responses in the 1998 General Social Survey to seven questions about the
importance of particular job values:
On the following list there are various aspects of jobs. Please circle one number to
show how important you personally consider it is in a job:
• SECJOB: Job security?
• HHINC: High income?
• PROMOTN: Good opportunities for advancement?
• INTJOB: An interesting job?
• WRKINDP: A job that allows someone to work independently?
• HLPOTHS: A job that allows someone to help other people?
• HLPSOC: A job that is useful to society?
The response categories ranged from “Very important” = 1 to “Not
important at all” = 5. Here are SPSS recode commands that reverse-code
those values, then write out a data file to be subsequently read by the
PRELIS program and create a covariance matrix for input into LISREL:
RECODE secjob hiinc promotn intjob wrkindp hlpoths hlpsoc
(1=5)(2=4)(3=3)(4=2)(5=1)(ELSE = -999).
WRITE OUTFILE = JOBVALS.TXT /
secjob hiinc promotn intjob wrkindp hlpoths hlpsoc (7F5.0).
FREQUENCIES VAR=secjob hiinc promotn intjob wrkindp hlpoths hlpsoc.
By including (ELSE = -999), SPSS changes the three missing value codes
on each variable to -999. The WRITE OUTFILE command stores the seven
recoded variables as a fixed-format ASCII file (JOBVALS.TXT). The format
(7F5.0) creates five-column fields to contain the new variable values,
allowing at least one space separation between each score. Here are a few
lines from the JOBVALS.TXT file:
4 3 4 4 4 4 4
5 5 4 4 4 4 4
4 4 4 5 4 4 4
-999 -999 -999 -999 -999 -999 -999
-999 -999 -999 -999 -999 -999 -999
4 4 3 4 3 4 4
5 4 4 4 4 3 3
21
PRELIS to Create a Matrix
The ultimate data analyzed by LISREL 8.54 comprise a matrix of
covariations or product-moment (Pearson) correlations among the
indicator variables. PRELIS 2.30 in the LISREL program can set up a
matrix from data that are either entered interactively or imported from an
SPSS.SAV file. This section shows another option, where these PRELIS
commands are saved in an ASCII text file called JOBVALS.PR2:
PRELIS FOR JOB VALUES (SAVED IN JOBVALS.PR2)
DATA NI=8 NO=2832 MI=-999 TR=LI
RAW FI=JOBVALS.TXT FO
(7F5.0)
LABELS
SECJOB HIINC PROMOTN INTJOB WRKINDP HLPOTHS HLPSOC
CONTINUOUS SECJOB HIINC PROMOTN INTJOB WRKINDP HLPOTHS HLPSOC
OUTPUT MATRIX=CM SM=JOBVALS.MAT
To run this job, launch LISREL 8.54, click “File” on the upper left
toolbar, then “Open”. In the window select JOBVALS.PR2, then click
“Open.” Again click “File” on the upper left toolbar, then “Run
PRELIS” to execute. The printout will be stored in a file named
JOBVALS.OUT.
NOTES: The first line is an optional title; I included its own file name
DATA is the input data description, where:
NI = number of observed indicators (variables in the datafile)
NO = number of observations (total number of cases)
MI = missing value codes (if more than one, separated with commas)
TR = LI indicates listwise deletion: calculations are based only on
cases with no missing values on any variable. TR=PA is
pairwise deletion: for each pair of variables, computations are
based on all cases with nonmissing values on both variables.
RAW specifies the external file where the “raw” data are stored:
FI=JOBVALS.TXT. The FO option indicates that the format will appear on
the next line. If FO is omitted, the format is either the first line of the
external raw data file, or the data are stored in free format (separated by
spaces, commas, or return characters).
(7F5.0) indicates the case records in the external file format consist of
seven 5-column fields with no decimals.
22
LABELS command assigns the sequence of names on the next line to the
NI variables; maximum label length is eight characters
CONTINUOUS defines the listed variables as interval-level measures. By
default, PRELIS2 treat variables with fewer than 16 values as ordinal.
OUTPUT command where:
MATRIX = CM computes a covariance matrix
MATRIX = KM computes a correlation matrix
SM = FILENAME for storing the matrix to be read into LISREL
The covariance matrix below (edited from the output) was computed using
listwise data from 1,129 respondents:
SECJOB HIINC PROMOTN INTJOB WRKINDP HLPOTHS HLPSOC
-------- -------- -------- -------- -------- -------- --------
SECJOB 0.442
HIINC 0.167 0.594
PROMOTN 0.194 0.270 0.514
INTJOB 0.119 0.132 0.207 0.399
WRKINDP 0.068 0.139 0.178 0.213 0.616
HLPOTHS 0.068 0.062 0.159 0.170 0.244 0.596
HLPSOC 0.098 0.065 0.146 0.178 0.214 0.430 0.649
-------- -------- -------- -------- -------- -------- --------
Means 4.508 3.982 4.233 4.461 4.051 4.069 4.060
Std Devs 0.665 0.770 0.717 0.632 0.785 0.772 0.806
SIMPLIS Commands
In the earliest versions of LISREL, analysts had to specify the full set of
eight parameter matrices, indicating which coefficients were constrained to
zero and which were free to vary. A major benefit of this approach was to
force researchers to think very carefully and completely about their
models. However, the opportunities for errors were numerous and
frustrating to the learning process. LISREL 8 introduced the SIMPLIS
(SIMPlified LISREL) command language that avoids the necessity to
completely specify the parameter matrices. It undoubtedly speeds the
model-testing process and reduces trial-and-error learning. These Notes
present a variety of examples using SIMPLIS commands.
23
CONFIRMATORY FACTOR ANALYSIS
Methodologists usually describe factor analysis with LISREL as
confirmatory factor analysis (CFA) because the researcher formulates an a
priori theoretical model to describe or explain the empirical data. Then,
statistical analyses determine whether the sample data are consistent with
the imposed model; that is, do the results confirm the substantively
generated model? In practice, however, researchers seldom conduct only
one test of a factor model. Rather, based on the initial parameter
estimates, they typically alter some specifications and re-analyze the new
model, trying to improve its fit to the data. Hence, most applications of
LISREL to investigate latent factors are mixtures of exploratory and
confirmatory procedures.
During my initial analyses of GSS job values, I discovered that a single
factor could not account for the observed covariances among the seven
items. So, to demonstrate how LISREL estimates a single-factor model, I
concentrate on the relations among the first four GSS indicators (SECJOB,
HIINC, PROMOTN, INTJOB), most of which appear to emphasize “extrinsic”
job rewards based on such external benefits as money, promotions, and
job security.
Here’s the SIMPLIS command file, saved in file LISJV1.LS8:
Single Factor LISREL Model with 4 Job Indicators (LISJV1.LS8)
Observed Variables: SECJOB HIINC PROMOTN INTJOB WRKINDP HLPOTHS HLPSOC
Covariance Matrix From File JOBVALS.MAT
Sample Size = 1129
Latent Variables: Jobvalue
Relationships:
PROMOTN = 1*Jobvalue
SECJOB HIINC INTJOB = Jobvalue
Path Diagram
LISREL Output: SC MI
End of Problem
To run this job, launch LISREL 8.54, click “File” on the upper left toolbar,
then “Open”. In the window select LISJV1.LS8, then click “Open.”
Again click “File” on the upper left toolbar, then “Run LISREL” to
execute. The printout will be stored in a file named LISJV1.OUT, and a
path diagram in LISJV1.PTH.
24
NOTES: The first line is the job title
The Observed variables line lists all seven GSS variable names in their
exact order of occurrence in the covariance matrix previously created by
PRELIS.
Covariance Matrix identifies the JOBVALS.MAT file where PRELIS stored
that covariance matrix.
Sample Size reports the number of observations used to compute the
covariances. (The listwise deletion in PRELIS found 1,129 cases with no
missing data on all seven variables.)
Latent Variables provides a name for the single unobserved factor.
Relationships is followed by a specification for the factor loadings to be
estimated. The observed variables’ names appear to the left hand side of
an equal sign and the factor name on the right hand side.
SCALING LATENT CONSTRUCTS: A latent construct is unobserved
and hence has no definite scale; that is, its origin and unit of
measurement are arbitrary. A researcher usually fixes the origin by
assuming a latent construct to have zero mean; LISREL automatically
does this unless otherwise instructed. The unit of measurement can be
scaled one of two ways: (1) Assume that a latent construct is
standardized to have a variance = 1.00; this is the LISREL default
option. (2) Assign a unit measure to the construct by fixing the factor
loading for one indicator to a nonzero value (typically = 1.00). This
method defines the latent construct scale in terms of an observed
reference variable, usually an indicator that the researcher believes
best represents the factor. I used the second procedure for scaling
constructs in this course. I chose PROMOTN as the reference indicator
for the unobserved “Jobvalue” factor (based on preliminary analyses
showing it to have the highest factor loading).
LISREL Output: SC MI requests a completely standardized solution and
modification indices.
End of Problem signals the termination of the model specification.
25
After five iterations, LISREL produced these maximum likelihood estimates
(MLE) of the parameters, with standard errors in parentheses, and t-ratios
in the third rows:
LISREL Estimates (Maximum Likelihood)
Measurement Equations
SECJOB = 0.56*Jobvalue, Errorvar.= 0.33 , R² = 0.25
(0.042) (0.016)
13.41 21.03
HIINC = 0.75*Jobvalue, Errorvar.= 0.39 , R² = 0.34
(0.051) (0.020)
14.69 19.33
PROMOTN = 1.00*Jobvalue, Errorvar.= 0.15 , R² = 0.70
(0.021)
7.52
INTJOB = 0.56*Jobvalue, Errorvar.= 0.28 , R² = 0.29
(0.040) (0.014)
13.99 20.44
The loading for the reference indicator, PROMOTN, was fixed to 1.00, so its
coefficient doesn’t have a standard error or t-test. This observed variable
has the largest proportion of variance explained by “Jobvalue” (70 percent)
suggesting that it was a good choice for fixing the latent construct’s scale.
The other three observed variables all have highly significant loadings on
the latent factor. But, each parameter estimate is smaller than the fixed
value for the reference indicator and their R-squares are also much
smaller.
LISREL also draws a diagram corresponding to the model. To save it, click
“File” on the top toolbar, then “Export As Gif file (.gif)”. I inserted it on the
next page, and cropped the excess borders with MS Word’s
“Format/Picture” options.
26
The factor loadings are based on the covariances among the four
indicators, which are measured in the original 5-point scales. A completely
standardized factor solution rescales both the latent factor(s) and all
observed variables to have standard deviations equal to one (i.e.,
transformed to Z-scores). Therefore, all variances also equal one. This
rescaling produces parameter estimates that are proportional to the MLE
parameters. If no indicator’s scale is fixed to 1.00, then LISREL will
automatically set the latent construct’s variance = 1.00. (See the Jobvalue
for PHI in the output below.) In that case, all corresponding MLE and SC
parameters will be identical; why? [HINT: What is the relationship between
correlation and covariance?]
Here’s the completely standardized solution, which appears at the end of
the output:
Completely Standardized Solution
LAMBDA-X
Jobvalue
--------
SECJOB 0.50
HIINC 0.58
PROMOTN 0.84
INTJOB 0.54
PHI
27
Jobvalue
--------
1.00
THETA-DELTA
SECJOB HIINC PROMOTN INTJOB
-------- -------- -------- --------
0.75 0.66 0.30 0.71
These standardized factor loadings clearly reveal that PROMOTN has the
strongest relationship with the “Jobvalue” construct. Hence, PROMOTN is
that factor’s most reliable indicator: (.84)2
= 0.71. SECJOB and INTJOB
have the lowest factor loadings; what are their reliabilities? The variance
of Jobvalue equals 1.00 (reported in PHI), consistent with a standardized
solution.
The completely standardized solution reveals that the squared factor
loadings and squared error terms (reported in THETA-DELTA) jointly
account for all the variation in each indicator, as required in classical test
theory. That is, the sum of a squared factor loading plus its squared error
term = 1.00. For example, SECJOB = (0.50)2
+ (0.75) = 0.25 + 0.75 = 1.00.
What are these sums for the other three indicators?
To view a model diagram with the standardized coefficients, click “View”
on the top toolbar, then “Estimations” and “Standardized Solutions.”
28
29
In the figure below, I converted the LISREL error term values into path
coefficients. To measure all effects in a standard-deviation metric, take the
positive square root of each LISREL error:
Now the sum of the squared factor loading plus the squared error equals
1.0 for each indicator: PROMOTN = (0.55)2
+ (0.84)2
= 0.30 + 0.70 = 1.00
Calculate the expected correlations by multiplying pairs of factor loadings;
calculate the differences by subtracting the observed correlations:
Observed Variables Expected r Observed r Difference
SECJOB-HIINC (.50)(.58) = 0.290 0.326
SECJOB-PROMOTN (.50)(.84) = 0.408
SECJOB-INTJOB (.50)(.54) = 0.284
HIINC-PROMOTN (.58)(.84) = 0.489
HIINC-INTJOB (.58)(.54) = 0.272
PROMOTN-INTJOB (.84)(.54) = 0.458
The discrepancies are fairly small, implying that the unobserved Jobvalue
factor accounts reasonably well for most correlations among its four
indicators. We can more precisely assess how well a model fits the data
with several goodness of fit statistics generated by LISREL.
INTJOB
.54.84.58.50
.84
Jobvalue
PROMOTNHIINCSECJOB
.55.81.87
30
MODEL FIT STATISTICS
As with logistic regression, not only the individual parameters but an entire
LISREL model’s fit to the data can be assessed statistically. A specific
structural equation model implies an expected covariance matrix (or a
correlation matrix) for the k observed variables, Σ(θ), where θ is a vector of
parameters to be estimated. PRELIS uses the N sample cases to create the
observed covariance matrix, S, which LISREL then uses to estimate the
expected model parameters. LISREL fits the analyst’s hypothesized model
to the empirical data by minimizing a fit function F involving both matrices.
In matrix algebra notation, this function is:
F[S, Σ(θ)] = ln |Σ| + tr (SΣ-1
) - ln |S| + t
where t is the number of independent parameters estimated, and tr means
“trace” – the sum of the elements in a matrix diagonal. The F function is
non-negative and is zero only if a perfect fit occurs, that is, if S = Σ.
For a large sample N, multiplying F[S, Σ(θ)] by (N-1) yields a test statistic
that is approximately distributed as a χ2
with degrees of freedom equal to:
d = [k(k-1)/2] - t
where k is the number of observed indicators. Because d must be
nonnegative, the number of independent parameters to be estimated (t)
cannot be more than (k2
-k)/2. For example, if k = 5 indicators, what is the
maximum number of parameters that LISREL could estimate?
A researcher’s strategy for finding a best-fitting LISREL model involves
using the fit function to conduct chi-square tests on a series of nested
models with successive parameter constraints. Ideally, poorer-fitting
models will be rejected in favorable of alternative models yielding
improved fits to the data. The ultimate goal is to specify a best-fitting
LISREL that cannot be rejected, indicating that the hypothesized model’s
covariances matrix closely approximates the observed covariance matrix.
(This strategy is opposite to the conventional chi-square testing approach
for a crosstab, where the goal is to reject the null hypothesis of
independence between variables.)
To use the minimum fit function in chi-square tests, a researcher chooses
an α level of significance at which to reject an hypothesized model; for
example, by setting α = .05. If the model χ2
exceeds the (1 - α) percentile of
31
the chi-square distribution with d degrees of freedom, then that model
must be rejected as producing a poor fit to the observed variance-
covariance matrix. For example, a model is a poor fit if p < .05; but if p >
.05, then the model would have a fit acceptable to a researcher setting the
region of rejection at α = .05.
In practice, a researcher who wants to find an acceptable latent structure
model (i.e., not seeking to reject the model) hopes to obtain a low chi-
square value relative to the degrees of freedom. Because the minimum fit
function χ2
test statistic increases proportional to sample size, (N-1),
obtaining low chi-square values with large samples often proves difficult.
Many analysts come to regard chi-square more usefully as an overall of
“goodness-of-fit” measure rather than as a test statistic. That is, χ2
is
measures the distance (difference) between the sample covariance matrix
and the expected covariance matrix, (S- Σ). Jöreskog and Sörbom half-
jokingly refer to χ2
as a “badness of fit” measure in the sense that large
chi-square corresponds to a bad fit and low chi-square to a good fit. Zero
χ2
is a “perfect” fit.
LISREL prints several goodness-of-fit measures that are functions of chi-
square. Two measures that do not depend explicitly on sample size
measure how much better the specified model fits the data, compared to
no model at all. Both indices range between 0 and 1, with values closer to
1 indicating a better fit of model to data. Most researchers seek values of
0.95 or higher.
The goodness-of-fit index (GFI) is:
)]0(,[
)](,[
Σ
Σ
=
SF
SF
GFI
θ
where the numerator is the minimum of the hypothesized model’s fit
function and the denominator is the fit function for a model whose
parameters all equal zero ( the null hypothesis model). (This latter model is
conceptually equivalent to the “constant only” equation in logistic
regression or EHA, used to calculate the model chi-square value for a
hypothesized equation.)
The adjusted goodness-of-fit index (AGFI) deflates the GFI by taking into
account the degrees of freedom consumed in estimating the parameters:
32
)1(
2
)1(
1 GFI
d
kk
AGFI −
+
−=
where k = the number of observed indicators and d is the model df.
Using chi-square as a test statistic assumes that the model holds exactly in
the population, an implausible assumption. Models that hold
approximately in the population will be rejected for large samples.
An alternative approach takes into account the errors of approximation in
the population and the precision of the fit measure. The estimated
population discrepancy function (PDF) is defined as:
}0)),1/((ˆ{ˆ
0 −−= NdFMaxF
where Fˆ = the minimum value of the fit function and d = degrees of
freedom.
Because the PDF usually decreases as additional parameters are added to
the model, the Root Mean Square Error of Approximation (RMSEA)
measures the discrepancy per degree of freedom:
dF /ˆ
0=ε
A RMSEA value of ε ≤ 0.05 indicates a “close fit”, while values up to 0.08
indicate “reasonable” errors of approximation in the population. A 90-
percent confidence interval for RMSEA indicates whether the sample point
estimate falls into a range that also includes the 0.05 criterion.
33
Here are all the Goodness of Fit Statistics for the one-factor Jobvalue
model. Can you conclude that the one-factor model is a good fit?
Goodness of Fit Statistics
Degrees of Freedom = 2
Minimum Fit Function Chi-Square = 8.31 (P = 0.016)
Normal Theory Weighted Least Squares Chi-Square = 8.11 (P = 0.017)
Estimated Non-centrality Parameter (NCP) = 6.11
90 Percent Confidence Interval for NCP = (0.78 ; 18.91)
Minimum Fit Function Value = 0.0074
Population Discrepancy Function Value (F0) = 0.0054
90 Percent Confidence Interval for F0 = (0.00069 ; 0.017)
Root Mean Square Error of Approximation (RMSEA) = 0.052
90 Percent Confidence Interval for RMSEA = (0.019 ; 0.092)
P-Value for Test of Close Fit (RMSEA < 0.05) = 0.39
Expected Cross-Validation Index (ECVI) = 0.021
90 Percent Confidence Interval for ECVI = (0.017 ; 0.033)
ECVI for Saturated Model = 0.018
ECVI for Independence Model = 0.88
Chi-Square for Independence Model with 6 Degrees of Freedom = 987.81
Independence AIC = 995.81
Model AIC = 24.11
Saturated AIC = 20.00
Independence CAIC = 1019.92
Model CAIC = 72.34
Saturated CAIC = 80.29
Normed Fit Index (NFI) = 0.99
Non-Normed Fit Index (NNFI) = 0.98
Parsimony Normed Fit Index (PNFI) = 0.33
Comparative Fit Index (CFI) = 0.99
Incremental Fit Index (IFI) = 0.99
Relative Fit Index (RFI) = 0.97
Critical N (CN) = 1251.05
Root Mean Square Residual (RMR) = 0.0087
Standardized RMR = 0.018
Goodness of Fit Index (GFI) = 1.00
Adjusted Goodness of Fit Index (AGFI) = 0.98
Parsimony Goodness of Fit Index (PGFI) = 0.20
Although the chi-square just fails to exceed the p > .05 level of
significance, the other fit statistics suggest a quite acceptable fit of the
single-factor model to the data.
34
A TWO-FACTOR MODEL
My initial effort to fit a single-factor model using all seven GSS job items
produced a terrible fit: Chi-Square = 828.1, df = 14, p <.000; GFI = 0.83;
AGFI = 0.65; and RMSEA = 0.23. However, the underlying attitude structure
may consist of two intercorrelated latent factors, each of which influences
the variation in different subsets of observed variables. Such a
specification could resemble this diagram:
After trying several alternative model specifications, I discovered that two
latent factors could plausibly account for the covariations among five of
the eight indicators: (1) a subset consisting of SECJOB HIINC PROMOTN;
and (2) another subset of HLPOTHS HLPSOC, which may reflect “intrinsic”
job rewards from helping others or accomplishing something worthwhile.
The new LISREL commands are:
Latent Variables: Jobval1 Jobval2
Relationships:
PROMOTN = 1*Jobval1
SECJOB HIINC = Jobval1
HLPOTHS = 1*Jobval2
HLPSOC = Jobval2
Jobvalue1
X5
Jobvalue2
X1 X6X2 X7 X8X3 X4
35
Latent Variables assigns two distinct construct names, while Relationships
specifies the pair of reference indicators and identifies the other variables’
factor loadings. Here’s a diagram of that model specification:
The overall fit statistics indicate much better fit: Chi-Square = 27.7, df = 4, p
= .00; GFI = 0.99; AGFI = 0.96; and RMSEA = 0.073 (a “reasonable fit,” with
the 90% confidence interval from 0.049 to 0.099). Given a sample size of
more than a thousand, I would be tempted to stop trying to improve the fit.
However, I want to demonstrate how to use LISREL’s modification indexes
for clues about altering a model’s specification to fit the data better.
Modification Indexes
LISREL’s modification indexes are powerful diagnostic tools for identifying
which parameters might be added to a model (that is, set free rather than
constrained to equal 0). By adding “MI” to the “LISREL Output:” line,
modification values will be generated for every missing parameter. These
values are predictions about the decrease in model Chi-square that will
occur if a particular parameter were added to the model.
Here are two sets of MIs for the two-factor model above:
Jobval1 Jobval2
SECJOB HLPOTHS HLPSOHIINC PROMOTN
36
Modification Indices and Expected Change
Modification Indices for LAMBDA-X
Jobval1 Jobval2
-------- --------
SECJOB - - 0.03
HIINC - - 15.63
PROMOTN - - 11.71
HLPOTHS - - - -
HLPSOC - - - -
Modification Indices for THETA-DELTA
SECJOB HIINC PROMOTN HLPOTHS HLPSOC
-------- -------- -------- -------- --------
SECJOB - -
HIINC 11.71 - -
PROMOTN 15.63 0.03 - -
HLPOTHS 7.63 4.45 13.81 - -
HLPSOC 9.55 0.65 2.19 - - - -
The MI for LAMBDA-X indicates that adding an arrow from Jobval2 to HIINC
should reduce Chi-square by 15.63. Similarly, three MI in THETA-DELTA,
indicate that correlating pairs of errors would improve Chi-square by more
than 10.00. Because I wanted to allow each indicator to load on just one
factor, I chose the latter respecification. Correlating two error terms will
use one of the four degrees of freedom, but should produce a much better
fit. Although the PROMOTN-SECJOB pair has the largest value, they are
indicators of the same unmeasured construct. My attempt to correlate
them produced some unusual estimates, so instead I correlated errors of
the PROMOTON-HLPOTHS indicators.
Because LISREL computes the MIs independently of one another, you
generally should make only one parameter change at a time. Then, use
the new MI results to decide which further changes to try.
Inserted before the last line, the SIMPLIS command to correlate the errors
of two indicators closely resembles natural language:
Let the Errors of SECJOB and PROMOTN Correlate
End of Problem
37
This new model’s fit statistics are better: Chi-Square = 13.9, df = 3, p < .003;
GFI = 1.00; AGFI = 0.98; RMSEA = 0.057 (“reasonable fit”), with 90% CI
from 0.029 to 0.089. So let’s examine the diagram with the completely
standardized values attached to the unconstrained parameters:
The small but significant correlated errors of HLPOTHS and PROMOTN
(0.08) suggest that their covariation arises from an additional unspecified
common source. PROMOTN is now clearly has the highest factor loading
on Jobval1 (and thus the highest reliability), while HLPSOC is the most
reliable Jobval2 indicator. My substantive interpretation is that the second
factor represents an “intrinsic rewards” dimension, in contrast to the
“extrinsic rewards” dimension of the first factor. The two latent factors
correlate moderately (0.32), indicating that respondents who report
extrinsic job rewards as important to them also tend to view intrinsic
values as important. What substantive interpretation might you venture
about the correlated errors?
38
A MIMIC MODEL
LISREL also can be used to estimate several regression-like structural
equation models, in which one or more dependent variables are predicted
by several independent variables. Some or all of these variables may be
latent constructs with two or more observed indicators. Structural
equation models combine two conceptually distinct levels of analysis -- a
measurement level and a structural level.
In parallel to confirmatory factor analysis, the parameter estimates at the
measurement level show how well (or poorly) the observed variables serve
as indicators of the unobserved theoretical concepts. Parameters at the
structural level show the magnitudes and significance of the hypothesized
relations among the latent concepts. And, again in common with factor
analysis, the various goodness-of-fit statistics reveal how well the
combined measurement and structural equation models reproduce the
matrix of covariances among the indicators.
Our first example of a structural equation model is a Multiple Indicator-
Multiple Cause (MIMIC) model. This model’s relationships involve a latent
dependent variable, indicated by several observed measures, that is
predicted by a set of exogenous or predetermined variables, each of which
has just one indicator (see SSDA pp. 475-8). These predictors can be
termed “directly observed variables.” More complex models discussed
below have multiple indicators for both independent and dependent
variables.
The MIMIC example involves four indicators of attitudes towards the
federal government’s role in solving social problems, using 1998 GSS.
Each observed variable is measured on a five-point scale where: “I
strongly agree with [the governmental involvement position]” is 1, “I
strongly agree with [the individualist position]” is 5 and “I agree with both
answers” is 3. The four item wordings:
• HELPPOOR: I'd like to talk with you about issues some people tell us are important.
Please look at CARD AT. Some people think that the government in Washington should
do everything possible to improve the standard of living of all poor Americans; they are
at Point 1 on this card. Other people think it is not the government's responsibility, and
that each person should take care of himself; they are at Point 5.
• HELPNOT: Now look at CARD AU. Some people think that the government in
Washington is trying to do too many things that should be left to individuals and private
39
businesses. Others disagree and think that the government should do even more to solve
our country's problems. Still others have opinions somewhere in between.
• HELPSICK: Look at CARD AV. In general, some people think that it is the
responsibility of the government in Washington to see to it that people have help in
paying for doctors and hospital bills. Others think that these matters are not the
responsibility of the federal government and that people should take care of these things
themselves.
• HELPBLK: Now look at CARD AW. Some people think that (Blacks/Negroes/African-
Americans) have been discriminated against for so long that the government has a special
obligation to help improve their living standards. Others believe that the government
should not be giving special treatment to (Blacks/Negroes/African-Americans).
The four single-indicator independent variables are AGE, POLVIEWS,
EDUC and WHITE (a 1-0 dichotomy from recoding RACE(1=1)(2,3=0)).
Here’s a diagram of the model specification to be estimated:
Arrows from the latent construct (Help) to the four indicators are the
measurement level of analysis, while the arrows to Help coming directly
from the four independent variables occur at the structural level.
Help
HELPPOORAGE
POLVIEWS
EDUC
WHITE
HELPNOT
HELPSICK
HELPBLK
40
After recoding missing values to –999 and writing a raw data file
(HELP.TXT), I used the following PRELIS commands to create the
covariance matrix for input to LISREL:
PRELIS FOR GOVERNMENT HELP (SAVED IN HELP.PR2)
DATA NI=8 NO=2832 MI=-999 TR=LI
RAW FI=HELP.TXT FO
(8F5.0)
LABELS
HELPBLK HELPNOT HELPPOOR HELPSICK EDUC AGE POLVIEWS WHITE
CONTINUOUS HELPBLK HELPNOT HELPPOOR HELPSICK EDUC AGE POLVIEWS
ORDINAL WHITE
OUTPUT MATRIX=CM SM=HELP.MAT
Note designation of WHITE as an ordinal variable. PRELIS distinguishes
only three levels of measurement – continuous, ordinal, and “censored” –
so any measures that you are unwilling to consider as continuous
(including dummy variables) should be labeled as ordinal. PRELIS will
compute: (1) Pearson product-moment correlations for pairs of continuous
variables; (2) polyserial correlations for a ordinal-continuous pair; and (3)
polychoric correlations for pairs of ordinal variables.* More below.
The covariance matrix (HELP.MAT), based on a pairwise average of about
1,654 cases:
HELPBLK HELPNOT HELPPOOR HELPSICK EDUC AGE POLVIEWS WHITE
-------- -------- -------- -------- ------- ------- -------- -------
HELPBLK 1.439
HELPNOT 0.519 1.378
HELPPOOR 0.521 0.609 1.300
HELPSICK 0.448 0.519 0.577 1.451
EDUC -0.168 0.319 0.403 0.115 8.133
AGE 1.080 1.956 1.214 2.176 -7.113 289.143
POLVIEWS 0.351 0.382 0.371 0.394 -0.159 1.904 1.917
WHITE 0.379 0.369 0.307 0.231 0.529 3.246 0.170 1.000
-------- -------- -------- -------- ------- ------- -------- -------
Means 3.566 3.164 3.051 2.548 13.262 45.505 4.139 0.826
Std Dev 1.199 1.174 1.140 1.204 2.852 17.004 1.384 1.000
-------- -------- -------- -------- ------- ------- -------- -------
________________________________________________________________________
*For details, see Jöreskog and Sörbom. 1996. PRELIS2: User’s Reference Guide.
Chicago: SSI Scientific Software International. Pp. 18-25.
________________________________________________________________________
41
The LISREL commands are similar to a factor analysis, except the latent
dependent variable (Help) is regressed directly onto the four observed
independent variables (EDUC AGE POLVIEWS WHITE):
MIMIC MODEL FOR 4 HELP VARS (SAVED IN MIMIC.LS8)
Observed variables:
HELPBLK HELPNOT HELPPOOR HELPSICK EDUC AGE POLVIEWS WHITE
Covariance Matrix From File: HELP.MAT
Sample Size: 1654
Latent Variables: Help
Relationships:
HELPPOOR = 1*Help
HELPNOT = Help
HELPSICK = Help
HELPBLK = Help
Help = EDUC AGE POLVIEWS WHITE
LISREL Output: SC MI
End of Problem
The minimum fit function chi-square test = 99.1 for 14 degrees of freedom,
which is not a good fit; however, the large sample size makes model
rejection relatively easy. Other statistics suggest a somewhat better fit:
GFI = 0.99 and AGFI = 0.96. Moreover, the root mean square error of
approximation (0.061) falls into the intermediate range of a “reasonable fit”
(the 90% confidence interval for RMSEA is 0.050 to 0.072).
A portion of the modification indices (MI):
Modification Indices for THETA-DELTA-EPS
HELPBLK HELPNOT HELPPOOR HELPSICK
-------- -------- -------- --------
EDUC 48.29 3.31 16.33 0.02
AGE 8.78 1.06 0.22 5.37
POLVIEWS 0.00 0.11 0.86 2.22
WHITE 34.56 1.89 9.41 14.49
which implies that the error term of HELPBLK correlates significantly with
both WHITE and EDUC and together may account for most of the model
chi-square.
To improve the model fit, I added these commands and re-ran the analysis:
Let the Errors between WHITE and HELPBLK Correlate
Let the Errors between EDUC and HELPBLK Correlate
End of Problem
The minimum fit function chi-square test now falls to 30.55 for 12 df, which
has a probability = .002. But the other fit statistics attained very desirable
42
values: GFI = 1.00, AGFI = 0.99, and RMSEA = 0.030, a “close” fit (90% CI
from 0.017 to 0.044).
Here is completely standardized solution:
At the measurement level, the indicators of the latent Help construct all
have highly significant factor loadings (p<.001), and roughly equal
magnitudes.
At the structural level, three of the four path coefficients are highly
significant (p<.001), according to t-tests in the LISREL output. The AGE
effect (0.05) does not differ from zero at α= .05 for a two-tailed research
hypothesis. POLVIEWS and WHITE have much stronger standardized
effects (0.32 and 0.34 standard deviations) than EDUC (0.10) on the latent
Help construct. The predictors jointly explain about 27 percent of the
variation in the Help construct (R2
= .27).
High scores on the four social problems variables mean that respondents
prefer the individual or nongovernmental solutions to social problems.
Therefore, the positive path coefficients reveal that conservatives, whites,
and highly educated respondents were more likely to endorse such policy
positions.
The diagram does not show the two correlated error terms:
THETA-DELTA-EPS
43
HELPBLK HELPNOT HELPPOOR HELPSICK
-------- -------- -------- --------
EDUC -0.44 - - - - - -
(0.08)
-5.77
AGE - - - - - - - -
POLVIEWS - - - - - - - -
WHITE 0.11 - - - - - -
(0.03)
4.83
The error term for HELPBLK has significantly correlations with the error
terms for both EDUC (-0.44) and WHITE (0.11). This re-specification was
estimated mainly to demonstrate a technique for improving a model’s
statistical fit. However, I didn’t have a priori reasons for correlating these
error terms, while post hoc interpretations – e.g., classist or racist
stereotypical responses – are vulnerable to taking advantage of chance
occurrences in the sample.
44
A CHAIN MODEL
A widespread application of LISREL involves models in which several or all
the latent constructs have multiple indicators. The simplest version is a
chain model, involving one independent and one dependent variable. The
example below modifies the preceding MIMIC model, by including a second
political measure (PARTYID) with POLVIEWS as indicators of unobserved
political ideology.
The LISREL commands:
CHAIN MODEL FOR POLITICS-HELP (SAVED IN CHAIN.LS8)
Observed variables:
HELPBLK HELPNOT HELPPOOR HELPSICK POLVIEWS PARTYID
Covariance Matrix From File: CHAIN.MAT
Sample Size: 1615
Latent Variables: Help Politic
Relationships:
HELPPOOR = 1*Help
HELPNOT = Help
HELPSICK = Help
HELPBLK = Help
POLVIEWS = Politic
PARTYID = Politic
Help = Politic
LISREL Output: SC MI
End of Problem
In the Relationships commands, I did not specify that one indicator of the
latent Politic construct should serve as the reference variable. Instead,
LISREL automatically set the scale by standardizing the variance of Politic
to 1.00. To estimate the structural-level regression coefficient, use a
command involving the latent constructs: “Help = Politic”.
The minimum fit function χ2
= 15.9 for 8 df (p = .044). Other fit statistics --
GFI = 1.00; AGFI = 0.99; RMSEA = 0.025 -- indicate a very “close” fit. The
standard coefficients are all highly significant:
45
The structural parameter (0.63) indicates that a difference of one standard
deviation in political ideology accounts for a three-fifths standard deviation
difference in attitude towards the federal government’s role in solving
social problems. The positive sign means that more conservative
respondents favor more individualistic solutions.
The value of the error term for the Help construct (from the output, but not
shown in the diagram) is the square root of PSI: .77.060.0 = Show that the
sum of the squared structural parameter plus its squared error term
accounts for all the variance of Help.
For each indicator in the diagram, show that the squared factor loading
plus the (squared) error term accounts for all of that indicator’s variance:
Indicator λ2
+ θδ
2
= 1.00
HELPBLK 1.00
HELPNOT 1.00
HELPPOOR 1.00
HELPSICK 1.00
POLVIEWS 1.00
PARTYID 1.00
Equality Constraints on Parameters
46
The chain model above shows that the estimated parameters from the
unobserved constructs to the indicators have roughly similar magnitudes
(0.55 to 0.69). LISREL allows an explicit statistical test for the hypothesis
that one parameter equals another in the population. That is:
jiH ββ =:0
Hypothesizing that a pair of “paths” are equal (rather than free to take on
independent values) requires that LISREL estimate just one parameter
instead of two. As a result, one degree of freedom now becomes available
to test whether the two models’ chi-square goodness-of-fit statistics differ
at a chosen alpha-level. If no significant difference occurs in the free and
constrained pair of parameters, then the more parsimonious version with
equal parameters is accepted (i.e., the model with fewer free parameters).
The LISREL commands for constraining paths equality have this syntax,
inserted before the “End of the Problem” line:
Set Path Help -> HELPSICK = Path Help -> HELPNOT
To constrain multiple parameters, continue the command:
Set Path Help -> HELPSICK = Path Help -> HELPNOT = Path Help -> HELPBLK
Because the HELPPOOR indicator is already used as the reference variable
for Help, I did not include it in the constraints
47
This table shows the chi-squares for alternative models constraining
several combinations of parameters among three Help indicators:
Equality constraints
Model
Chi-square
Model
df
1. No equality constraints (baseline model) 15.90 8
2. HELPSICK = HELPBLK 17.84 9
3. HELPNOT = HELPBLK 23.64 9
4. HELPSICK = HELPNOT 18.23 9
5. HELPSICK = HELPNOT = HELPBLK 23.66 10
In comparison to the initial model with no equality constraints (#1),
specifying equal factor loadings doesn’t result in significantly worse fits to
the data. For example, the difference in χ2
for model #2 versus model 1 is
(17.84 – 15.90 = 1.94) for 1 df. The critical value at α = .05 is 3.84, so we
cannot reject the null hypothesis that these two parameters are equal in the
population. Perform a chi-square difference test for model #1 versus
model #3 and decide whether the latter is the most parsimonious model.
48
COMPARING MODELS FOR GROUPS
Researchers frequently want to compare structural equation models for
multiple (sub)groups, such as women vs. men, whites vs. blacks,
Republicans vs. Democrats. LISREL is a powerful tool for analyzing
multiple samples simultaneously, with some or all parameters constrained
to be equal across the groups. To illustrate this application, I compare the
preceding chain model for a subset of white and black respondents
(omitting the “other” race.) In a variation on PRELIS-LISREL procedures,
these commands how to enter separate matrices of correlations and
descriptive statistics into the command file, which LISREL will convert to
the respective covariances:
Group 1: CHAIN MODEL FOR POLITIC-HELP: WHITES
Observed variables:
PARTYID POLVIEWS HELPPOOR HELPNOT HELPSICK HELPBLK
Correlation Matrix:
1.000
.419 1.000
.190 .231 1.000
.226 .258 .450 1.000
.246 .242 .443 .350 1.000
.187 .232 .367 .337 .278 1.000
Means: 3.00 4.15 3.15 3.28 2.61 3.71
Standard Deviations: 1.92 1.40 1.11 1.13 1.21 1.14
Sample Size: 1282
Latent Variables: Help Politic
Relationships:
HELPPOOR = 1*Help
HELPNOT = Help
HELPSICK = Help
HELPBLK = Help
POLVIEWS = Politic
PARTYID = Politic
Help = Politic
Group 2: CHAIN MODEL FOR POLITIC-HELP: BLACKS
Correlation Matrix:
1.000
.111 1.000
.008 .138 1.000
.137 -.009 .365 1.000
.082 .091 .328 .389 1.000
.153 .065 .299 .369 .335 1.000
Means: 1.43 3.93 2.46 2.62 2.16 2.73
Standard Deviations: 1.51 1.27 1.14 1.19 1.15 1.24
Sample Size: 260
LISREL Output: SC MI
End of Problem
My initial comparison involves setting exactly equal corresponding
parameters across the two groups. Because the Relationships commands
appearing in the white specification do not also appear for the black group,
49
LISREL assumes by default that all pairs of parameters must be
constrained to equality.
The minimum fit function χ2
= 94.88 for 29 df (p < .0001). Other fit statistics
are GFI = 0.93 (AGFI is not calculated) and RMSEA = 0.052 (“reasonable”),
90% CI from 0.040 to 0.64. The structural coefficient for the regression of
Help on Politic = 0.53 for both groups.
My next model specification freed all the factor loadings and structural
regression parameters, by repeating the same Relationships for the black
group that appeared in the white group. In addition, I also allowed these
indicators’ error terms to vary freely across groups, and for the error term
of the Help construct to take on different values, by inserting these lines at
the end of both subgroups’ commands:
Set the Error Variances of HELPPOOR-HELPBLK Free
Set the Error Variances of PARTYID-POLVIEWS Free
Set the Error Variance of Help Free
This completely free model has χ2
= 35.27 for 16 df (p < .0037); GFI = 0.99
and RMSEA = 0.038 (“close fit”), 90% CI from 0.020 to 0.056. Here are some
completely standardized parameter estimates for the two racial groups:
Parameters Blacks Whites
HELPPOOR .68 .68
HELPNOT .84 .60
HELPSICK .70 .58
HELPBLK .74 .49
PARTYID .34 .63
POLVIEWS .25 .70
Politic -> Help .33 .57
Help Error Covar .56 .74
The factor loadings for the two political indicators are substantially smaller
for blacks than for whites, perhaps reflecting the narrower span of black
political ideology. The estimated magnitude of the structural effect of
Politic on Help is much larger for whites (0.57) than for blacks (0.33).
Finally, to determine whether any parameters may be constrained to
equality without worsening the model fit, additional LISREL analyses can
be conducted that deleted a single Relationship line from just one
subgroup. For example, to test for racial equality of the structural effect, I
deleted the “Help = Politic” line from the black subgroup, which forces
50
LISREL to replicate all the relationships in the second group except the
deleted one. The result was a χ2
= 36.50 for 17 df, which does not differ
significantly from the model above (i.e., the difference in chi-squares is just
1.23 for one df). In other words, the population regression coefficients are
probably equal (both estimates = 0.56). The large standard errors due to
the small black sample size may have prevented rejection of this null
hypothesis.
51
A PATH MODEL
This section extends the preceding MIMIC and chain examples to a causal
model of the relationships among three unobservable constructs with
multiple indicators. I introduce some basic principles of path analysis (see
Chapter 11 in SSDA4 for more details). At the structural equation level, a
causal diagram is indispensable to displaying the hypothesized causal
effects among the latent constructs:
The diagram displays several path analytic principles:
HelpPolitic
SES
BOX 11.1 Rules for Constructing Causal Diagrams
1. Variable names are represented either by short keywords or letters.
2. Variables placed to the left in a diagram are assumed to be causally
prior to those on the right.
3. Causal relationships between variables are represented by single-
headed arrows.
4. Variables assumed to be correlated but not causally related are
linked by a curved double-headed arrow.
5. Variables assumed to be correlated but not causally related should
be at the same point on the horizontal axis of the causal diagram.
6. The causal effect presumed between two variables is indicated by
placing + or - signs along the causal arrows to show how increases or
decreases in one variable affect the other.
52
• The model asserts that a respondent’s socioeconomic (SES) directly
causes both political ideology (Politic) and attitude about the
governmental role (Help).
• SES also indirectly affects Help, via its impact on Politic (e.g.,
transmitted via the compound product of the path from SES to
Politics times the path from Politic to Help). [See SSDA Chapter 11
for a detailed discussion of disaggregating the covariation between
any pair of variables into direct, indirect, and so-called correlated
effects.]
• The unsourced arrows to the two endogenous variables (Politic and
Help) mean that other sources of their variation aren’t included in this
model. These residual effects presumably operate independently of
(i.e., are uncorrelated with) the explicitly included causes.
At the measurement equation level, each construct has two or more
observable indicators. The variables for Politic and Help are the same as
above; the two SES indicators as EDUC and INCOME98 (measured as
midpoints of the $000 ranges). Here are the LISREL commands:
CAUSAL MODEL FOR SES-POLITIC-HELP (SAVED IN PATH.LS8)
Observed variables:
HELPBLK HELPNOT HELPPOOR HELPSICK POLVIEWS PARTYID EDUC INCOME98 PRESTG80
Covariance Matrix From File: PATH.MAT
Sample Size: 1396
Latent Variables: Help Politic SES
Relationships:
HELPPOOR = Help
HELPNOT = Help
HELPSICK = Help
HELPBLK = Help
POLVIEWS = Politic
PARTYID = Politic
INCOME98 = SES
EDUC = SES
PRESTG80 = SES
Help = Politic
Help = SES
Politic = SES
Path Diagram
LISREL Output: SC MI
End of Problem
Although the minimum fit function is much higher than desirable (χ2
=
143.58 for 24 df; p < .0001), the other statistics indicate a good fit -- GFI =
.98; AGFI = 0.96; RMSEA = 0.060 (“reasonable fit”), 90% CI from 0.051 to
53
0.069. All coefficients for the factor loadings and structural effects were all
highly significant.
The completely standardized path coefficients:
The factor loadings for Politic and Help are similar to those in the chain
model. The three SES indicators also have comparably high loadings. At
the structural level, the strongest path coefficient is from Politic to Help, at
0.60 standard deviations. The direct path from SES to Help is just 0.11, and
its indirect path via Politic is almost as strong: (0.14)(0.60) = 0. 08. Thus,
higher status persons are more politically conservative and support more
individualistic solutions to social policies.
54
MODEL IDENTIFICATION
To be estimable, a latent variable structural equation model must be
identified. Models are identified if one optimal (best) value exists for each
parameter whose value is not known. Models that are identified usually
converge to best estimates for these parameters. Models in which at least
one parameter does not have a unique solution are called “not identified”
or “underidentified.” Underidentification occurs when the specified
equations contain more unknowns to be estimated. To illustrate, does one
unique pair of values for X and Y solve this algebraic equation?
Y = 4 + 3X
However, if we add a second equation, the system of two equations
becomes “just identified”: given two equations with two unknowns,
unique X and Y values are easily calculated:
Y = 14 – 2X
(HINT: set the two righthand sides equal and solve for X.) In this example
illustrates a “just identified” model because it has exactly as many knowns
as unknowns and yields precise estimates.
If we include a third equation in the system, such as Y = 2 + 4X, this
“overidentified” system would allow us to solve for precise X and Y values
using three different pairs. However, if that third equation were Y = 2 + 2X,
would an exact solution be possible? SEM computer programs iteratively
calculate overidentified parameter estimates that minimize discrepancies
between the observed and expected covariance matrices, (S - Σ(θ)).
The known values in structural equation models are the observed
variances and covariances, while the unknowns are those model
parameters you allow to vary freely. For example, if we have 5 indicators,
the covariance matrix contains ((5)(5+1))/2 = 15 nonduplicate values.
Hence, a CFA or SEM would not be identified if you specify more than 15
free parameters to be estimated.
SEM programs usually provide information about a model’s identifiability.
If one or more parameters are not identified, the program will be unable to
produce standard errors for the parameter estimates. In such cases, you
should try to identify the model by placing additional constraints on
appropriate parameters (consistent with theory). For example, set some
55
parameter value(s) equal to 0, 1, or equal to another free parameter. Run
the model again to see whether it yields a complete set of estimates.
SEM experts disagree on whether computer programs can be trusted to
find all instances of nonidentification, particular for very complex models.
That is, programs sometimes produce standard errors for models that are
not identified. Purists argue that researchers should check whether both
the measurement and structural models are separately identified prior to
submitting a SEM model for computer estimation. Researchers can study
the necessary-and-sufficient rules and requirements for model
identification. Two good sources are: (1) Kenneth Bollen. 1989. Structural
Equations with Latent Variables. New York: Wiley. (2) David A. Kenney’s
Rules for Identification Webpage <http://w3.nai.net/~dakenny/identify.htm>.
For this course, identification is unlikely to be problematic for the types of
CFAs and SEMs that most students will estimate -- multiple-indicator
recursive models, in which every observed variable loads on only one
latent construct, and one indicator per construct is fixed to 1.0 to set the
latent factor’s scale.
56
FACTOR ANALYSIS OF DICHOTOMIES
If a LISREL analysis includes some or all variables measured at the ordinal
or discrete (dichotomous) level, computing a covariance or Pearson
correlation matrix from such scores and applying maximum likelihood
estimation (MLE) may lead to distorted parameter estimates and inaccurate
test statistics. Jöreskog and Sörbom recommend alternative correlation
coefficients and estimation methods for such situations.
An observed variable whose categories represent a set of ordered
categories might be viewed as a crude classification of an unobserved
(latent) continuous variable z* with a standard normal distribution. For
example, a low-medium-high measure X could be trichotomized at three
threshold values for z*:
X is scored 1 if z* ≤ α1
X is scored 2 if α1 < z* ≤ α2
X is scored 3 if α2 < z*
A variety of correlation coefficients can be calculated when one or both
observed variables are ordinal:
• Polychoric correlation coefficient for two ordinal variables assumes
their underlying continuous measures have a bivariate normal
distribution
• Tetrachoric correlation, a subtype of polychoric, is used for two
dichotomies
• Polyserial correlation involves an ordinal and a continuous variable,
and also assumes an underlying bivariate normal distribution
• Biserial correlation, a subtype of polyserial, is used for a
dichotomous and a continuous variable
To include an ordinal variable in a linear relationship, LISREL iteratively
computes polychoric and polyserial correlations not from the observed
scores but from the theoretical correlations of the underlying z* variables.
A matrix of estimated correlation coefficients is created from the separate
crosstabulations for every pair of observed continuous, ordinal, or
dichotomous variables.
57
As an alternative to MLE, LISREL obtains correct large-sample standard
errors and chi-square values using a weighted least squares (WLS)
estimation method. A weight matrix required for WLS is the inverse of an
estimated asymptotic covariance matrix (W) of polychoric and polyserial
correlations. This inversion will be performed by LISREL, based on input
of a W matrix generated by PRELIS and stored on the computer as a binary
file. PRELIS computes estimates of the asymptotic covariances of the
correlations when instructed:
OUTPUT MATRIX=PM SM=FILE.MAT SA=FILE.ACM PA
The PM option instructs PRELIS to compute a matrix of polychoric
correlations when some or all variables have been declared as ordinal; SM
saves this correlation matrix in a first named file with extension “mat”; SA
saves the asymptotic covariance matrix in another file with extension
“acm”; and PA tells it to write the W matrix in the PRELIS output file.
To illustrate, I analyze the seven GSS2000 items on confidence in
institutions, whose responses were recoded into dichotomies (1 = a great
deal of confidence; 0 = only some or hardly any confidence):
165. I am going to name some institutions in this country. As far as the people running these
institutions are concerned, would you say you have a great deal of confidence, only some
confidence, or hardly any confidence at all in them?
CONFINAN: Banks and financial institutions
CONBUS: Major companies
CONCLERG: Organized religion
CONEDUC: Education
CONFED: Executive branch of the federal government
CONLABOR: Organized labor
CONPRESS: Press
CONMEDIC: Medicine
CONTV: TV
CONJUDGE: U.S. Supreme Court
CONSCI: Scientific Community
CONLEGIS: Congress
CONARMY: Military
The research question is whether a single factor or multiple factors are
required to represent the tetrachoric correlations among these
dichotomies.
58
These SPSS commands recode all 13 indicators into dichotomies, replace
all missing values with –999, and write the raw datafile to
“ABORTION.TXT”:
MISSING VALUES confinan to conarmy ().
RECODE confinan to conarmy (2,3=0)(1=1)(ELSE=-999).
FREQ VAR = confinan to conarmy.
WRITE OUTFILE = CONFIDE.TXT/confinan to conarmy (13F5.0).
EXECUTE.
These PRELIS commands estimate the asymptotic covariance matrix for
listwise deletion of cases:
PRELIS FOR 13 CONFIDENCE DICHOTOMIES(SAVED IN CONFIDE.PR2)
DATA NI=13 NO=2817 MI=-999 TR=LI
RAW-DATA-FROM FILE=CONFIDE.TXT
LABELS
FINAN BUS CLERG EDUC FED LABOR PRESS MEDIC TV JUDGE SCI LEGIS ARMY
ORDINAL FINAN BUS CLERG EDUC FED LABOR PRESS MEDIC TV JUDGE SCI LEGIS
OUTPUT MATRIX=PM SM=CONFIDE.MAT SA=CONFIDE.ACM PA
Here’s a matrix of tetrachoric correlations among 8 indicators analyzed
below, based on N = 1,496 cases:
FINAN FED PRESS MEDIC TV JUDGE SCI LEGIS
------- ------ ------- ------- ------- ------- ------ ------
FINAN 1.00
FED 0.33 1.00
PRESS 0.32 0.40 1.00
MEDIC 0.41 0.38 0.35 1.00
TV 0.32 0.32 0.61 0.41 1.00
JUDGE 0.39 0.52 0.37 0.35 0.23 1.00
SCI 0.43 0.33 0.22 0.45 0.15 0.55 1.00
LEGIS 0.49 0.72 0.47 0.41 0.41 0.68 0.40 1.00
In contrast, the Pearsonian correlation coefficients, which treat the
confidence dichotomies as continuous variables, are substantially smaller:
FINAN FED PRESS MEDIC TV JUDGE SCI LEGIS
------- ------ ------- ------- ------- ------- ------ ------
FINAN 1.00
FED 0.18 1.00
PRESS 0.16 0.20 1.00
MEDIC 0.25 0.19 0.17 1.00
TV 0.16 0.14 0.33 0.19 1.00
JUDGE 0.25 0.28 0.19 0.22 0.11 1.00
SCI 0.27 0.17 0.10 0.30 0.07 0.36 1.00
LEGIS 0.26 0.45 0.24 0.20 0.19 0.38 0.20 1.00
For a single-factor model, LISREL automatically applies WLS when
instructed to read the asymptotic covariance matrix:
59
LISREL FOR 13 GSS00 CONFIDENCE DICHOTOMIES (SAVED IN CONFIDE.LS8)
Observed variables:
FINAN BUS CLERG EDUC FED LABOR PRESS MEDIC TV JUDGE SCI LEGIS ARMY
Correlation Matrix from File: CONFIDE.MAT
Asymptotic Covariance Matrix from File: CONFIDE.ACM
Sample size: 1496
Latent Variables: Confide1
Relationships:
FINAN - ARMY = Confide1
Path Diagram
LISREL Output: SC MI
End of Problem
This single-factor model yielded an absurdly high χ2
= 2,063.1 (14 df; p <
.0001), and two other statistics indicated poor good fits -- GFI = 0.83; AGFI
= 0.76. However, RMSEA = 0.053 (“reasonable fit”), 90% CI from 0.048 to
0.059.
I decide to estimate a multi-factor model, incrementally adding an indicator
and observing the change in fit statistics. After several trials, I concluded
that a model having three factors with 8 indicators and four correlated error
terms produced the best fit. Here are the commands:
LISREL FOR 13 GSS00 CONFIDENCE DICHOTOMIES (SAVED IN CONFIDE.LS8)
Observed variables:
FINAN BUS CLERG EDUC FED LABOR PRESS MEDIC TV JUDGE SCI LEGIS ARMY
Correlation Matrix from File: CONFIDE.MAT
Asymptotic Covariance Matrix from File: CONFIDE.ACM
Sample size: 1946
Latent Variables: Confide1 Confide2 Confide3
Relationships:
LEGIS FED JUDGE = Confide1
TV PRESS = Confide2
FINAN MEDIC SCI = Confide3
Let the Errors between JUDGE and SCI Correlate
Let the Errors between MEDIC and PRESS Correlate
Let the Errors between FINAN and FED Correlate
Let the Errors between JUDGE and PRESS Correlate
Path Diagram
LISREL Output: SC MI
End of Problem
60
The factor loadings are high, reflecting the polychoric correlations on
which they are based. The three factors are strongly correlated: r12 = 0.59,
r23 = 0.59, and r13 = 0.71. As usual, the substantive meanings of latent
constructs can be inferred by the contents of the specific measures which
load on them. What dimensions, if any, do you conjecture?
The importance of using WLS in conjunction with the asymptotic
covariance matrix becomes evident when the preceding analysis uses only
the tetrachoric correlation matrix. The fit statistics are much worse: χ2
=
159.3 (13 df; p < .0001) and RMSEA = 0.087.
61
SEM WITH ORDINAL VARIABLES
Continuing to examine how LISREL handles ordinal indicators, I next
estimate a structural equation model that treats all indicators as ordinal
variables. The data are from GSS98. The dependent variable is a latent
abortion attitude construct with three dichotomous indicators of legal
abortions for specific circumstances (1 = yes, 0 = no):
Please tell me whether or not you think it should be possible for a pregnant woman to obtain
a legal abortion if. . .
ABDEFECT: If there is a strong chance of serious defect in the baby?
ABHLTH: If the woman's own health is seriously endangered by the pregnancy?
ABRAPE: If she became pregnant as a result of rape?
The two independent constructs affecting variation in abortion attitude are:
(1) a political orientation construct with indicators PARTYID and
POLVIEWS, each having seven ordered categories; and (2) a religiosity
construct consisting of two items having six and eight ordered categories,
respectively:
PRAY: About how often do you pray?
PRIVPRAY: How often do you pray privately in places other than at church or synagogue?
The structural model has a curved two-headed arrow to indicate the
antecedent constructs are correlated but not causally related:
Abort
Prays
Politics
62
After recoding the two religiosity indicators to make frequent praying the
high scores, I used SPSS to write the datafile. Next, these PRELIS
commands created the correlation matrix and ACM files:
PRELIS FOR PRAY-ABORTION VARIABLES (PRAY.PR2)
DATA NI=7 NO=2832 MISSING=-999 TR=LI
RAW-DATA-FROM FILE=PRAY.TXT
LABELS
ABDEFECT ABHLTH ABRAPE PRAY PRIVPRAY PARTYID POLVIEWS
ORDINAL ABDEFECT ABHLTH ABRAPE PRAY PRIVPRAY PARTYID POLVIEWS
OUTPUT MATRIX=PM SM=PRAY.MAT SA=PRAY.ACM PA
The tetrachoric correlation matrix:
ABDEFECT ABHLTH ABRAPE PRAY PRIVPRAY PARTYID POLVIEWS
-------- -------- -------- ------- -------- -------- --------
ABDEFECT 1.000
ABHLTH 0.866 1.000
ABRAPE 0.802 0.859 1.000
PRAY -0.233 -0.289 -0.263 1.000
PRIVPRAY -0.295 -0.351 -0.303 0.824 1.000
PARTYID -0.243 -0.105 -0.166 0.075 0.124 1.000
POLVIEWS -0.068 -0.085 -0.120 -0.001 0.061 0.434 1.000
Here are the commands:
LISREL FOR GSS98 PRAY-POLITICS-ABORTION MODEL
Observed variables:
ABDEFECT ABHLTH ABRAPE PRAY PRIVPRAY PARTYID POLVIEWS
Correlation Matrix from File: PRAY.MAT
Asymptotic Covariance Matrix from File: PRAY.ACM
Sample size: 598
Latent Variables: Abortion Prays Politics
Relationships:
ABDEFECT ABHLTH ABRAPE = Abortion
PRAY PRIVPRAY = Prays
PARTYID POLVIEWS = Politics
Abortion = Prays
Abortion = Politics
Path Diagram
LISREL Output: SC MI
End of Problem
63
The path diagram with completely standardized estimates:
Notice that the error term for PRIVPRAY is negative (-0.02), a nonsensical
value. To constrain that parameter, at the cost of 1 df, include this LISREL
command:
Let the Error Variance of PRIVPRAY Equal 0
What are your substantive interpretations about the relative impacts of
political orientation and religiosity on abortion attitudes?

More Related Content

What's hot

Correlation and regression
Correlation and regressionCorrelation and regression
Correlation and regressionHuma Ansari
 
Some Difficulties in the Gustafson Dental Age Estimations
Some Difficulties in the Gustafson Dental Age EstimationsSome Difficulties in the Gustafson Dental Age Estimations
Some Difficulties in the Gustafson Dental Age EstimationsBalachandar Kirubakaran
 
Analyzing Relations between Data Set - Part I
Analyzing Relations between Data Set - Part IAnalyzing Relations between Data Set - Part I
Analyzing Relations between Data Set - Part INaseha Sameen
 
Multiple linear regression II
Multiple linear regression IIMultiple linear regression II
Multiple linear regression IIJames Neill
 
Econometrics notes (Introduction, Simple Linear regression, Multiple linear r...
Econometrics notes (Introduction, Simple Linear regression, Multiple linear r...Econometrics notes (Introduction, Simple Linear regression, Multiple linear r...
Econometrics notes (Introduction, Simple Linear regression, Multiple linear r...Muhammad Ali
 
Machine Learning Algorithm - Linear Regression
Machine Learning Algorithm - Linear RegressionMachine Learning Algorithm - Linear Regression
Machine Learning Algorithm - Linear RegressionKush Kulshrestha
 
Regression Analysis
Regression AnalysisRegression Analysis
Regression AnalysisSalim Azad
 
Introduction to financial forecasting in investment analysis
Introduction to financial forecasting in investment analysisIntroduction to financial forecasting in investment analysis
Introduction to financial forecasting in investment analysisSpringer
 
Simple linear regression
Simple linear regressionSimple linear regression
Simple linear regressionpankaj8108
 
Regression and Co-Relation
Regression and Co-RelationRegression and Co-Relation
Regression and Co-Relationnuwan udugampala
 
Econometric model ing
Econometric model ingEconometric model ing
Econometric model ingMatt Grant
 
Chapter 2 part1-Scatterplots
Chapter 2 part1-ScatterplotsChapter 2 part1-Scatterplots
Chapter 2 part1-Scatterplotsnszakir
 
Correlation and regression analysis
Correlation and regression analysisCorrelation and regression analysis
Correlation and regression analysis_pem
 

What's hot (19)

Correlation and regression
Correlation and regressionCorrelation and regression
Correlation and regression
 
Some Difficulties in the Gustafson Dental Age Estimations
Some Difficulties in the Gustafson Dental Age EstimationsSome Difficulties in the Gustafson Dental Age Estimations
Some Difficulties in the Gustafson Dental Age Estimations
 
Correlation
CorrelationCorrelation
Correlation
 
Analyzing Relations between Data Set - Part I
Analyzing Relations between Data Set - Part IAnalyzing Relations between Data Set - Part I
Analyzing Relations between Data Set - Part I
 
Correlation analysis
Correlation analysis Correlation analysis
Correlation analysis
 
Multiple linear regression II
Multiple linear regression IIMultiple linear regression II
Multiple linear regression II
 
Econometrics notes (Introduction, Simple Linear regression, Multiple linear r...
Econometrics notes (Introduction, Simple Linear regression, Multiple linear r...Econometrics notes (Introduction, Simple Linear regression, Multiple linear r...
Econometrics notes (Introduction, Simple Linear regression, Multiple linear r...
 
Machine Learning Algorithm - Linear Regression
Machine Learning Algorithm - Linear RegressionMachine Learning Algorithm - Linear Regression
Machine Learning Algorithm - Linear Regression
 
Regression Analysis
Regression AnalysisRegression Analysis
Regression Analysis
 
Introduction to financial forecasting in investment analysis
Introduction to financial forecasting in investment analysisIntroduction to financial forecasting in investment analysis
Introduction to financial forecasting in investment analysis
 
Russo Silf07 Explaining Causal Modelling
Russo Silf07 Explaining Causal ModellingRusso Silf07 Explaining Causal Modelling
Russo Silf07 Explaining Causal Modelling
 
Simple linear regression
Simple linear regressionSimple linear regression
Simple linear regression
 
Correlation and Regression
Correlation and RegressionCorrelation and Regression
Correlation and Regression
 
Regression and Co-Relation
Regression and Co-RelationRegression and Co-Relation
Regression and Co-Relation
 
Scattergrams
ScattergramsScattergrams
Scattergrams
 
Econometric model ing
Econometric model ingEconometric model ing
Econometric model ing
 
Chapter 2 part1-Scatterplots
Chapter 2 part1-ScatterplotsChapter 2 part1-Scatterplots
Chapter 2 part1-Scatterplots
 
Correlation and Regression
Correlation and RegressionCorrelation and Regression
Correlation and Regression
 
Correlation and regression analysis
Correlation and regression analysisCorrelation and regression analysis
Correlation and regression analysis
 

Similar to Notes s8811 structuralequations2004

Linear regression
Linear regressionLinear regression
Linear regressionDepEd
 
Correlation analysis
Correlation analysisCorrelation analysis
Correlation analysisAwais Salman
 
Simple regressionand correlation (2).pdf
Simple regressionand correlation (2).pdfSimple regressionand correlation (2).pdf
Simple regressionand correlation (2).pdfyadavrahulrahul799
 
Artikel Original Uji Sobel (Sobel Test)
Artikel Original Uji Sobel (Sobel Test)Artikel Original Uji Sobel (Sobel Test)
Artikel Original Uji Sobel (Sobel Test)Trisnadi Wijaya
 
Research Methodology Module-06
Research Methodology Module-06Research Methodology Module-06
Research Methodology Module-06Kishor Ade
 
STRUCTURAL EQUATION MODEL (SEM)
STRUCTURAL EQUATION MODEL (SEM)STRUCTURAL EQUATION MODEL (SEM)
STRUCTURAL EQUATION MODEL (SEM)AJHSSR Journal
 
ders 8 Quantile-Regression.ppt
ders 8 Quantile-Regression.pptders 8 Quantile-Regression.ppt
ders 8 Quantile-Regression.pptErgin Akalpler
 
Scatter plot- Complete
Scatter plot- CompleteScatter plot- Complete
Scatter plot- CompleteIrfan Yaqoob
 
REGRESSION ANALYSIS THEORY EXPLAINED HERE
REGRESSION ANALYSIS THEORY EXPLAINED HEREREGRESSION ANALYSIS THEORY EXPLAINED HERE
REGRESSION ANALYSIS THEORY EXPLAINED HEREShriramKargaonkar
 
Data analysis test for association BY Prof Sachin Udepurkar
Data analysis   test for association BY Prof Sachin UdepurkarData analysis   test for association BY Prof Sachin Udepurkar
Data analysis test for association BY Prof Sachin Udepurkarsachinudepurkar
 
[Xin yan, xiao_gang_su]_linear_regression_analysis(book_fi.org)
[Xin yan, xiao_gang_su]_linear_regression_analysis(book_fi.org)[Xin yan, xiao_gang_su]_linear_regression_analysis(book_fi.org)
[Xin yan, xiao_gang_su]_linear_regression_analysis(book_fi.org)mohamedchaouche
 
correlation and regression
correlation and regressioncorrelation and regression
correlation and regressionKeyur Tejani
 
Chapter 5.pptx
Chapter 5.pptxChapter 5.pptx
Chapter 5.pptxmesfin69
 
Correlation and regression
Correlation and regressionCorrelation and regression
Correlation and regressionAntony Raj
 

Similar to Notes s8811 structuralequations2004 (20)

MSTHESIS_Fuzzy
MSTHESIS_FuzzyMSTHESIS_Fuzzy
MSTHESIS_Fuzzy
 
Linear regression
Linear regressionLinear regression
Linear regression
 
Correlation analysis
Correlation analysisCorrelation analysis
Correlation analysis
 
Simple regressionand correlation (2).pdf
Simple regressionand correlation (2).pdfSimple regressionand correlation (2).pdf
Simple regressionand correlation (2).pdf
 
Artikel Original Uji Sobel (Sobel Test)
Artikel Original Uji Sobel (Sobel Test)Artikel Original Uji Sobel (Sobel Test)
Artikel Original Uji Sobel (Sobel Test)
 
Research Methodology Module-06
Research Methodology Module-06Research Methodology Module-06
Research Methodology Module-06
 
STRUCTURAL EQUATION MODEL (SEM)
STRUCTURAL EQUATION MODEL (SEM)STRUCTURAL EQUATION MODEL (SEM)
STRUCTURAL EQUATION MODEL (SEM)
 
ders 8 Quantile-Regression.ppt
ders 8 Quantile-Regression.pptders 8 Quantile-Regression.ppt
ders 8 Quantile-Regression.ppt
 
Dyadic Data Analysis
Dyadic Data AnalysisDyadic Data Analysis
Dyadic Data Analysis
 
Scatter plot- Complete
Scatter plot- CompleteScatter plot- Complete
Scatter plot- Complete
 
REGRESSION ANALYSIS THEORY EXPLAINED HERE
REGRESSION ANALYSIS THEORY EXPLAINED HEREREGRESSION ANALYSIS THEORY EXPLAINED HERE
REGRESSION ANALYSIS THEORY EXPLAINED HERE
 
Sem+Essentials
Sem+EssentialsSem+Essentials
Sem+Essentials
 
Data analysis test for association BY Prof Sachin Udepurkar
Data analysis   test for association BY Prof Sachin UdepurkarData analysis   test for association BY Prof Sachin Udepurkar
Data analysis test for association BY Prof Sachin Udepurkar
 
[Xin yan, xiao_gang_su]_linear_regression_analysis(book_fi.org)
[Xin yan, xiao_gang_su]_linear_regression_analysis(book_fi.org)[Xin yan, xiao_gang_su]_linear_regression_analysis(book_fi.org)
[Xin yan, xiao_gang_su]_linear_regression_analysis(book_fi.org)
 
correlation and regression
correlation and regressioncorrelation and regression
correlation and regression
 
Chapter 5.pptx
Chapter 5.pptxChapter 5.pptx
Chapter 5.pptx
 
gamdependence_revision1
gamdependence_revision1gamdependence_revision1
gamdependence_revision1
 
Bus 173_6.pptx
Bus 173_6.pptxBus 173_6.pptx
Bus 173_6.pptx
 
Correlation and regression
Correlation and regressionCorrelation and regression
Correlation and regression
 
2-20-04.ppt
2-20-04.ppt2-20-04.ppt
2-20-04.ppt
 

Recently uploaded

Instant Issue Debit Cards - High School Spirit
Instant Issue Debit Cards - High School SpiritInstant Issue Debit Cards - High School Spirit
Instant Issue Debit Cards - High School Spiritegoetzinger
 
Andheri Call Girls In 9825968104 Mumbai Hot Models
Andheri Call Girls In 9825968104 Mumbai Hot ModelsAndheri Call Girls In 9825968104 Mumbai Hot Models
Andheri Call Girls In 9825968104 Mumbai Hot Modelshematsharma006
 
Log your LOA pain with Pension Lab's brilliant campaign
Log your LOA pain with Pension Lab's brilliant campaignLog your LOA pain with Pension Lab's brilliant campaign
Log your LOA pain with Pension Lab's brilliant campaignHenry Tapper
 
VIP High Class Call Girls Saharanpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Saharanpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Saharanpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Saharanpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
VIP Call Girls in Saharanpur Aarohi 8250192130 Independent Escort Service Sah...
VIP Call Girls in Saharanpur Aarohi 8250192130 Independent Escort Service Sah...VIP Call Girls in Saharanpur Aarohi 8250192130 Independent Escort Service Sah...
VIP Call Girls in Saharanpur Aarohi 8250192130 Independent Escort Service Sah...Suhani Kapoor
 
Chapter 2.ppt of macroeconomics by mankiw 9th edition
Chapter 2.ppt of macroeconomics by mankiw 9th editionChapter 2.ppt of macroeconomics by mankiw 9th edition
Chapter 2.ppt of macroeconomics by mankiw 9th editionMuhammadHusnain82237
 
(TANVI) Call Girls Nanded City ( 7001035870 ) HI-Fi Pune Escorts Service
(TANVI) Call Girls Nanded City ( 7001035870 ) HI-Fi Pune Escorts Service(TANVI) Call Girls Nanded City ( 7001035870 ) HI-Fi Pune Escorts Service
(TANVI) Call Girls Nanded City ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
VIP Call Girls LB Nagar ( Hyderabad ) Phone 8250192130 | ₹5k To 25k With Room...
VIP Call Girls LB Nagar ( Hyderabad ) Phone 8250192130 | ₹5k To 25k With Room...VIP Call Girls LB Nagar ( Hyderabad ) Phone 8250192130 | ₹5k To 25k With Room...
VIP Call Girls LB Nagar ( Hyderabad ) Phone 8250192130 | ₹5k To 25k With Room...Suhani Kapoor
 
Call Girls In Yusuf Sarai Women Seeking Men 9654467111
Call Girls In Yusuf Sarai Women Seeking Men 9654467111Call Girls In Yusuf Sarai Women Seeking Men 9654467111
Call Girls In Yusuf Sarai Women Seeking Men 9654467111Sapana Sha
 
20240429 Calibre April 2024 Investor Presentation.pdf
20240429 Calibre April 2024 Investor Presentation.pdf20240429 Calibre April 2024 Investor Presentation.pdf
20240429 Calibre April 2024 Investor Presentation.pdfAdnet Communications
 
call girls in Nand Nagri (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in  Nand Nagri (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in  Nand Nagri (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Nand Nagri (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Vip B Aizawl Call Girls #9907093804 Contact Number Escorts Service Aizawl
Vip B Aizawl Call Girls #9907093804 Contact Number Escorts Service AizawlVip B Aizawl Call Girls #9907093804 Contact Number Escorts Service Aizawl
Vip B Aizawl Call Girls #9907093804 Contact Number Escorts Service Aizawlmakika9823
 
Independent Call Girl Number in Kurla Mumbai📲 Pooja Nehwal 9892124323 💞 Full ...
Independent Call Girl Number in Kurla Mumbai📲 Pooja Nehwal 9892124323 💞 Full ...Independent Call Girl Number in Kurla Mumbai📲 Pooja Nehwal 9892124323 💞 Full ...
Independent Call Girl Number in Kurla Mumbai📲 Pooja Nehwal 9892124323 💞 Full ...Pooja Nehwal
 
Instant Issue Debit Cards - School Designs
Instant Issue Debit Cards - School DesignsInstant Issue Debit Cards - School Designs
Instant Issue Debit Cards - School Designsegoetzinger
 
New dynamic economic model with a digital footprint | European Business Review
New dynamic economic model with a digital footprint | European Business ReviewNew dynamic economic model with a digital footprint | European Business Review
New dynamic economic model with a digital footprint | European Business ReviewAntonis Zairis
 
Stock Market Brief Deck for 4/24/24 .pdf
Stock Market Brief Deck for 4/24/24 .pdfStock Market Brief Deck for 4/24/24 .pdf
Stock Market Brief Deck for 4/24/24 .pdfMichael Silva
 
Russian Call Girls In Gtb Nagar (Delhi) 9711199012 💋✔💕😘 Naughty Call Girls Se...
Russian Call Girls In Gtb Nagar (Delhi) 9711199012 💋✔💕😘 Naughty Call Girls Se...Russian Call Girls In Gtb Nagar (Delhi) 9711199012 💋✔💕😘 Naughty Call Girls Se...
Russian Call Girls In Gtb Nagar (Delhi) 9711199012 💋✔💕😘 Naughty Call Girls Se...shivangimorya083
 
The Economic History of the U.S. Lecture 17.pdf
The Economic History of the U.S. Lecture 17.pdfThe Economic History of the U.S. Lecture 17.pdf
The Economic History of the U.S. Lecture 17.pdfGale Pooley
 

Recently uploaded (20)

Instant Issue Debit Cards - High School Spirit
Instant Issue Debit Cards - High School SpiritInstant Issue Debit Cards - High School Spirit
Instant Issue Debit Cards - High School Spirit
 
Andheri Call Girls In 9825968104 Mumbai Hot Models
Andheri Call Girls In 9825968104 Mumbai Hot ModelsAndheri Call Girls In 9825968104 Mumbai Hot Models
Andheri Call Girls In 9825968104 Mumbai Hot Models
 
Log your LOA pain with Pension Lab's brilliant campaign
Log your LOA pain with Pension Lab's brilliant campaignLog your LOA pain with Pension Lab's brilliant campaign
Log your LOA pain with Pension Lab's brilliant campaign
 
VIP High Class Call Girls Saharanpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Saharanpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Saharanpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Saharanpur Anushka 8250192130 Independent Escort Se...
 
VIP Call Girls in Saharanpur Aarohi 8250192130 Independent Escort Service Sah...
VIP Call Girls in Saharanpur Aarohi 8250192130 Independent Escort Service Sah...VIP Call Girls in Saharanpur Aarohi 8250192130 Independent Escort Service Sah...
VIP Call Girls in Saharanpur Aarohi 8250192130 Independent Escort Service Sah...
 
Chapter 2.ppt of macroeconomics by mankiw 9th edition
Chapter 2.ppt of macroeconomics by mankiw 9th editionChapter 2.ppt of macroeconomics by mankiw 9th edition
Chapter 2.ppt of macroeconomics by mankiw 9th edition
 
(TANVI) Call Girls Nanded City ( 7001035870 ) HI-Fi Pune Escorts Service
(TANVI) Call Girls Nanded City ( 7001035870 ) HI-Fi Pune Escorts Service(TANVI) Call Girls Nanded City ( 7001035870 ) HI-Fi Pune Escorts Service
(TANVI) Call Girls Nanded City ( 7001035870 ) HI-Fi Pune Escorts Service
 
VIP Call Girls LB Nagar ( Hyderabad ) Phone 8250192130 | ₹5k To 25k With Room...
VIP Call Girls LB Nagar ( Hyderabad ) Phone 8250192130 | ₹5k To 25k With Room...VIP Call Girls LB Nagar ( Hyderabad ) Phone 8250192130 | ₹5k To 25k With Room...
VIP Call Girls LB Nagar ( Hyderabad ) Phone 8250192130 | ₹5k To 25k With Room...
 
Call Girls In Yusuf Sarai Women Seeking Men 9654467111
Call Girls In Yusuf Sarai Women Seeking Men 9654467111Call Girls In Yusuf Sarai Women Seeking Men 9654467111
Call Girls In Yusuf Sarai Women Seeking Men 9654467111
 
Commercial Bank Economic Capsule - April 2024
Commercial Bank Economic Capsule - April 2024Commercial Bank Economic Capsule - April 2024
Commercial Bank Economic Capsule - April 2024
 
20240429 Calibre April 2024 Investor Presentation.pdf
20240429 Calibre April 2024 Investor Presentation.pdf20240429 Calibre April 2024 Investor Presentation.pdf
20240429 Calibre April 2024 Investor Presentation.pdf
 
call girls in Nand Nagri (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in  Nand Nagri (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in  Nand Nagri (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Nand Nagri (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
Vip B Aizawl Call Girls #9907093804 Contact Number Escorts Service Aizawl
Vip B Aizawl Call Girls #9907093804 Contact Number Escorts Service AizawlVip B Aizawl Call Girls #9907093804 Contact Number Escorts Service Aizawl
Vip B Aizawl Call Girls #9907093804 Contact Number Escorts Service Aizawl
 
Independent Call Girl Number in Kurla Mumbai📲 Pooja Nehwal 9892124323 💞 Full ...
Independent Call Girl Number in Kurla Mumbai📲 Pooja Nehwal 9892124323 💞 Full ...Independent Call Girl Number in Kurla Mumbai📲 Pooja Nehwal 9892124323 💞 Full ...
Independent Call Girl Number in Kurla Mumbai📲 Pooja Nehwal 9892124323 💞 Full ...
 
Instant Issue Debit Cards - School Designs
Instant Issue Debit Cards - School DesignsInstant Issue Debit Cards - School Designs
Instant Issue Debit Cards - School Designs
 
New dynamic economic model with a digital footprint | European Business Review
New dynamic economic model with a digital footprint | European Business ReviewNew dynamic economic model with a digital footprint | European Business Review
New dynamic economic model with a digital footprint | European Business Review
 
Stock Market Brief Deck for 4/24/24 .pdf
Stock Market Brief Deck for 4/24/24 .pdfStock Market Brief Deck for 4/24/24 .pdf
Stock Market Brief Deck for 4/24/24 .pdf
 
Russian Call Girls In Gtb Nagar (Delhi) 9711199012 💋✔💕😘 Naughty Call Girls Se...
Russian Call Girls In Gtb Nagar (Delhi) 9711199012 💋✔💕😘 Naughty Call Girls Se...Russian Call Girls In Gtb Nagar (Delhi) 9711199012 💋✔💕😘 Naughty Call Girls Se...
Russian Call Girls In Gtb Nagar (Delhi) 9711199012 💋✔💕😘 Naughty Call Girls Se...
 
The Economic History of the U.S. Lecture 17.pdf
The Economic History of the U.S. Lecture 17.pdfThe Economic History of the U.S. Lecture 17.pdf
The Economic History of the U.S. Lecture 17.pdf
 
Veritas Interim Report 1 January–31 March 2024
Veritas Interim Report 1 January–31 March 2024Veritas Interim Report 1 January–31 March 2024
Veritas Interim Report 1 January–31 March 2024
 

Notes s8811 structuralequations2004

  • 1. SOC 8811 ADVANCED STATISTICS LECTURE NOTES STRUCTURAL EQUATION MODELS SPRING 2004 Prof. David Knoke Sociology Department 909 Social Sciences (612) 624-6816/4300 knoke@atlas.socsci.umn.edu
  • 2. 2 TABLE OF CONTENTS Review of Correlation & Covariance 3 Structural Equation Models 5 Validity and Reliability 8 Classical Test Theory 11 Parallel Measures 13 Factor Analysis 16 Factor Analysis of GSS Job Values 20 PRELIS to Create a Matrix 21 SIMPLIS Commands 22 Confirmatory Factor Analysis 23 Model Fit Statistics 30 A Two-Factor Model 34 Modification Indexes 35 A MIMIC Model 38 A Chain Model 44 Equality Constraints on Parameters 46 Comparing Models for Groups 48 A Path Model 51 Model Identification 54 Factor Analysis of Dichotomies 56 SEM with Ordinal Variables 61
  • 3. 3 REVIEW OF CORRELATION & COVARIANCE Because structural equation models (SEMs) are based on analyses of covariance or correlation matrices, a brief review of these descriptive statistics may be helpful. The Pearson product-moment correlation coefficient for two continuous variables, Y and X, measures the amount of dispersion (spread) around a linear least-squares regression line. For a population, using a Greek letter to indicate a parameter, the OLS estimator for the bivariate regression slope of Y on X is: ∑ ∑ = = − −− = N i i N i ii YX XX XXYY 1 2 1 )( ))(( β The numerator of this parameter is the sum across the N observations of the cross-product of deviations of both variables around their means. The denominator is the sum of squared deviations of X around its means. If we divide both the numerator and denominator by N, the regression slope formula becomes: ∑ ∑ = = − −− = N i i N i ii YX NXX NXXYY 1 2 1 /)( /))(( β The numerator is called the covariance of Y and X and the denominator is the variance of X. Thus, we can simplify the OLS estimator of the bivariate regression coefficient as the ratio of those two components: 2 X YX YX σ σ β =
  • 4. 4 Depending on the direction of the covariance of Y and X, a bivariate regression slope may have a positive or negative sign, indicating the direction of the relationship between Y and X in the population. In a bivariate regression the population coefficient of determination, ρ2 , indicates the proportion of total variation in Y that is determined by its linear relationship with X. One of its formulas (see SSDA4 pp. 184 for details) involves the ratio of the squared covariance to the product of both variances: 22 2 2 YX YX σσ σ ρ = Given to the squaring, the coefficient of determination cannot have a negative sign. The Pearson product-moment correlation coefficient is defined as the square root of the coefficient of determination. It summarizes the linear relationship and takes the same sign (plus or minus) as the regression slope: XY YX YXYX σσ σ ρρ == 2 Thus, the correlation is also defined as the covariance of Y and X divided by the product of the standard deviations of both variables. It ranges between +1.00 and –1.00 and has a value of 0 when the two variables do not covary (i.e., are unrelated). The sign attached to the correlation must be the same as the signs of the covariance and the regression slope. Both correlations and covariances are symmetric; that is, XYYX ρρ = and XYYX σσ = , which can be ascertained by noting that the order of cross- product multiplication is irrelevant in the regression slope formula above. One important relation between covariance and correlation is to observe what happens when both X and Y are standardized variables; that is, turned into Z-scores by subtracting the mean and dividing by the standard deviation. Into the formula above for ρYX, substitute Z-scores for both variables:
  • 5. 5 XY XY XY XY XY ZZ ZZ ZZ ZZ ZZ σ σ σσ σ ρ === )1)(1( Because the standard deviation of a Z-score is 1.00, the correlation coefficient for two standardized measures equals their covariance. Correlation coefficients “scale-free,” that is, they are unaffected by whether the units of measurement are the original scales or their transformed Z- scores. We will see that structural equation models can be estimated using either covariances or correlations (or both). STRUCTURAL EQUATION MODELS This part of this course examines the basics of structural equation models (SEMs), specifically the LISREL (LInear Structural RELations) approach developed three decades ago by Swedish psychometricians Karl Jöreskog and Dag Sörbom. We’ll be using LISREL 8.54 to analyze General Social Survey data. These notes use the simplified commands (SIMPLIS), as does Chapter 12 in SSDA4. To understand causal diagrams, a good preparation would be to skim Chapter 11 on causal models and path analysis. However, I will try to develop everything we need on that topic in these lecture notes, primarily by working through increasingly complex data analysis examples. As with every statistical method, the structural equation approach is more suitable to some types of data and measures than to others. Two major uses of LISREL are: (1) to model social psychological attitudes (factor structures), in which one or more unobserved constructs generate the variation in several observed indicators; and (2) to estimate parameters for a causal model, in which some variables are treated as causes of other variables (the effects). The chief advantage of LISREL over alternative methods (such as path analysis and index construction), lies in its power to combine observed measures with relations among unobserved constructs into a single integrated system.
  • 6. 6 I like to imagine that the relationship between structural and measurement levels of analysis can be traced back to a famous philosophical metaphor in Plato’s New Republic: the shadows that the unenlightened prisoners see on the cave wall are obscure reflections of an underlying reality which analysts cannot view directly but can only seek to comprehend through intellectual reasoning. Concepts and the objects they indicate are not identical phenomena (the point of René Magritte’s droll painting, “Ceci n’est pas une pipe”). Similarly, as Plato elsewhere reasoned, a triangle drawn with pencil and paper is a flawed representation of the abstract, eternal concept of “triangle” that exists beyond the realm of sensual perception. By analogy, social scientists can never accurately observe peoples’ attitudes (not even their visible behaviors), but can only infer their existence by making noisy, error-prone measurements – such as respondents’ responses to survey questions – that are only partially influenced by their unobservable true beliefs (or actions). “The famous pipe. How people reproached me for it! And yet, could you stuff my pipe? No, it’s just a representation, is it not? So if I had written on my picture ‘This is a pipe,’ I'd have been lying!” - René Magritte
  • 7. 7 Modern measurement theory concerns the relationships between a latent construct at the theoretical or conceptual level and observed indicators at the level of empirical observations: Complete these examples: CONSTRUCTS INDICATOR(S) Religiosity ______________________________ Industrialization ______________________________ Delinquency ______________________________ Centralization ______________________________ Intelligence ______________________________ _________________ Sudden numbness, confusion, difficulty seeing, severe headache, loss of coordination & balance _________________ Fewer social services; low tax rates; stronger national defense EMPIRICAL LEVEL Observed indicators Income Education Occupation CONCEPTUAL LEVEL Latent construct Socioeconomic Status (SES)
  • 8. 8 VALIDITY AND RELIABILITY Measurement theory seeks to represent a latent construct with one or more observable indicators (operational measure or variable) that accurately capture that theoretical construct. Two desirable properties of empirical quantitative measures are high levels of validity and reliability: • Validity: The degree to which a variable’s operationalization accurately reflects the concept it is intended to measure. • Reliability: The extent to which different operationalizations of the same concept produce consistent results. The proportion of an item’s variance that is attributable to the unobserved cause or source. Many validity issues concern how well or poorly an observable variable reflects its latent counterpart. Another central concern is with accurately depicting the (causal or covariational) relationships among several theoretical constructs, using information about the covariation among observed indicators. This latter interest lies at the heart of the factor analysis and structural equation models examined in later sections. Reliability refers to the replicability of a measure under the same conditions. A perfectly reliable measure must generate the same scores when conditions are identical. A measure may be very reliable but not valid; that is, an instrument can precisely measure some phenomenon yet represent complete nonsense. For example, your bathroom scale consistently gives identical readings when you step off and on, but it invalidly operationalizes your true weight (you dialed it back 5 pounds). To be valid, a measure or indicator must be reliable. In the extreme, if a measure’s reliability is zero, its validity is also zero. However, a given indicator may vary in the extent of its validity as a measure of different concepts. For example, education, measured as years of formal schooling, might be used both as an indicator of educational persistence and as an indicator of socioeconomic status (SES). Validity is clearly affected by the choice of one’s indicator(s). For example, we can treat church attendance as a measure of Americans’ religiosity, but this indicator might have only moderate validity because some highly religious persons don’t attend services, and some go to church mainly for social purposes. A more valid measure of religiosity would include not only attendance at religious services, but also would query people about their religious beliefs (e.g., in
  • 9. 9 the efficacy of prayer, the existence of an afterlife, and infallibility of scriptures). Unfortunately, researchers never obtain perfect measurements in the real world; that is, all measures are subject to measurement error, hence they are all unreliable and invalid to some greater or lesser degree. Measurement theory is therefore also a theory about the magnitudes and sources of errors in empirical observations. Reliability assumes random errors. When a measurement is repeated over numerous occasions under the same conditions, if random error occurs, then the resulting variations in scores form a normal distribution about the measure’s true value. The standard error of that distribution represents the magnitude of the measurement error: the larger the standard error, the lower the measure’s reliability. By definition, random errors are uncorrelated with any variable, including other random error variables. Natural scientists also face measurement reliability problems. For example, astronomers made important contributions to measurement theory by developing techniques for estimating the true transit times of Jupiter’s moons from erroneous telescopic observations. (See Stephen M. Stigler. 1986. The History of Statistics: The Measurement of Uncertainty Before 1900. Cambridge, MA: Harvard University Press.) Systematic error (nonrandom error) implies a miscalibration of the measuring instrument that biases the scores by consistently over- or underestimating a latent construct (e.g., your miscalibrated bathroom scale). Such consistent biases don’t alter the measure’s reliability, but they clearly alter its validity because they prevent the indicator from accurately representing the theoretical concept. The research methodology literature discusses several types of validity, but we lack space to examine all these conceptual distinctions (Box 12.1 defines a variety of validity concepts). For purposes of explicating structural equations models, we’ll assume that the empirical observations we use have adequate content validity as indicators of the designated latent constructs. Therefore, we turn next to the quantification of reliability in classical test theory.
  • 10. 10 BOX 12.1 Varieties of Validity Validity indicates the appropriateness of a measurement instrument, such as a battery of test items, for the concept it intends to measure. In other words, an instrument’s validity denotes the extent to which measures what it is supposed to measure. Validity can be established by experts knowledgeable about a substantive domain, or by demonstrating a measure’s consistency with the theoretical concepts it is designed to represent. Three traditional types of measurement validity are construct, criterion-related, and content validity. Brief definitions and examples of these various validity types are: Construct validity: the extent to which a measure agrees with theoretical expectations; for example, IQ test items try to measure theoretically hypothesized dimensions of intelligence. Measures with high convergent validity and discriminant validity exhibit high agreement with theoretically similar measures but low correlations with dissimilar measures, respectively. Criterion-related validity: the extent to which a measure accurately predicts performances on some subsequently observable activity (the criterion); for example, how highly a written driving-test score correlates with people’s actual skills in operating an automobile. A measure’s concurrent validity is assessed by its ability to discriminate between persons with and without the criterion. A measure’s predictive validity is demonstrated by its accuracy in forecasting future behavior. Content validity: the extent to which a measure adequately represents the defined domain of interest that it was designed to measure; for example, a mathematical ability test should cover the full range of students’ mathematical knowledge.
  • 11. 11 Classical Test Theory Classical test theory depicts the observed score (X) of respondent i on a measuring instrument, such as a test battery or survey item, as arising from two hypothetical unobservable sources: the respondent’s “true score” and an error component: A person’s true score is the average that would be obtained across infinitely repeated measures of X. In the theoretical definition of random error, the distribution of error forms a normal distribution around a mean value of zero. Because the ± error deviations around the true score cancel one another, the expected value (mean) of the errors is zero and the expected value of the observed scores equals the respondent i’s true score: iTiXE µ=)( Further, the error term is assumed to be uncorrelated with its true score (which makes sense if the errors are really random). Hence, both components make unique contributions to the variances of the observed scores in a population: 222 εσσσ += TX That is, the observed score variance is the sum of the true score variance plus error variance. For good measures, the error variance is small relative to the observed variance; poor measures have the opposite pattern. Truei Xi Errori Xi = Ti + ei
  • 12. 12 The reliability of X is defined as the ratio between true score and observed score variances (“rho” here is not the same as the Pearsonian correlation): 22 2 2 2 eT T X T X σσ σ σ σ ρ + == Note that reliability ranges between 0 (when the true score variance is zero) and 1 (when the error variance is zero). Values between these extremes reflect the relative proportions of error and true score variation in the measure of X. Rearranging the definition of reliability reveals that the true score variance equals the observed score variance times the reliability: 22 XXT σρσ = Hence, we can estimate the unobserved true score variance from a measure’s reliability and its observed variance. Finally, reliability can also be expressed using the error term*: 2 2 1 X e X σ σ ρ −= This formula again demonstrates that reliability ranges between 0 and 1: if the entire observed variance is error, ρX = 0; but if no random error exists, then ρX = 1.
  • 13. 13 _______________________________________________________________ * The derivation of the error term above is: 2 2 2 22 2 2 22 1 X e X X eX X T X XXT σ σ ρ σ σσ σ σ ρ σρσ −= − == = _______________________________________________________________ Parallel Measures If we had a second measure of the same unobservable construct that differed from the first indicator only in their errors (the true scores are equal), we would have two parallel measures where: Assuming that the population variances of their error terms are equal, then a measure’s reliability is the correlation of the parallel forms. The proof: _______________________________________________________________ 1. The correlation coefficient for two variables is defined as the ratio of the covariance to the product of standard deviations: 21 21 21 XX XX XXr σσ σ = X1i = Ti + e1i X2i = Ti + e2i
  • 14. 14 2. In the numerator, substitute the two variables’ true and error scores and multiply the subscripts: 2 2 )()( 2121 2121 T eeTeTeT eTeTXX σ σσσσ σσ = +++= = ++ The three terms on the right are zero because the error terms and true scores are uncorrelated. 3. Because the standard deviations of parallel measures are equal, the denominator simplifies to: 2 21 XXX σσσ = 4. Hence, by substituting this term into the denominator in step 1, the correlation coefficient for parallel measures becomes: 2 2 21 X T XXr σ σ = 5. We previously defined the right-side expression in step 4 as the reliability; therefore: XXXr ρ=21 _______________________________________________________________ An important consequence of this identity is that the true score’s variance can be estimated as the product of just two empirical measures, the correlation coefficient and the variance. Rearranging step 4 above: 22 21 XXXT σρσ =
  • 15. 15 The correlation between the true score and an observed variable equals the square root of the reliability, which is also the square root of the correlation between two parallel measures: 211 XXXTX ρρρ == This equation shows that the correlation between an observable indicator and the unobservable true score it measures can be estimated as the square root of the reliability of indicator X. For example, if X has reliability = 0.64, then the correlation with its true score = 0.80. What is the reliability of X if its true-score correlation = 0.81? The measurement theory principles discussed in this section are incorporated into structural equation models, which I introduce next through the confirmatory factor analytic approach to modeling the relationships between observed indicators and latent constructs.
  • 16. 16 FACTOR ANALYSIS Factor analysis refers to a family of statistical methods that represent the relationships among a set of observed variables in terms of an hypothesized smaller number of latent constructs, or common factors. The common factors are assumed to generate the observed variables’ covariations (or correlations, if all measures are standardized with zero means and unit variances). For example, respondents’ observed scores on several mental ability tests (e.g., IQ, SAT, GRE exams) allegedly result from unobserved common verbal and quantitative factors. Or covariations among numerous socioeconomic indicators of urban communities depend on latent industrialization, health, and welfare factors. Of the two major classes of factor analysis, exploratory and confirmatory, we limit our discussion to the latter. In confirmatory factor analysis (CFA) a researcher posits an a priori theoretical measurement model to describe or explain the relationship between the underlying unobserved constructs (“factors”) and the empirical measures. Then, the analyst uses statistical fit criteria to assess the degree to which the sample data are consistent with the posited model; that is, to ask whether the results confirm the hypothesized model? In practice, however, researchers seldom conduct only one test of a confirmatory factor model. Rather, based on initial estimates, they typically alter some model specifications and re-analyze the new model, trying to improve its fit to the data. Hence, most applications of CFA to investigate latent factors involve successive modeling attempts. We apply this successive model-fitting strategy in estimating alternative models to explain the empirical relationships among a set of observed variables.
  • 17. 17 Researchers use confirmatory factor analysis to estimate the parameters of a measurement model. Consider this diagram showing a single latent factor measured by four empirical variables. where: F = latent common factor Xi = observed variable i (indicator) ei = unobserved “error” source (unique factor) for variable Xi bi = “factor loading” effect of common factor F on observed variable Xi di = effect of unique factor ei on observed variable Xi This diagram implies that, if the latent variable were observed, it would produce values of the indicators. Each observed score is a linear combination of this common factor plus a unique error term. We can see these relationships clearly by writing the four implied measurement equations, which closely resemble the classical test theory equation: X1 = b1 F + d1 e1 X2 = b2 F + d2 e2 X3 = b3 F + d3 e3 X4 = b4 F + d4 e4 F X1 X2 X3 X4 e1 e4e3e2 b4b3b2 d4 b1 d3d2d1
  • 18. 18 Note the non-coincidental similarity for these factor analytic equations to classical test theory’s representation of an observed score as a sum of a true score and an error term. The diagram above shows that the error terms are uncorrelated with the factors and among themselves. Hence, the only sources of indicator i’s variance are the common factor F and the indicator’s unique error term: 2222 ii FiX εσβσ Θ+= where 2 iεΘ signifies the variance of the error in Xi. Because F is unobserved, its variance is unknown. And because it is unknown, we can assume it is a standardized variable, which means that its variance = 1.0. Therefore, 222 ii iX εβσ Θ+= Note that this formulation closely resembles the classical test theory equation in which the variance of a measure equals the sum of two components -- the true score variance plus the error variance. Next note that if we standardize Xi, then the sum of these two components must equal 1.0. A CFA model has another similarity to the classical test theory. The reliability of indicator Xi is defined as the squared correlation between a factor and an indicator. This value is the proportion of variation in Xi that is statistically “explained” by the common factor (the “true score” in classical test theory) that it purports to measure. 22 iFXX ii βρρ == Hence, item reliability equals the square of its factor loading.
  • 19. 19 Finally, the covariance between two indicators in a single-factor model is the expected value of the product of their two factor loadings: )])([( 221121 εβεβσ ++= FFEXX which, because the error terms are uncorrelated with the factor and with each other, simplifies to: 21 2 2121 ββσββσ == FXX When all variances standardized, this relationship further simplifies to: 2121 ZZZZ ρσ = That is, the correlation of a pair of observed variables loading on a common factor is the product of their standardized factor loadings. What are the reliabilities of each indicator X, the error variances, and the expected correlations between every pair of X’s for this single-factor model: F X1 X2 X3 X4 .6.9.7.8
  • 20. 20 Factor Analysis of GSS Job Values To illustrate LISREL procedures for estimating a single-factor model, I use responses in the 1998 General Social Survey to seven questions about the importance of particular job values: On the following list there are various aspects of jobs. Please circle one number to show how important you personally consider it is in a job: • SECJOB: Job security? • HHINC: High income? • PROMOTN: Good opportunities for advancement? • INTJOB: An interesting job? • WRKINDP: A job that allows someone to work independently? • HLPOTHS: A job that allows someone to help other people? • HLPSOC: A job that is useful to society? The response categories ranged from “Very important” = 1 to “Not important at all” = 5. Here are SPSS recode commands that reverse-code those values, then write out a data file to be subsequently read by the PRELIS program and create a covariance matrix for input into LISREL: RECODE secjob hiinc promotn intjob wrkindp hlpoths hlpsoc (1=5)(2=4)(3=3)(4=2)(5=1)(ELSE = -999). WRITE OUTFILE = JOBVALS.TXT / secjob hiinc promotn intjob wrkindp hlpoths hlpsoc (7F5.0). FREQUENCIES VAR=secjob hiinc promotn intjob wrkindp hlpoths hlpsoc. By including (ELSE = -999), SPSS changes the three missing value codes on each variable to -999. The WRITE OUTFILE command stores the seven recoded variables as a fixed-format ASCII file (JOBVALS.TXT). The format (7F5.0) creates five-column fields to contain the new variable values, allowing at least one space separation between each score. Here are a few lines from the JOBVALS.TXT file: 4 3 4 4 4 4 4 5 5 4 4 4 4 4 4 4 4 5 4 4 4 -999 -999 -999 -999 -999 -999 -999 -999 -999 -999 -999 -999 -999 -999 4 4 3 4 3 4 4 5 4 4 4 4 3 3
  • 21. 21 PRELIS to Create a Matrix The ultimate data analyzed by LISREL 8.54 comprise a matrix of covariations or product-moment (Pearson) correlations among the indicator variables. PRELIS 2.30 in the LISREL program can set up a matrix from data that are either entered interactively or imported from an SPSS.SAV file. This section shows another option, where these PRELIS commands are saved in an ASCII text file called JOBVALS.PR2: PRELIS FOR JOB VALUES (SAVED IN JOBVALS.PR2) DATA NI=8 NO=2832 MI=-999 TR=LI RAW FI=JOBVALS.TXT FO (7F5.0) LABELS SECJOB HIINC PROMOTN INTJOB WRKINDP HLPOTHS HLPSOC CONTINUOUS SECJOB HIINC PROMOTN INTJOB WRKINDP HLPOTHS HLPSOC OUTPUT MATRIX=CM SM=JOBVALS.MAT To run this job, launch LISREL 8.54, click “File” on the upper left toolbar, then “Open”. In the window select JOBVALS.PR2, then click “Open.” Again click “File” on the upper left toolbar, then “Run PRELIS” to execute. The printout will be stored in a file named JOBVALS.OUT. NOTES: The first line is an optional title; I included its own file name DATA is the input data description, where: NI = number of observed indicators (variables in the datafile) NO = number of observations (total number of cases) MI = missing value codes (if more than one, separated with commas) TR = LI indicates listwise deletion: calculations are based only on cases with no missing values on any variable. TR=PA is pairwise deletion: for each pair of variables, computations are based on all cases with nonmissing values on both variables. RAW specifies the external file where the “raw” data are stored: FI=JOBVALS.TXT. The FO option indicates that the format will appear on the next line. If FO is omitted, the format is either the first line of the external raw data file, or the data are stored in free format (separated by spaces, commas, or return characters). (7F5.0) indicates the case records in the external file format consist of seven 5-column fields with no decimals.
  • 22. 22 LABELS command assigns the sequence of names on the next line to the NI variables; maximum label length is eight characters CONTINUOUS defines the listed variables as interval-level measures. By default, PRELIS2 treat variables with fewer than 16 values as ordinal. OUTPUT command where: MATRIX = CM computes a covariance matrix MATRIX = KM computes a correlation matrix SM = FILENAME for storing the matrix to be read into LISREL The covariance matrix below (edited from the output) was computed using listwise data from 1,129 respondents: SECJOB HIINC PROMOTN INTJOB WRKINDP HLPOTHS HLPSOC -------- -------- -------- -------- -------- -------- -------- SECJOB 0.442 HIINC 0.167 0.594 PROMOTN 0.194 0.270 0.514 INTJOB 0.119 0.132 0.207 0.399 WRKINDP 0.068 0.139 0.178 0.213 0.616 HLPOTHS 0.068 0.062 0.159 0.170 0.244 0.596 HLPSOC 0.098 0.065 0.146 0.178 0.214 0.430 0.649 -------- -------- -------- -------- -------- -------- -------- Means 4.508 3.982 4.233 4.461 4.051 4.069 4.060 Std Devs 0.665 0.770 0.717 0.632 0.785 0.772 0.806 SIMPLIS Commands In the earliest versions of LISREL, analysts had to specify the full set of eight parameter matrices, indicating which coefficients were constrained to zero and which were free to vary. A major benefit of this approach was to force researchers to think very carefully and completely about their models. However, the opportunities for errors were numerous and frustrating to the learning process. LISREL 8 introduced the SIMPLIS (SIMPlified LISREL) command language that avoids the necessity to completely specify the parameter matrices. It undoubtedly speeds the model-testing process and reduces trial-and-error learning. These Notes present a variety of examples using SIMPLIS commands.
  • 23. 23 CONFIRMATORY FACTOR ANALYSIS Methodologists usually describe factor analysis with LISREL as confirmatory factor analysis (CFA) because the researcher formulates an a priori theoretical model to describe or explain the empirical data. Then, statistical analyses determine whether the sample data are consistent with the imposed model; that is, do the results confirm the substantively generated model? In practice, however, researchers seldom conduct only one test of a factor model. Rather, based on the initial parameter estimates, they typically alter some specifications and re-analyze the new model, trying to improve its fit to the data. Hence, most applications of LISREL to investigate latent factors are mixtures of exploratory and confirmatory procedures. During my initial analyses of GSS job values, I discovered that a single factor could not account for the observed covariances among the seven items. So, to demonstrate how LISREL estimates a single-factor model, I concentrate on the relations among the first four GSS indicators (SECJOB, HIINC, PROMOTN, INTJOB), most of which appear to emphasize “extrinsic” job rewards based on such external benefits as money, promotions, and job security. Here’s the SIMPLIS command file, saved in file LISJV1.LS8: Single Factor LISREL Model with 4 Job Indicators (LISJV1.LS8) Observed Variables: SECJOB HIINC PROMOTN INTJOB WRKINDP HLPOTHS HLPSOC Covariance Matrix From File JOBVALS.MAT Sample Size = 1129 Latent Variables: Jobvalue Relationships: PROMOTN = 1*Jobvalue SECJOB HIINC INTJOB = Jobvalue Path Diagram LISREL Output: SC MI End of Problem To run this job, launch LISREL 8.54, click “File” on the upper left toolbar, then “Open”. In the window select LISJV1.LS8, then click “Open.” Again click “File” on the upper left toolbar, then “Run LISREL” to execute. The printout will be stored in a file named LISJV1.OUT, and a path diagram in LISJV1.PTH.
  • 24. 24 NOTES: The first line is the job title The Observed variables line lists all seven GSS variable names in their exact order of occurrence in the covariance matrix previously created by PRELIS. Covariance Matrix identifies the JOBVALS.MAT file where PRELIS stored that covariance matrix. Sample Size reports the number of observations used to compute the covariances. (The listwise deletion in PRELIS found 1,129 cases with no missing data on all seven variables.) Latent Variables provides a name for the single unobserved factor. Relationships is followed by a specification for the factor loadings to be estimated. The observed variables’ names appear to the left hand side of an equal sign and the factor name on the right hand side. SCALING LATENT CONSTRUCTS: A latent construct is unobserved and hence has no definite scale; that is, its origin and unit of measurement are arbitrary. A researcher usually fixes the origin by assuming a latent construct to have zero mean; LISREL automatically does this unless otherwise instructed. The unit of measurement can be scaled one of two ways: (1) Assume that a latent construct is standardized to have a variance = 1.00; this is the LISREL default option. (2) Assign a unit measure to the construct by fixing the factor loading for one indicator to a nonzero value (typically = 1.00). This method defines the latent construct scale in terms of an observed reference variable, usually an indicator that the researcher believes best represents the factor. I used the second procedure for scaling constructs in this course. I chose PROMOTN as the reference indicator for the unobserved “Jobvalue” factor (based on preliminary analyses showing it to have the highest factor loading). LISREL Output: SC MI requests a completely standardized solution and modification indices. End of Problem signals the termination of the model specification.
  • 25. 25 After five iterations, LISREL produced these maximum likelihood estimates (MLE) of the parameters, with standard errors in parentheses, and t-ratios in the third rows: LISREL Estimates (Maximum Likelihood) Measurement Equations SECJOB = 0.56*Jobvalue, Errorvar.= 0.33 , R² = 0.25 (0.042) (0.016) 13.41 21.03 HIINC = 0.75*Jobvalue, Errorvar.= 0.39 , R² = 0.34 (0.051) (0.020) 14.69 19.33 PROMOTN = 1.00*Jobvalue, Errorvar.= 0.15 , R² = 0.70 (0.021) 7.52 INTJOB = 0.56*Jobvalue, Errorvar.= 0.28 , R² = 0.29 (0.040) (0.014) 13.99 20.44 The loading for the reference indicator, PROMOTN, was fixed to 1.00, so its coefficient doesn’t have a standard error or t-test. This observed variable has the largest proportion of variance explained by “Jobvalue” (70 percent) suggesting that it was a good choice for fixing the latent construct’s scale. The other three observed variables all have highly significant loadings on the latent factor. But, each parameter estimate is smaller than the fixed value for the reference indicator and their R-squares are also much smaller. LISREL also draws a diagram corresponding to the model. To save it, click “File” on the top toolbar, then “Export As Gif file (.gif)”. I inserted it on the next page, and cropped the excess borders with MS Word’s “Format/Picture” options.
  • 26. 26 The factor loadings are based on the covariances among the four indicators, which are measured in the original 5-point scales. A completely standardized factor solution rescales both the latent factor(s) and all observed variables to have standard deviations equal to one (i.e., transformed to Z-scores). Therefore, all variances also equal one. This rescaling produces parameter estimates that are proportional to the MLE parameters. If no indicator’s scale is fixed to 1.00, then LISREL will automatically set the latent construct’s variance = 1.00. (See the Jobvalue for PHI in the output below.) In that case, all corresponding MLE and SC parameters will be identical; why? [HINT: What is the relationship between correlation and covariance?] Here’s the completely standardized solution, which appears at the end of the output: Completely Standardized Solution LAMBDA-X Jobvalue -------- SECJOB 0.50 HIINC 0.58 PROMOTN 0.84 INTJOB 0.54 PHI
  • 27. 27 Jobvalue -------- 1.00 THETA-DELTA SECJOB HIINC PROMOTN INTJOB -------- -------- -------- -------- 0.75 0.66 0.30 0.71 These standardized factor loadings clearly reveal that PROMOTN has the strongest relationship with the “Jobvalue” construct. Hence, PROMOTN is that factor’s most reliable indicator: (.84)2 = 0.71. SECJOB and INTJOB have the lowest factor loadings; what are their reliabilities? The variance of Jobvalue equals 1.00 (reported in PHI), consistent with a standardized solution. The completely standardized solution reveals that the squared factor loadings and squared error terms (reported in THETA-DELTA) jointly account for all the variation in each indicator, as required in classical test theory. That is, the sum of a squared factor loading plus its squared error term = 1.00. For example, SECJOB = (0.50)2 + (0.75) = 0.25 + 0.75 = 1.00. What are these sums for the other three indicators? To view a model diagram with the standardized coefficients, click “View” on the top toolbar, then “Estimations” and “Standardized Solutions.”
  • 28. 28
  • 29. 29 In the figure below, I converted the LISREL error term values into path coefficients. To measure all effects in a standard-deviation metric, take the positive square root of each LISREL error: Now the sum of the squared factor loading plus the squared error equals 1.0 for each indicator: PROMOTN = (0.55)2 + (0.84)2 = 0.30 + 0.70 = 1.00 Calculate the expected correlations by multiplying pairs of factor loadings; calculate the differences by subtracting the observed correlations: Observed Variables Expected r Observed r Difference SECJOB-HIINC (.50)(.58) = 0.290 0.326 SECJOB-PROMOTN (.50)(.84) = 0.408 SECJOB-INTJOB (.50)(.54) = 0.284 HIINC-PROMOTN (.58)(.84) = 0.489 HIINC-INTJOB (.58)(.54) = 0.272 PROMOTN-INTJOB (.84)(.54) = 0.458 The discrepancies are fairly small, implying that the unobserved Jobvalue factor accounts reasonably well for most correlations among its four indicators. We can more precisely assess how well a model fits the data with several goodness of fit statistics generated by LISREL. INTJOB .54.84.58.50 .84 Jobvalue PROMOTNHIINCSECJOB .55.81.87
  • 30. 30 MODEL FIT STATISTICS As with logistic regression, not only the individual parameters but an entire LISREL model’s fit to the data can be assessed statistically. A specific structural equation model implies an expected covariance matrix (or a correlation matrix) for the k observed variables, Σ(θ), where θ is a vector of parameters to be estimated. PRELIS uses the N sample cases to create the observed covariance matrix, S, which LISREL then uses to estimate the expected model parameters. LISREL fits the analyst’s hypothesized model to the empirical data by minimizing a fit function F involving both matrices. In matrix algebra notation, this function is: F[S, Σ(θ)] = ln |Σ| + tr (SΣ-1 ) - ln |S| + t where t is the number of independent parameters estimated, and tr means “trace” – the sum of the elements in a matrix diagonal. The F function is non-negative and is zero only if a perfect fit occurs, that is, if S = Σ. For a large sample N, multiplying F[S, Σ(θ)] by (N-1) yields a test statistic that is approximately distributed as a χ2 with degrees of freedom equal to: d = [k(k-1)/2] - t where k is the number of observed indicators. Because d must be nonnegative, the number of independent parameters to be estimated (t) cannot be more than (k2 -k)/2. For example, if k = 5 indicators, what is the maximum number of parameters that LISREL could estimate? A researcher’s strategy for finding a best-fitting LISREL model involves using the fit function to conduct chi-square tests on a series of nested models with successive parameter constraints. Ideally, poorer-fitting models will be rejected in favorable of alternative models yielding improved fits to the data. The ultimate goal is to specify a best-fitting LISREL that cannot be rejected, indicating that the hypothesized model’s covariances matrix closely approximates the observed covariance matrix. (This strategy is opposite to the conventional chi-square testing approach for a crosstab, where the goal is to reject the null hypothesis of independence between variables.) To use the minimum fit function in chi-square tests, a researcher chooses an α level of significance at which to reject an hypothesized model; for example, by setting α = .05. If the model χ2 exceeds the (1 - α) percentile of
  • 31. 31 the chi-square distribution with d degrees of freedom, then that model must be rejected as producing a poor fit to the observed variance- covariance matrix. For example, a model is a poor fit if p < .05; but if p > .05, then the model would have a fit acceptable to a researcher setting the region of rejection at α = .05. In practice, a researcher who wants to find an acceptable latent structure model (i.e., not seeking to reject the model) hopes to obtain a low chi- square value relative to the degrees of freedom. Because the minimum fit function χ2 test statistic increases proportional to sample size, (N-1), obtaining low chi-square values with large samples often proves difficult. Many analysts come to regard chi-square more usefully as an overall of “goodness-of-fit” measure rather than as a test statistic. That is, χ2 is measures the distance (difference) between the sample covariance matrix and the expected covariance matrix, (S- Σ). Jöreskog and Sörbom half- jokingly refer to χ2 as a “badness of fit” measure in the sense that large chi-square corresponds to a bad fit and low chi-square to a good fit. Zero χ2 is a “perfect” fit. LISREL prints several goodness-of-fit measures that are functions of chi- square. Two measures that do not depend explicitly on sample size measure how much better the specified model fits the data, compared to no model at all. Both indices range between 0 and 1, with values closer to 1 indicating a better fit of model to data. Most researchers seek values of 0.95 or higher. The goodness-of-fit index (GFI) is: )]0(,[ )](,[ Σ Σ = SF SF GFI θ where the numerator is the minimum of the hypothesized model’s fit function and the denominator is the fit function for a model whose parameters all equal zero ( the null hypothesis model). (This latter model is conceptually equivalent to the “constant only” equation in logistic regression or EHA, used to calculate the model chi-square value for a hypothesized equation.) The adjusted goodness-of-fit index (AGFI) deflates the GFI by taking into account the degrees of freedom consumed in estimating the parameters:
  • 32. 32 )1( 2 )1( 1 GFI d kk AGFI − + −= where k = the number of observed indicators and d is the model df. Using chi-square as a test statistic assumes that the model holds exactly in the population, an implausible assumption. Models that hold approximately in the population will be rejected for large samples. An alternative approach takes into account the errors of approximation in the population and the precision of the fit measure. The estimated population discrepancy function (PDF) is defined as: }0)),1/((ˆ{ˆ 0 −−= NdFMaxF where Fˆ = the minimum value of the fit function and d = degrees of freedom. Because the PDF usually decreases as additional parameters are added to the model, the Root Mean Square Error of Approximation (RMSEA) measures the discrepancy per degree of freedom: dF /ˆ 0=ε A RMSEA value of ε ≤ 0.05 indicates a “close fit”, while values up to 0.08 indicate “reasonable” errors of approximation in the population. A 90- percent confidence interval for RMSEA indicates whether the sample point estimate falls into a range that also includes the 0.05 criterion.
  • 33. 33 Here are all the Goodness of Fit Statistics for the one-factor Jobvalue model. Can you conclude that the one-factor model is a good fit? Goodness of Fit Statistics Degrees of Freedom = 2 Minimum Fit Function Chi-Square = 8.31 (P = 0.016) Normal Theory Weighted Least Squares Chi-Square = 8.11 (P = 0.017) Estimated Non-centrality Parameter (NCP) = 6.11 90 Percent Confidence Interval for NCP = (0.78 ; 18.91) Minimum Fit Function Value = 0.0074 Population Discrepancy Function Value (F0) = 0.0054 90 Percent Confidence Interval for F0 = (0.00069 ; 0.017) Root Mean Square Error of Approximation (RMSEA) = 0.052 90 Percent Confidence Interval for RMSEA = (0.019 ; 0.092) P-Value for Test of Close Fit (RMSEA < 0.05) = 0.39 Expected Cross-Validation Index (ECVI) = 0.021 90 Percent Confidence Interval for ECVI = (0.017 ; 0.033) ECVI for Saturated Model = 0.018 ECVI for Independence Model = 0.88 Chi-Square for Independence Model with 6 Degrees of Freedom = 987.81 Independence AIC = 995.81 Model AIC = 24.11 Saturated AIC = 20.00 Independence CAIC = 1019.92 Model CAIC = 72.34 Saturated CAIC = 80.29 Normed Fit Index (NFI) = 0.99 Non-Normed Fit Index (NNFI) = 0.98 Parsimony Normed Fit Index (PNFI) = 0.33 Comparative Fit Index (CFI) = 0.99 Incremental Fit Index (IFI) = 0.99 Relative Fit Index (RFI) = 0.97 Critical N (CN) = 1251.05 Root Mean Square Residual (RMR) = 0.0087 Standardized RMR = 0.018 Goodness of Fit Index (GFI) = 1.00 Adjusted Goodness of Fit Index (AGFI) = 0.98 Parsimony Goodness of Fit Index (PGFI) = 0.20 Although the chi-square just fails to exceed the p > .05 level of significance, the other fit statistics suggest a quite acceptable fit of the single-factor model to the data.
  • 34. 34 A TWO-FACTOR MODEL My initial effort to fit a single-factor model using all seven GSS job items produced a terrible fit: Chi-Square = 828.1, df = 14, p <.000; GFI = 0.83; AGFI = 0.65; and RMSEA = 0.23. However, the underlying attitude structure may consist of two intercorrelated latent factors, each of which influences the variation in different subsets of observed variables. Such a specification could resemble this diagram: After trying several alternative model specifications, I discovered that two latent factors could plausibly account for the covariations among five of the eight indicators: (1) a subset consisting of SECJOB HIINC PROMOTN; and (2) another subset of HLPOTHS HLPSOC, which may reflect “intrinsic” job rewards from helping others or accomplishing something worthwhile. The new LISREL commands are: Latent Variables: Jobval1 Jobval2 Relationships: PROMOTN = 1*Jobval1 SECJOB HIINC = Jobval1 HLPOTHS = 1*Jobval2 HLPSOC = Jobval2 Jobvalue1 X5 Jobvalue2 X1 X6X2 X7 X8X3 X4
  • 35. 35 Latent Variables assigns two distinct construct names, while Relationships specifies the pair of reference indicators and identifies the other variables’ factor loadings. Here’s a diagram of that model specification: The overall fit statistics indicate much better fit: Chi-Square = 27.7, df = 4, p = .00; GFI = 0.99; AGFI = 0.96; and RMSEA = 0.073 (a “reasonable fit,” with the 90% confidence interval from 0.049 to 0.099). Given a sample size of more than a thousand, I would be tempted to stop trying to improve the fit. However, I want to demonstrate how to use LISREL’s modification indexes for clues about altering a model’s specification to fit the data better. Modification Indexes LISREL’s modification indexes are powerful diagnostic tools for identifying which parameters might be added to a model (that is, set free rather than constrained to equal 0). By adding “MI” to the “LISREL Output:” line, modification values will be generated for every missing parameter. These values are predictions about the decrease in model Chi-square that will occur if a particular parameter were added to the model. Here are two sets of MIs for the two-factor model above: Jobval1 Jobval2 SECJOB HLPOTHS HLPSOHIINC PROMOTN
  • 36. 36 Modification Indices and Expected Change Modification Indices for LAMBDA-X Jobval1 Jobval2 -------- -------- SECJOB - - 0.03 HIINC - - 15.63 PROMOTN - - 11.71 HLPOTHS - - - - HLPSOC - - - - Modification Indices for THETA-DELTA SECJOB HIINC PROMOTN HLPOTHS HLPSOC -------- -------- -------- -------- -------- SECJOB - - HIINC 11.71 - - PROMOTN 15.63 0.03 - - HLPOTHS 7.63 4.45 13.81 - - HLPSOC 9.55 0.65 2.19 - - - - The MI for LAMBDA-X indicates that adding an arrow from Jobval2 to HIINC should reduce Chi-square by 15.63. Similarly, three MI in THETA-DELTA, indicate that correlating pairs of errors would improve Chi-square by more than 10.00. Because I wanted to allow each indicator to load on just one factor, I chose the latter respecification. Correlating two error terms will use one of the four degrees of freedom, but should produce a much better fit. Although the PROMOTN-SECJOB pair has the largest value, they are indicators of the same unmeasured construct. My attempt to correlate them produced some unusual estimates, so instead I correlated errors of the PROMOTON-HLPOTHS indicators. Because LISREL computes the MIs independently of one another, you generally should make only one parameter change at a time. Then, use the new MI results to decide which further changes to try. Inserted before the last line, the SIMPLIS command to correlate the errors of two indicators closely resembles natural language: Let the Errors of SECJOB and PROMOTN Correlate End of Problem
  • 37. 37 This new model’s fit statistics are better: Chi-Square = 13.9, df = 3, p < .003; GFI = 1.00; AGFI = 0.98; RMSEA = 0.057 (“reasonable fit”), with 90% CI from 0.029 to 0.089. So let’s examine the diagram with the completely standardized values attached to the unconstrained parameters: The small but significant correlated errors of HLPOTHS and PROMOTN (0.08) suggest that their covariation arises from an additional unspecified common source. PROMOTN is now clearly has the highest factor loading on Jobval1 (and thus the highest reliability), while HLPSOC is the most reliable Jobval2 indicator. My substantive interpretation is that the second factor represents an “intrinsic rewards” dimension, in contrast to the “extrinsic rewards” dimension of the first factor. The two latent factors correlate moderately (0.32), indicating that respondents who report extrinsic job rewards as important to them also tend to view intrinsic values as important. What substantive interpretation might you venture about the correlated errors?
  • 38. 38 A MIMIC MODEL LISREL also can be used to estimate several regression-like structural equation models, in which one or more dependent variables are predicted by several independent variables. Some or all of these variables may be latent constructs with two or more observed indicators. Structural equation models combine two conceptually distinct levels of analysis -- a measurement level and a structural level. In parallel to confirmatory factor analysis, the parameter estimates at the measurement level show how well (or poorly) the observed variables serve as indicators of the unobserved theoretical concepts. Parameters at the structural level show the magnitudes and significance of the hypothesized relations among the latent concepts. And, again in common with factor analysis, the various goodness-of-fit statistics reveal how well the combined measurement and structural equation models reproduce the matrix of covariances among the indicators. Our first example of a structural equation model is a Multiple Indicator- Multiple Cause (MIMIC) model. This model’s relationships involve a latent dependent variable, indicated by several observed measures, that is predicted by a set of exogenous or predetermined variables, each of which has just one indicator (see SSDA pp. 475-8). These predictors can be termed “directly observed variables.” More complex models discussed below have multiple indicators for both independent and dependent variables. The MIMIC example involves four indicators of attitudes towards the federal government’s role in solving social problems, using 1998 GSS. Each observed variable is measured on a five-point scale where: “I strongly agree with [the governmental involvement position]” is 1, “I strongly agree with [the individualist position]” is 5 and “I agree with both answers” is 3. The four item wordings: • HELPPOOR: I'd like to talk with you about issues some people tell us are important. Please look at CARD AT. Some people think that the government in Washington should do everything possible to improve the standard of living of all poor Americans; they are at Point 1 on this card. Other people think it is not the government's responsibility, and that each person should take care of himself; they are at Point 5. • HELPNOT: Now look at CARD AU. Some people think that the government in Washington is trying to do too many things that should be left to individuals and private
  • 39. 39 businesses. Others disagree and think that the government should do even more to solve our country's problems. Still others have opinions somewhere in between. • HELPSICK: Look at CARD AV. In general, some people think that it is the responsibility of the government in Washington to see to it that people have help in paying for doctors and hospital bills. Others think that these matters are not the responsibility of the federal government and that people should take care of these things themselves. • HELPBLK: Now look at CARD AW. Some people think that (Blacks/Negroes/African- Americans) have been discriminated against for so long that the government has a special obligation to help improve their living standards. Others believe that the government should not be giving special treatment to (Blacks/Negroes/African-Americans). The four single-indicator independent variables are AGE, POLVIEWS, EDUC and WHITE (a 1-0 dichotomy from recoding RACE(1=1)(2,3=0)). Here’s a diagram of the model specification to be estimated: Arrows from the latent construct (Help) to the four indicators are the measurement level of analysis, while the arrows to Help coming directly from the four independent variables occur at the structural level. Help HELPPOORAGE POLVIEWS EDUC WHITE HELPNOT HELPSICK HELPBLK
  • 40. 40 After recoding missing values to –999 and writing a raw data file (HELP.TXT), I used the following PRELIS commands to create the covariance matrix for input to LISREL: PRELIS FOR GOVERNMENT HELP (SAVED IN HELP.PR2) DATA NI=8 NO=2832 MI=-999 TR=LI RAW FI=HELP.TXT FO (8F5.0) LABELS HELPBLK HELPNOT HELPPOOR HELPSICK EDUC AGE POLVIEWS WHITE CONTINUOUS HELPBLK HELPNOT HELPPOOR HELPSICK EDUC AGE POLVIEWS ORDINAL WHITE OUTPUT MATRIX=CM SM=HELP.MAT Note designation of WHITE as an ordinal variable. PRELIS distinguishes only three levels of measurement – continuous, ordinal, and “censored” – so any measures that you are unwilling to consider as continuous (including dummy variables) should be labeled as ordinal. PRELIS will compute: (1) Pearson product-moment correlations for pairs of continuous variables; (2) polyserial correlations for a ordinal-continuous pair; and (3) polychoric correlations for pairs of ordinal variables.* More below. The covariance matrix (HELP.MAT), based on a pairwise average of about 1,654 cases: HELPBLK HELPNOT HELPPOOR HELPSICK EDUC AGE POLVIEWS WHITE -------- -------- -------- -------- ------- ------- -------- ------- HELPBLK 1.439 HELPNOT 0.519 1.378 HELPPOOR 0.521 0.609 1.300 HELPSICK 0.448 0.519 0.577 1.451 EDUC -0.168 0.319 0.403 0.115 8.133 AGE 1.080 1.956 1.214 2.176 -7.113 289.143 POLVIEWS 0.351 0.382 0.371 0.394 -0.159 1.904 1.917 WHITE 0.379 0.369 0.307 0.231 0.529 3.246 0.170 1.000 -------- -------- -------- -------- ------- ------- -------- ------- Means 3.566 3.164 3.051 2.548 13.262 45.505 4.139 0.826 Std Dev 1.199 1.174 1.140 1.204 2.852 17.004 1.384 1.000 -------- -------- -------- -------- ------- ------- -------- ------- ________________________________________________________________________ *For details, see Jöreskog and Sörbom. 1996. PRELIS2: User’s Reference Guide. Chicago: SSI Scientific Software International. Pp. 18-25. ________________________________________________________________________
  • 41. 41 The LISREL commands are similar to a factor analysis, except the latent dependent variable (Help) is regressed directly onto the four observed independent variables (EDUC AGE POLVIEWS WHITE): MIMIC MODEL FOR 4 HELP VARS (SAVED IN MIMIC.LS8) Observed variables: HELPBLK HELPNOT HELPPOOR HELPSICK EDUC AGE POLVIEWS WHITE Covariance Matrix From File: HELP.MAT Sample Size: 1654 Latent Variables: Help Relationships: HELPPOOR = 1*Help HELPNOT = Help HELPSICK = Help HELPBLK = Help Help = EDUC AGE POLVIEWS WHITE LISREL Output: SC MI End of Problem The minimum fit function chi-square test = 99.1 for 14 degrees of freedom, which is not a good fit; however, the large sample size makes model rejection relatively easy. Other statistics suggest a somewhat better fit: GFI = 0.99 and AGFI = 0.96. Moreover, the root mean square error of approximation (0.061) falls into the intermediate range of a “reasonable fit” (the 90% confidence interval for RMSEA is 0.050 to 0.072). A portion of the modification indices (MI): Modification Indices for THETA-DELTA-EPS HELPBLK HELPNOT HELPPOOR HELPSICK -------- -------- -------- -------- EDUC 48.29 3.31 16.33 0.02 AGE 8.78 1.06 0.22 5.37 POLVIEWS 0.00 0.11 0.86 2.22 WHITE 34.56 1.89 9.41 14.49 which implies that the error term of HELPBLK correlates significantly with both WHITE and EDUC and together may account for most of the model chi-square. To improve the model fit, I added these commands and re-ran the analysis: Let the Errors between WHITE and HELPBLK Correlate Let the Errors between EDUC and HELPBLK Correlate End of Problem The minimum fit function chi-square test now falls to 30.55 for 12 df, which has a probability = .002. But the other fit statistics attained very desirable
  • 42. 42 values: GFI = 1.00, AGFI = 0.99, and RMSEA = 0.030, a “close” fit (90% CI from 0.017 to 0.044). Here is completely standardized solution: At the measurement level, the indicators of the latent Help construct all have highly significant factor loadings (p<.001), and roughly equal magnitudes. At the structural level, three of the four path coefficients are highly significant (p<.001), according to t-tests in the LISREL output. The AGE effect (0.05) does not differ from zero at α= .05 for a two-tailed research hypothesis. POLVIEWS and WHITE have much stronger standardized effects (0.32 and 0.34 standard deviations) than EDUC (0.10) on the latent Help construct. The predictors jointly explain about 27 percent of the variation in the Help construct (R2 = .27). High scores on the four social problems variables mean that respondents prefer the individual or nongovernmental solutions to social problems. Therefore, the positive path coefficients reveal that conservatives, whites, and highly educated respondents were more likely to endorse such policy positions. The diagram does not show the two correlated error terms: THETA-DELTA-EPS
  • 43. 43 HELPBLK HELPNOT HELPPOOR HELPSICK -------- -------- -------- -------- EDUC -0.44 - - - - - - (0.08) -5.77 AGE - - - - - - - - POLVIEWS - - - - - - - - WHITE 0.11 - - - - - - (0.03) 4.83 The error term for HELPBLK has significantly correlations with the error terms for both EDUC (-0.44) and WHITE (0.11). This re-specification was estimated mainly to demonstrate a technique for improving a model’s statistical fit. However, I didn’t have a priori reasons for correlating these error terms, while post hoc interpretations – e.g., classist or racist stereotypical responses – are vulnerable to taking advantage of chance occurrences in the sample.
  • 44. 44 A CHAIN MODEL A widespread application of LISREL involves models in which several or all the latent constructs have multiple indicators. The simplest version is a chain model, involving one independent and one dependent variable. The example below modifies the preceding MIMIC model, by including a second political measure (PARTYID) with POLVIEWS as indicators of unobserved political ideology. The LISREL commands: CHAIN MODEL FOR POLITICS-HELP (SAVED IN CHAIN.LS8) Observed variables: HELPBLK HELPNOT HELPPOOR HELPSICK POLVIEWS PARTYID Covariance Matrix From File: CHAIN.MAT Sample Size: 1615 Latent Variables: Help Politic Relationships: HELPPOOR = 1*Help HELPNOT = Help HELPSICK = Help HELPBLK = Help POLVIEWS = Politic PARTYID = Politic Help = Politic LISREL Output: SC MI End of Problem In the Relationships commands, I did not specify that one indicator of the latent Politic construct should serve as the reference variable. Instead, LISREL automatically set the scale by standardizing the variance of Politic to 1.00. To estimate the structural-level regression coefficient, use a command involving the latent constructs: “Help = Politic”. The minimum fit function χ2 = 15.9 for 8 df (p = .044). Other fit statistics -- GFI = 1.00; AGFI = 0.99; RMSEA = 0.025 -- indicate a very “close” fit. The standard coefficients are all highly significant:
  • 45. 45 The structural parameter (0.63) indicates that a difference of one standard deviation in political ideology accounts for a three-fifths standard deviation difference in attitude towards the federal government’s role in solving social problems. The positive sign means that more conservative respondents favor more individualistic solutions. The value of the error term for the Help construct (from the output, but not shown in the diagram) is the square root of PSI: .77.060.0 = Show that the sum of the squared structural parameter plus its squared error term accounts for all the variance of Help. For each indicator in the diagram, show that the squared factor loading plus the (squared) error term accounts for all of that indicator’s variance: Indicator λ2 + θδ 2 = 1.00 HELPBLK 1.00 HELPNOT 1.00 HELPPOOR 1.00 HELPSICK 1.00 POLVIEWS 1.00 PARTYID 1.00 Equality Constraints on Parameters
  • 46. 46 The chain model above shows that the estimated parameters from the unobserved constructs to the indicators have roughly similar magnitudes (0.55 to 0.69). LISREL allows an explicit statistical test for the hypothesis that one parameter equals another in the population. That is: jiH ββ =:0 Hypothesizing that a pair of “paths” are equal (rather than free to take on independent values) requires that LISREL estimate just one parameter instead of two. As a result, one degree of freedom now becomes available to test whether the two models’ chi-square goodness-of-fit statistics differ at a chosen alpha-level. If no significant difference occurs in the free and constrained pair of parameters, then the more parsimonious version with equal parameters is accepted (i.e., the model with fewer free parameters). The LISREL commands for constraining paths equality have this syntax, inserted before the “End of the Problem” line: Set Path Help -> HELPSICK = Path Help -> HELPNOT To constrain multiple parameters, continue the command: Set Path Help -> HELPSICK = Path Help -> HELPNOT = Path Help -> HELPBLK Because the HELPPOOR indicator is already used as the reference variable for Help, I did not include it in the constraints
  • 47. 47 This table shows the chi-squares for alternative models constraining several combinations of parameters among three Help indicators: Equality constraints Model Chi-square Model df 1. No equality constraints (baseline model) 15.90 8 2. HELPSICK = HELPBLK 17.84 9 3. HELPNOT = HELPBLK 23.64 9 4. HELPSICK = HELPNOT 18.23 9 5. HELPSICK = HELPNOT = HELPBLK 23.66 10 In comparison to the initial model with no equality constraints (#1), specifying equal factor loadings doesn’t result in significantly worse fits to the data. For example, the difference in χ2 for model #2 versus model 1 is (17.84 – 15.90 = 1.94) for 1 df. The critical value at α = .05 is 3.84, so we cannot reject the null hypothesis that these two parameters are equal in the population. Perform a chi-square difference test for model #1 versus model #3 and decide whether the latter is the most parsimonious model.
  • 48. 48 COMPARING MODELS FOR GROUPS Researchers frequently want to compare structural equation models for multiple (sub)groups, such as women vs. men, whites vs. blacks, Republicans vs. Democrats. LISREL is a powerful tool for analyzing multiple samples simultaneously, with some or all parameters constrained to be equal across the groups. To illustrate this application, I compare the preceding chain model for a subset of white and black respondents (omitting the “other” race.) In a variation on PRELIS-LISREL procedures, these commands how to enter separate matrices of correlations and descriptive statistics into the command file, which LISREL will convert to the respective covariances: Group 1: CHAIN MODEL FOR POLITIC-HELP: WHITES Observed variables: PARTYID POLVIEWS HELPPOOR HELPNOT HELPSICK HELPBLK Correlation Matrix: 1.000 .419 1.000 .190 .231 1.000 .226 .258 .450 1.000 .246 .242 .443 .350 1.000 .187 .232 .367 .337 .278 1.000 Means: 3.00 4.15 3.15 3.28 2.61 3.71 Standard Deviations: 1.92 1.40 1.11 1.13 1.21 1.14 Sample Size: 1282 Latent Variables: Help Politic Relationships: HELPPOOR = 1*Help HELPNOT = Help HELPSICK = Help HELPBLK = Help POLVIEWS = Politic PARTYID = Politic Help = Politic Group 2: CHAIN MODEL FOR POLITIC-HELP: BLACKS Correlation Matrix: 1.000 .111 1.000 .008 .138 1.000 .137 -.009 .365 1.000 .082 .091 .328 .389 1.000 .153 .065 .299 .369 .335 1.000 Means: 1.43 3.93 2.46 2.62 2.16 2.73 Standard Deviations: 1.51 1.27 1.14 1.19 1.15 1.24 Sample Size: 260 LISREL Output: SC MI End of Problem My initial comparison involves setting exactly equal corresponding parameters across the two groups. Because the Relationships commands appearing in the white specification do not also appear for the black group,
  • 49. 49 LISREL assumes by default that all pairs of parameters must be constrained to equality. The minimum fit function χ2 = 94.88 for 29 df (p < .0001). Other fit statistics are GFI = 0.93 (AGFI is not calculated) and RMSEA = 0.052 (“reasonable”), 90% CI from 0.040 to 0.64. The structural coefficient for the regression of Help on Politic = 0.53 for both groups. My next model specification freed all the factor loadings and structural regression parameters, by repeating the same Relationships for the black group that appeared in the white group. In addition, I also allowed these indicators’ error terms to vary freely across groups, and for the error term of the Help construct to take on different values, by inserting these lines at the end of both subgroups’ commands: Set the Error Variances of HELPPOOR-HELPBLK Free Set the Error Variances of PARTYID-POLVIEWS Free Set the Error Variance of Help Free This completely free model has χ2 = 35.27 for 16 df (p < .0037); GFI = 0.99 and RMSEA = 0.038 (“close fit”), 90% CI from 0.020 to 0.056. Here are some completely standardized parameter estimates for the two racial groups: Parameters Blacks Whites HELPPOOR .68 .68 HELPNOT .84 .60 HELPSICK .70 .58 HELPBLK .74 .49 PARTYID .34 .63 POLVIEWS .25 .70 Politic -> Help .33 .57 Help Error Covar .56 .74 The factor loadings for the two political indicators are substantially smaller for blacks than for whites, perhaps reflecting the narrower span of black political ideology. The estimated magnitude of the structural effect of Politic on Help is much larger for whites (0.57) than for blacks (0.33). Finally, to determine whether any parameters may be constrained to equality without worsening the model fit, additional LISREL analyses can be conducted that deleted a single Relationship line from just one subgroup. For example, to test for racial equality of the structural effect, I deleted the “Help = Politic” line from the black subgroup, which forces
  • 50. 50 LISREL to replicate all the relationships in the second group except the deleted one. The result was a χ2 = 36.50 for 17 df, which does not differ significantly from the model above (i.e., the difference in chi-squares is just 1.23 for one df). In other words, the population regression coefficients are probably equal (both estimates = 0.56). The large standard errors due to the small black sample size may have prevented rejection of this null hypothesis.
  • 51. 51 A PATH MODEL This section extends the preceding MIMIC and chain examples to a causal model of the relationships among three unobservable constructs with multiple indicators. I introduce some basic principles of path analysis (see Chapter 11 in SSDA4 for more details). At the structural equation level, a causal diagram is indispensable to displaying the hypothesized causal effects among the latent constructs: The diagram displays several path analytic principles: HelpPolitic SES BOX 11.1 Rules for Constructing Causal Diagrams 1. Variable names are represented either by short keywords or letters. 2. Variables placed to the left in a diagram are assumed to be causally prior to those on the right. 3. Causal relationships between variables are represented by single- headed arrows. 4. Variables assumed to be correlated but not causally related are linked by a curved double-headed arrow. 5. Variables assumed to be correlated but not causally related should be at the same point on the horizontal axis of the causal diagram. 6. The causal effect presumed between two variables is indicated by placing + or - signs along the causal arrows to show how increases or decreases in one variable affect the other.
  • 52. 52 • The model asserts that a respondent’s socioeconomic (SES) directly causes both political ideology (Politic) and attitude about the governmental role (Help). • SES also indirectly affects Help, via its impact on Politic (e.g., transmitted via the compound product of the path from SES to Politics times the path from Politic to Help). [See SSDA Chapter 11 for a detailed discussion of disaggregating the covariation between any pair of variables into direct, indirect, and so-called correlated effects.] • The unsourced arrows to the two endogenous variables (Politic and Help) mean that other sources of their variation aren’t included in this model. These residual effects presumably operate independently of (i.e., are uncorrelated with) the explicitly included causes. At the measurement equation level, each construct has two or more observable indicators. The variables for Politic and Help are the same as above; the two SES indicators as EDUC and INCOME98 (measured as midpoints of the $000 ranges). Here are the LISREL commands: CAUSAL MODEL FOR SES-POLITIC-HELP (SAVED IN PATH.LS8) Observed variables: HELPBLK HELPNOT HELPPOOR HELPSICK POLVIEWS PARTYID EDUC INCOME98 PRESTG80 Covariance Matrix From File: PATH.MAT Sample Size: 1396 Latent Variables: Help Politic SES Relationships: HELPPOOR = Help HELPNOT = Help HELPSICK = Help HELPBLK = Help POLVIEWS = Politic PARTYID = Politic INCOME98 = SES EDUC = SES PRESTG80 = SES Help = Politic Help = SES Politic = SES Path Diagram LISREL Output: SC MI End of Problem Although the minimum fit function is much higher than desirable (χ2 = 143.58 for 24 df; p < .0001), the other statistics indicate a good fit -- GFI = .98; AGFI = 0.96; RMSEA = 0.060 (“reasonable fit”), 90% CI from 0.051 to
  • 53. 53 0.069. All coefficients for the factor loadings and structural effects were all highly significant. The completely standardized path coefficients: The factor loadings for Politic and Help are similar to those in the chain model. The three SES indicators also have comparably high loadings. At the structural level, the strongest path coefficient is from Politic to Help, at 0.60 standard deviations. The direct path from SES to Help is just 0.11, and its indirect path via Politic is almost as strong: (0.14)(0.60) = 0. 08. Thus, higher status persons are more politically conservative and support more individualistic solutions to social policies.
  • 54. 54 MODEL IDENTIFICATION To be estimable, a latent variable structural equation model must be identified. Models are identified if one optimal (best) value exists for each parameter whose value is not known. Models that are identified usually converge to best estimates for these parameters. Models in which at least one parameter does not have a unique solution are called “not identified” or “underidentified.” Underidentification occurs when the specified equations contain more unknowns to be estimated. To illustrate, does one unique pair of values for X and Y solve this algebraic equation? Y = 4 + 3X However, if we add a second equation, the system of two equations becomes “just identified”: given two equations with two unknowns, unique X and Y values are easily calculated: Y = 14 – 2X (HINT: set the two righthand sides equal and solve for X.) In this example illustrates a “just identified” model because it has exactly as many knowns as unknowns and yields precise estimates. If we include a third equation in the system, such as Y = 2 + 4X, this “overidentified” system would allow us to solve for precise X and Y values using three different pairs. However, if that third equation were Y = 2 + 2X, would an exact solution be possible? SEM computer programs iteratively calculate overidentified parameter estimates that minimize discrepancies between the observed and expected covariance matrices, (S - Σ(θ)). The known values in structural equation models are the observed variances and covariances, while the unknowns are those model parameters you allow to vary freely. For example, if we have 5 indicators, the covariance matrix contains ((5)(5+1))/2 = 15 nonduplicate values. Hence, a CFA or SEM would not be identified if you specify more than 15 free parameters to be estimated. SEM programs usually provide information about a model’s identifiability. If one or more parameters are not identified, the program will be unable to produce standard errors for the parameter estimates. In such cases, you should try to identify the model by placing additional constraints on appropriate parameters (consistent with theory). For example, set some
  • 55. 55 parameter value(s) equal to 0, 1, or equal to another free parameter. Run the model again to see whether it yields a complete set of estimates. SEM experts disagree on whether computer programs can be trusted to find all instances of nonidentification, particular for very complex models. That is, programs sometimes produce standard errors for models that are not identified. Purists argue that researchers should check whether both the measurement and structural models are separately identified prior to submitting a SEM model for computer estimation. Researchers can study the necessary-and-sufficient rules and requirements for model identification. Two good sources are: (1) Kenneth Bollen. 1989. Structural Equations with Latent Variables. New York: Wiley. (2) David A. Kenney’s Rules for Identification Webpage <http://w3.nai.net/~dakenny/identify.htm>. For this course, identification is unlikely to be problematic for the types of CFAs and SEMs that most students will estimate -- multiple-indicator recursive models, in which every observed variable loads on only one latent construct, and one indicator per construct is fixed to 1.0 to set the latent factor’s scale.
  • 56. 56 FACTOR ANALYSIS OF DICHOTOMIES If a LISREL analysis includes some or all variables measured at the ordinal or discrete (dichotomous) level, computing a covariance or Pearson correlation matrix from such scores and applying maximum likelihood estimation (MLE) may lead to distorted parameter estimates and inaccurate test statistics. Jöreskog and Sörbom recommend alternative correlation coefficients and estimation methods for such situations. An observed variable whose categories represent a set of ordered categories might be viewed as a crude classification of an unobserved (latent) continuous variable z* with a standard normal distribution. For example, a low-medium-high measure X could be trichotomized at three threshold values for z*: X is scored 1 if z* ≤ α1 X is scored 2 if α1 < z* ≤ α2 X is scored 3 if α2 < z* A variety of correlation coefficients can be calculated when one or both observed variables are ordinal: • Polychoric correlation coefficient for two ordinal variables assumes their underlying continuous measures have a bivariate normal distribution • Tetrachoric correlation, a subtype of polychoric, is used for two dichotomies • Polyserial correlation involves an ordinal and a continuous variable, and also assumes an underlying bivariate normal distribution • Biserial correlation, a subtype of polyserial, is used for a dichotomous and a continuous variable To include an ordinal variable in a linear relationship, LISREL iteratively computes polychoric and polyserial correlations not from the observed scores but from the theoretical correlations of the underlying z* variables. A matrix of estimated correlation coefficients is created from the separate crosstabulations for every pair of observed continuous, ordinal, or dichotomous variables.
  • 57. 57 As an alternative to MLE, LISREL obtains correct large-sample standard errors and chi-square values using a weighted least squares (WLS) estimation method. A weight matrix required for WLS is the inverse of an estimated asymptotic covariance matrix (W) of polychoric and polyserial correlations. This inversion will be performed by LISREL, based on input of a W matrix generated by PRELIS and stored on the computer as a binary file. PRELIS computes estimates of the asymptotic covariances of the correlations when instructed: OUTPUT MATRIX=PM SM=FILE.MAT SA=FILE.ACM PA The PM option instructs PRELIS to compute a matrix of polychoric correlations when some or all variables have been declared as ordinal; SM saves this correlation matrix in a first named file with extension “mat”; SA saves the asymptotic covariance matrix in another file with extension “acm”; and PA tells it to write the W matrix in the PRELIS output file. To illustrate, I analyze the seven GSS2000 items on confidence in institutions, whose responses were recoded into dichotomies (1 = a great deal of confidence; 0 = only some or hardly any confidence): 165. I am going to name some institutions in this country. As far as the people running these institutions are concerned, would you say you have a great deal of confidence, only some confidence, or hardly any confidence at all in them? CONFINAN: Banks and financial institutions CONBUS: Major companies CONCLERG: Organized religion CONEDUC: Education CONFED: Executive branch of the federal government CONLABOR: Organized labor CONPRESS: Press CONMEDIC: Medicine CONTV: TV CONJUDGE: U.S. Supreme Court CONSCI: Scientific Community CONLEGIS: Congress CONARMY: Military The research question is whether a single factor or multiple factors are required to represent the tetrachoric correlations among these dichotomies.
  • 58. 58 These SPSS commands recode all 13 indicators into dichotomies, replace all missing values with –999, and write the raw datafile to “ABORTION.TXT”: MISSING VALUES confinan to conarmy (). RECODE confinan to conarmy (2,3=0)(1=1)(ELSE=-999). FREQ VAR = confinan to conarmy. WRITE OUTFILE = CONFIDE.TXT/confinan to conarmy (13F5.0). EXECUTE. These PRELIS commands estimate the asymptotic covariance matrix for listwise deletion of cases: PRELIS FOR 13 CONFIDENCE DICHOTOMIES(SAVED IN CONFIDE.PR2) DATA NI=13 NO=2817 MI=-999 TR=LI RAW-DATA-FROM FILE=CONFIDE.TXT LABELS FINAN BUS CLERG EDUC FED LABOR PRESS MEDIC TV JUDGE SCI LEGIS ARMY ORDINAL FINAN BUS CLERG EDUC FED LABOR PRESS MEDIC TV JUDGE SCI LEGIS OUTPUT MATRIX=PM SM=CONFIDE.MAT SA=CONFIDE.ACM PA Here’s a matrix of tetrachoric correlations among 8 indicators analyzed below, based on N = 1,496 cases: FINAN FED PRESS MEDIC TV JUDGE SCI LEGIS ------- ------ ------- ------- ------- ------- ------ ------ FINAN 1.00 FED 0.33 1.00 PRESS 0.32 0.40 1.00 MEDIC 0.41 0.38 0.35 1.00 TV 0.32 0.32 0.61 0.41 1.00 JUDGE 0.39 0.52 0.37 0.35 0.23 1.00 SCI 0.43 0.33 0.22 0.45 0.15 0.55 1.00 LEGIS 0.49 0.72 0.47 0.41 0.41 0.68 0.40 1.00 In contrast, the Pearsonian correlation coefficients, which treat the confidence dichotomies as continuous variables, are substantially smaller: FINAN FED PRESS MEDIC TV JUDGE SCI LEGIS ------- ------ ------- ------- ------- ------- ------ ------ FINAN 1.00 FED 0.18 1.00 PRESS 0.16 0.20 1.00 MEDIC 0.25 0.19 0.17 1.00 TV 0.16 0.14 0.33 0.19 1.00 JUDGE 0.25 0.28 0.19 0.22 0.11 1.00 SCI 0.27 0.17 0.10 0.30 0.07 0.36 1.00 LEGIS 0.26 0.45 0.24 0.20 0.19 0.38 0.20 1.00 For a single-factor model, LISREL automatically applies WLS when instructed to read the asymptotic covariance matrix:
  • 59. 59 LISREL FOR 13 GSS00 CONFIDENCE DICHOTOMIES (SAVED IN CONFIDE.LS8) Observed variables: FINAN BUS CLERG EDUC FED LABOR PRESS MEDIC TV JUDGE SCI LEGIS ARMY Correlation Matrix from File: CONFIDE.MAT Asymptotic Covariance Matrix from File: CONFIDE.ACM Sample size: 1496 Latent Variables: Confide1 Relationships: FINAN - ARMY = Confide1 Path Diagram LISREL Output: SC MI End of Problem This single-factor model yielded an absurdly high χ2 = 2,063.1 (14 df; p < .0001), and two other statistics indicated poor good fits -- GFI = 0.83; AGFI = 0.76. However, RMSEA = 0.053 (“reasonable fit”), 90% CI from 0.048 to 0.059. I decide to estimate a multi-factor model, incrementally adding an indicator and observing the change in fit statistics. After several trials, I concluded that a model having three factors with 8 indicators and four correlated error terms produced the best fit. Here are the commands: LISREL FOR 13 GSS00 CONFIDENCE DICHOTOMIES (SAVED IN CONFIDE.LS8) Observed variables: FINAN BUS CLERG EDUC FED LABOR PRESS MEDIC TV JUDGE SCI LEGIS ARMY Correlation Matrix from File: CONFIDE.MAT Asymptotic Covariance Matrix from File: CONFIDE.ACM Sample size: 1946 Latent Variables: Confide1 Confide2 Confide3 Relationships: LEGIS FED JUDGE = Confide1 TV PRESS = Confide2 FINAN MEDIC SCI = Confide3 Let the Errors between JUDGE and SCI Correlate Let the Errors between MEDIC and PRESS Correlate Let the Errors between FINAN and FED Correlate Let the Errors between JUDGE and PRESS Correlate Path Diagram LISREL Output: SC MI End of Problem
  • 60. 60 The factor loadings are high, reflecting the polychoric correlations on which they are based. The three factors are strongly correlated: r12 = 0.59, r23 = 0.59, and r13 = 0.71. As usual, the substantive meanings of latent constructs can be inferred by the contents of the specific measures which load on them. What dimensions, if any, do you conjecture? The importance of using WLS in conjunction with the asymptotic covariance matrix becomes evident when the preceding analysis uses only the tetrachoric correlation matrix. The fit statistics are much worse: χ2 = 159.3 (13 df; p < .0001) and RMSEA = 0.087.
  • 61. 61 SEM WITH ORDINAL VARIABLES Continuing to examine how LISREL handles ordinal indicators, I next estimate a structural equation model that treats all indicators as ordinal variables. The data are from GSS98. The dependent variable is a latent abortion attitude construct with three dichotomous indicators of legal abortions for specific circumstances (1 = yes, 0 = no): Please tell me whether or not you think it should be possible for a pregnant woman to obtain a legal abortion if. . . ABDEFECT: If there is a strong chance of serious defect in the baby? ABHLTH: If the woman's own health is seriously endangered by the pregnancy? ABRAPE: If she became pregnant as a result of rape? The two independent constructs affecting variation in abortion attitude are: (1) a political orientation construct with indicators PARTYID and POLVIEWS, each having seven ordered categories; and (2) a religiosity construct consisting of two items having six and eight ordered categories, respectively: PRAY: About how often do you pray? PRIVPRAY: How often do you pray privately in places other than at church or synagogue? The structural model has a curved two-headed arrow to indicate the antecedent constructs are correlated but not causally related: Abort Prays Politics
  • 62. 62 After recoding the two religiosity indicators to make frequent praying the high scores, I used SPSS to write the datafile. Next, these PRELIS commands created the correlation matrix and ACM files: PRELIS FOR PRAY-ABORTION VARIABLES (PRAY.PR2) DATA NI=7 NO=2832 MISSING=-999 TR=LI RAW-DATA-FROM FILE=PRAY.TXT LABELS ABDEFECT ABHLTH ABRAPE PRAY PRIVPRAY PARTYID POLVIEWS ORDINAL ABDEFECT ABHLTH ABRAPE PRAY PRIVPRAY PARTYID POLVIEWS OUTPUT MATRIX=PM SM=PRAY.MAT SA=PRAY.ACM PA The tetrachoric correlation matrix: ABDEFECT ABHLTH ABRAPE PRAY PRIVPRAY PARTYID POLVIEWS -------- -------- -------- ------- -------- -------- -------- ABDEFECT 1.000 ABHLTH 0.866 1.000 ABRAPE 0.802 0.859 1.000 PRAY -0.233 -0.289 -0.263 1.000 PRIVPRAY -0.295 -0.351 -0.303 0.824 1.000 PARTYID -0.243 -0.105 -0.166 0.075 0.124 1.000 POLVIEWS -0.068 -0.085 -0.120 -0.001 0.061 0.434 1.000 Here are the commands: LISREL FOR GSS98 PRAY-POLITICS-ABORTION MODEL Observed variables: ABDEFECT ABHLTH ABRAPE PRAY PRIVPRAY PARTYID POLVIEWS Correlation Matrix from File: PRAY.MAT Asymptotic Covariance Matrix from File: PRAY.ACM Sample size: 598 Latent Variables: Abortion Prays Politics Relationships: ABDEFECT ABHLTH ABRAPE = Abortion PRAY PRIVPRAY = Prays PARTYID POLVIEWS = Politics Abortion = Prays Abortion = Politics Path Diagram LISREL Output: SC MI End of Problem
  • 63. 63 The path diagram with completely standardized estimates: Notice that the error term for PRIVPRAY is negative (-0.02), a nonsensical value. To constrain that parameter, at the cost of 1 df, include this LISREL command: Let the Error Variance of PRIVPRAY Equal 0 What are your substantive interpretations about the relative impacts of political orientation and religiosity on abortion attitudes?