A Method for Meta-Analytic Confirmatory Factor Analysis
A Method for
Factor AnalysisKamden K. Strunk, Ph.D.
Oklahoma State University
Center for Research on STEM Teaching and Learning
School of Educational Studies
Research Evaluation Statistics and Assessment
The Need for a Method
Many instruments have controversial structure.
Instrument structure has practical implications.
Changes in the scoring of an instrument have impacts
Mental Health diagnoses
Existing methods use Generalized Least Squares
This is necessary in MA-SEM, but results in imprecise
Generalized Least Squares estimation (or the two-stage
MA-SEM method) is unnecessary in the case of MA-CFA.
An Exact Method
In the case of factor analysis, correlation matrices and
item descriptives are often given.
When not given, they can be easily obtained by
Correlation matrices with item SDs are easily converted
to covariance matrices.
Covariance matrices are then easily converted to SSCP
SSCP matrices use summation, and thus can be
First, the inter-item correlation matrices are converted
to variance (𝑠 𝑥
2 = (𝑠 𝑥)2)/covariance (𝑠 𝑥𝑦 = 𝑟𝑥𝑦(𝑠 𝑥 𝑠 𝑦))
Next, the variance/covariance matrices are converted to
SSCP matrices ( (𝑋 − 𝑋)(𝑌 − 𝑌) = 𝑠 𝑥𝑦(𝑁 − 1))
Then, the SSCP matrices are added together.
Finally, the combined SSCP matrix is divided by the total
sample size for all combined samples minus one.
This results in a combined variance/covariance matrix
for all of the sampled studies.
An Illustrative Case
One example is the Beck Depression Inventory, 2nd
edition (BDI-II; Beck, Steer, & Brown, 1996).
This scale is widely used by clinicians in the
measurement of depression, thus making it all the more
important to understand its psychometric properties.
Among those who have explored the structure of the
BDI-II, a number of differing solutions have emerged.
The controversy with the structure of the BDI-II is
regarding both the number of factors, as well as their
Proposed Factor Solutions
Among the two-factor solutions, the cognitive/somatic split is
However, there are many variations on this general theme.
Some include a somatic-affective factor alongside a cognitive
(Arnau, Meagher, Norris, & Bramson, 2001; Vanhuele, Desmet, Groenvynch, Rosseel, &
Fontaine, 2008; Viljoen, Iverson, Griffiths, & Woodward, 2003).
Others include a cognitive-affective factor paired with a somatic
factor (Patterson, Morasco, Fuller, Indest, Loftis, & Hauser, 2011; Siegert, Walkey, & Turner-
Stokes, 2009; Storch, Roberti, & Roth, 2004; Whisman, Perez, & Ramel, 2000).
In one study, Wilson VanVoorhis and Blumentritt (2007) found a
cognitive-somatic factor and an affective factor.
Still others follow a simple cognitive/somatic split (Grothe, Dutton,
Jones, Bodenlos, Ancona, & Brantley, 2005; Palmer & Binks, 2008; Quilty, Zhang, & Bagby,
2010; Thombs, Ziegelstein, Beck, & Pilote, 2008), though one labels this
differently as cognitive and non-cognitive (Steer, Ball, Ranieri, & Beck,
Proposed Factor Solutions
There is some convergence around the idea of cognitive,
affective, and somatic factors (Brouwer, Meijer, & Zevalkink,
2012; Johnson, Neal, Brems, & Fisher, 2006; Lindsay & Skene,
2007; Tully, Winefield, Baker, Turbull, & de Jonge, 2011;
Vanhuele, et al., 2008).
Byrne, et al. (2007) suggests negative attitude, performance
difficulty, and somatic elements, though the items on the
factors deviate only somewhat from other solutions interpreted
as cognitive, affective, and somatic.
Chilcot, Norton, Wellsted, Almond, Davenport, and Farrington
(2011) suggest a cognitive, self-criticism, and anhedonia
solution that differs substantially from other solutions.
Lopez, Pierce, Gardner, and Hanson (2012) found a three factor
solution interpreted as negative rumination, somatic
complaints, and mood that also differs from the other three
factor solutions in general structure of the items.
Proposed Factor Solutions
Hierarchical and General Factor Solutions:
In general factor solutions, the researchers add an
additional factor for “depression” onto which all items
In hierarchical solutions, the researchers add an additional
factor for “depression” onto which all first-order factors
Both Quilty, Zhang, and Bagby (2010) and Thombs, et al.
(2008) found the best fit with a general depression factor
onto which all items load in their two-factor solutions, as
did Chilcot, et al. (2011) with a three-factor solution plus
the general factor.
Byrne, et al. (2007) has suggested a hierarchical solution
on top of the three-factor model, while Grothe, et al. (2005)
has suggested likewise on the two-factor model.
It has been suggested by some that the structure of the
BDI-II varies by population.
However, a population dependent structure is less useful
for diagnostic purposes.
For example, it has been suggested that the instrument
has different structures in depressed and non-depressed
populations. How then would it be useful in determining
which population one belongs to?
Data were collected from published factor analyses of the
BDI-II that included an inter-item correlation matrix.
Additionally, authors of other factor analytic work with the
BDI-II were contacted and asked for copies of the inter-
item correlation matrix with standard deviations.
In total, 10 studies were included in the final data set.
Although these studies include samples with varied
characteristics, they are combined in this case in an attempt
to approach the population as a whole, rather than any
As a result, for this study, the “population” is thought of as
all individuals who may be assessed with the BDI-II, both
depressed and not depressed, of all age groups, and all
The BDI-II seemed to fit relatively well in several models, with the
exception of the chi-square to degrees of freedom ratio.
Although recommended cutoffs for this ratio are much lower than
values obtained in these models, there are a number of possible
explanations for these high values.
For example, Hammervold & Olsson (2012) found that even slightly
misspecified models were highly likely to be rejected in the chi-
square test, even with extremely large sample sizes.
Given the controversy surrounding the BDI-II and the idea that its
structure may be sample-dependent, it is likely that each model is
slightly misspecified in a sense.
On the other hand, reliable models also tend to produce larger chi-
square values in very large samples, according to simulation studies
by Miles and Shevlin (2007).
In other words, it may be that the large chi-square to degrees of
freedom ratios in this case are misleading.
In the case that one would choose to view the large chi-
square to degrees of freedom ratios as artifacts of the
large sample and the measurement properties of the
BDI-II, the picture becomes even more troubling.
It would appear that just about any published model
(with the exception of the Wilson VanVoorhis, et al.
 model) fits relatively well.
Obviously some models fit better than others, but there
is not adequate evidence to suggest that there is one
true and superior model at the population level for the
BDI-II from an empirical standpoint, at least in terms of
However, in general, the hierarchical and general factor
models did not seem to offer much in terms of fit.
One notable exception is the Thombs, et al. (2008)
model. This model had very good fit, and one of the
lowest chi-square to degrees of freedom ratios overall.
Moreover, it does appear that the current research
supports the assertions of Byrne, et al. (2007), Ward
(2006), and others, who have argued strongly for the
superiority of three-factor solutions. It is worth noting
that the advantage in fit for three-factor models was
significant, but slight.
Additionally, because the difference in fit is so slight in many cases,
researcher may consider straying away from structures that deviate
heavily from the BDI-II’s original scoring or administration.
For example, Chilcot, et al. (2011) and Patterson, et al. (2011) each
use less than half of the original items.
Given that the BDI-II is a commercial instrument unlikely to change
its item content to match factor analytic work (particularly if that
work is not in the majority) it may be more useful to explore
structures that are more practically relevant by using the entire (or
close to the entire) instrument, and focusing on structures that
provide information relevant to clinicians.
Particularly with instruments like the BDI-II that are in constant use
for diagnosis and patient care, it may be meaningful to focus on
instrument structures that are focused on such issues, especially
given the very slight differences in fit for most models.
The suitability of different models cannot be finally
resolved wholly through empirical means such as
confirmatory factor analysis, however.
Future researchers may wish to focus on theoretical and
practical advantages that particular models offer.
Additionally, empirical studies of construct validity for
particular models, especially for predictive validity, will
be an important next step in evaluating alternative
While large-scale confirmatory factor analysis offers
insight into the scale’s structure, these practical steps
are necessary to decide on the appropriate structure for
practical, clinical use.
Discussion - Method
The primary purpose of this study was to serve as a
demonstration of a method for assessing instrument
structure across multiple samples.
This method has clear advantages – a side-by-side
comparison of many published factor models can be
created among shared data.
This puts the models on equal ground, and gives a large,
varied data set on which to test the fit of those models.
In addition, no estimation is involved in the process of
creating the combined variance/covariance matrix so
researchers using this method can have increased
confidence in the analyses they conduct with such
calculated matrices over estimated matrices.
Discussion - Method
A potential limitation is the use of disparate samples in
creating the combined covariance matrix.
Each has very different sample characteristics, which has
one of two possible outcomes:
First, it may result in a closer approximation of the
population matrix. However, it may also result in a highly
variable matrix that includes more than one population.
That is, these disparate samples may actually represent
Future research may focus on the suitability of covariance
matrices for combination, and the implications of combining
or not combining matrices on how the population is
Discussion - Method
This method for testing instrument structure across
multiple samples may be useful in any case where there
are multiple factor analytic studies with conflicting
This is particularly important to consider when the
instrument in question has, as the BDI-II, practical
implications for health, education, policy, or other areas
with implications for the people who will be measured
using the instrument.
In such cases, understanding the structure of the
instrument takes on an added importance and value,
and having an additional tool for assessing structure in
the population may prove extremely useful.
For a list of references, a copy of the finished paper, or
additional information, please contact: