2. Two Quotations
If it matters, measure it. If it can’t be measured, measure it
anyway.
---- Milton Friedman
When you can measure what you are speaking about, and can
express it in numbers, you know something about it; but when you
cannot measure it, and when you cannot express it in numbers, then
your knowledge is of a meager and unsatisfactory kind.
---- William Thomson
4. Observations
• Any observation has two components:
• True Component (∞) in which you are
interested and an error (ε).
• As the number of observations increase, the
approximation of the ∞ becomes more
accurate (μ∞) and that of the ε, in contrast,
comes closer to zero (με)
5. Error is Random
• As the error is random, it eventually becomes 0.
In practice, however, there has to be some error
for any test to be performed.
How do you calculate t or F test?
• When the error is systematic, it is called bias.
– OBC children score higher on intelligence tests
when the test administrator is an OBC than when
he is of high-caste.
– Students score higher on Tuesday than on Friday.
6. Reliability
• Reliability of a measure indicates how far it
is free from random error.
• Responses that are consistent are called
reliable.
• Operationally, reliability of a measure
indicates its ability to correlate with itself.
7. Base Error Sources Reliability
coefficients
Procedures Analyses
TIME
Change in participants
over time; Change in
testing situation
Retest (or stability)
Test participants at
different times with
the same form
Pearson correlation
(r)
FORMS
Differences in content
sampling across “parallel”
forms
Equivalence
Test participants at
one time with two
forms covering the
same content
Pearson r
ITEMS
Content heterogeneity
and low content
saturation in the items
Split–half
Internal consistency
Test participants
with multiple items
at one time
(a) r between test
halves
(b) Cronbach’s alpha
JUDGES
Disagreement among
judges
Internal consistency
Obtain ratings from
multiple judges on
one form and
occasion
(a) Inter-judge
correlation
(b) Coefficient alpha
SCALE CONSTRUCTION AND RELIABILTY
9. Six Blind Men and the Elephant
John Godfrey Saxe: (1816-1887)
Wal
l
Spear
Snake
Tre
e
Fan Rop
e
“… what we observe is not nature in itself, but nature
exposed to our method of questioning.”
Werner Heisenberg
(1901-1977)
Nobel Prize in Physics
Max Planck Medal
10. Representing Abstract by Actual
Abstract Actual
Deficiency
of
the
Actual
Contamination
in
the
Actual
Relevance of the Actual for the
Abstract
11. Relevance, Deficiency, and Contamination
in Hired Secretary
Ideal
Teacher
Hired
Teacher
Deficiency
of
the
Actual
Contamination
in
the
Actual
Representation of the Abstract by the
Actual
9.
Look
10.
Age
11.
Religion
12.
Connection
13.
Liking
12. Criterion Relevance as Validity:
Major Approaches
Face validity: Extent to which the items appear to measure the intended
construct.
Content validity: Extent to which the items are representative sample of
the behavior domain to be measured.
Criterion-Related (or External) Validity
Concurrent: Extent to which the test scores estimate an individual’s
present criterion score
Predictive: Extent to which an individual’s future score on a criterion is
predicted from prior test scores.
Construct validity: Whether the measure accurately reflects the
conceptual or ultimate construct (Criterion relevance by Convergent and
Divergent validities)
13. Criterion-Related Validity
From JAP article, we had learned that time is also a basis of establishing
validity:
Proximal (short-term) vs. Distal (long-term)
If you have a particular criterion in mind, then you can use your new
scale with data available right now (concurrent) or in the future
(predictive).
Concurrent: Extent to which scores estimate an individual’s present
criterion score. For example, performance in this course now, peer
nominations for a leadership position.
Predictive: Extent to which scores predict an individual’s future score on
a criterion. Most selection and aptitude tests are required to predict
future performance on the job.
Statistics: simple or multiple r depending upon how many predctors you
have.
14. Validation Procedures
Expert judgments and reviews. Test whether experts agree that
items are relevant and represent construct domain; use ratings to
assess item characteristics, such as comprehensibility and clarity.
Differentiation between criterion (or contrast) groups: Test size and
direction of expected differences between groups on the construct
of interest.
Correlation: Test relation between a measure of the construct and
other measures of the same construct.
Multi-trait multi-method: Test whether correlations from different
measures (e.g., instruments, data sources, languages) of the same
and different constructs form a specific pattern.
Factor analysis: Test hypothesized structure of the construct
domain (e.g., whether items thought to define the construct load
on the same factor and not on the other factors)
15. • You took responses to 12 items, supposed to
measure two constructs, competence and
warmth in 200 school teachers.
• The reliability of the competence and warmth
measures are .90 and .95, respectively. The
correlation between them is .68.
• Is competence distinct from warmth of
teachers?
Are two Constructs Distinct?
16. Correlation Corrected for Unreliability
• Correlation corrected for attenuation =
(rxy / √rxxryy) =
(.68 / √(.90 x .95)
= (.68 / .92) = .74
• As the correction is still lower than either
reliability, the two measures are related but
distinct.
17. 95% Confidence Intervals of r
• You can also calculate the 95% confidence intervals of
the correlation of .68. The sample size is 200.
• Go to this site to calculate the 95% confidence
intervals. Give the r and sample size (e.g., .68 and
200):
– http://glass.ed.asu.edu/stats/analysis/rci.html
• The 95% Confidence Intervals on the Population
Correlation range from .60 to .76 .
• As the highest correlation of .76 is still lower than
either reliability, the two measures are distinct.
19. Construct Validation through Multi-
trait Multi-method
• Measure TWO constructs, one of interest and another
that is distinct, using TWO different methods.
– Convergent validity: The same construct measured
by the different methods should correlate highly.
– Divergent validity: The different constructs
measured by the same method should have low or
no correlation.
20. Evidence for Convergent and Divergent
Validities
Questionnaire Observations of Behavior
Attitude toward
women (ATW)
Attitude toward
men (ATM)
Attitude toward
women (ATW)
Attitude toward
men (ATM)
Questionnaire
ATW (.90)
ATM .30 (.90)
Behavior
ATW .70 .10 (.90)
ATM .10 .70 .30 (.90)
Note: The coefficients in parentheses are reliability estimates.
Same Construct, Different Method
Different Constructs, Same Method
21. Lack of Convergent or Divergent Validity
Questionnaire Observations of Behavior
Attitude
toward women
(ATW)
Attitudes
toward men
(ATM)
Attitude
toward women
(ATW)
Attitude
toward men
(ATM)
Questionnaire
ATW (.90)
ATM .80 (.90)
Behavior
ATW .40 .30 (.90)
ATM .30 .40 .80 (.90)
Note: The coefficients in parentheses are reliability estimates.
All four correlations are opposite of those in the previous slide.
The scales are reliable but not valid.
22. Factor Aanalysis
• Part I: Introduction to factor analysis
• Part II: Principal Components, Principal Axis,
and Maximum Likelihood Analyses
• Part III: Using the Extraction Three Procedures
• Part IV: Confirmatory Factor Analysis as a
Measurement Model
24. Construct
• The ideal construct is represented by the actual one (i.e.,
predictor).
• To show that a predictor does represent the ideal criterion, we
learnt how to distinguish between the two constructs.
• Given two constructs, we can use two measurement methods to
test convergent (i.e., the same construct measured differently
should correlate) and divergent (i.e., the different constructs
measured by even a similar method should not correlate)
validities of the predictor.
25. Applied Research
• In most applied research, we take multiple
measurements on the same respondents:
– Employees judge the management.
– Consumers judge the inflation.
– Students judge their teachers.
– Voters judge the candidate endorsed.
26. Part II: Principal Components, Principal Axis,
and Maximum Likelihood Analyses
27. Goals of Factor Analysis
– To parsimoniously represent scores on a large set of
measured variables.
– Do the measured variables reflect a single underlying
construct or do they represent distinct constructs?
– The general goal is pursued for one of two reasons:
• Construct identification
• Construct measurement
28. • Three major procedural decisions that must be taken when
conducting a FA
– Factor Extraction
– Determining the Number of Common Factors
– Factor Rotation
Decisions in Factor Analysis
29. Factor Extraction
– Specified in the “extraction” subcommand of
PASW Statistics
– Computational procedure used to calculate the
estimates of the factor loadings and
communalities of the measured variables.
– Also called “model fitting” or “parameter
estimation” method
30. • PCA is a data reduction technique that maximizes the amount
of variance accounted for in the observed variables by a
smaller set of PRINCIPAL COMPONENTS (PC).
– PCA identifies patterns in the data, and expresses the
results in such a way that highlights their similarities and
differences between the variables and the components.
– PC represent all of the variance in the X variables through a
small set of components.
Principal Components Analysis (PCA)
31. Principal Axis Factor Analysis (PAF)
• PAF tries to understand the shared variance in a set of X measurements
through a small set of latent variables called factors.
– Latent factors are unobserved conceptual or ideal variable
such an ideal teacher (i.e., competence and warmth of the
teacher)
– They supposedly cause the scores observed on the
measured or indicator variables.
Latent Variables Oberseved Responses
– PAF is good at recovering even weak factors
32. Maximum Likelihood Factor Analysis (ML)
Maximum Likelihood Factor Analysis
(ML) also deals with latent variables
but provides an estimation of the
goodness of fit of the hypothesized
model to the data obtained.
One can specify the number of factors
and see whether those factors
satisfactorily represent the data.
33. • The three factor extraction procedures differ in
important ways:
– They assume fundamentally different underlying
models.
– They make different assumptions regarding
distributional properties of the measured
variables.
– They provide different results.
Extraction Procedures Differ
34. Number of “Major” Factors
• The appropriate number is one that:
– does a good job accounting for the correlations among the
measured variables;
– makes the fit worse if one is reduced;
– does not improve the fit if one additional factor is
allowed;
– allows readily interpretable and theoretically useful
factors.
35. • Most widely used of all factor number rules
• For any matrix of correlations, it is possible to compute a set
of numerical values called eigen values.
• They reflect the variance accounted for by principal
components,
– with the first value reflecting the variance explained by the
strongest component,
– the second value the variance explained by the second strongest
component and so on.
Eigen Values ≥ 1
36. – involves constructing a graph in which eigen
values from the matrix are plotted in descending
order
– graph is then examined to determine the number
of eigen values that precede the last major drop.
Scree Test
37. Example of a Scree Plot
0
0.5
1
1.5
2
2.5
3
3.5
E
i
g
e
n
v
a
l
u
e
Eigenvalue Ordinal Position
The scree plot of this
sample suggests 3 factors.
38. There is no clear definition of what constitutes a major
drop.
Sometimes the data may produce a gradual decreasing
slope with no major break points
Note. The scree test has been found to function reasonably
well in cases where strong PCs are present.
Two Criticisms
39. Factor interpretability
When different procedures suggest different numbers of factors, these
models can be compared on the basis of their interpretability
Factor stability
When multiple data sets are available, the stability of different factor
solutions can be compared across them.
Factor number should be decided in a holistic fashion
No procedure or criterion is infallible
Decision should be based on the configuration of evidence
Some More Points
40. Factor Rotation
• Factor extraction procedures arrive at their initial solutions on
the basis of computation ease rather than conceptual
plausibility.
• For any model with 2 or more factors (or components), there
will exist an infinite number of equally good fitting solutions.
• How do we select which of these equally good solutions to
serve as the basis for our interpretation?
41. Assumptions
• Are the factors orthogonal?
– Varimax
• Are the factors correlated?
– Direct oblimin
Rotation makes the loadings or regression weights
look much clearer!
43. Positive and Negative Mood
• Participants rated their immediate mood
along 4 negative and 4 positive 5-point items.
• Do the responses to the 8 items imply simple
negative and positive responses?
• Let us analyze the data, using PCA, PAF, and
ML.
47. Factor interpretability
When different procedures suggest different numbers of
factors, these models can be compared on the basis of their
interpretability
Factor stability
When multiple data sets are available, the stability of
different factor solutions can be compared
Factor number should be decided in a holistic fashion
No procedure or criterion is infallible
Decision should be based on the configuration of evidence
Some More Points
49. EFA vs. CFA
• Exploratory
– Determines the
number of factors
– Determines whether
factors are
correlated or not
– Variables free to
load wherever
Theory Generating
• Confirmatory
– Confirm the a-priori
factors
– Correlations among
factors set a-priori
– A-priori variable
loadings
Theory Testing
50. Advantages of CFA
• The measurement model includes error
– Random error
– Error from unreliability
• In SEM, the entire measurement model
can be included
53. Current Fit Indices
1. Χ2(df, N) = p = (nonsignificant)
2. Non-normed Fit Index (NNFI)/ Tucker-Lewis Index (TLI) ≥
.95 [GFI}
3. Incremental Fit Index (IFI) ≥ .95 [CFI}
4. Root Mean Square Error of Approximation (RMSEA) ≤ .05
5. Standardized Root Mean Residual (SRMR) ≤ .05
54. Stages in SEM
• Model Specification
• Model Identification
• Model Estimation
• Model Testing
• Model Modification
55. Model Specification
• There are many ways a model can be
developed
– EFA can suggest a model. The model is specified a
priori.
– There may be a theoretical prediction.
56. Model Identification
• Comparing the number of observed values (1)
and free parameters (2)
• (1) < (2): Under-identified (can’t run the
analysis)
• (1) = (2): Just identified (not useful)
• (1) > (2): Over-identified (can run the analysis)
57. Model Estimation
1. Draw the model diagram,
2. Name of the factors,
3. Bring in the items from the file;
4. Unplug the observed variables;
5. Click at SRMR (Standardized root mean residual)
5. Estimate the parameters and the
model fit
58. Model Testing
• Is the fit of the model to the data good?
1. Χ2(df) = p = nonsignificant
2. NNFI/TLI ≥ .95 [TLI}
• 3. IFI ≥ .95
• 4. RMSEA ≤ .05
•
• 5. SRMR ≤ .05
59. Exercise
Hypothesis: People are drawn to others when
they (a) believe that others are respectable
and (b) assume that others would like
them.
Method: You asked 384 people to think a
person they either liked or disliked (ns=
194) and then rate their respect, inferred
attraction, and attraction.
60. Three Responses Measured
Respect Construct
I think my partner would make a good leader.
My partner will probably achieve all of his/her goals.
My interaction partner is probably well respected.
My partner will probably be successful in life.
Inferred Attraction Construct
I think my partner will like me.
I think my partner will care for me.
My partner could help me accomplish my goals.
I think my partner will enjoy working together with me.
Attraction Construct
I would like to meet my partner.
I would like to be with my partner.
I look forward to working with my partner.
I would like to get to know this person better.
61. Do 12 Responses Have a 3-Factor
Structure?
• Do a 3-factor CFA to test the hypothesis.
• Do another 2-factor CFA to show that 3CFA
provides a better test.
• Do another single factor CFA to rule out that the
12 responses are not mere evaluations of the
partner.
62. Your Models
CFA Three Two One
-------------------------------------------------------------------------------------------
χ2 = 134.25 237.13 395.60
df = 51 53 54
p = .001 .001 .001
NNFI/TLI = .96 .91 .84
.
IFI = .97 .93 .87
RMSEA = .07 .10 .13
SRMR = .03 .05 .06
63. Comments
When the factors are known or theoretically specified,
CFA is a better analytic procedure than EFA.
Given that there are so many criteria for judging the
model fit, it is clear that this procedure is still under
development.
Of the ML and CFA, the former is preferable unless the
reviewers insist for CFA.