1.
Eﬀect of Number of Categories and Category
Boundaries on Recovery of Latent Linear
Correlations from Optimally Weighted
Categorical Data
Johnny Lin
Advisor: Peter Bentler
November 19, 2008
2.
Outline
Introduction
LINEALS
Forming a Hypothesis
Method
Description
Simulation
Analysis
Results
Main Eﬀects
Interactions
3.
Outline
Introduction
LINEALS
Forming a Hypothesis
Method
Description
Simulation
Analysis
Results
Main Eﬀects
Interactions
4.
Introducing LINEALS
A Method of Optimal Scaling
Algorithm
An iterative process that minimizes m m 2 2 2
l=1 (ηjl − rjl ) where ηjl
j=1
is a measure of nonlinearity.
Developed by Jan de Leeuw and implemented by Patrick Mair.
Assumption
That bi-linearization is possible. No assumption of normality.
5.
Plot of LINEALS Transformation
Criterion: Linearize both X on Y and Y on X simultaneously.
Figure: Red: X on Y , Blue: Y on X
6.
Outline
Introduction
LINEALS
Forming a Hypothesis
Method
Description
Simulation
Analysis
Results
Main Eﬀects
Interactions
7.
Questions to ask
First, deﬁne good recovery as small deviation from true score.
1. Does LINEALS recover true population correlations better
than Pearson for categorical data?
2. Is the performance of LINEALS robust?
3. What factors inﬂuence good recovery?
8.
Outline
Introduction
LINEALS
Forming a Hypothesis
Method
Description
Simulation
Analysis
Results
Main Eﬀects
Interactions
9.
Conditions tested
Correlation Type, True Population Correlation, Number of
Categories, and Homogeneity
Condition Parameters
{0=LINEALS, 1=Pearson}
1. Correlation Type (r)
{0.3,0.5,0.7,0.9}
2. True Population Correlation (P)
{2,3,5,7,10}
3. Number of Categories (V)
{0=Non-Homogeneous, 1=Homogeneous}
4. Homogeneity (h)
Total of 80 combinations (2x4x5x2).
10.
Outline
Introduction
LINEALS
Forming a Hypothesis
Method
Description
Simulation
Analysis
Results
Main Eﬀects
Interactions
11.
Creating functions in R
For each combination (total of 80):
1. Generate 1000 sets of bivariate normal data.
2. Make “cuts” (homogeneous vs. non-homogeneous).
3. Run through LINEALS / Pearson.
4. Calculate deviation of result and true population correlation.
5. Repeat Steps 1 - 4 twenty-ﬁve times.
Result: Total of 2000 deviations (80x25).
12.
Outline
Introduction
LINEALS
Forming a Hypothesis
Method
Description
Simulation
Analysis
Results
Main Eﬀects
Interactions
13.
Hierarchical Regression
Description
DV: deviation of sample correlation from true population
correlation |ρ12 | − |ˆ12 |
ρ
IVs: main eﬀect and interactions of four conditions (total of
15)
Four main eﬀects (h,r,P,V)
Six 2-way interactions (hr, hP, hV, . . . )
Four 3-way interactions (hrP, hrV, . . . )
One 4-way interaction (hrPV)
14.
Hierarchical Regression
Model Selection
Tested full model against nested models.
Conﬁrmed with Best Subset Regression.
Optimal Adj. R 2 and Mallow’s CP found with 7-8 parameters.
(a) Adj. R 2 (b) Mallow’s CP
15.
Final Model
SPSS Output
Coefficients(a)
Unstandardized Standardized
Model Coefficients Coefficients t Sig.
B Std. Error Beta
1 (Constant) .189 .006 31.240 .000
h -.113 .012 -.620 -9.299 .000
r .007 .002 .041 3.054 .002
V -.024 .001 -.773 -40.558 .000
P .098 .008 .241 12.655 .000
hV .013 .002 .487 7.164 .000
hP .117 .018 .435 6.392 .000
hPV -.017 .003 -.422 -6.326 .000
a Dependent Variable: difference
Diﬀerence between LINEALS and Pearson deviations is .007
controlling for other factors.
16.
Outline
Introduction
LINEALS
Forming a Hypothesis
Method
Description
Simulation
Analysis
Results
Main Eﬀects
Interactions
17.
Plot of Main Eﬀects I
Figure: Main Eﬀect of Number of
Figure: Main Eﬀect of Population
Categories V
Correlation P
18.
Plot of Main Eﬀects II
Figure: Main Eﬀect of Homogeneity h Figure: Main Eﬀect of Correlation Type r
19.
Outline
Introduction
LINEALS
Forming a Hypothesis
Method
Description
Simulation
Analysis
Results
Main Eﬀects
Interactions
20.
Plot of Signiﬁcant Interactions
Note: The signiﬁcant 3-way interaction hPV is not plotted.
Figure: Population Correlation by Levels Figure: Number of Categories by Levels
of Homogeneity hP of Homogeneity hV
21.
Interaction of Correlation Type and Number of Categories
When rV added into regression model, the main eﬀect of
Correlation Type r goes away.
Suggests that number of categories may contribute to the LINEALS vs.
Pearson diﬀerence.
Figure: Number of Categories by Correlation Type (rV, marginally sig.)
22.
Summary
1. LINEALS performs slightly better than Pearson under
bivariate normal categorizations.
2. The non-signiﬁcant interactions with Correlation Type suggest
that LINEALS is robust.
3. Recovery of true population correlations is highly inﬂuenced by
homogeneity (i.e., the underlying equality of interval widths).
Future Studies
How does it compare against polychoric correlations?
Is the resulting matrix positive deﬁnite?
Clipping is a handy way to collect and organize the most important slides from a presentation. You can keep your great finds in clipboards organized around topics.
Be the first to comment