R をつかったカテゴリカル因子分析




Kosugi,E.Koji (Yamadai.R)   Categorical Factor Analysis by using R   2012/10/05   1/9
Why we use...

     因子分析をしたいけど,3 件法だったらダメっていわれた
     5 件法でデータを取ったけど,データが偏っていた
     因子分析をしたけど,項目がどんどん落ちちゃ って・・

  Kosugi,E.Koji (Yamadai.R)   Categorical Factor Analysis by using R   2012/10/05   2/9
FA vs categorical FA

具体的な計算手続きは, 次の通りです。
 1   データから相関行列を作成
 2   相関行列を固有値分解
 3   固有値から因子の数を決める。固有ベクトルから因子負荷量
を求めるためには データが間隔尺度水準以上 で得られている必要

  Kosugi,E.Koji (Yamadai.R)   Categorical Factor Analysis by using R   2012/10/05   3/9
One of the reasons

     3 件法は間隔尺度水準とはいえない(統計的には 7 件法以上)

  Kosugi,E.Koji (Yamadai.R)   Categorical Factor Analysis by using R   2012/10/05   4/9
対応していたら解決される。例えば狩野・三浦 (2002) によると、
 1   連続とみなす
 2   多分相関係数(polychoric correlation coefficient)   ,多分系列相
     関係数 (polyserial correlation coefficient) を使う
 3   多項分布に基づく方法をとる

 Kosugi,E.Koji (Yamadai.R)   Categorical Factor Analysis by using R   2012/10/05   5/9

   ポリコリック相関係数 Polychoric Correlation は「多分相関
   ポリシリアル相関係数 Polyserial Correlation は「多分系列相
   テトラコリック相関係数 Tetrachoric Correlation は四分相関

Kosugi,E.Koji (Yamadai.R)   Categorical Factor Analysis by using R   2012/10/05   6/9
images of latent continuity

                Figure : image of latent continuity and expression

変数 x の奥に潜在変数 ξ があり、それが正規分布していると仮定す
る。変数 x と ξ の関係は次のように書ける。
                               x = 1 ξ < a1
                               x = 2 a1 ≤ ξ < a2
                               x = 3 a2 ≤ ξ < a3                                    (1)
                               .     .
                               .     .
                               x = s as−1 ≤ ξ
  Kosugi,E.Koji (Yamadai.R)   Categorical Factor Analysis by using R   2012/10/05    7/9
   そうすると求めるのは,潜在レベルでの相関係数 ρ と変数
   X,Y のカテゴリに見られる閾値である。


Kosugi,E.Koji (Yamadai.R)   Categorical Factor Analysis by using R   2012/10/05   8/9
Follow me with R code...


  Kosugi,E.Koji (Yamadai.R)   Categorical Factor Analysis by using R   2012/10/05   9/9
>   library(psych)
>   library(polycor)
>   # sample statistics
>   sample <- read.csv("cEFAsample.csv",head=F,na.strings="*")
>   head(sample)
    V1 V2 V3 V4 V5 V6 V7 V8
1    1 1 1 1 4 1 1 1
2    3 4 4 1 4 4 1 1
3    3 4 4 3 4 3 3 4
4    2 4 5 2 2 4 1 4
5    2 2 2 3 4 2 2 3
6    3 3 5 3 3 2 2 3
> summary(sample)
       V1                     V2              V3               V4
 Min.   :1.000          Min.   :1.000   Min.   :1.000    Min.    :1.000
 1st Qu.:3.500          1st Qu.:4.000   1st Qu.:4.000    1st Qu.:3.000
 Median :4.000          Median :4.000   Median :4.000    Median :4.000
 Mean   :3.913          Mean   :4.127   Mean   :3.901    Mean    :3.853
 3rd Qu.:4.000          3rd Qu.:5.000   3rd Qu.:4.000    3rd Qu.:4.000
 Max.   :5.000          Max.   :5.000   Max.   :5.000    Max.    :5.000
                                        NA's   :2        NA's    :1
      V5                      V6              V7              V8
Min.   :1.000           Min.   :1.000   Min.   :1.00    Min.   :1.000
1st Qu.:4.000           1st Qu.:3.000   1st Qu.:2.00    1st Qu.:3.000
Median :4.000           Median :3.000   Median :3.00    Median :4.000
Mean   :3.955           Mean   :3.138   Mean   :2.78    Mean   :3.442
3rd Qu.:5.000           3rd Qu.:4.000   3rd Qu.:4.00    3rd Qu.:4.000
Max.   :5.000           Max.   :5.000   Max.   :5.00    Max.   :5.000
                                        NA's   :1
> table(sample$V1)
    1    2    3   4      5
    2   26   61 178     88
> describe(sample)
     var     n   mean     sd median trimmed mad min max range skew kurtosis   se
V1     1   355   3.91   0.87      4    4.00 0.00  1   5     4 -0.70    0.22 0.05
V2     2   355   4.13   0.78      4    4.22 0.00  1   5     4 -1.11    2.08 0.04
V3     3   353   3.90   0.78      4    3.95 0.00  1   5     4 -0.76    0.96 0.04
V4     4   354   3.85   0.90      4    3.94 0.00  1   5     4 -0.82    0.66 0.05
V5     5   355   3.95   0.87      4    4.04 1.48  1   5     4 -0.71    0.24 0.05
V6     6   355   3.14   0.95      3    3.16 1.48  1   5     4 -0.22   -0.12 0.05
V7     7   354   2.78   1.01      3    2.79 1.48  1   5     4 0.14    -0.70 0.05
V8     8   355   3.44   1.00      4    3.47 1.48  1   5     4 -0.42   -0.28 0.05

> # peason cor
> peason.cor <- cor(sample,use="complete.obs")
> print(peason.cor,digit=2)

       V1      V2     V3     V4     V5     V6      V7     V8
V1   1.00   0.380   0.43   0.40   0.26   0.19   0.285   0.26
V2   0.38   1.000   0.28   0.34   0.27   0.16   0.099   0.21
V3   0.43   0.277   1.00   0.26   0.21   0.15   0.150   0.16
V4   0.40   0.339   0.26   1.00   0.42   0.26   0.276   0.23
V5   0.26   0.265   0.21   0.42   1.00   0.23   0.255   0.22
V6   0.19   0.157   0.15   0.26   0.23   1.00   0.341   0.39
V7   0.29   0.099   0.15   0.28   0.26   0.34   1.000   0.41
V8   0.26   0.212   0.16   0.23   0.22   0.39   0.415   1.00

> # polychoric cor
> polychoric.cor <- polychoric(sample)

> print(polychoric.cor$rho)

            V1          V2           V3            V4            V5          V6          V7
V1   1.0000000   0.4693292    0.4993862     0.4702445     0.3260640   0.2015360   0.3172379
V2   0.4693292   1.0000000    0.3661174     0.4283065     0.3544777   0.1925806   0.1164603
V3   0.4993862   0.3661174    1.0000000     0.3131351     0.2971062   0.1704954   0.1565841
V4   0.4702445   0.4283065    0.3131351     1.0000000     0.5128292   0.2805638   0.3020316
V5   0.3260640   0.3544777    0.2971062     0.5128292     1.0000000   0.2612329   0.2785856
V6   0.2015360   0.1925806    0.1704954     0.2805638     0.2612329   1.0000000   0.3832876
V7   0.3172379   0.1164603    0.1565841     0.3020316     0.2785856   0.3832876   1.0000000
V8   0.2939444   0.2544516    0.1885443     0.2562720     0.2513339   0.4138156   0.4444297
V1   0.2939444
V2   0.2544516
V3   0.1885443
V4   0.2562720
V5   0.2513339
V6   0.4138156
V7   0.4444297
V8   1.0000000

>   #
>   # compare, peason vs polycor
>   #
>   # FA
>   fa.parallel(peason.cor,n.obs=355)

Parallel analysis suggests that the number of factors =                  3   and the number of components =

> fa.parallel(polychoric.cor$rho,n.obs=355)
Parallel analysis suggests that the number of factors =   3   and the number of components =
> fa.result.peason <- fa(peason.cor,n.obs=355,fm="gls",nfactors=3,rotate="promax")
> fa.result.polych <- fa(polychoric.cor$rho,n.obs=355,fm="gls",nfactors=3,rotate="promax")
> print(fa.result.peason,digit=3,sort=T)
Factor Analysis using method = gls
Call: fa(r = peason.cor, nfactors = 3, n.obs = 355, rotate = "promax",
    fm = "gls")
Standardized loadings (pattern matrix) based upon correlation matrix
   item   GLS2   GLS1   GLS3    h2    u2
V8    8 0.695 0.073 -0.076 0.468 0.532
V7    7 0.583 0.032 0.028 0.377 0.623
V6    6 0.529 -0.062 0.121 0.333 0.667
V1    1 0.056 0.886 -0.091 0.721 0.279
V3    3 -0.006 0.453 0.083 0.261 0.739
V4    4 -0.015 0.017 0.696 0.490 0.510
V5    5 0.023 -0.145 0.692 0.377 0.623
V2    2 -0.063 0.289 0.313 0.273 0.727

                         GLS2    GLS1    GLS3
SS loadings             1.140   1.094   1.065
Proportion Var          0.143   0.137   0.133
Cumulative Var          0.143   0.279   0.412
Proportion Explained    0.346   0.332   0.323
Cumulative Proportion   0.346   0.677   1.000

 With factor correlations of
      GLS2 GLS1 GLS3
GLS2 1.000 0.409 0.566
GLS1 0.409 1.000 0.688
GLS3 0.566 0.688 1.000

Test of the hypothesis that 3 factors are sufficient.

The degrees of freedom for the null model are 28 and the objective function was        1.469 wit
The degrees of freedom for the model are 7 and the objective function was 0.03

The root mean square of the residuals (RMSR) is 0.014
The df corrected root mean square of the residuals is 0.04
The number of observations was 355 with Chi Square = 10.559      with prob <   0.159

Tucker Lewis Index of factoring reliability = 0.9706
RMSEA index = 0.0388 and the 90 % confidence intervals are     NA 0.0814
BIC = -30.546

Fit based upon off diagonal values = 0.995
Measures of factor score adequacy
                                                 GLS2 GLS1 GLS3
Correlation of scores with factors              0.830 0.885 0.851
Multiple R square of scores with factors        0.689 0.783 0.723
Minimum correlation of possible factor scores   0.378 0.565 0.447

> print(fa.result.polych,digit=3,sort=T)

Factor Analysis using method = gls
Call: fa(r = polychoric.cor$rho, nfactors = 3, n.obs = 355, rotate = "promax",
    fm = "gls")
Standardized loadings (pattern matrix) based upon correlation matrix
   item   GLS3   GLS1   GLS2    h2    u2
V5    5 0.806 -0.179 0.029 0.497 0.503
V4    4 0.709 0.024 0.020 0.543 0.457
V2    2 0.383 0.312 -0.069 0.376 0.624
V1    1 -0.138 0.976 0.069 0.826 0.174
V3    3 0.145 0.470 -0.028 0.326 0.674
V7    7 -0.019 0.052 0.657 0.447 0.553
V8    8 -0.038 0.097 0.650 0.452 0.548
V6    6 0.143 -0.083 0.555 0.365 0.635

                         GLS3    GLS1    GLS2
SS loadings             1.319   1.289   1.226
Proportion Var          0.165   0.161   0.153
Cumulative Var          0.165   0.326   0.479
Proportion Explained    0.344   0.336   0.320
Cumulative Proportion   0.344   0.680   1.000

 With factor correlations of
      GLS3 GLS1 GLS2
GLS3 1.000 0.716 0.522
GLS1 0.716 1.000 0.392
GLS2 0.522 0.392 1.000

Test of the hypothesis that 3 factors are sufficient.

The degrees of freedom for the null model are 28 and the objective function was    1.986 wit
The degrees of freedom for the model are 7 and the objective function was 0.055

The root mean square of the residuals (RMSR) is 0.017
The df corrected root mean square of the residuals is 0.048
The number of observations was 355 with Chi Square = 19.207     with prob <   0.00756

Tucker Lewis Index of factoring reliability =   0.9265

RMSEA index = 0.0711 and the 90 % confidence intervals are 0.0335 0.1085
BIC = -21.897
Fit based upon off diagonal values = 0.995
Measures of factor score adequacy
                                               GLS3 GLS1 GLS2
Correlation of scores with factors            0.882 0.928 0.839
Multiple R square of scores with factors      0.779 0.862 0.704
Minimum correlation of possible factor scores 0.557 0.724 0.408

>   #
>   # sample <- subset(sample,select=c("V11","V13","V20","V5","V4","V17","V12","V15"))
>   # write.table(sample,"cEFAsample.csv",sep=",",,,na="*")
>   # mixed pattern
> <- data.frame(lapply(sample[1:3],factor),sample[4:8])
>   summary(

 V1       V2           V3              V4                  V5                V6
 1: 2     1: 3      1   : 2      Min.   :1.000       Min.   :1.000     Min.   :1.000
 2: 26    2: 13     2   : 18     1st Qu.:3.000       1st Qu.:4.000     1st Qu.:3.000
 3: 61    3: 31     3   : 60     Median :4.000       Median :4.000     Median :3.000
 4:178    4:197     4   :206     Mean   :3.853       Mean   :3.955     Mean   :3.138
 5: 88    5:111     5   : 67     3rd Qu.:4.000       3rd Qu.:5.000     3rd Qu.:4.000
                    NA's: 2      Max.   :5.000       Max.   :5.000     Max.   :5.000
                                 NA's   :1
       V7                V8
 Min.   :1.00      Min.   :1.000
 1st Qu.:2.00      1st Qu.:3.000
 Median :3.00      Median :4.000
 Mean   :2.78      Mean   :3.442
 3rd Qu.:4.00      3rd Qu.:4.000
 Max.   :5.00      Max.   :5.000
 NA's   :1

> hetcor.cor <- hetcor(
> hetcor.cor$correlations

            V1          V2          V3          V4          V5          V6          V7
V1   1.0000000   0.4766232   0.4902862   0.4305458   0.2853987   0.2076320   0.3015123
V2   0.4766232   1.0000000   0.3740222   0.3757560   0.3093574   0.1839428   0.1175596
V3   0.4902862   0.3740222   1.0000000   0.2752806   0.2491626   0.1686548   0.1583849
V4   0.4305458   0.3757560   0.2752806   1.0000000   0.4202661   0.2636989   0.2758351
V5   0.2853987   0.3093574   0.2491626   0.4202661   1.0000000   0.2279503   0.2550014
V6   0.2076320   0.1839428   0.1686548   0.2636989   0.2279503   1.0000000   0.3414939
V7   0.3015123   0.1175596   0.1583849   0.2758351   0.2550014   0.3414939   1.0000000
V8   0.2663878   0.2378540   0.1553257   0.2324400   0.2175612   0.3937855   0.4146257

V1   0.2663878
V2   0.2378540
V3   0.1553257
V4   0.2324400
V5   0.2175612
V6   0.3937855
V7   0.4146257
V8   1.0000000

> hetcor.cor$type

       [,1]           [,2]           [,3]           [,4]           [,5]
[1,]   ""             "Polychoric"   "Polychoric"   "Polyserial"   "Polyserial"
[2,]   "Polychoric"   ""             "Polychoric"   "Polyserial"   "Polyserial"
[3,]   "Polychoric"   "Polychoric"   ""             "Polyserial"   "Polyserial"
[4,]   "Polyserial"   "Polyserial"   "Polyserial"   ""             "Pearson"
[5,]   "Polyserial"   "Polyserial"   "Polyserial"   "Pearson"      ""
[6,]   "Polyserial"   "Polyserial"   "Polyserial"   "Pearson"      "Pearson"
[7,]   "Polyserial"   "Polyserial"   "Polyserial"   "Pearson"      "Pearson"
[8,]   "Polyserial"   "Polyserial"   "Polyserial"   "Pearson"      "Pearson"
       [,6]           [,7]           [,8]
[1,]   "Polyserial"   "Polyserial"   "Polyserial"
[2,]   "Polyserial"   "Polyserial"   "Polyserial"
[3,]   "Polyserial"   "Polyserial"   "Polyserial"
[4,]   "Pearson"      "Pearson"      "Pearson"
[5,]   "Pearson"      "Pearson"      "Pearson"
[6,]   ""             "Pearson"      "Pearson"
[7,]   "Pearson"      ""             "Pearson"
[8,]   "Pearson"      "Pearson"      ""

> fa.parallel(hetcor.cor$correlations,n.obs=355)

Parallel analysis suggests that the number of factors =            3   and the number of components =

> fa.result.hetcor <- fa(hetcor.cor$correlations,n.obs=355,fm="gls",nfactors=3,rotate="proma
> print(fa.result.hetcor,digit=3,sort=T)

Factor Analysis using method = gls
Call: fa(r = hetcor.cor$correlations, nfactors = 3, n.obs = 355, rotate = "promax",
    fm = "gls")
Standardized loadings (pattern matrix) based upon correlation matrix
   item   GLS1   GLS2   GLS3    h2    u2
V1    1 0.868 0.082 -0.101 0.695 0.305
V3    3 0.599 -0.029 0.017 0.359 0.641
V2    2 0.459 -0.058 0.235 0.384 0.616
V8    8 0.077 0.686 -0.075 0.460 0.540

V7   7 0.020 0.597       0.034   0.391   0.609
V6   6 -0.031 0.520      0.109   0.328   0.672
V5   5 -0.131 -0.004     0.735   0.420   0.580
V4   4 0.078 0.014       0.613   0.459   0.541

                         GLS1    GLS2    GLS3
SS loadings             1.364   1.144   0.988
Proportion Var          0.171   0.143   0.123
Cumulative Var          0.171   0.314   0.437
Proportion Explained    0.390   0.327   0.283
Cumulative Proportion   0.390   0.717   1.000

 With factor correlations of
      GLS1 GLS2 GLS3
GLS1 1.000 0.409 0.704
GLS2 0.409 1.000 0.552
GLS3 0.704 0.552 1.000

Test of the hypothesis that 3 factors are sufficient.

The degrees of freedom for the null model are 28 and the objective function was   1.705 wit
The degrees of freedom for the model are 7 and the objective function was 0.046

The root mean square of the residuals (RMSR) is 0.016
The df corrected root mean square of the residuals is 0.046
The number of observations was 355 with Chi Square = 16.046   with prob <   0.0247

Tucker Lewis Index of factoring reliability = 0.9361
RMSEA index = 0.0613 and the 90 % confidence intervals are 0.0203 0.0998
BIC = -25.059
Fit based upon off diagonal values = 0.994
Measures of factor score adequacy
                                               GLS1 GLS2 GLS3
Correlation of scores with factors            0.892 0.829 0.849
Multiple R square of scores with factors      0.795 0.688 0.721
Minimum correlation of possible factor scores 0.590 0.375 0.441



YamadaiR(Categorical Factor Analysis)

  • 1. R をつかったカテゴリカル因子分析 小杉考司 やまだいあ~る 2012/10/05 Kosugi,E.Koji (Yamadai.R) Categorical Factor Analysis by using R 2012/10/05 1/9
  • 2. Why we use... 因子分析をしたいけど,3 件法だったらダメっていわれた 5 件法でデータを取ったけど,データが偏っていた 因子分析をしたけど,項目がどんどん落ちちゃ って・・ ・ Kosugi,E.Koji (Yamadai.R) Categorical Factor Analysis by using R 2012/10/05 2/9
  • 3. FA vs categorical FA 因子分析とは,多変量解析のひとつで,たくさんの質問項目に共 通する要因を取り出してくる技術。 具体的な計算手続きは, 次の通りです。 1 データから相関行列を作成 2 相関行列を固有値分解 3 固有値から因子の数を決める。固有ベクトルから因子負荷量 を求める。 ここで,相関行列とは,「ピアソンの積率相関係数」であり,これ を求めるためには データが間隔尺度水準以上 で得られている必要 がある。 Kosugi,E.Koji (Yamadai.R) Categorical Factor Analysis by using R 2012/10/05 3/9
  • 4. One of the reasons 3 件法は間隔尺度水準とはいえない(統計的には 7 件法以上) データの偏り=上方・下方のいずれかのカテゴリが弁別でき てない 分析の元になる相関係数が小さい値=偏っているので分散が 小さい Kosugi,E.Koji (Yamadai.R) Categorical Factor Analysis by using R 2012/10/05 4/9
  • 5. 問題は,相関係数の出し方が「順序尺度水準」「名義尺度水準」に 対応していたら解決される。例えば狩野・三浦 (2002) によると、 順序尺度を分析するには 1 連続とみなす 2 多分相関係数(polychoric correlation coefficient) ,多分系列相 関係数 (polyserial correlation coefficient) を使う 3 多項分布に基づく方法をとる の三択になるとしている。 Kosugi,E.Koji (Yamadai.R) Categorical Factor Analysis by using R 2012/10/05 5/9
  • 6. 順序尺度水準の相関係数とは ポリコリック相関係数 Polychoric Correlation は「多分相関 係数」と訳される。順序尺度と順序尺度の相関係数である。 ポリシリアル相関係数 Polyserial Correlation は「多分系列相 関係数」あるいは「重双相関係数」と訳される。順序尺度と 連続尺度の相関係数である。 テトラコリック相関係数 Tetrachoric Correlation は四分相関 係数と訳される。四分は2×2、つまり二値データ同士の相 関係数である。これはポリコリック相関係数の特殊な場合で ある。 Kosugi,E.Koji (Yamadai.R) Categorical Factor Analysis by using R 2012/10/05 6/9
  • 7. images of latent continuity Figure : image of latent continuity and expression 変数 x の奥に潜在変数 ξ があり、それが正規分布していると仮定す る。変数 x と ξ の関係は次のように書ける。 x = 1 ξ < a1 x = 2 a1 ≤ ξ < a2 x = 3 a2 ≤ ξ < a3 (1) . . . . . . x = s as−1 ≤ ξ Kosugi,E.Koji (Yamadai.R) Categorical Factor Analysis by using R 2012/10/05 7/9
  • 8. 順序尺度の相関係数 目に見えない潜在変数レベルで二変数が相関しており,それ がカテゴリカルに表現されていると考える。 そうすると求めるのは,潜在レベルでの相関係数 ρ と変数 X,Y のカテゴリに見られる閾値である。 閾値はクロス集計表の周辺度数から近似することも出来る (2step-ML) ↓ 天井効果・床効果のような歪みを閾値で適切に調節するイ メージ。 なので,一般的にカテゴリカルな相関係数のほうが(無理矢 理等間隔性を仮定している)ピアソンの相関係数よりも大き くなる。 相関係数が大きくなるので,因子も引っ張りだしやすくなる。 Kosugi,E.Koji (Yamadai.R) Categorical Factor Analysis by using R 2012/10/05 8/9
  • 9. Follow me with R code... 以下コード Kosugi,E.Koji (Yamadai.R) Categorical Factor Analysis by using R 2012/10/05 9/9
  • 10. > library(psych) > library(polycor) > # sample statistics > sample <- read.csv("cEFAsample.csv",head=F,na.strings="*") > head(sample) V1 V2 V3 V4 V5 V6 V7 V8 1 1 1 1 1 4 1 1 1 2 3 4 4 1 4 4 1 1 3 3 4 4 3 4 3 3 4 4 2 4 5 2 2 4 1 4 5 2 2 2 3 4 2 2 3 6 3 3 5 3 3 2 2 3 > summary(sample) V1 V2 V3 V4 Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000 1st Qu.:3.500 1st Qu.:4.000 1st Qu.:4.000 1st Qu.:3.000 Median :4.000 Median :4.000 Median :4.000 Median :4.000 Mean :3.913 Mean :4.127 Mean :3.901 Mean :3.853 3rd Qu.:4.000 3rd Qu.:5.000 3rd Qu.:4.000 3rd Qu.:4.000 Max. :5.000 Max. :5.000 Max. :5.000 Max. :5.000 NA's :2 NA's :1 V5 V6 V7 V8 Min. :1.000 Min. :1.000 Min. :1.00 Min. :1.000 1st Qu.:4.000 1st Qu.:3.000 1st Qu.:2.00 1st Qu.:3.000 Median :4.000 Median :3.000 Median :3.00 Median :4.000 Mean :3.955 Mean :3.138 Mean :2.78 Mean :3.442 3rd Qu.:5.000 3rd Qu.:4.000 3rd Qu.:4.00 3rd Qu.:4.000 Max. :5.000 Max. :5.000 Max. :5.00 Max. :5.000 NA's :1 > table(sample$V1) 1 2 3 4 5 2 26 61 178 88 > describe(sample) var n mean sd median trimmed mad min max range skew kurtosis se V1 1 355 3.91 0.87 4 4.00 0.00 1 5 4 -0.70 0.22 0.05 V2 2 355 4.13 0.78 4 4.22 0.00 1 5 4 -1.11 2.08 0.04 V3 3 353 3.90 0.78 4 3.95 0.00 1 5 4 -0.76 0.96 0.04 V4 4 354 3.85 0.90 4 3.94 0.00 1 5 4 -0.82 0.66 0.05 V5 5 355 3.95 0.87 4 4.04 1.48 1 5 4 -0.71 0.24 0.05 V6 6 355 3.14 0.95 3 3.16 1.48 1 5 4 -0.22 -0.12 0.05 V7 7 354 2.78 1.01 3 2.79 1.48 1 5 4 0.14 -0.70 0.05 V8 8 355 3.44 1.00 4 3.47 1.48 1 5 4 -0.42 -0.28 0.05 1
  • 11. > # peason cor > peason.cor <- cor(sample,use="complete.obs") > print(peason.cor,digit=2) V1 V2 V3 V4 V5 V6 V7 V8 V1 1.00 0.380 0.43 0.40 0.26 0.19 0.285 0.26 V2 0.38 1.000 0.28 0.34 0.27 0.16 0.099 0.21 V3 0.43 0.277 1.00 0.26 0.21 0.15 0.150 0.16 V4 0.40 0.339 0.26 1.00 0.42 0.26 0.276 0.23 V5 0.26 0.265 0.21 0.42 1.00 0.23 0.255 0.22 V6 0.19 0.157 0.15 0.26 0.23 1.00 0.341 0.39 V7 0.29 0.099 0.15 0.28 0.26 0.34 1.000 0.41 V8 0.26 0.212 0.16 0.23 0.22 0.39 0.415 1.00 > # polychoric cor > polychoric.cor <- polychoric(sample) > print(polychoric.cor$rho) V1 V2 V3 V4 V5 V6 V7 V1 1.0000000 0.4693292 0.4993862 0.4702445 0.3260640 0.2015360 0.3172379 V2 0.4693292 1.0000000 0.3661174 0.4283065 0.3544777 0.1925806 0.1164603 V3 0.4993862 0.3661174 1.0000000 0.3131351 0.2971062 0.1704954 0.1565841 V4 0.4702445 0.4283065 0.3131351 1.0000000 0.5128292 0.2805638 0.3020316 V5 0.3260640 0.3544777 0.2971062 0.5128292 1.0000000 0.2612329 0.2785856 V6 0.2015360 0.1925806 0.1704954 0.2805638 0.2612329 1.0000000 0.3832876 V7 0.3172379 0.1164603 0.1565841 0.3020316 0.2785856 0.3832876 1.0000000 V8 0.2939444 0.2544516 0.1885443 0.2562720 0.2513339 0.4138156 0.4444297 V8 V1 0.2939444 V2 0.2544516 V3 0.1885443 V4 0.2562720 V5 0.2513339 V6 0.4138156 V7 0.4444297 V8 1.0000000 > # > # compare, peason vs polycor > # > > # FA > fa.parallel(peason.cor,n.obs=355) Parallel analysis suggests that the number of factors = 3 and the number of components = 2
  • 12. > fa.parallel(polychoric.cor$rho,n.obs=355) Parallel analysis suggests that the number of factors = 3 and the number of components = > fa.result.peason <- fa(peason.cor,n.obs=355,fm="gls",nfactors=3,rotate="promax") > fa.result.polych <- fa(polychoric.cor$rho,n.obs=355,fm="gls",nfactors=3,rotate="promax") > print(fa.result.peason,digit=3,sort=T) Factor Analysis using method = gls Call: fa(r = peason.cor, nfactors = 3, n.obs = 355, rotate = "promax", fm = "gls") Standardized loadings (pattern matrix) based upon correlation matrix item GLS2 GLS1 GLS3 h2 u2 V8 8 0.695 0.073 -0.076 0.468 0.532 V7 7 0.583 0.032 0.028 0.377 0.623 V6 6 0.529 -0.062 0.121 0.333 0.667 V1 1 0.056 0.886 -0.091 0.721 0.279 V3 3 -0.006 0.453 0.083 0.261 0.739 V4 4 -0.015 0.017 0.696 0.490 0.510 V5 5 0.023 -0.145 0.692 0.377 0.623 V2 2 -0.063 0.289 0.313 0.273 0.727 GLS2 GLS1 GLS3 SS loadings 1.140 1.094 1.065 Proportion Var 0.143 0.137 0.133 Cumulative Var 0.143 0.279 0.412 Proportion Explained 0.346 0.332 0.323 Cumulative Proportion 0.346 0.677 1.000 With factor correlations of GLS2 GLS1 GLS3 GLS2 1.000 0.409 0.566 GLS1 0.409 1.000 0.688 GLS3 0.566 0.688 1.000 Test of the hypothesis that 3 factors are sufficient. The degrees of freedom for the null model are 28 and the objective function was 1.469 wit The degrees of freedom for the model are 7 and the objective function was 0.03 The root mean square of the residuals (RMSR) is 0.014 The df corrected root mean square of the residuals is 0.04 The number of observations was 355 with Chi Square = 10.559 with prob < 0.159 Tucker Lewis Index of factoring reliability = 0.9706 RMSEA index = 0.0388 and the 90 % confidence intervals are NA 0.0814 BIC = -30.546 3
  • 13. Fit based upon off diagonal values = 0.995 Measures of factor score adequacy GLS2 GLS1 GLS3 Correlation of scores with factors 0.830 0.885 0.851 Multiple R square of scores with factors 0.689 0.783 0.723 Minimum correlation of possible factor scores 0.378 0.565 0.447 > print(fa.result.polych,digit=3,sort=T) Factor Analysis using method = gls Call: fa(r = polychoric.cor$rho, nfactors = 3, n.obs = 355, rotate = "promax", fm = "gls") Standardized loadings (pattern matrix) based upon correlation matrix item GLS3 GLS1 GLS2 h2 u2 V5 5 0.806 -0.179 0.029 0.497 0.503 V4 4 0.709 0.024 0.020 0.543 0.457 V2 2 0.383 0.312 -0.069 0.376 0.624 V1 1 -0.138 0.976 0.069 0.826 0.174 V3 3 0.145 0.470 -0.028 0.326 0.674 V7 7 -0.019 0.052 0.657 0.447 0.553 V8 8 -0.038 0.097 0.650 0.452 0.548 V6 6 0.143 -0.083 0.555 0.365 0.635 GLS3 GLS1 GLS2 SS loadings 1.319 1.289 1.226 Proportion Var 0.165 0.161 0.153 Cumulative Var 0.165 0.326 0.479 Proportion Explained 0.344 0.336 0.320 Cumulative Proportion 0.344 0.680 1.000 With factor correlations of GLS3 GLS1 GLS2 GLS3 1.000 0.716 0.522 GLS1 0.716 1.000 0.392 GLS2 0.522 0.392 1.000 Test of the hypothesis that 3 factors are sufficient. The degrees of freedom for the null model are 28 and the objective function was 1.986 wit The degrees of freedom for the model are 7 and the objective function was 0.055 The root mean square of the residuals (RMSR) is 0.017 The df corrected root mean square of the residuals is 0.048 The number of observations was 355 with Chi Square = 19.207 with prob < 0.00756 Tucker Lewis Index of factoring reliability = 0.9265 4
  • 14. RMSEA index = 0.0711 and the 90 % confidence intervals are 0.0335 0.1085 BIC = -21.897 Fit based upon off diagonal values = 0.995 Measures of factor score adequacy GLS3 GLS1 GLS2 Correlation of scores with factors 0.882 0.928 0.839 Multiple R square of scores with factors 0.779 0.862 0.704 Minimum correlation of possible factor scores 0.557 0.724 0.408 > # > # sample <- subset(sample,select=c("V11","V13","V20","V5","V4","V17","V12","V15")) > # write.table(sample,"cEFAsample.csv",sep=",",,,na="*") > > > # mixed pattern > <- data.frame(lapply(sample[1:3],factor),sample[4:8]) > summary( V1 V2 V3 V4 V5 V6 1: 2 1: 3 1 : 2 Min. :1.000 Min. :1.000 Min. :1.000 2: 26 2: 13 2 : 18 1st Qu.:3.000 1st Qu.:4.000 1st Qu.:3.000 3: 61 3: 31 3 : 60 Median :4.000 Median :4.000 Median :3.000 4:178 4:197 4 :206 Mean :3.853 Mean :3.955 Mean :3.138 5: 88 5:111 5 : 67 3rd Qu.:4.000 3rd Qu.:5.000 3rd Qu.:4.000 NA's: 2 Max. :5.000 Max. :5.000 Max. :5.000 NA's :1 V7 V8 Min. :1.00 Min. :1.000 1st Qu.:2.00 1st Qu.:3.000 Median :3.00 Median :4.000 Mean :2.78 Mean :3.442 3rd Qu.:4.00 3rd Qu.:4.000 Max. :5.00 Max. :5.000 NA's :1 > hetcor.cor <- hetcor( > hetcor.cor$correlations V1 V2 V3 V4 V5 V6 V7 V1 1.0000000 0.4766232 0.4902862 0.4305458 0.2853987 0.2076320 0.3015123 V2 0.4766232 1.0000000 0.3740222 0.3757560 0.3093574 0.1839428 0.1175596 V3 0.4902862 0.3740222 1.0000000 0.2752806 0.2491626 0.1686548 0.1583849 V4 0.4305458 0.3757560 0.2752806 1.0000000 0.4202661 0.2636989 0.2758351 V5 0.2853987 0.3093574 0.2491626 0.4202661 1.0000000 0.2279503 0.2550014 V6 0.2076320 0.1839428 0.1686548 0.2636989 0.2279503 1.0000000 0.3414939 V7 0.3015123 0.1175596 0.1583849 0.2758351 0.2550014 0.3414939 1.0000000 V8 0.2663878 0.2378540 0.1553257 0.2324400 0.2175612 0.3937855 0.4146257 5
  • 15. V8 V1 0.2663878 V2 0.2378540 V3 0.1553257 V4 0.2324400 V5 0.2175612 V6 0.3937855 V7 0.4146257 V8 1.0000000 > hetcor.cor$type [,1] [,2] [,3] [,4] [,5] [1,] "" "Polychoric" "Polychoric" "Polyserial" "Polyserial" [2,] "Polychoric" "" "Polychoric" "Polyserial" "Polyserial" [3,] "Polychoric" "Polychoric" "" "Polyserial" "Polyserial" [4,] "Polyserial" "Polyserial" "Polyserial" "" "Pearson" [5,] "Polyserial" "Polyserial" "Polyserial" "Pearson" "" [6,] "Polyserial" "Polyserial" "Polyserial" "Pearson" "Pearson" [7,] "Polyserial" "Polyserial" "Polyserial" "Pearson" "Pearson" [8,] "Polyserial" "Polyserial" "Polyserial" "Pearson" "Pearson" [,6] [,7] [,8] [1,] "Polyserial" "Polyserial" "Polyserial" [2,] "Polyserial" "Polyserial" "Polyserial" [3,] "Polyserial" "Polyserial" "Polyserial" [4,] "Pearson" "Pearson" "Pearson" [5,] "Pearson" "Pearson" "Pearson" [6,] "" "Pearson" "Pearson" [7,] "Pearson" "" "Pearson" [8,] "Pearson" "Pearson" "" > fa.parallel(hetcor.cor$correlations,n.obs=355) Parallel analysis suggests that the number of factors = 3 and the number of components = > fa.result.hetcor <- fa(hetcor.cor$correlations,n.obs=355,fm="gls",nfactors=3,rotate="proma > print(fa.result.hetcor,digit=3,sort=T) Factor Analysis using method = gls Call: fa(r = hetcor.cor$correlations, nfactors = 3, n.obs = 355, rotate = "promax", fm = "gls") Standardized loadings (pattern matrix) based upon correlation matrix item GLS1 GLS2 GLS3 h2 u2 V1 1 0.868 0.082 -0.101 0.695 0.305 V3 3 0.599 -0.029 0.017 0.359 0.641 V2 2 0.459 -0.058 0.235 0.384 0.616 V8 8 0.077 0.686 -0.075 0.460 0.540 6
  • 16. V7 7 0.020 0.597 0.034 0.391 0.609 V6 6 -0.031 0.520 0.109 0.328 0.672 V5 5 -0.131 -0.004 0.735 0.420 0.580 V4 4 0.078 0.014 0.613 0.459 0.541 GLS1 GLS2 GLS3 SS loadings 1.364 1.144 0.988 Proportion Var 0.171 0.143 0.123 Cumulative Var 0.171 0.314 0.437 Proportion Explained 0.390 0.327 0.283 Cumulative Proportion 0.390 0.717 1.000 With factor correlations of GLS1 GLS2 GLS3 GLS1 1.000 0.409 0.704 GLS2 0.409 1.000 0.552 GLS3 0.704 0.552 1.000 Test of the hypothesis that 3 factors are sufficient. The degrees of freedom for the null model are 28 and the objective function was 1.705 wit The degrees of freedom for the model are 7 and the objective function was 0.046 The root mean square of the residuals (RMSR) is 0.016 The df corrected root mean square of the residuals is 0.046 The number of observations was 355 with Chi Square = 16.046 with prob < 0.0247 Tucker Lewis Index of factoring reliability = 0.9361 RMSEA index = 0.0613 and the 90 % confidence intervals are 0.0203 0.0998 BIC = -25.059 Fit based upon off diagonal values = 0.994 Measures of factor score adequacy GLS1 GLS2 GLS3 Correlation of scores with factors 0.892 0.829 0.849 Multiple R square of scores with factors 0.795 0.688 0.721 Minimum correlation of possible factor scores 0.590 0.375 0.441 > 7