第19回疫学セミナー 統計解析ソフトRの活用「Rで線形モデル」

5,964 views

Published on

日本疫学会 第19回疫学セミナー 「統計解析ソフトRの活用」内で講演した「Rで線形モデル」のスライドです

1 Comment
5 Likes
Statistics
Notes
  • 会場で出たご質問にこの場を借りて回答いたします。
    p<2e-16 のような表記になるときの、正確なp値を知りたい、というご質問でした。ゲノム解析など、非常に多くの多重比較を考慮する必要がある場合かと思います。lm()で作成されるオブジェクトの場合、summary.lm関数が返すオブジェクトの中にcoefficients属性があり、それを直接取り出すことで正確なp値を得ることができます。たとえば、summary(m1)$coefficients とすると出ます。なお、t検定の場合には、t.test関数の結果オブジェクトのp.value属性を$で取り出せば正確なp値が得られます。
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total views
5,964
On SlideShare
0
From Embeds
0
Number of Embeds
746
Actions
Shares
0
Downloads
61
Comments
1
Likes
5
Embeds 0
No embeds

No notes for slide
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • 第19回疫学セミナー 統計解析ソフトRの活用「Rで線形モデル」

    1. 1. R , 19 ( 22 )2012/1/26
    2. 2. R : lm() ,R : , , , , : glm()
    3. 3. RRR lm lm
    4. 4. : Edgar Anderson’s Iris DataRhelp(package=”datasets”) iris 150( 50 : Species (Sepal.Length , Sepal.Width )(Petal.Length , Petal.Width )
    5. 5. > m1 <- lm(Sepal.Length ~ Petal.Length, data=iris)> m1 Call: lm(formula = Sepal.Length ~ Petal.Length, data = iris) Coefficients: (Intercept) Petal.Length 4.3066 0.4089• iris Sepal.Length Petal.Length m1• m1
    6. 6. > summary(m1) Call: lm(formula = Sepal.Length ~ Petal.Length, data = iris) Residuals: Min 1Q Median 3Q Max -1.24675 -0.29657 -0.01515 0.27676 1.00269 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 4.30660 0.07839 54.94 <2e-16 *** Petal.Length 0.40892 0.01889 21.65 <2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 ( )• summary summary coef fitted , residuals
    7. 7. ( )Residual standard error: 0.4071 on 148 degrees of freedomMultiple R-squared: 0.76, Adjusted R-squared: 0.7583F-statistic: 468.6 on 1 and 148 DF, p-value: < 2.2e-16 • summary , , =0 t ( ), , R2, R2, F • coef fitted residuals
    8. 8. > m2 <- lm(Sepal.Length ~ Petal.Length + Petal.Width, data=iris)> m2Call:lm(formula = Sepal.Length ~ Petal.Length + Petal.Width, data = iris)Coefficients: (Intercept) Petal.Length Petal.Width 4.1906 0.5418 -0.3196• Petal.Width• iris Sepal.Length Petal.Length Petal.Width m2• m2
    9. 9. > summary(m2)( )Coefficients: Estimate Std. Error t value Pr(>|t|)(Intercept) 4.19058 0.09705 43.181 < 2e-16 ***Petal.Length 0.54178 0.06928 7.820 9.41e-13 ***Petal.Width -0.31955 0.16045 -1.992 0.0483 *( )Residual standard error: 0.4031 on 147 degrees of freedomMultiple R-squared: 0.7663, Adjusted R-squared: 0.7631F-statistic: 241 on 2 and 147 DF, p-value: < 2.2e-16• summary
    10. 10. > m2 <- lm(Sepal.Length ~ Petal.Length + Petal.Width, data=iris) ”~” ”+” data
    11. 11. > m3 <- lm(Sepal.Length ~ Petal.Length + Petal.Width +Petal.Length:Petal.Width , data=iris) ”:” 2 “*”> m3d <- lm(Sepal.Length ~ Petal.Length * Petal.Width ,data=iris) m3 m3d
    12. 12. > m4 <- lm(Sepal.Length ~ Petal.Length^2 , data=iris) > m4 <- lm(Sepal.Length ~ I(Petal.Length^2) , data=iris)“^” “+” “-” ”*” “:” I( i)
    13. 13. > m5 <- lm(Sepal.Length ~ Petal.Length + Species, data=iris)> summary(m5)( )Coefficients: Estimate Std. Error t value Pr(>|t|)(Intercept) 3.68353 0.10610 34.719 < 2e-16 ***Petal.Length 0.90456 0.06479 13.962 < 2e-16 ***Speciesversicolor -1.60097 0.19347 -8.275 7.37e-14 ***Speciesvirginica -2.11767 0.27346 -7.744 1.48e-12 ***( ) Species factor (“setosa”, “versicolor”, “virginica” ) factor -1 (Speciesversicolor, Speciesvirginica)
    14. 14. > m2Call:lm(formula = Sepal.Length ~ Petal.Length + Petal.Width, data = iris)Coefficients: (Intercept) Petal.Length Petal.Width 4.1906 0.5418 -0.3196> coef(m2) (Intercept) Petal.Length Petal.Width 4.1905824 0.5417772 -0.3195506> sd(iris$Petal.Length) / sd(iris$Sepal.Length)[1] 2.131832> sd(iris$Petal.Width) / sd(iris$Sepal.Length)[1] 0.9205034> 0.5417772 * 2.131832[1] 1.154978> -0.3195506 * 0.9205034[1] -0.2941474> coef(m2)[-1] * apply(m2$model[-1],2,sd) / sd(m2$model[,1])Petal.Length Petal.Width 1.1549781 -0.2941474
    15. 15. > cor(iris[,1:4]) Sepal.Length Sepal.Width Petal.Length Petal.WidthSepal.Length 1.0000000 -0.1175698 0.8717538 0.8179411Sepal.Width -0.1175698 1.0000000 -0.4284401 -0.3661259Petal.Length 0.8717538 -0.4284401 1.0000000 0.9628654Petal.Width 0.8179411 -0.3661259 0.9628654 1.0000000 (factor) 5 (Species ) iris[,1:4] cor
    16. 16. > m6 <- lm(Sepal.Length ~ Sepal.Width + Petal.Length + Petal.Width+ Species, data = iris)> step(m6)Start: AIC=-348.57Sepal.Length ~ Sepal.Width + Petal.Length + Petal.Width + Species Df Sum of Sq RSS AIC<none> 13.556 -348.57- Petal.Width 1 0.4090 13.966 -346.11- Species 2 0.8889 14.445 -343.04- Sepal.Width 1 3.1250 16.681 -319.45- Petal.Length 1 13.7853 27.342 -245.33 step AIC( ) AIC
    17. 17. (Residual-Fitted plot) > m1 <- lm(Sepal.Length ~ Petal.Length, data=iris) > m6 <- lm(Sepal.Length ~ Sepal.Width + Petal.Length + Petal.Width + Species, data = iris) > plot(m1, which=1) > plot(m6, which=1) Residuals vs Fitted Residuals vs Fitted 1.0 15 132 15 136 0.5 0.5Residuals Residuals 0.0 0.0 -0.5 -0.5 -1.0 107 85 -1.5 5.0 5.5 6.0 6.5 7.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 Fitted values Fitted values lm(Sepal.Length ~ Petal.Length) lm(Sepal.Length ~ Sepal.Width + Petal.Length + Petal.Width + Species)
    18. 18. ( q-q plot)> m1 <- lm(Sepal.Length ~ Petal.Length, data=iris)> m6 <- lm(Sepal.Length ~ Sepal.Width + Petal.Length + Petal.Width+ Species, data = iris)> plot(m1, which=2)> plot(m6, which=2) Normal Q-Q Normal Q-Q 3 3 132 15 15 136 2 2 Standardized residuals Standardized residuals 1 1 0 0 -1 -1 -2 -2 -3 107 85 -2 -1 0 1 2 -2 -1 0 1 2 Theoretical Quantiles Theoretical Quantiles lm(Sepal.Length ~ Petal.Length) lm(Sepal.Length ~ Sepal.Width + Petal.Length + Petal.Width + Species)
    19. 19. glmglm lm
    20. 20. : data from a case-control study ofesophagaeal cancer in Ile-et-Vilaine, France R esoph agegp(Age group, 6 ), alcgp(Alcohol consumption, 4 ), tobgp(Tobacco consumption, 4 ), ncases (number of cases), ncontrols(number of controls) 5 agegp, alcgp, tobgp case control
    21. 21. > m7 <- glm(cbind(ncases, ncontrols) ~ agegp+tobgp+alcgp,data=esoph, family=binomial,contrasts=list(agegp="contr.treatment",tobgp="contr.treatment", alcgp="contr.treatment"))• esoph agegp tobgp alcgp m7• 1 1 0 or 1 iris ” ”• “family=binomial”• contrasts ( ) factor
    22. 22. > summary(m7)( )Deviance Residuals: Min 1Q Median 3Q Max-1.6891 -0.5618 -0.2168 0.2314 2.0642Coefficients: Estimate Std. Error z value Pr(>|z|)(Intercept) -5.9108 1.0302 -5.737 9.61e-09 ***agegp35-44 1.6095 1.0676 1.508 0.131652agegp45-54 2.9752 1.0242 2.905 0.003675 **agegp55-64 3.3584 1.0198 3.293 0.000991 ***agegp65-74 3.7270 1.0253 3.635 0.000278 ***agegp75+ 3.6818 1.0645 3.459 0.000543 ***tobgp10-19 0.3407 0.2054 1.659 0.097159 .tobgp20-29 0.3962 0.2456 1.613 0.106708tobgp30+ 0.8677 0.2765 3.138 0.001701 **alcgp40-79 1.1216 0.2384 4.704 2.55e-06 ***alcgp80-119 1.4471 0.2628 5.506 3.68e-08 ***alcgp120+ 2.1154 0.2876 7.356 1.90e-13 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1(Dispersion parameter for binomial family taken to be 1)Null deviance: 227.241 on 87 degrees of freedomResidual deviance: 53.973 on 76 degrees of freedomAIC: 225.45Number of Fisher Scoring iterations: 6
    23. 23. summarylm Residual Deviance( ) Wald z =0 AIC
    24. 24. > exp(cbind(coef(m7),confint(m7)))Waiting for profiling to be done... 2.5 % 97.5 %(Intercept) 0.002710046 0.0001500676 0.01309911agegp35-44 5.000426461 0.9048631872 93.44857822agegp45-54 19.592860766 4.1082301889 351.60508496agegp55-64 28.741838956 6.1156513661 513.64343522agegp65-74 41.554820823 8.6954216578 746.54178414agegp75+ 39.716132031 7.3282690051 740.73696224tobgp10-19 1.405982889 0.9376683713 2.10030180tobgp20-29 1.486221090 0.9107730044 2.39108827tobgp30+ 2.381435327 1.3753149494 4.07656452alcgp40-79 3.069638047 1.9438541677 4.96418587alcgp80-119 4.250811157 2.5569142185 7.18622034alcgp120+ 8.292938857 4.7505293148 14.70778864 95% confint 95% exp 95%
    25. 25. Web
    26. 26. R 3500 R Epi, epicalc, epitools,epiR, epibasix 2x2 APC , Mantel-Haenszel
    27. 27. Web ,RjpWiki : Q and AR-bloggers : Rinside-R : R Revolution RTwitter: #rstats, #rstatsj
    28. 28. “R ”slideshare.net Twitter R Ustream R
    29. 29. R

    ×