Linear Regression      with      2: Model selection     2012-12-10 @HSPH    Kazuki Yoshida, M.D.      MPH-CLE student     ...
Group Website is at:http://rpubs.com/kaz_yos/useR_at_HSPH
Previously in this groupn   Introduction               n   Graphicsn   Reading Data into R (1)    n   Groupwise, conti...
Menun   Linear regression: Model selection
Ingredients         Statistics                Programmingn   Selection methods      n   step()                          ...
OpenR Studio
Open the saved   script that we created last time.See also Linear Regression with R 1 slides
Create full & null modelslm.full <- lm(bwt ~ age + lwt + smoke + ht + ui + ftv.cat +              race.cat + preterm, data...
Compare two modelsanova(lm.full, lm.null)      Model 1   Model 2
Models             Partial F-test               Difference in residual SS    Residual sum of squaresResidual degree of fre...
Backward elimination                    Specify full modellm.step.bw <- step(lm.full, direction = "backward")     Final mo...
Initial  AIC       Removing ftv.catfor full   makes AIC smallest model             Removing age           makes AIC smalle...
Forward selection           Final model object                                     Specify null modellm.step.fw <- step(lm...
Initial AIC for                              Adding ui null                         makes AIC smallestmodel               ...
Stepwise selection/elimination           Final model object                                     Specify null modellm.step....
Initial   AIC                                    Adding ui   for                                makes AIC smallest   null ...
F-test using drop1()## age is the least significant by partial F testdrop1(lm.full, test = "F")## After elimination, ftv.ca...
Updating models## Remove age from full modellm.age.less <- update(lm.full, ~ . -age)                     all variables(.) ...
test full modelage least significantF-test comparing age-inmodel to age-out modelremove age, and testftv.cat least significa...
F-test using add1()## ui is the most significant variableadd1(lm.null, scope = ~ age + lwt + race.cat + smoke + preterm ++ ...
test null model ui most significantF-test comparing ui-out model to ui-in model        add ui, and testrace.cat most signifi...
All-subset regressionusing leaps package
library(leaps)regsubsets.out <-  regsubsets(bwt ~ age + lwt + smoke + ht + ui + ftv.cat +             race.cat + preterm, ...
library(leaps)          Result object regsubsets.out <-   regsubsets(bwt ~ age + lwt + smoke + ht + ui + ftv.cat +        ...
Forced variablesVariable combination    Best 1                       predictor                        model               ...
plot(regsubsets.out, scale = "adjr2", main = "Adjusted R^2")             the higher the better                            ...
library(car)subsets(regsubsets.out, statistic="adjr2", legend = FALSE,        min.size = 5, main = "Adjusted R^2")        ...
subsets(regsubsets.out, statistic="cp", legend = FALSE,        min.size = 5, main = "Mallow Cp")       First model for whi...
Linear regression with R 2
Upcoming SlideShare
Loading in...5
×

Linear regression with R 2

795

Published on

Published in: Education
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
795
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
29
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Linear regression with R 2

  1. 1. Linear Regression with 2: Model selection 2012-12-10 @HSPH Kazuki Yoshida, M.D. MPH-CLE student FREEDOM TO  KNOW
  2. 2. Group Website is at:http://rpubs.com/kaz_yos/useR_at_HSPH
  3. 3. Previously in this groupn Introduction n Graphicsn Reading Data into R (1) n Groupwise, continuousn Reading Data into R (2) n Linear regressionn Descriptive, continuousn Descriptive, categoricaln Deducer
  4. 4. Menun Linear regression: Model selection
  5. 5. Ingredients Statistics Programmingn Selection methods n step() n drop1() n add1() n leaps::regsubsets()
  6. 6. OpenR Studio
  7. 7. Open the saved script that we created last time.See also Linear Regression with R 1 slides
  8. 8. Create full & null modelslm.full <- lm(bwt ~ age + lwt + smoke + ht + ui + ftv.cat + race.cat + preterm, data = lbw)lm.null <- lm(bwt ~ 1, data = lbw) Intercept-only
  9. 9. Compare two modelsanova(lm.full, lm.null) Model 1 Model 2
  10. 10. Models Partial F-test Difference in residual SS Residual sum of squaresResidual degree of freedom Significant
  11. 11. Backward elimination Specify full modellm.step.bw <- step(lm.full, direction = "backward") Final model object
  12. 12. Initial AIC Removing ftv.catfor full makes AIC smallest model Removing age makes AIC smallest Doing nothing makes AIC smallest
  13. 13. Forward selection Final model object Specify null modellm.step.fw <- step(lm.null, scope = ~ age + lwt + smoke + ht + ui + ftv.cat + race.cat + preterm, direction = "forward") formula for possible variables
  14. 14. Initial AIC for Adding ui null makes AIC smallestmodel Adding race.cat makes AIC smallest Adding smoke makes AIC smallest Still goes on ...
  15. 15. Stepwise selection/elimination Final model object Specify null modellm.step.both <- step(lm.null, scope = ~ age + lwt + smoke + ht + ui + ftv.cat + race.cat + preterm, direction = "both") formula for possible variables
  16. 16. Initial AIC Adding ui for makes AIC smallest null model Adding race.cat Removing is makes AIC smallestalso considered Adding smoke Removing is makes AIC smallestalso considered Still goes on ...
  17. 17. F-test using drop1()## age is the least significant by partial F testdrop1(lm.full, test = "F")## After elimination, ftv.cat is the least significantdrop1(update(lm.full, ~ . -age), test = "F")## After elimination, preterm is least significat at p = 0.12.drop1(update(lm.full, ~ . -age -ftv.cat), test = "F")## After elimination, all variables are significant at p < 0.1drop1(update(lm.full, ~ . -age -ftv.cat -preterm), test = "F")## Show summary for final modelsummary(update(lm.full, ~ . -age -ftv.cat -preterm))
  18. 18. Updating models## Remove age from full modellm.age.less <- update(lm.full, ~ . -age) all variables(.) minus age## Adding ui to null modellm.ui.only <- update(lm.null, ~ . +ui) all variables (.) plus ui
  19. 19. test full modelage least significantF-test comparing age-inmodel to age-out modelremove age, and testftv.cat least significantremove age, ftv.cat
  20. 20. F-test using add1()## ui is the most significant variableadd1(lm.null, scope = ~ age + lwt + race.cat + smoke + preterm ++ ui + ftv.cat, test = "F")## After inclusion, race.cat is the most significantadd1(update(lm.null, ~ . +ui), scope = ~ age + lwt + race.cat +smoke + preterm + ht + ui + ftv.cat, test = "F")## After inclusion, smoke is the most significantadd1(update(lm.null, ~ . +ui +race.cat), scope = ~ age + lwt +race.cat + smoke + preterm + ht + ui + ftv.cat, test = "F")## After inclusion, ht is the most significantadd1(update(lm.null, ~ . +ui +race.cat +smoke), scope = ~ age + l+ race.cat + smoke + preterm + ht + ui + ftv.cat, test = "F")...
  21. 21. test null model ui most significantF-test comparing ui-out model to ui-in model add ui, and testrace.cat most significant add ui and race.cat
  22. 22. All-subset regressionusing leaps package
  23. 23. library(leaps)regsubsets.out <- regsubsets(bwt ~ age + lwt + smoke + ht + ui + ftv.cat + race.cat + preterm, data = lbw, nbest = 1, nvmax = NULL, force.in = NULL, force.out = NULL, method = "exhaustive")summary(regsubsets.out)
  24. 24. library(leaps) Result object regsubsets.out <- regsubsets(bwt ~ age + lwt + smoke + ht + ui + ftv.cat + race.cat + preterm, Full model data = lbw, How many best models? Maxmodel size nbest = 1, Forced variables nvmax = NULL, force.in = NULL, force.out = NULL, method = "exhaustive")summary(regsubsets.out)
  25. 25. Forced variablesVariable combination Best 1 predictor model Best 7 predictor model Best 10 predictor model
  26. 26. plot(regsubsets.out, scale = "adjr2", main = "Adjusted R^2") the higher the better ~ lwt + smoke + ht + ui + race.cat + preterm ~ smoke + ht + ui + race ~ ui
  27. 27. library(car)subsets(regsubsets.out, statistic="adjr2", legend = FALSE, min.size = 5, main = "Adjusted R^2") ~ lwt + smoke + ht + ui + race.cat + preterm
  28. 28. subsets(regsubsets.out, statistic="cp", legend = FALSE, min.size = 5, main = "Mallow Cp") First model for which Mallow Cp is less than number of regressors + 1 ~ lwt + smoke + ht + ui + race.cat + preterm
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×