Linear regression with R 1

1,145
-1

Published on

Published in: Education
1 Comment
4 Likes
Statistics
Notes
No Downloads
Views
Total Views
1,145
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
39
Comments
1
Likes
4
Embeds 0
No embeds

No notes for slide

Linear regression with R 1

  1. 1. Linear Regression with 1: Prepare data/specify model/read results 2012-12-07 @HSPH Kazuki Yoshida, M.D. MPH-CLE student FREEDOM TO  KNOW
  2. 2. Group Website is at:http://rpubs.com/kaz_yos/useR_at_HSPH
  3. 3. Previously in this groupn Introduction n Graphicsn Reading Data into R (1) n Groupwise, continuousn Reading Data into R (2) nn Descriptive, continuousn Descriptive, categoricaln Deducer
  4. 4. Menun Linear regression
  5. 5. Ingredients Statistics Programmingn Data preparation n within()n Model formula n factor(), relevel() n lm() n formula = Y ~ X1 + X2 n summary() n anova(), car::Anova()
  6. 6. OpenR Studio
  7. 7. Create a new script and save it.
  8. 8. http://www.umass.edu/statdata/statdata/data/
  9. 9. We will use lowbwt dataset used in BIO213 lowbwt.dathttp://www.umass.edu/statdata/statdata/data/lowbwt.txthttp://www.umass.edu/statdata/statdata/data/lowbwt.dat
  10. 10. Load dataset from weblbw <- read.table("http://www.umass.edu/statdata/statdata/data/lowbwt.dat", head = T, skip = 4) skip 4 rows header = TRUE to pick up variable names
  11. 11. “Fix” dataset lbw[c(10,39), "BWT"] <- c(2655, 3035) BWT column Replace data points10th,39th to make the dataset identical rows to BIO213 dataset
  12. 12. Lower case variable names names(lbw) <- tolower(names(lbw)) Put them back into Convert variable variable names names to lower case
  13. 13. See overview
  14. 14. library(gpairs)gpairs(lbw)
  15. 15. RecodingChanging and creating variables
  16. 16. Name of newly created dataset (here replacing original) Take dataset dataset <- within(dataset, { _variable manipulations_ }) Perform variable manipulation You can specify by variable name only. No need for dataset$var_name
  17. 17. lbw <- within(lbw, { ## Relabel race race.cat <- factor(race, levels = 1:3, labels = c("White","Black","Other")) ## Categorize ftv (frequency of visit) ftv.cat <- cut(ftv, breaks = c(-Inf, 0, 2, Inf), labels = c("None","Normal","Many")) ftv.cat <- relevel(ftv.cat, ref = "Normal") ## Dichotomize ptl preterm <- factor(ptl >= 1, levels = c(F,T), labels = c("0","1+"))})
  18. 18. Numeric to categorical: element by element 1st will be referencelbw <- within(lbw, { ## Relabel race race.cat <- factor(race, levels = 1:3, labels = c("White","Black","Other")) ## Categorize ftv (frequency of visit) ftv.cat <- cut(ftv, breaks = c(-Inf, 0, 2, Inf), labels = c("None","Normal","Many")) ftv.cat <- relevel(ftv.cat, ref = "Normal") ## Dichotomize ptl preterm <- factor(ptl >= 1, levels = c(F,T), labels = c("0","1+"))}) 1 to White 1st will be referenceCategorize race and label: 2 to Black 3 to Other
  19. 19. Explained more in depthfactor() to create categorical variable Create newvariable named Take race variable race.cat lbw <- within(lbw, { ## Relabel race race.cat <- factor(race, levels = 1:3, labels = c("White","Black","Other")) }) Order levels 1, 2, 3 Make 1 reference level Label levels 1, 2, 3 as White, Black, Other
  20. 20. Numeric to categorical: range to elementlbw <- within(lbw, { 1st will be reference ## Relabel race race.cat <- factor(race, levels = 1:3, labels = c("White","Black","Other")) ## Categorize ftv (frequency of visit) ftv.cat <- cut(ftv, breaks = c(-Inf, 0, 2, Inf), labels = c("None","Normal","Many")) ftv.cat <- relevel(ftv.cat, ref = "Normal") ## Dichotomize ptl preterm <- factor(ptl >= 1, levels = c(F,T), labels = c("0","1+"))}) How breaks work(-Inf 0] 1 2] 3 4 5 6 Inf ] None Normal Many
  21. 21. Reset reference levellbw <- within(lbw, { ## Relabel race race.cat <- factor(race, levels = 1:3, labels = c("White","Black","Other")) ## Categorize ftv (frequency of visit) ftv.cat <- cut(ftv, breaks = c(-Inf, 0, 2, Inf), labels = c("None","Normal","Many")) ftv.cat <- relevel(ftv.cat, ref = "Normal") ## Dichotomize ptl preterm <- factor(ptl >= 1, levels = c(F,T), labels = c("0","1+"))}) Change reference level of ftv.cat variable from None to Normal
  22. 22. Numeric to Boolean to Categorylbw <- within(lbw, { ## Relabel race race.cat <- factor(race, levels = 1:3, labels = c("White","Black","Other")) ## Categorize ftv (frequency of visit) ftv.cat <- cut(ftv, breaks = c(-Inf, 0, 2, Inf), labels = c("None","Normal","Many")) ftv.cat <- relevel(ftv.cat, ref = "Normal") ## Dichotomize ptl preterm <- factor(ptl >= 1, levels = c(FALSE,TRUE), labels = c("0","1+"))}) TRUE, FALSE ptl < 1 to FALSE, then to “0” vector created ptl >= 1 to TRUE, then to “1+” here levels labels
  23. 23. Binary 0,1 to No,Yeslbw <- within(lbw, { ## Categorize smoke ht ui smoke <- factor(smoke, levels = 0:1, labels = c("No","Yes")) One-by-one ht <- factor(ht, levels = 0:1, labels = c("No","Yes")) ui <- factor(ui, levels = 0:1, labels = c("No","Yes")) method})## Alternative to abovelbw[,c("smoke","ht","ui")] <- lapply(lbw[,c("smoke","ht","ui")], function(var) { Loop method var <- factor(var, levels = 0:1, labels = c("No","Yes")) })
  24. 24. model formula
  25. 25. formula outcome ~ predictor1 + predictor2 + predictor3 SAS equivalent:model outcome = predictor1 predictor2 predictor3;
  26. 26. In the case of t-test continuous variable grouping variable to to be compared separate groups age ~ zyg Variable to be Variable used explained to explain
  27. 27. linear sumY ~ X1 + X2
  28. 28. n . All variables except for the outcomen + X2 Add X2 termn - 1 Remove interceptn X1:X2 Interaction term between X1 and X2n X1*X2 Main effects and interaction term
  29. 29. Interaction termY ~ X1 + X2 + X1:X2 Main effects Interaction
  30. 30. Interaction termY ~ X1 * X2 Main effects & interaction
  31. 31. On-the-fly variable manipulation Inhibit formula interpretation. For math manipulation Y ~ X1 + I(X2 * X3) New variable (X2 times X3) created on-the-fly and used
  32. 32. Fit a modellm.full <- lm(bwt ~ age + lwt + smoke + ht + ui + ftv.cat + race.cat + preterm , data = lbw)
  33. 33. See model object lm.full
  34. 34. Call: command repeated Coefficient for each variable
  35. 35. See summarysummary(lm.full)
  36. 36. Call: command repeated Residual distribution Coef/SE = t Dummy variables createdModel R^2 and adjusted R^2F-test
  37. 37. ftv.catNone No 1st trimester visit people compared to Normal 1st trimester visit people (reference level)ftv.catMany Many 1st trimester visit people compared to Normal 1st trimester visit people (reference level)
  38. 38. race.catBlack Black people compared to White people (reference level)race.catOther Other people compared to White people (reference level)
  39. 39. Confidence intervalsconfint(fit.lm)
  40. 40. Confidence intervals Lower Upper boundary boundary
  41. 41. ANOVA table (type I)anova(lm.full)
  42. 42. ANOVA table (type I) degree of Sequential Mean SS freedom SS = SS/DF F = Mean SS / Mean SS of residual
  43. 43. Type I = Sequential SS 1 age 1st gets all in type I er lap ov I ut pe ll b n ty sa 1i las et n g e 2 lwt on emtr nd twe 2 e ly b in aini typ ng3 smoke eI
  44. 44. ANOVA table (type III) library(car)Anova(lm.full, type = 3)
  45. 45. ANOVA table (type III) Marginal degree of SS freedom Multi-categoryvariablestested as one F = Mean SS / Mean SS of residual
  46. 46. Type III = Marginal SS 1 age gin ar I ets m e II 1s t g typ in o nly e I in typ rg II i n ma las ly ets on tg 2 lwt ets dg ly in ma 2n typ rg on3 smoke e I in II
  47. 47. ComparisonType I Type III
  48. 48. Effect plotlibrary(effects)plot(allEffects(lm.full), ylim = c(2000,4000)) Fix Y-axis values for all plots
  49. 49. Effect of a variablewith other covariate set at average
  50. 50. Interaction
  51. 51. This model is fordemonstration purpose. Continuous * Continuous lm.full.int <- lm(bwt ~ age*lwt + smoke + ht + ui + age*ftv.cat + race.cat*preterm, data = lbw) Continuous * Categorical Categorical * Categorical
  52. 52. Anova(lm.full.int, type = 3)
  53. 53. Marginal degree of SS freedomInteraction terms F = Mean SS / Mean SS of residual
  54. 54. plot(effect("age:lwt", lm.full.int)) lwt levelContinuous * Continuous
  55. 55. plot(effect("age:ftv.cat", lm.full.int), multiline = TRUE) Continuous * Categorical
  56. 56. plot(effect(c("race.cat*preterm"), lm.full.int),x.var = "preterm", z.var = "race.cat", multiline = TRUE) Categorical * Categorical
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×