Upcoming SlideShare
×

# Linear regression with R 1

1,145
-1

Published on

Published in: Education
1 Comment
4 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Accompanying web site
http://rpubs.com/kaz_yos/useR_at_HSPH

Are you sure you want to  Yes  No
Views
Total Views
1,145
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
39
1
Likes
4
Embeds 0
No embeds

No notes for slide

### Linear regression with R 1

1. 1. Linear Regression with 1: Prepare data/specify model/read results 2012-12-07 @HSPH Kazuki Yoshida, M.D. MPH-CLE student FREEDOM TO  KNOW
2. 2. Group Website is at:http://rpubs.com/kaz_yos/useR_at_HSPH
3. 3. Previously in this groupn Introduction n Graphicsn Reading Data into R (1) n Groupwise, continuousn Reading Data into R (2) nn Descriptive, continuousn Descriptive, categoricaln Deducer
5. 5. Ingredients Statistics Programmingn Data preparation n within()n Model formula n factor(), relevel() n lm() n formula = Y ~ X1 + X2 n summary() n anova(), car::Anova()
6. 6. OpenR Studio
7. 7. Create a new script and save it.
8. 8. http://www.umass.edu/statdata/statdata/data/
9. 9. We will use lowbwt dataset used in BIO213 lowbwt.dathttp://www.umass.edu/statdata/statdata/data/lowbwt.txthttp://www.umass.edu/statdata/statdata/data/lowbwt.dat
10. 10. Load dataset from weblbw <- read.table("http://www.umass.edu/statdata/statdata/data/lowbwt.dat", head = T, skip = 4) skip 4 rows header = TRUE to pick up variable names
11. 11. “Fix” dataset lbw[c(10,39), "BWT"] <- c(2655, 3035) BWT column Replace data points10th,39th to make the dataset identical rows to BIO213 dataset
12. 12. Lower case variable names names(lbw) <- tolower(names(lbw)) Put them back into Convert variable variable names names to lower case
13. 13. See overview
14. 14. library(gpairs)gpairs(lbw)
15. 15. RecodingChanging and creating variables
16. 16. Name of newly created dataset (here replacing original) Take dataset dataset <- within(dataset, { _variable manipulations_ }) Perform variable manipulation You can specify by variable name only. No need for dataset\$var_name
17. 17. lbw <- within(lbw, { ## Relabel race race.cat <- factor(race, levels = 1:3, labels = c("White","Black","Other")) ## Categorize ftv (frequency of visit) ftv.cat <- cut(ftv, breaks = c(-Inf, 0, 2, Inf), labels = c("None","Normal","Many")) ftv.cat <- relevel(ftv.cat, ref = "Normal") ## Dichotomize ptl preterm <- factor(ptl >= 1, levels = c(F,T), labels = c("0","1+"))})
18. 18. Numeric to categorical: element by element 1st will be referencelbw <- within(lbw, { ## Relabel race race.cat <- factor(race, levels = 1:3, labels = c("White","Black","Other")) ## Categorize ftv (frequency of visit) ftv.cat <- cut(ftv, breaks = c(-Inf, 0, 2, Inf), labels = c("None","Normal","Many")) ftv.cat <- relevel(ftv.cat, ref = "Normal") ## Dichotomize ptl preterm <- factor(ptl >= 1, levels = c(F,T), labels = c("0","1+"))}) 1 to White 1st will be referenceCategorize race and label: 2 to Black 3 to Other
19. 19. Explained more in depthfactor() to create categorical variable Create newvariable named Take race variable race.cat lbw <- within(lbw, { ## Relabel race race.cat <- factor(race, levels = 1:3, labels = c("White","Black","Other")) }) Order levels 1, 2, 3 Make 1 reference level Label levels 1, 2, 3 as White, Black, Other
20. 20. Numeric to categorical: range to elementlbw <- within(lbw, { 1st will be reference ## Relabel race race.cat <- factor(race, levels = 1:3, labels = c("White","Black","Other")) ## Categorize ftv (frequency of visit) ftv.cat <- cut(ftv, breaks = c(-Inf, 0, 2, Inf), labels = c("None","Normal","Many")) ftv.cat <- relevel(ftv.cat, ref = "Normal") ## Dichotomize ptl preterm <- factor(ptl >= 1, levels = c(F,T), labels = c("0","1+"))}) How breaks work(-Inf 0] 1 2] 3 4 5 6 Inf ] None Normal Many
21. 21. Reset reference levellbw <- within(lbw, { ## Relabel race race.cat <- factor(race, levels = 1:3, labels = c("White","Black","Other")) ## Categorize ftv (frequency of visit) ftv.cat <- cut(ftv, breaks = c(-Inf, 0, 2, Inf), labels = c("None","Normal","Many")) ftv.cat <- relevel(ftv.cat, ref = "Normal") ## Dichotomize ptl preterm <- factor(ptl >= 1, levels = c(F,T), labels = c("0","1+"))}) Change reference level of ftv.cat variable from None to Normal
22. 22. Numeric to Boolean to Categorylbw <- within(lbw, { ## Relabel race race.cat <- factor(race, levels = 1:3, labels = c("White","Black","Other")) ## Categorize ftv (frequency of visit) ftv.cat <- cut(ftv, breaks = c(-Inf, 0, 2, Inf), labels = c("None","Normal","Many")) ftv.cat <- relevel(ftv.cat, ref = "Normal") ## Dichotomize ptl preterm <- factor(ptl >= 1, levels = c(FALSE,TRUE), labels = c("0","1+"))}) TRUE, FALSE ptl < 1 to FALSE, then to “0” vector created ptl >= 1 to TRUE, then to “1+” here levels labels
23. 23. Binary 0,1 to No,Yeslbw <- within(lbw, { ## Categorize smoke ht ui smoke <- factor(smoke, levels = 0:1, labels = c("No","Yes")) One-by-one ht <- factor(ht, levels = 0:1, labels = c("No","Yes")) ui <- factor(ui, levels = 0:1, labels = c("No","Yes")) method})## Alternative to abovelbw[,c("smoke","ht","ui")] <- lapply(lbw[,c("smoke","ht","ui")], function(var) { Loop method var <- factor(var, levels = 0:1, labels = c("No","Yes")) })
24. 24. model formula
25. 25. formula outcome ~ predictor1 + predictor2 + predictor3 SAS equivalent:model outcome = predictor1 predictor2 predictor3;
26. 26. In the case of t-test continuous variable grouping variable to to be compared separate groups age ~ zyg Variable to be Variable used explained to explain
27. 27. linear sumY ~ X1 + X2
28. 28. n . All variables except for the outcomen + X2 Add X2 termn - 1 Remove interceptn X1:X2 Interaction term between X1 and X2n X1*X2 Main effects and interaction term
29. 29. Interaction termY ~ X1 + X2 + X1:X2 Main effects Interaction
30. 30. Interaction termY ~ X1 * X2 Main effects & interaction
31. 31. On-the-ﬂy variable manipulation Inhibit formula interpretation. For math manipulation Y ~ X1 + I(X2 * X3) New variable (X2 times X3) created on-the-ﬂy and used
32. 32. Fit a modellm.full <- lm(bwt ~ age + lwt + smoke + ht + ui + ftv.cat + race.cat + preterm , data = lbw)
33. 33. See model object lm.full
34. 34. Call: command repeated Coefﬁcient for each variable
35. 35. See summarysummary(lm.full)
36. 36. Call: command repeated Residual distribution Coef/SE = t Dummy variables createdModel R^2 and adjusted R^2F-test
37. 37. ftv.catNone No 1st trimester visit people compared to Normal 1st trimester visit people (reference level)ftv.catMany Many 1st trimester visit people compared to Normal 1st trimester visit people (reference level)
38. 38. race.catBlack Black people compared to White people (reference level)race.catOther Other people compared to White people (reference level)
39. 39. Conﬁdence intervalsconﬁnt(ﬁt.lm)
40. 40. Conﬁdence intervals Lower Upper boundary boundary
41. 41. ANOVA table (type I)anova(lm.full)
42. 42. ANOVA table (type I) degree of Sequential Mean SS freedom SS = SS/DF F = Mean SS / Mean SS of residual
43. 43. Type I = Sequential SS 1 age 1st gets all in type I er lap ov I ut pe ll b n ty sa 1i las et n g e 2 lwt on emtr nd twe 2 e ly b in aini typ ng3 smoke eI
44. 44. ANOVA table (type III) library(car)Anova(lm.full, type = 3)
45. 45. ANOVA table (type III) Marginal degree of SS freedom Multi-categoryvariablestested as one F = Mean SS / Mean SS of residual
46. 46. Type III = Marginal SS 1 age gin ar I ets m e II 1s t g typ in o nly e I in typ rg II i n ma las ly ets on tg 2 lwt ets dg ly in ma 2n typ rg on3 smoke e I in II
47. 47. ComparisonType I Type III
48. 48. Effect plotlibrary(effects)plot(allEffects(lm.full), ylim = c(2000,4000)) Fix Y-axis values for all plots
49. 49. Effect of a variablewith other covariate set at average
50. 50. Interaction
51. 51. This model is fordemonstration purpose. Continuous * Continuous lm.full.int <- lm(bwt ~ age*lwt + smoke + ht + ui + age*ftv.cat + race.cat*preterm, data = lbw) Continuous * Categorical Categorical * Categorical
52. 52. Anova(lm.full.int, type = 3)
53. 53. Marginal degree of SS freedomInteraction terms F = Mean SS / Mean SS of residual
54. 54. plot(effect("age:lwt", lm.full.int)) lwt levelContinuous * Continuous
55. 55. plot(effect("age:ftv.cat", lm.full.int), multiline = TRUE) Continuous * Categorical
56. 56. plot(effect(c("race.cat*preterm"), lm.full.int),x.var = "preterm", z.var = "race.cat", multiline = TRUE) Categorical * Categorical
1. #### A particular slide catching your eye?

Clipping is a handy way to collect important slides you want to go back to later.