Your SlideShare is downloading. ×
Linear regression with R 1
Linear regression with R 1
Linear regression with R 1
Linear regression with R 1
Linear regression with R 1
Linear regression with R 1
Linear regression with R 1
Linear regression with R 1
Linear regression with R 1
Linear regression with R 1
Linear regression with R 1
Linear regression with R 1
Linear regression with R 1
Linear regression with R 1
Linear regression with R 1
Linear regression with R 1
Linear regression with R 1
Linear regression with R 1
Linear regression with R 1
Linear regression with R 1
Linear regression with R 1
Linear regression with R 1
Linear regression with R 1
Linear regression with R 1
Linear regression with R 1
Linear regression with R 1
Linear regression with R 1
Linear regression with R 1
Linear regression with R 1
Linear regression with R 1
Linear regression with R 1
Linear regression with R 1
Linear regression with R 1
Linear regression with R 1
Linear regression with R 1
Linear regression with R 1
Linear regression with R 1
Linear regression with R 1
Linear regression with R 1
Linear regression with R 1
Linear regression with R 1
Linear regression with R 1
Linear regression with R 1
Linear regression with R 1
Linear regression with R 1
Linear regression with R 1
Linear regression with R 1
Linear regression with R 1
Linear regression with R 1
Linear regression with R 1
Linear regression with R 1
Linear regression with R 1
Linear regression with R 1
Linear regression with R 1
Linear regression with R 1
Linear regression with R 1
Linear regression with R 1
Linear regression with R 1
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Linear regression with R 1

915

Published on

Published in: Education
1 Comment
3 Likes
Statistics
Notes
No Downloads
Views
Total Views
915
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
30
Comments
1
Likes
3
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Linear Regression with 1: Prepare data/specify model/read results 2012-12-07 @HSPH Kazuki Yoshida, M.D. MPH-CLE student FREEDOM TO  KNOW
  • 2. Group Website is at:http://rpubs.com/kaz_yos/useR_at_HSPH
  • 3. Previously in this groupn Introduction n Graphicsn Reading Data into R (1) n Groupwise, continuousn Reading Data into R (2) nn Descriptive, continuousn Descriptive, categoricaln Deducer
  • 4. Menun Linear regression
  • 5. Ingredients Statistics Programmingn Data preparation n within()n Model formula n factor(), relevel() n lm() n formula = Y ~ X1 + X2 n summary() n anova(), car::Anova()
  • 6. OpenR Studio
  • 7. Create a new script and save it.
  • 8. http://www.umass.edu/statdata/statdata/data/
  • 9. We will use lowbwt dataset used in BIO213 lowbwt.dathttp://www.umass.edu/statdata/statdata/data/lowbwt.txthttp://www.umass.edu/statdata/statdata/data/lowbwt.dat
  • 10. Load dataset from weblbw <- read.table("http://www.umass.edu/statdata/statdata/data/lowbwt.dat", head = T, skip = 4) skip 4 rows header = TRUE to pick up variable names
  • 11. “Fix” dataset lbw[c(10,39), "BWT"] <- c(2655, 3035) BWT column Replace data points10th,39th to make the dataset identical rows to BIO213 dataset
  • 12. Lower case variable names names(lbw) <- tolower(names(lbw)) Put them back into Convert variable variable names names to lower case
  • 13. See overview
  • 14. library(gpairs)gpairs(lbw)
  • 15. RecodingChanging and creating variables
  • 16. Name of newly created dataset (here replacing original) Take dataset dataset <- within(dataset, { _variable manipulations_ }) Perform variable manipulation You can specify by variable name only. No need for dataset$var_name
  • 17. lbw <- within(lbw, { ## Relabel race race.cat <- factor(race, levels = 1:3, labels = c("White","Black","Other")) ## Categorize ftv (frequency of visit) ftv.cat <- cut(ftv, breaks = c(-Inf, 0, 2, Inf), labels = c("None","Normal","Many")) ftv.cat <- relevel(ftv.cat, ref = "Normal") ## Dichotomize ptl preterm <- factor(ptl >= 1, levels = c(F,T), labels = c("0","1+"))})
  • 18. Numeric to categorical: element by element 1st will be referencelbw <- within(lbw, { ## Relabel race race.cat <- factor(race, levels = 1:3, labels = c("White","Black","Other")) ## Categorize ftv (frequency of visit) ftv.cat <- cut(ftv, breaks = c(-Inf, 0, 2, Inf), labels = c("None","Normal","Many")) ftv.cat <- relevel(ftv.cat, ref = "Normal") ## Dichotomize ptl preterm <- factor(ptl >= 1, levels = c(F,T), labels = c("0","1+"))}) 1 to White 1st will be referenceCategorize race and label: 2 to Black 3 to Other
  • 19. Explained more in depthfactor() to create categorical variable Create newvariable named Take race variable race.cat lbw <- within(lbw, { ## Relabel race race.cat <- factor(race, levels = 1:3, labels = c("White","Black","Other")) }) Order levels 1, 2, 3 Make 1 reference level Label levels 1, 2, 3 as White, Black, Other
  • 20. Numeric to categorical: range to elementlbw <- within(lbw, { 1st will be reference ## Relabel race race.cat <- factor(race, levels = 1:3, labels = c("White","Black","Other")) ## Categorize ftv (frequency of visit) ftv.cat <- cut(ftv, breaks = c(-Inf, 0, 2, Inf), labels = c("None","Normal","Many")) ftv.cat <- relevel(ftv.cat, ref = "Normal") ## Dichotomize ptl preterm <- factor(ptl >= 1, levels = c(F,T), labels = c("0","1+"))}) How breaks work(-Inf 0] 1 2] 3 4 5 6 Inf ] None Normal Many
  • 21. Reset reference levellbw <- within(lbw, { ## Relabel race race.cat <- factor(race, levels = 1:3, labels = c("White","Black","Other")) ## Categorize ftv (frequency of visit) ftv.cat <- cut(ftv, breaks = c(-Inf, 0, 2, Inf), labels = c("None","Normal","Many")) ftv.cat <- relevel(ftv.cat, ref = "Normal") ## Dichotomize ptl preterm <- factor(ptl >= 1, levels = c(F,T), labels = c("0","1+"))}) Change reference level of ftv.cat variable from None to Normal
  • 22. Numeric to Boolean to Categorylbw <- within(lbw, { ## Relabel race race.cat <- factor(race, levels = 1:3, labels = c("White","Black","Other")) ## Categorize ftv (frequency of visit) ftv.cat <- cut(ftv, breaks = c(-Inf, 0, 2, Inf), labels = c("None","Normal","Many")) ftv.cat <- relevel(ftv.cat, ref = "Normal") ## Dichotomize ptl preterm <- factor(ptl >= 1, levels = c(FALSE,TRUE), labels = c("0","1+"))}) TRUE, FALSE ptl < 1 to FALSE, then to “0” vector created ptl >= 1 to TRUE, then to “1+” here levels labels
  • 23. Binary 0,1 to No,Yeslbw <- within(lbw, { ## Categorize smoke ht ui smoke <- factor(smoke, levels = 0:1, labels = c("No","Yes")) One-by-one ht <- factor(ht, levels = 0:1, labels = c("No","Yes")) ui <- factor(ui, levels = 0:1, labels = c("No","Yes")) method})## Alternative to abovelbw[,c("smoke","ht","ui")] <- lapply(lbw[,c("smoke","ht","ui")], function(var) { Loop method var <- factor(var, levels = 0:1, labels = c("No","Yes")) })
  • 24. model formula
  • 25. formula outcome ~ predictor1 + predictor2 + predictor3 SAS equivalent:model outcome = predictor1 predictor2 predictor3;
  • 26. In the case of t-test continuous variable grouping variable to to be compared separate groups age ~ zyg Variable to be Variable used explained to explain
  • 27. linear sumY ~ X1 + X2
  • 28. n . All variables except for the outcomen + X2 Add X2 termn - 1 Remove interceptn X1:X2 Interaction term between X1 and X2n X1*X2 Main effects and interaction term
  • 29. Interaction termY ~ X1 + X2 + X1:X2 Main effects Interaction
  • 30. Interaction termY ~ X1 * X2 Main effects & interaction
  • 31. On-the-fly variable manipulation Inhibit formula interpretation. For math manipulation Y ~ X1 + I(X2 * X3) New variable (X2 times X3) created on-the-fly and used
  • 32. Fit a modellm.full <- lm(bwt ~ age + lwt + smoke + ht + ui + ftv.cat + race.cat + preterm , data = lbw)
  • 33. See model object lm.full
  • 34. Call: command repeated Coefficient for each variable
  • 35. See summarysummary(lm.full)
  • 36. Call: command repeated Residual distribution Coef/SE = t Dummy variables createdModel R^2 and adjusted R^2F-test
  • 37. ftv.catNone No 1st trimester visit people compared to Normal 1st trimester visit people (reference level)ftv.catMany Many 1st trimester visit people compared to Normal 1st trimester visit people (reference level)
  • 38. race.catBlack Black people compared to White people (reference level)race.catOther Other people compared to White people (reference level)
  • 39. Confidence intervalsconfint(fit.lm)
  • 40. Confidence intervals Lower Upper boundary boundary
  • 41. ANOVA table (type I)anova(lm.full)
  • 42. ANOVA table (type I) degree of Sequential Mean SS freedom SS = SS/DF F = Mean SS / Mean SS of residual
  • 43. Type I = Sequential SS 1 age 1st gets all in type I er lap ov I ut pe ll b n ty sa 1i las et n g e 2 lwt on emtr nd twe 2 e ly b in aini typ ng3 smoke eI
  • 44. ANOVA table (type III) library(car)Anova(lm.full, type = 3)
  • 45. ANOVA table (type III) Marginal degree of SS freedom Multi-categoryvariablestested as one F = Mean SS / Mean SS of residual
  • 46. Type III = Marginal SS 1 age gin ar I ets m e II 1s t g typ in o nly e I in typ rg II i n ma las ly ets on tg 2 lwt ets dg ly in ma 2n typ rg on3 smoke e I in II
  • 47. ComparisonType I Type III
  • 48. Effect plotlibrary(effects)plot(allEffects(lm.full), ylim = c(2000,4000)) Fix Y-axis values for all plots
  • 49. Effect of a variablewith other covariate set at average
  • 50. Interaction
  • 51. This model is fordemonstration purpose. Continuous * Continuous lm.full.int <- lm(bwt ~ age*lwt + smoke + ht + ui + age*ftv.cat + race.cat*preterm, data = lbw) Continuous * Categorical Categorical * Categorical
  • 52. Anova(lm.full.int, type = 3)
  • 53. Marginal degree of SS freedomInteraction terms F = Mean SS / Mean SS of residual
  • 54. plot(effect("age:lwt", lm.full.int)) lwt levelContinuous * Continuous
  • 55. plot(effect("age:ftv.cat", lm.full.int), multiline = TRUE) Continuous * Categorical
  • 56. plot(effect(c("race.cat*preterm"), lm.full.int),x.var = "preterm", z.var = "race.cat", multiline = TRUE) Categorical * Categorical

×