Linear regression with R 1

Linear Regression
with
1: Prepare data/specify model/read results

2012-12-07 @HSPH
Kazuki Yoshida, M.D.
MPH-CLE student

FREEDOM
TO
KNOW

Group Website is at:
http://rpubs.com/kaz_yos/useR_at_HSPH

Previously in this group
n Introduction n Graphics

n Reading Data into R (1) n Groupwise, continuous

n Reading Data into R (2) n

n Descriptive, continuous

n Descriptive, categorical

n Deducer

Menu

n Linear regression

Ingredients
Statistics Programming
n Data preparation n within()

n Model formula n factor(), relevel()

n lm()

n formula = Y ~ X1 + X2

n summary()

n anova(), car::Anova()

Create a new script
and save it.

http://www.umass.edu/statdata/statdata/data/

We will use lowbwt dataset used in BIO213

lowbwt.dat
http://www.umass.edu/statdata/statdata/data/lowbwt.txt
http://www.umass.edu/statdata/statdata/data/lowbwt.dat

Load dataset from web

lbw <- read.table("http://www.umass.edu/statdata/statdata/data/lowbwt.dat",
head = T, skip = 4)

skip 4 rows
header = TRUE
to pick up
variable names

“Fix” dataset

lbw[c(10,39), "BWT"] <- c(2655, 3035)

BWT column
Replace data points
10th,39th to make the dataset identical
rows to BIO213 dataset

Lower case variable names

names(lbw) <- tolower(names(lbw))

Put them back into Convert variable
variable names names to lower case

Recoding
Changing and creating variables

Name of newly created dataset
(here replacing original) Take dataset

dataset <-

within(dataset, {

_variable manipulations_
}) Perform variable manipulation
You can specify by variable name
only. No need for dataset$var_name

lbw <- within(lbw, {

## Relabel race
race.cat <- factor(race, levels = 1:3, labels = c("White","Black","Other"))

## Categorize ftv (frequency of visit)
ftv.cat <- cut(ftv, breaks = c(-Inf, 0, 2, Inf), labels = c("None","Normal","Many"))
ftv.cat <- relevel(ftv.cat, ref = "Normal")

## Dichotomize ptl
preterm <- factor(ptl >= 1, levels = c(F,T), labels = c("0","1+"))

})

Numeric to categorical:
element by element 1st will be reference

## Relabel race


## Dichotomize ptl

})
1 to White 1st will be reference
Categorize race and label: 2 to Black
3 to Other

Explained more in depth
factor() to create categorical variable
Create new
variable named Take race variable
race.cat

## Relabel race

})

Order levels 1, 2, 3
Make 1 reference level
Label levels 1, 2, 3 as
White, Black, Other

Numeric to categorical:
range to element
1st will be reference
## Relabel race


## Dichotomize ptl

}) How breaks work

(-Inf 0] 1 2] 3 4 5 6 Inf ]
None Normal Many

Reset reference level

## Relabel race


## Dichotomize ptl

})

Change reference level of ftv.cat variable
from None to Normal

Numeric to Boolean to Category

## Relabel race


## Dichotomize ptl
preterm <- factor(ptl >= 1, levels = c(FALSE,TRUE), labels = c("0","1+"))

})

TRUE, FALSE ptl < 1 to FALSE, then to “0”
vector created ptl >= 1 to TRUE, then to “1+”
here levels labels

Binary 0,1 to No,Yes

## Categorize smoke ht ui
smoke <- factor(smoke, levels = 0:1, labels = c("No","Yes")) One-by-one
ht <- factor(ht, levels = 0:1, labels = c("No","Yes"))
ui <- factor(ui, levels = 0:1, labels = c("No","Yes")) method
})

## Alternative to above
lbw[,c("smoke","ht","ui")] <-
lapply(lbw[,c("smoke","ht","ui")],
function(var) { Loop method
var <- factor(var, levels = 0:1, labels = c("No","Yes"))
})

formula

outcome ~ predictor1 + predictor2 + predictor3

SAS equivalent:
model outcome = predictor1 predictor2 predictor3;

In the case of t-test

continuous variable grouping variable to
to be compared separate groups

age ~ zyg
Variable to be Variable used
explained to explain

n . All variables except for the outcome

n + X2 Add X2 term

n - 1 Remove intercept

n X1:X2 Interaction term between X1 and X2

n X1*X2 Main effects and interaction term

Interaction term

Y ~ X1 + X2 + X1:X2
Main effects Interaction

Interaction term

Y ~ X1 * X2
Main effects & interaction

On-the-ﬂy variable manipulation
Inhibit formula
interpretation. For math
manipulation

Y ~ X1 + I(X2 * X3)
New variable (X2 times X3)
created on-the-ﬂy and used

Fit a model

lm.full <- lm(bwt ~ age + lwt + smoke + ht + ui +
ftv.cat + race.cat + preterm ,
data = lbw)

See model object

lm.full

Call: command repeated

Coefﬁcient for each
variable

See summary

summary(lm.full)

Call: command repeated Residual
distribution

Coef/SE = t

Dummy
variables
created

Model R^2 and adjusted R^2
F-test

ftv.catNone No 1st trimester visit people compared to
Normal 1st trimester visit people (reference level)
ftv.catMany Many 1st trimester visit people compared to
Normal 1st trimester visit people (reference level)

race.catBlack Black people compared to
White people (reference level)
race.catOther Other people compared to
White people (reference level)

Confidence intervals

confint(fit.lm)

Conﬁdence intervals
Lower Upper
boundary boundary

ANOVA table (type I)

anova(lm.full)

ANOVA table (type I)
degree of Sequential Mean SS
freedom SS = SS/DF

F = Mean SS / Mean SS of residual

Type I = Sequential SS
1 age

1st gets all in type I

er lap
ov I
ut pe
ll b n ty
sa 1i
las et n
g e 2 lwt
on emtr nd twe
2 e
ly b
in aini
typ ng
3 smoke eI

ANOVA table (type III)

library(car)
Anova(lm.full, type = 3)

ANOVA table (type III)
Marginal degree of
SS freedom
Multi-
category
variables
tested as
one


Type III = Marginal SS
1 age
gin
ar I
ets m e II
1s t g typ
in
o nly

e I in
typ rg
II
i n ma
las

ly ets
on tg 2 lwt
ets

dg
ly
in ma
2n
typ rg
on
3 smoke e I in
II

Comparison

Type I Type III

Effect plot

library(effects)
plot(allEffects(lm.full), ylim = c(2000,4000))

Fix Y-axis
values for all
plots

Effect of a variable
with other covariate
set at average

This model is for
demonstration purpose.
Continuous * Continuous

lm.full.int <- lm(bwt ~ age*lwt + smoke +
ht + ui + age*ftv.cat + race.cat*preterm,
data = lbw)

Continuous * Categorical
Categorical * Categorical

Marginal degree of
SS freedom

Interaction
terms


plot(effect("age:lwt", lm.full.int))

lwt level
Continuous * Continuous

plot(effect("age:ftv.cat", lm.full.int), multiline = TRUE)
Continuous * Categorical

plot(effect(c("race.cat*preterm"), lm.full.int),
x.var = "preterm", z.var = "race.cat", multiline = TRUE)
Categorical * Categorical

Linear regression with R 1

More Related Content

What's hot

Similar to Linear regression with R 1

More from Kazuki Yoshida

Recently uploaded

Linear regression with R 1