Event link: http://www.meetup.com/NYC-Open-Data/events/161342472/
A free R workshop given by SupStat Inc at New York R user group and NYC Open Data Meetup group
Amil baba in Pakistan amil baba Karachi amil baba in pakistan amil baba in la...
Introduction to Mixed Effect Models for Data Analysis
1. Introduction of Mixed effect model
http://nycdatascience.com/mixed_effect_model_supstat/index.html#1
Introduction of Mixed effect model
Learning by simulation
Supstat Inc.
1 of 34
1/29/14, 10:51 PM
2. Introduction of Mixed effect model
http://nycdatascience.com/mixed_effect_model_supstat/index.html#1
Outline
· What is mixed effect model
· Fixed effect model
· Mixed effect model
- Random Intercept model
- Random Intercept and Slope Model
· General Mixed effect model
· Case study
2 of 34
1/29/14, 10:51 PM
3. Introduction of Mixed effect model
http://nycdatascience.com/mixed_effect_model_supstat/index.html#1
What is mixed effect model
3 of 34
1/29/14, 10:51 PM
4. Introduction of Mixed effect model
http://nycdatascience.com/mixed_effect_model_supstat/index.html#1
Classical normal linear model
Formation:
Yi = b0 + b1*Xi + ei
· Yi is response from suject i.
· Xi are covariates.
· b0, b1 are parameters that we want to estimate.
· ei are the random terms in the model, and are assumped to be independently and indentically
distributed from Normal(0,1). It is very important that there is no stucuture in ei and it
represents the variations that could not be controled in our studies.
4 of 34
1/29/14, 10:51 PM
5. Introduction of Mixed effect model
http://nycdatascience.com/mixed_effect_model_supstat/index.html#1
Violation of independence assumpation.
In many cases, responses are not independent from each other. These data usualy have some
cluster stucture.
· Repeated measures, where measurements are taken multiple times from the same sujects.
(clustered by subject)
· A survey of all the family memebers. (clustered by family)
· A survey of students from 20 classrooms in a high school. (clustered by classroom)
· Longitudial data, or known as the panel data, where several responses are collected from the
same sujects along the time. (clustered by subject)
We need new tools - Mixed effect model.
5 of 34
1/29/14, 10:51 PM
6. Introduction of Mixed effect model
http://nycdatascience.com/mixed_effect_model_supstat/index.html#1
Mixed effect model
Mixed effect model = Fixed effect + Random effect
· Fixed effects
- expected to have a systematic and predictable influence on your data.
- exhaust “the levels of a factor”.Think of sex(male/femal).
· Random effect
- expected to have a non-systematic, unpredictable, or “random” influence on your data.
- Random effects have factor levels that are drawn from a large population, but we do not
know exactly how or why they differ.
6 of 34
1/29/14, 10:51 PM
7. Introduction of Mixed effect model
http://nycdatascience.com/mixed_effect_model_supstat/index.html#1
Example of Fixed effects and Random effects
FIXED EFFECTS
Male or female
Individuals with repeated measures
Insecticide sprayed or not
Block within a field
Upland or lowland
Brood
One country versus another
Split plot within a plot
Wet versus dry
7 of 34
RANDOM EFFECTS
Family
1/29/14, 10:51 PM
8. Introduction of Mixed effect model
http://nycdatascience.com/mixed_effect_model_supstat/index.html#1
Fixed effect model
8 of 34
1/29/14, 10:51 PM
9. Introduction of Mixed effect model
http://nycdatascience.com/mixed_effect_model_supstat/index.html#1
Fixed effect model
Fixed effect model is just the linear model that you maybe already know.
Yi = b0 + b1*Xi + ei
1<i<n n is number of sample
· Yi: Response Variable
· b0: fixed intercept
· b1: fixed slope
· Xi: Explanatory Variable (fixed effect)
· ei: noise (error)
9 of 34
1/29/14, 10:51 PM
10. Introduction of Mixed effect model
http://nycdatascience.com/mixed_effect_model_supstat/index.html#1
Data generation of fixed effect model
set.seed(1)
# genaerate x
x <- seq(1,5,length.out=100)
# generate error
noise <- rnorm(n=100,mean=0,sd=1)
b0 <- 1
b1 <- 2
# generate y
y <- b0 + b1*x + noise
10 of 34
1/29/14, 10:51 PM
11. Introduction of Mixed effect model
http://nycdatascience.com/mixed_effect_model_supstat/index.html#1
Data generation of fixed effect model
plot(y~x)
11 of 34
1/29/14, 10:51 PM
12. Introduction of Mixed effect model
http://nycdatascience.com/mixed_effect_model_supstat/index.html#1
Cooefficient estimation of fixed effect model
model <- lm(y~x)
summary(model)
Call:
lm(formula = y ~ x)
Residuals:
Min
1Q
-2.3401 -0.6058
Median
0.0155
3Q
0.5851
Max
2.2975
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)
1.1424
0.2491
4.59 1.3e-05 ***
x
1.9888
0.0774
25.70 < 2e-16 ***
--Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.903 on 98 degrees of freedom
12 of 34
1/29/14, 10:51 PM
13. Introduction of Mixed effect model
http://nycdatascience.com/mixed_effect_model_supstat/index.html#1
plot of fixed effect model
plot(y~x)
abline(model)
13 of 34
1/29/14, 10:51 PM
14. Introduction of Mixed effect model
http://nycdatascience.com/mixed_effect_model_supstat/index.html#1
Mixed effect model
14 of 34
1/29/14, 10:51 PM
15. Introduction of Mixed effect model
http://nycdatascience.com/mixed_effect_model_supstat/index.html#1
Random Intercept model
there are i people, and we repeat measure j times for every people. These poeple are individually
different which we don't know, so there are random effect cause by people, and there are another
random noise cause by measure for every people.
Yij = b0 + b1*Xij + bi + eij
· b0: fixed intercept
· b1: fixed slope
· Xij: fixed effect
· bi: random effect(influence intercept)
· eij: noise
15 of 34
1/29/14, 10:51 PM
16. Introduction of Mixed effect model
http://nycdatascience.com/mixed_effect_model_supstat/index.html#1
Data generation of Random Intercept model
b0 <- 9.9
b1 <- 2
# repeat measure times for 6 people
n <- c(13, 14, 14, 15, 12, 13)
npeople <- length(n)
set.seed(1)
# generate x(fixed effect)
x <- matrix(rep(0, length=max(n) * npeople),ncol = npeople)
for (i in 1:npeople){
x[1:n[i], i] <- runif(n[i], min = 1, max = 5)
x[1:n[i], i] <- sort(x[1:n[i], i])
}
# random effect
bi <- rnorm(npeople, mean = 0, sd = 10)
16 of 34
1/29/14, 10:51 PM
17. Introduction of Mixed effect model
http://nycdatascience.com/mixed_effect_model_supstat/index.html#1
Data generation of Random Intercept model
xall <- NULL
yall <- NULL
peopleall <- NULL
for (i in 1:npeople){
xall <- c(xall, x[1:n[i], i]) # combine x
# generate y
y <- rep(b0 + bi[i], length = n[i]) +
b1 * x[1:n[i],i] +
rnorm(n[i], mean = 0, sd = 2) # noise
yall <- c(yall, y) # combine y
people <- rep(i, length = n[i])
peopleall <- c(peopleall, people)
}
# final dataset
data1 <- data.frame(yall,peopleall,xall)
17 of 34
1/29/14, 10:51 PM
18. Introduction of Mixed effect model
http://nycdatascience.com/mixed_effect_model_supstat/index.html#1
Cooefficient estimation of Random Intercept
model
library(nlme)
# xall is fixed effect
# bi influence intercept of model
lme1 <- lme(yall~xall,random=~1|peopleall,data=data1)
summary(lme1)
Linear mixed-effects model fit by REML
Data: data1
AIC BIC logLik
358 368
-175
Random effects:
Formula: ~1 | peopleall
(Intercept) Residual
StdDev:
7.3
1.77
Fixed effects: yall ~ xall
18 of 34
1/29/14, 10:51 PM
19. Introduction of Mixed effect model
http://nycdatascience.com/mixed_effect_model_supstat/index.html#1
Plot of Random Intercept model
19 of 34
1/29/14, 10:51 PM
20. Introduction of Mixed effect model
http://nycdatascience.com/mixed_effect_model_supstat/index.html#1
Random Intercept and slope model
Yij = b0 + (b1+si)*Xij + bi + eij
· b0: fixed intercept
· b1: fixed slope
· X: fixed effect
· bi: random effect(influence intercept)
· eij: noise
· si: random effect(influence slope)
20 of 34
1/29/14, 10:51 PM
21. Introduction of Mixed effect model
http://nycdatascience.com/mixed_effect_model_supstat/index.html#1
Data generation of Random Intercept and
slope model
a0 <- 9.9
a1 <- 2
n <- c(12, 13, 14, 15, 16, 13)
npeople <- length(n)
set.seed(1)
si <- rnorm(npeople, mean = 0, sd = 0.5) # random slope
x <- matrix(rep(0, length = max(n) * npeople),
ncol = npeople)
for (i in 1:npeople){
x[1:n[i], i] <- runif(n[i], min = 1,
max = 5)
x[1:n[i], i] <- sort(x[1:n[i], i])
}
21 of 34
1/29/14, 10:51 PM
22. Introduction of Mixed effect model
http://nycdatascience.com/mixed_effect_model_supstat/index.html#1
Data generation of Random Intercept and
slope model
bi <- rnorm(npeople, mean = 0, sd = 10) # random intercept
xall <- NULL
yall <- NULL
peopleall <- NULL
for (i in 1:npeople){
xall <- c(xall, x[1:n[i], i])
y <- rep(a0 + bi[i], length = n[i]) +
(a1 + si[i]) * x[1:n[i],i] +
rnorm(n[i], mean = 0, sd = 0.5)
yall <- c(yall, y)
people <- rep(i, length = n[i])
peopleall <- c(peopleall, people)
}
# generate final dataset
data2 <- data.frame(yall, peopleall, xall)
22 of 34
1/29/14, 10:51 PM
23. Introduction of Mixed effect model
http://nycdatascience.com/mixed_effect_model_supstat/index.html#1
Cooefficient estimation of Random Intercept
and slope model
# bi influence intercept and slope of model
lme2 <- lme(yall~xall,random=~1+xall|peopleall,data=data2)
print(summary(lme2))
Linear mixed-effects model fit by REML
Data: data2
AIC BIC logLik
179 194 -83.6
Random effects:
Formula: ~1 + xall | peopleall
Structure: General positive-definite, Log-Cholesky parametrization
StdDev Corr
(Intercept) 11.593 (Intr)
xall
0.464 0.044
Residual
0.445
23 of 34
1/29/14, 10:51 PM
24. Introduction of Mixed effect model
http://nycdatascience.com/mixed_effect_model_supstat/index.html#1
Plot of Random Intercept and slope model
24 of 34
1/29/14, 10:51 PM
25. Introduction of Mixed effect model
http://nycdatascience.com/mixed_effect_model_supstat/index.html#1
what if we just use linear model
· complete pooling
# wrong estimation
lm1 <- lm(yall~xall,data=data2)
summary(lm1)
Call:
lm(formula = yall ~ xall, data = data2)
Residuals:
Min
1Q Median
-17.80 -6.27 -3.67
3Q
2.19
Max
24.33
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)
6.86
3.72
1.84 0.06874 .
xall
4.31
1.15
3.76 0.00032 ***
--25 of 34
1/29/14, 10:51 PM
26. Introduction of Mixed effect model
http://nycdatascience.com/mixed_effect_model_supstat/index.html#1
what if we just use linear model
· no pooling
# wrong estimation and waste too many freedom and we don't care about the exact different of pe
lm2 <- lm(yall~xall+factor(peopleall)+xall*factor(peopleall),data=data1)
summary(lm2)
Call:
lm(formula = yall ~ xall + factor(peopleall) + xall * factor(peopleall),
data = data1)
Residuals:
Min
1Q Median
-2.983 -1.194 0.054
3Q
1.092
Max
4.238
Coefficients:
(Intercept)
xall
26 of 34
Estimate Std. Error t value Pr(>|t|)
18.818
1.342
14.02 < 2e-16 ***
0.929
0.413
2.25
0.028 *
1/29/14, 10:51 PM
27. Introduction of Mixed effect model
http://nycdatascience.com/mixed_effect_model_supstat/index.html#1
General Mixed effect model
27 of 34
1/29/14, 10:51 PM
28. Introduction of Mixed effect model
http://nycdatascience.com/mixed_effect_model_supstat/index.html#1
Logistic Mixed effect model
Yij = exp(eta)/(1+exp(eta))
eta = b0 + b1*Xij + bi + eij
· b0: fixed intercept
· b1: fixed slope
· X: fixed effect
· bi: random effect(influence intercept)
· eij: noise
28 of 34
1/29/14, 10:51 PM
29. Introduction of Mixed effect model
http://nycdatascience.com/mixed_effect_model_supstat/index.html#1
Data generation of Logistic Mixed effect
model
b0 <- - 6
b1 <- 2.1
set.seed(1)
n <- c(12, 13, 14, 15, 16, 13)
npeople <- length(n)
x <- matrix(rep(0, length = max(n) * npeople),
ncol = npeople)
bi <- rnorm(npeople, mean = 0, sd = 1.5)
for (i in 1:npeople){
x[1:n[i], i] <- runif(n[i], min = 1,max = 5)
x[1:n[i], i] <- sort(x[1:n[i], i])
}
29 of 34
1/29/14, 10:51 PM
30. Introduction of Mixed effect model
http://nycdatascience.com/mixed_effect_model_supstat/index.html#1
Data generation of Logistic Mixed effect
model
xall <- NULL
yall <- NULL
peopleall <- NULL
for (i in 1:npeople){
xall <- c(xall, x[1:n[i], i])
y <- NULL
for(j in 1:n[i]){
eta1 <- b0 + b1 * x[j, i] + bi[i]
y <- c(y, rbinom(n = 1, size = 1,
prob = exp(eta1)/(exp(eta1) + 1)))
}
yall <- c(yall, y)
people <- rep(i, length = n[i])
peopleall <- c(peopleall, people)
}
data3 <- data.frame(xall, peopleall,yall)
30 of 34
1/29/14, 10:51 PM
31. Introduction of Mixed effect model
http://nycdatascience.com/mixed_effect_model_supstat/index.html#1
Cooefficient estimation of Logistic Mixed
effect model
library(lme4)
# formula is different
lmer3 <- glmer(yall~xall+(1|peopleall),data=data3,family=binomial)
print(summary(lmer3))
Generalized linear mixed model fit by maximum likelihood ['glmerMod']
Family: binomial ( logit )
Formula: yall ~ xall + (1 | peopleall)
Data: data3
AIC
69.8
BIC
77.1
logLik deviance
-31.9
63.8
Random effects:
Groups
Name
Variance Std.Dev.
peopleall (Intercept) 3.94
1.98
Number of obs: 83, groups: peopleall, 6
31 of 34
1/29/14, 10:51 PM
32. Introduction of Mixed effect model
http://nycdatascience.com/mixed_effect_model_supstat/index.html#1
Plot of Logistic Mixed effect model
32 of 34
1/29/14, 10:51 PM
33. Introduction of Mixed effect model
http://nycdatascience.com/mixed_effect_model_supstat/index.html#1
Case study
33 of 34
1/29/14, 10:51 PM
34. Introduction of Mixed effect model
34 of 34
http://nycdatascience.com/mixed_effect_model_supstat/index.html#1
1/29/14, 10:51 PM